C for R Users

Get "Closer to the Machine..."

BB

For anyone who wants to either a) read the R source code, or b), extend R via C, it's going to be tough sledding trying to decipher the R C API without at least a basic understanding of C. Accordingly, I want to write/cite/link a little about C as a standalone programming language. To be clear, this is also for my own benefit as I am very inexperienced with C. Obviously, this will just be very basic as it would be pretty hard to cover an entire programming language in a single webpage, let alone a language that I am fairly ignorant in. Still, learning C and "getting closer to the machine" has been highly rewarding as C is at the heart of most modern operating systems and other, interpreted languages like R and Python (e.g., Cython). True, the syntax is ugly, but I like the fact that in a few concise statements you can perform a number of very powerful actions. However, the same concise syntax can also be confusing as hell, especially in terms of the '*' operator. Nonetheless, I think the good outweighs the bad...

Basics

C syntax is very similar to R, considering R, like many other languages, purposefully adopted C/ALGOL style. Many of the actual language features and function calls in C are also similar to R. In fact, it is probably most useful to highlight differences as opposed to similarities. As K&R (e.g., Kernighan & Ritchie) note, the hardest part of C is creating a 'hello world' program...

This is the big hurdle; to leap over it you have to be able to create the program text somewhere, compile it, load it, run it, and find out where your output went. With these mechanical details mastered, everything else is comparatively easy.

The C Programming Language, Kernighan & Ritchie

// file = helloworld.c


  #include <stdio.h>

      main()
      {
            printf("hello, world\n");
      }
  

In C, the main() function is basically a wrapper for other operations/functions that one wants to use when the file is run whereas printf() is a function like R's print() or sprintf(). The line as the top #include, is the C standard library input/out header. A header is just a link to various functions that either come standard with C, are provided by another library, or are custom-built by a user. Headers in brackets '<>' are on the compiler's 'search path' whereas those in quotes (e.g., '#include "myheader.h"') should reference a file in the same directory as the '.c' file In reality, there is little difference between a ".h" file and a ".c" file as far as the compiler is concerned . Note also that lines are terminated with the semi-colon ';' and that printed output requires a newline after it '\n'.

When you want to run the program, you will save the file as a '.c' file and then compile it. The mechanics of this vary depending on the compiler. Most commonly, the compiler will be gcc, which comes with Rtools if you are on windows Rtools. Those who develop exclusively for windows often use Visual Studio, although this is most often used for C++/C# files. Because R makes use of gcc, I will focus on that. There are a lot of different flags for gcc but on windows you would do something like the following in the command prompt, or via shellsystem in R.

# Make sure GCC is working
  shell('gcc --version', intern = TRUE)
  
## [1] "gcc (i686-posix-dwarf, Built by MinGW-W64 project) 4.9.3"
  ## [2] "Copyright (C) 2015 Free Software Foundation, Inc."
  ## [3] "This is free software; see the source for copying conditions.  There is NO"
  ## [4] "warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
  ## [5] ""
  
shell('gcc helloworld.c -o helloworld')
  

Assuming that it compiles correctly, on Windows there will now be a helloworld.exe ".o" on Unix in the same directory. Type the name 'helloworld' in the prompt and you will get:

hw <- shell('helloworld', intern = TRUE)
  hw
  
## [1] "hello, world"
  

It is important to note that the main() function that calls 'helloworld' neither takes a value as an argument, nor returns a value to the caller. Functions that take either one or the other require that input/return value(s) be specified. This can be illustrated with another example:

// divider.c

  #include <stdio.h>
  #include<stdlib.h>

  // '//' = single-line comment, '/* ... */' = multi-line comment

  // argc = number of arguments, argv[0] = name of program, argv[i] = user args

  int main(int argc,char* argv[])
      {
            double z; // declaring type double

            double x = atof(argv[1]); // stdlib::atof = convert string to double
            double y = atof(argv[2]);

            z = x / y;

            // %f = double, %d = integer, %s = string

      printf("The answer is: %f, thanks for entering %d arguments to the %s program\n",z,argc,argv[0]);


            return(0);
      }
  

Again, compile then run. I imaginatively named the file 'divider.c'

First, compile...

shell('gcc divider.c -o divider')
  

Then run...

divider <- shell('divider 20 5', intern = TRUE)
  divider
  
## [1] "The answer is: 4.000000, thanks for entering 3 arguments to the divider program"
  

The preceeding function illustrates a number of different aspects of C, including its strongly-typed nature (e.g., need to declare variable types ahead of time), entering arguments via command line (argc/argv[]), type conversions, and formatted printing which are explained in the code comments. One aspect that does require a little more detail is types and their respective storage allocations This cheatsheet provides a helpful type guide, among other info, although note that there are differences on different machines... .

Arrays, Strings, and Pointers

Arrays, pointers, and strings are all closely related. An array is just a consecutive group locations in memory. They are initialized by specifying type and length (e.g., int sample_array[5];). As you are probably aware,arrays are zero indexed. If you declare the array without specifiying values, it will contain whatever 'garbage' was in the memory location from before. You can specify values using curly brackets (e.g., int sample_array[] = {0,1,2,3,4};).

// array.c

  #include <stdio.h>

  void main()
        {

            int i;
            int sample_array[5];  // 5 random ints in memory

            for(i=0; i<5; i++)  
  // typical C loop syntax (while loops are also common...)   {     printf("%d\n",sample_array[i]);   }   int sample_array2[] = {0,1,2,3,4};   printf("\n"); // break up array output   for(i=0; i<5; i++)   {     printf("%d\n",sample_array2[i]);   }    return; }

Note that I skip the compilation step from here on out...

C_array <- shell('array', intern = TRUE)
  
C_array
  
##  [1] "6356864" "4200443" "4200352" "0"       "6"       ""        "0"
  ##  [8] "1"       "2"       "3"       "4"
  

As you might have guessed, it is possible to have 2-dimensional and even larger arrays, although the complexity quickly becomes unmangeable for larger N. A a sample two dimensional array would be declared:

type array[i][j] or in practice, int array[2][3]; More on this shortly... .

Working with strings in C is somewhat tedious (and even more complicated with the R C API!). In C, a string is just an array of type char terminated by a null '/0' symbol. As the GNU manual notes, strings can either intialized like char my_string[6] = "hello"; or char my_string[6] = {"h","e","l","l","o","\0"};, but in the latter case, null must be appended to the end.

// strings.c

  #include <stdio.h>

  void main () {

    int i;
    char my_string[6] = {'h','e','l','l','o','\0'};

    for(i=0; i<6; i++)
      {
          printf("%c\n",my_string[i]);

      }

  }
  

strings <- shell('strings', intern = TRUE)
  
strings
  
## [1] "h" "e" "l" "l" "o" ""
  

A lot of energy has been devoted to explaining pointers, including entire books Understanding and Using Pointers..., so this explanation will necessarily be incomplete. Pointers 'point' to an address in memory. The '*' symbol in conjunction with a variable declaration initializes a pointer int *new_pointer;. This can be read as a pointer that contains the address of an integer. The '&' symbol indicates an address in memory. The following function attempts to illustrate this:

// pointers.c

  #include <stdio.h>

  int main () {

       int  x = 5;
       int  *ptr = &x;

       printf("Value of x = %d\n", x  );

       *ptr = 10;

       printf("Value of x = %d\n\n", x  );

          char array[10];

       printf("The address of array = %p\n", &array  );
       printf("The address of array[0] = %p\n", &array[0] );
       printf("The address of array[1] %p\n\n", &array[1] );

       array[1] = 20;

       printf("The value of array[1] = %d\n", array[1] );
       printf("The value of array[1] = %d\n", *(array + 1) );

       return 0;
  }
  
pointers <- shell('pointers', intern = TRUE)
  pointers
  
## [1] "Value of x = 5"
  ## [2] "Value of x = 10"
  ## [3] ""
  ## [4] "The address of array = 0060FE9E"
  ## [5] "The address of array[0] = 0060FE9E"
  ## [6] "The address of array[1] 0060FE9F"
  ## [7] ""
  ## [8] "The value of array[1] = 20"
  ## [9] "The value of array[1] = 20"
  

A pointer to a pointer is just a pointer that contains the address of another pointer. As one might expect, the notation is '**' instead of just '*' and this can be extended indefinitely. How is this useful?

If you want to have a list of characters (a word), you can use char *word

If you want a list of words (a sentence), you can use char **sentence

If you want a list of sentences (a monologue), you can use char ***monologue

If you want a list of monologues (a biography), you can use char ****biography

If you want a list of biographies (a bio-library), you can use char *****biolibrary

If you want a list of bio-libraries (a ??lol), you can use char ******lol

Dynamic Memory Allocation

When one declares a variable, or an array, in C, the compiler explicitly accounts for the necessary memory. Many times, however, a user may not know exactly how much memory his or her program will require which is why the C standard library (e.g., '<stdlib.h>') provides the malloc/calloc and realloc functions. The former two will allocate memory depending on the size of the variable (calloc initalizes the memory to zero) whereas the latter, realloc, changes the allocation after the fact. Finally, free releases the memory once it is no longer needed so the program doesn't leak.

The following code for multiplying a pair of dynamically allocated matrices attempts to incorpoate a number of topics discussed above Note that memset() may be faster as well as the fact it is possible to avoid the calloc() calls by using pointers... ...

// matrix_mult.c

  #include<stdio.h>
  #include<stdlib.h>

  // xr = x-rows, xc = x-cols, etc.

  void mult(int xr, int xc,int x[xr][xc],int yr,int yc,int y[yr][yc]) {

        int i, j, k, m, f;

        int **res; // res declared as a pointer to a pointer

        // Allocate memory for an array of int pointers (e.g., columns)

        res = (int** )calloc(xr, sizeof(int* ));

        // Allocate memory for each int array (e.g., rows)

          for( i = 0; i < xr; i++) {
              *(res + i) = (int* )calloc(yc, sizeof(int)); }
                for (i = 0; i < xr; i++) {  // for rows in x...
                     for(j = 0; j < yc; j++) { //       for cols in y...
                          for(k = 0; k < xc; k++) {  // for cols in x...
                                res[i][j] += x[i][k] * y[k][j]; 
          // Increment and compute dot-product         }         printf("%d\n",res[i][j]);        }     }   for( i = 0; i < xr; i++) {     free(*(res + i));} // Free individual addresses     (res); // Free array of pointers   } int main() {   int x[2][3]={1,2,3,4,5,6};   int y[3][2]={7,8,9,10,11,12};   int xr = 2, xc = 3, yr = 3, yc = 2;   mult(xr,xc,x,yr,yc,y);   return(0); }
matrix_mult <- shell('matrix_mult', intern = TRUE)
  matrix_mult
  
## [1] "58"  "64"  "139" "154"
  

Finally, although these topics are a bit beyond the scope of this overview, it is important to at least briefly mention several slightly more advanced topics as these play a role in the R source code. First, in C, a structure, or struct, is just a way to group multiple related variables together for convenience and organizational purposes. In many ways, structs are an early version of modern object oriented classes in C++, Java etc. or the equivalent of the S4 class in R. As K&R note,

The only legal operations on a structure are copying it or assigning to it as a unit, taking its address with &, and accessing its members. Copy and assignment include passing arguments to functions and returning values from functions as well

The C Programming Language, Kernighan & Ritchie

The following is a simple (read boring) example of some struct operations. If you are interested in a more, real-world example, then by all means go check out the Rinternals.h file, where you can see the R SEXPREC defined along with other relevant types Rinternals.h.

// struct.c

  #include<stdio.h>

  struct foo

      {
            int x;
            int y;
            int z;
      };

  void copy_foo(struct foo foo)

      {
            foo.x = 5;
            foo.y = 10;
            foo.z = 15;
          }

  void alter_foo(struct foo* foo_ptr)

      {
            foo_ptr -> x = 5;
            foo_ptr -> y = 10;
            foo_ptr -> z = 15;   }

  int main()

      {
           struct foo instance1 = {0,1,2};
           copy_foo(instance1);

           printf("The sum of %d + %d is %d\n",instance1.x,instance1.y,instance1.z);

           struct foo instance2 = {0,1,2};
           alter_foo(&instance2);
           printf("The sum of %d + %d is %d\n",instance2.x,instance2.y,instance2.z);

           return(0);
      }
  
struct <- shell('struct', intern = TRUE)
  struct
  
## [1] "The sum of 0 + 1 is 2"   "The sum of 5 + 10 is 15"
  

Note above both the dot '.' notations for accessing members as well as the pointer arrow '->' notation. Both of these are frequently utilized in modern object oriented languages(e.g., Javascript).

A union is basically the equivalent of a variable that can potentially be multiple different types. Decalred much like structures, the compiler will allocate space for one type at a time, whichever is largest. Again, from K&R,

In effect, a union is a structure in which all members have offset zero from the base, the structure is big enough to hold the "widest" member, and the alignment is appropriate for all of the types in the union. The same operations are permitted on unions as on structures: assignment to or copying as a unit, taking the address, and accessing a member.

The C Programming Language, Kernighan & Ritchie

Finally, here are a number of other C-related links:

Comments



Name:


E-mail:




uaraamoray
2019-03-05 11:12:00
[url=http://theprettyguineapig.com/amoxicillin/]Buy Amoxicillin[/url] <a href="http://theprettyguineapig.com/amoxicillin/">Amoxicillin 500mg Capsules</a> http://theprettyguineapig.com/amoxicillin/
udezanuqiyal
2019-03-05 11:26:00
[url=http://theprettyguineapig.com/amoxicillin/]Amoxicillin No Prescription[/url] <a href="http://theprettyguineapig.com/amoxicillin/">Amoxicillin 500 Mg</a> http://theprettyguineapig.com/amoxicillin/
ebecinuvaju
2019-03-05 16:53:00
[url=http://theprettyguineapig.com/amoxicillin/]Amoxicillin[/url] <a href="http://theprettyguineapig.com/amoxicillin/">Buy Amoxil 500mg</a> http://theprettyguineapig.com/amoxicillin/
ujoxudeuxi
2019-03-05 17:03:00
[url=http://theprettyguineapig.com/amoxicillin/]Amoxicillin Online[/url] <a href="http://theprettyguineapig.com/amoxicillin/">Amoxicillin</a> http://theprettyguineapig.com/amoxicillin/