Basic Connections & RSQLite

A Quick Run-Down

BB

Although there is fairly extensive documentation provided by R Core itself R Connections, I can find it a bit much to process at one time. Accordingly, I wanted to go over (some) of options for basic connections with base-R so I'd better understand what is available. This isn't all of them by any means both due to platform specific issues (I'm on Windows) and because also covering all the options for binary connections is somewhat redundant. I also mostly stay with examples provided by the R documentation.

Standard File Connections

There are a number of different modes for connections. Write, read, (some combination of both), or append are what you will mostly use. Again, note the separate options for binary.

  • "r" or "rt" - Open for reading in text mode.
  • "w" or "wt" - Open for writing in text mode.
  • "a" or "at" -Open for appending in text mode.
  • "rb" - Open for reading in binary mode.
  • "wb"- Open for writing in binary mode.
  • "ab" - Open for appending in binary mode.
  • "r+", "r+b" - Open for reading and writing.
  • "w+", "w+b" - Open for reading and writing, truncating file initially.
  • "a+", "a+b" - Open for reading and appending.

Writing and reading text examples from R documentation with a few additional notes sprinkled in.

# Writing Text
  zz <- file("ex.data", "w")  # open an output file connection
  cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n")
  close(zz)

  zz <- file("ex.data", "r") # read-only mode
  cat("One more line\n", file = zz) # can't append
  
## Error in cat("One more line\n", file = zz): cannot write to this connection
  
zz <- file("ex.data", "a")
  cat("One more line\n", file = zz)
  close(zz)
  rm(zz)
  readLines("ex.data")
  
## [1] "TITLE extra line" "2 3 5 7"          ""
  ## [4] "11 13 17"         "One more line"
  
unlink("ex.data") # deletes file, nothing left
  

It's also possible to open a file for multiple options, as noted above...

## An example of a file open for reading and writing
  Tfile <- file("test1", "w+")
  c(isOpen(Tfile, "r"), isOpen(Tfile, "w")) # both TRUE
  
## [1] TRUE TRUE
  
cat("abc\ndef\n", file = Tfile)
  readLines(Tfile)
  
## [1] "abc" "def"
  
seek(Tfile, 0, rw = "r") # reset to beginning
  
## [1] 10
  
readLines(Tfile)
  
## [1] "abc" "def"
  
cat("ghi\n", file = Tfile)
  readLines(Tfile)
  
## [1] "ghi"
  
Tfile
  
## A connection with
  ## description "test1"
  ## class       "file"
  ## mode        "w+"
  ## text        "text"
  ## opened      "opened"
  ## can read    "yes"
  ## can write   "yes"
  
close(Tfile)
  Tfile
  
## A connection, specifically, 'file', but invalid.
  
unlink("test1")
  rm(Tfile)

  ## We can do the same thing with an anonymous file.
  Tfile <- file()
  cat("abc\ndef\n", file = Tfile)
  readLines(Tfile)
  
## [1] "abc" "def"
  
close(Tfile)
  

Text Connections

Another option is text connections... "Text connections are another source of input. They allow R character vectors to be read as if the lines were being read from a text file. A text connection is created and opened by a call to textConnection, which copies the current contents of the character vector to an internal buffer at the time of creation. Text connections can also be used to capture R output to a character vector...

zz <- textConnection(LETTERS)
  readLines(zz, 2)
  
## [1] "A" "B"
  
scan(zz, "", 4) # keeps the place...
  
## [1] "C" "D" "E" "F"
  
pushBack(c("aa", "bb"), zz) # literally throws it back..
  scan(zz, "", 4)
  
## [1] "aa" "bb" "G"  "H"
  
close(zz)
  rm(zz)

  zz <- textConnection("foo", "w")
  writeLines(c("testit1", "testit2"), zz)
  cat("testit3 ", file = zz)
  isIncomplete(zz)
  
## [1] TRUE
  
cat("testit4\n", file = zz)
  isIncomplete(zz)
  
## [1] FALSE
  
close(zz)
  foo
  
## [1] "testit1"         "testit2"         "testit3 testit4"
  
rm(foo,zz)


  ## test fixed-length strings
  zz <- file("testchar", "wb")
  x <- c("a", "this will be truncated", "abc")
  nc <- c(3, 10, 3)
  writeChar(x, zz, nc, eos = NULL)
  
## Warning in writeChar(x, zz, nc, eos = NULL): writeChar: more characters
  ## requested than are in the string - will zero-pad
  
writeChar(x, zz, eos = "\r\n")
  close(zz)

  zz <- file("testchar", "rb")
  readChar(zz, nc)
  
## [1] "a"          "this will " "abc"
  
readChar(zz, nchar(x)+3) # need to read the terminator explicitly
  
## [1] "a\r\n"                      "this will be truncated\r\n"
  ## [3] "abc\r\n"
  
close(zz)
  unlink("testchar")
  

Compression

As the documentation "Choosing the type of compression involves tradeoffs: gzip, bzip2 and xz are successively less widely supported, need more resources for both compression and decompression, and achieve more compression (although individual files may buck the general trend). Typical experience is that bzip2 compression is 15% better on text files than gzip compression, and xz with maximal compression 30% better." notes, the .gz and .bz2 extensions are for compressed files. The space savings are trivial with small files, but obviously increase for larger files.

zz <- gzfile("ex.gz", "w")  # compressed file
  cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n")
  close(zz)
  file.size("ex.gz")
  
## [1] 55
  
readLines(zz <- gzfile("ex.gz"))
  
## [1] "TITLE extra line" "2 3 5 7"          ""
  ## [4] "11 13 17"
  
close(zz)
  unlink("ex.gz")
  zz # an invalid connection
  
## A connection, specifically, 'gzfile', but invalid.
  
rm(zz)


  zz <- bzfile("ex.bz2", "w")  # bzip2-ed file
  cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n")
  close(zz)
  file.size("ex.bz2")
  
## [1] 77
  
zz # print() method: invalid connection
  
## A connection, specifically, 'bzfile', but invalid.
  
print(readLines(zz <- bzfile("ex.bz2")))
  
## [1] "TITLE extra line" "2 3 5 7"          ""
  ## [4] "11 13 17"
  
close(zz)
  unlink("ex.bz2")
  rm(zz)
  

The difference is actually quite sizable with only a 30mb file "For write-mode connections, compress specifies how hard the compressor works to minimize the file size, and higher values need more CPU time and more working memory (up to ca 800Mb for xzfile(compress = 9)). For xzfile negative values of compress correspond to adding the xz argument -e: this takes more time (double?) to compress but may achieve (slightly) better compression. The default (6) has good compression and modest (100Mb memory) usage: but if you are using xz compression you are probably looking for high compression." , but obviously serious compression takes both time and computing horsepower.

x <- outer(1:2000,1:2000)
  object.size(x)
  
## 32000200 bytes
  
zz <- file("bigfile.data", "w")
  cat(x,file=zz)
  close(zz)
  file.size("bigfile.data")
  
## [1] 29072548
  
unlink("bigfile.data")
  rm(zz)

  zz <- gzfile("bigfile.gz", "w")
  cat(x,file=zz)
  close(zz)
  file.size("bigfile.gz")
  
## [1] 13187416
  
unlink("bigfile.gz")
  rm(zz)

  zz <- bzfile("bigfile.bz2", "w")
  cat(x,file=zz)
  close(zz)
  file.size("bigfile.bz2")
  
## [1] 11821782
  
unlink("bigfile.bz2")
  rm(zz)
  

URLs

It is also possible to uncompress data directly from a url, assuming it is in correct format, as the example provided by the R documentation illustrates.

## Uncompress a data file from a URL
  z <- gzcon(url("http://www.stats.ox.ac.uk/pub/datasets/csb/ch12.dat.gz"))
  # read.table can only read from a text-mode connection.
  raw <- textConnection(readLines(z))
  close(z)
  dat <- read.table(raw)
  close(raw)
  dat[1:4, ]
  
##     V1 V2 V3 V4 V5 V6 V7
  ## 1   37 86  1  0  0  2  0
  ## 2   61 77  1  0  0  4  0
  ## 3 1084 75  1  0  0  3  1
  ## 4 1092 77  1  0  1  2  1
  

Sockets

One interesting, perhaps trivial, example is using socket connections to communicate between two R processes (two separate R sessions) on the same computer. Note that this prompt a firewall warning or fail if you have certain ports locked down. Also, you will want to avoid using a port that may interfere with other processes (e.g., 80).

## Two R processes communicating via non-blocking sockets
  # R process 1
  con1 <- socketConnection(port = 6011, server = TRUE)
  writeLines(LETTERS, con1)
  close(con1)

  # R process 2
  con2 <- socketConnection(Sys.info()["nodename"], port = 6011)
  # as non-blocking, may need to loop for input
  readLines(con2)
  while(isIncomplete(con2)) {
    Sys.sleep(1)
    z <- readLines(con2)
    if(length(z)) print(z)
  }
  close(con2)
  

RSQLite

There are also a number of other CRAN packages that facilitate other types of connections as well. Here, I'll cover the RSQLite package (which really is mainly DBI) to connect to a SQLite database. Note that I want to cover web-based data connections separately as there is a fair amount to cover.

RSQLite is a very simple interface with a lot of functionality. As noted above, in order to use this package it is necessary to load the DBI package first as this is what does most of the heavy lifting.

library(DBI)
  library(RSQLite)

  datasetsDb()
  
## <SQLiteConnection>
  ##   Path: C:\Users\Ben\Documents\R\win-library\3.4\RSQLite\db\datasets.sqlite
  ##   Extensions: TRUE
  
db <- datasetsDb()
  class(db)
  
## [1] "SQLiteConnection"
  ## attr(,"package")
  ## [1] "RSQLite"
  
isS4(db)
  
## [1] TRUE
  

Krill Mueller, Hadley, and Co. were very considerate and included a bunch of data sets to play around with.

dbListTables(db)
  
##  [1] "BOD"              "CO2"              "ChickWeight"
  ##  [4] "DNase"            "Formaldehyde"     "Indometh"
  ##  [7] "InsectSprays"     "LifeCycleSavings" "Loblolly"
  ## [10] "Orange"           "OrchardSprays"    "PlantGrowth"
  ## [13] "Puromycin"        "Theoph"           "ToothGrowth"
  ## [16] "USArrests"        "USJudgeRatings"   "airquality"
  ## [19] "anscombe"         "attenu"           "attitude"
  ## [22] "cars"             "chickwts"         "esoph"
  ## [25] "faithful"         "freeny"           "infert"
  ## [28] "iris"             "longley"          "morley"
  ## [31] "mtcars"           "npk"              "pressure"
  ## [34] "quakes"           "randu"            "rock"
  ## [37] "sleep"            "stackloss"        "swiss"
  ## [40] "trees"            "warpbreaks"       "women"
  
head(dbReadTable(db,'mtcars'))
  
##           row_names  mpg cyl disp  hp drat    wt  qsec vs am gear carb
  ## 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
  ## 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
  ## 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
  ## 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
  ## 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
  ## 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
  
con <- dbConnect(SQLite())
  dbWriteTable(con,'mtcars',mtcars)
  head(dbReadTable(con,'mtcars'))
  
##    mpg cyl disp  hp drat    wt  qsec vs am gear carb
  ## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
  ## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
  ## 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
  ## 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
  ## 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
  ## 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
  

In order to keep rownames, it is necessary to specify it as an argument. Also, take notice that either 'append' or 'overwrite' must be changed to 'TRUE'.

dbWriteTable(con,'mtcars',mtcars,row.names = TRUE, overwrite = TRUE)
  head(dbReadTable(con,'mtcars'))
  
##           row_names  mpg cyl disp  hp drat    wt  qsec vs am gear carb
  ## 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
  ## 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
  ## 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
  ## 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
  ## 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
  ## 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
  

You can both print the selected data on your console, or more usefully, save it to an R object. It will come out as a typical R data frame (but with stringsAsFactors = FALSE).

bad_mileage <- dbGetQuery(con,"SELECT * FROM mtcars WHERE mpg < 15;")
  class(bad_mileage)
  
## [1] "data.frame"
  
rm(bad_mileage)
  

If you want to make any changes to the table then use dbExecute. Note that SQLite does not support dropping columns. You will need to create a new table if you want to do that or use another database (e.g., MySQL, POSTGRES, etc.). You, can, however, insert new values into it.

dbExecute(con, "INSERT INTO mtcars (row_names, mpg) VALUES ('BIG ASS TRUCK',5);")
  
## [1] 1
  
dbGetQuery(con,"SELECT * FROM mtcars WHERE mpg = 5;")
  
##       row_names mpg cyl disp hp drat wt qsec vs am gear carb
  ## 1 BIG ASS TRUCK   5  NA   NA NA   NA NA   NA NA NA   NA   NA
  
# Now delete the row.

  dbExecute(con, "DELETE FROM mtcars WHERE mpg = 5;")
  
## [1] 1
  
dbGetQuery(con,"SELECT * FROM mtcars WHERE mpg = 5;")
  
##  [1] row_names mpg       cyl       disp      hp        drat      wt
  ##  [8] qsec      vs        am        gear      carb
  ## <0 rows> (or 0-length row.names)
  
# And finally, to disconnect..
  dbDisconnect(con)

  # You can check the status of the connection with,

  dbisValid(con)
  
## Error in dbisValid(con): could not find function "dbisValid"
  
# Opps, forgot about 'db', better check that..

  dbIsValid(db)
  
## [1] TRUE
  
dbDisconnect(db)
  rm(con,db)
  

Lastly, what if you want to read/write to a file?

# There may be another way to persist the .db file...

  con = dbConnect(SQLite())
  dbWriteTable(con,'mtcars',mtcars,row.names = TRUE, overwrite = TRUE)

  sqliteCopyDatabase(con,'test.db')
  dbDisconnect(con)
  rm(con)

  con = dbConnect(SQLite(), dbname="./test.db")

  dbListTables(con)
  
## [1] "mtcars"
  

And that is really all there is to it. SQLite is a very simple, but powerful database. Vis a vis R, I think it is most useful when dealing with large data sets that won't easily fit in memory where the user can query a subset of the data at a time. Additionally, it could also serve as a backend for a Shiny app Persist Data in Shiny if again, the data set is large. I should note that the DBI package can connect with other DBs like MySQL and that there are even other packages that allow connections to NoSQL databases as well NoSQL for R.

Comments



Name:


E-mail: