Saturday, December 22, 2012

R FILE I/O

R FILE I/O



REVISED: Saturday, March 2, 2013




R I/O

An interactive session with input from the keyboard and output to the screen is begun by default when you start R.

I.  INPUT FUNCTIONS WHICH READ DATA INTO R

A. read.table( )

read.table( ) and read.csv( ), for reading tabular data.

B. readLines( )

Reading lines of a text file is performed by invoking the function readLines( ).

C. source( )

source( ) for reading in R code files (inverse of dump( )). The file is taken from the "current working directory" (CWD) if the file name does not include a path.

D. dget( )

dget( ) for reading in R code files (inverse of dput( )).

E. load( )

load( ) is for reading in saved workspaces.

F. unserialize( )

unserialize( ) is for reading in single R objects in binary form.

II. OUTPUT FUNCTIONS WHICH WRITE DATA FROM R TO FILES


A. write.table( )

B. writeLines( )

C. dump( )

D. dput( )

E. save( )

F. serialize( )

G. sink( )


sink( ) function defines the direction of the output. sink( ) will not redirect graphic output.

III. read.table( ) ARGUMENTS


> str(read.table)
function (file, header = FALSE, sep = "", quote = "\"'", dec = ".", row.names, 
    col.names, as.is = !stringsAsFactors, na.strings = "NA", colClasses = NA, 
    nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, 
    strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, 
    flush = FALSE, stringsAsFactors = default.stringsAsFactors(), fileEncoding = "", 
    encoding = "unknown", text)
>

A. file

file is the name of a file or connection.

B. header

header is a logical indicating if the file has a header.

C. sep

sep is a string indicating how the columns are separated. The default sep is a space.

D. colClasses

coClasses is a character vector indicating the class of each column in the dataset.

E. nrows

nrows is a number indicating the number of rows in the dataset.

F. skip

skip is a number indicating the number of lines to skip from the beginning.

G. comment.char

comment.char is a character string indicating the comment character.

H. stringsAsFactors

stringsAsFactors is a logical indicating should character variables be coded as factors.

IV. read.table( )


For small to moderately sized datasets, you can usually call read.table( ) without specifying any other arguments.


data <- read.table("foo.txt")


R will automatically:
Skip comment lines that begin with a # sign.
Determine the number of rows and required memory allocation.
Determine the type of variable in each column of the table.


Telling R these things directly makes R run faster and more efficiently.


read.csv( ) is the same as read.table( ) except a comma is the default separator.
Read the help page for read.table( ) it provides many helpful hints.

Make a rough calculation of the memory required to store your dataset. If the dataset memory requirement is larger than the RAM on your computer you should not use R until you acquire adequate RAM.

Set comment.char = " " if there are no commented lines in your file.

Use the colClasses argument.

Instead of using the default, specifying this option can often make read.table( ) run twice as fast. You have to know the class of each column in your data frame in order to use this option. If all of the columns are "numeric", for example, you can just set colClass = "numeric".

In order to determine each column's classes:
initial = read.table("dataTable.txt", nrows = 100)
classes <- sapply(initial, class)

tabAll <- read.table("dataTable.txt", colClasses - classes) 
Set nrows. 
This does not make R run faster but it helps with memory usage. 

When you are trying to describe the object you can use either the str( ) or summary( ) 
diagnostic function.

file.exists( "filename.R" ) will return a TRUE or FALSE based on whether or not the file is in the current working directory.

Enjoy R I/O!

Elcric Otto Circle







-->




-->




-->

















How to Link to My Home Page

It will appear on your website as:

"Link to: ELCRIC OTTO CIRCLE's Home Page"




No comments:

Post a Comment