R OBJECT CLASSES

REVISED: Saturday, March 2, 2013

Five Classes of Objects in R.

I. FIVE CLASSES OF OBJECTS IN R

Everything in R is an object. R has five classes of objects and each object can have attributes. The attributes of a function can be determined by calling the attributes(functionname) function.

A. R has the following five classes of objects:

1. character object class

a. NA represents a missing value. NA objects have a class.

i. The function is.na( ) is used to test objects to see if they are NA.

2. numeric (double precision real numbers) object class

a. Inf represents infinity.

b. 1L represents the integer 1.

c. NaN represents an undefined mathematical value operation. A NaN object is also NA; however, the converse is not true.

i. The function is.nan( ) is used to test objects to see if they are NaN.

3. integer object class

An integer can be created by following the integer with the L operator; e.g., 12L.

4. complex (i is used for the imaginary component of a complex number; e.g., 2i) object class.

5. logical (True or 1/False or 0) object class.

NA represents a missing value. NA objects have a class.

Inf represents infinity.

B. R attributes:

R attributes can be determined by using the attributes( ) function.

1. names, dimnames

a. Names are very useful for writing self-describing code and readable objects.

b. The function names( ) can be used both to assign names to elements of an object and to determine the name attributes of an object. For example if you apply names(x) to the object x that has no names, a console print of x will produce NULL. However, if you assign the values x <- 1:2 and then assign names(x) <- c(“Red”, “Green”) the console print of x gives you two column headings Red and Green, and the elements 1 and 2 in row one. A console print of the function names(x) will now produce [1] “Red”, “Green”.

2. dimensions (For example, arrays and matrices.)

3. class

4. length

5. other user defined attributes/metadata

R is a vector language. Vectors can be thought of as contiguous cells containing data. The most basic object in R is a vector which cannot contain elements of more than one of the five classes of R objects. An empty vector can be created with the vector( ) function. There are two arguments in a vector, the first is the class of the elements in the vector and the second argument in a vector is the length of the vector itself. If you assign a value at the R prompt to a variable; e.g., x <- 10 and then just enter x at the R prompt, R will console print [1] 10 This means x is a vector of one element [1] and the value in that one element is 10. The default value of a numeric vector is zero. R does not create scalars, R stores numbers into vectors by default.

The : colon operator is used to create integer sequences. For example, x <- 1:5 assigns the integers 1, 2, 3, 4, 5 as elements to the vector x.

The c( ) "combine operator" function can be used to create vectors of objects. Think of c( ) as concatenate, it will use the least common denominator for the class of mixed class objects and coercion occurs so that each element of the vector is of the same class.

The seq( ) function lets you define the intervals of a sequence as well as starting and ending values. For example, to create a sequence from 0 to 100 in increments of 20 you could use the following:

> x <- seq(0, 100, by=20)
> x
[1] 0 20 40 60 80 100
>

Objects can be explicitly coerced from one class to another by using the as.* function; e.g., y <- 0:2 is the sequence of zero to two and as.integer(y) will console print 0, 1, 2. as.logical(y) will now console print as FALSE, TRUE, TRUE.

as.character(y) will now console print as “0”, “1”, “2”.

as.complex(y) will now console print as 0+0i, 1+0i, 2+0i.

A matrix is a type of vector with a dimension attribute. The dimension attribute is a vector of length two (nrow, ncol). A matrix is indexed with double subscripts. All elements of a matrix must be of the same class (type).

Matrices are filled column-wise. Entries begin in the top left column and run down the rows.

Matrices can be created by column-binding using the cbind( ) function or row-binding using the rbind( ) function.

> x <- c(1,2,3)
> y <- c(4,5,6)
> x

[1] 1 2 3

> y

[1] 4 5 6

> cbind(x,y)->z
> class(z)
[1] "matrix"
> z
x y
[1,] 1 4 #[1,] is read, "row one, all columns".
[2,] 2 5
[3,] 3 6
> rbind(x,y)->z.r
> z.r

[,1] [,2] [,3] # [,1] is read, "all rows, column 1".
x 1 2 3
y 4 5 6
>

A list is heterogeneous; and analogous to a C language strut. A list is a type of vector which can contain elements from all five classes of objects. Elements of a list will have double brackets around them; e.g., [[x]]. When different objects are mixed in a vector, coercion occurs so every element in the vector is of the same class. In the following example, coercion occurs, and the y vector is created using the least common denominator of character. The numeric integer 7 becomes a character string "7" and the logical FALSE becomes a character string "FALSE". When the least common denominator is numeric, a logical TRUE becomes a numeric 1, and a logical FALSE becomes a numeric 0 (zero).

> y<-list(7,"u","p",FALSE) # Class coercion occurs.
> y
[[1]]
[1] 7

[[2]]
[1] "u"

[[3]]
[1] "p"

[[4]]
[1] FALSE

A data frame is a special type of list created by using the data.frame( ) function. Data frames are used to store tabular data. Each element of a data frame has to have the same length. Each element in a data frame can be thought of as a column. The length of the data frame is the number of rows. Unlike matrices, data frames can store different types of classes in each column, just like lists. Matrices must have every element be the same class. Data frames have a special attribute called row.names. Data frames are created by calling read.table( ) or read.cvs( ). A data frame can be converted to a matrix by invoking data.matrix( ).

Factors are special types of vectors used to represent categorical data. A factor is an integer vector where each integer has a label; therefore, each factor is self describing. Factors have a separate attribute called levels. The table( ) can be called on a factor to determine how many of each level there are. The unclass( ) function can be called on a factor to strip out the levels. The levels can be set using the levels argument to factor( ); e.g., y<- factor(c(“yes”, “yes”, “no”, “yes”, “no”), levels = c(“yes”, “no”)). This can be important in linear modeling because the first level is the baseline level.

R is case sensitive; e.g., X and x are two different objects.

A “comma separated value” file is commonly referred to as a CSV file.

csd = read.csv(file=”csd.csv”, header=TRUE, sep=”,”) for comma separated data.

ssd = read.table(file=”ssd.csv”, header=TRUE, sep=””) for space separated data.

tsd = read.table(file=”tsd.csv”, header=TRUE, sep=”\t”) for tab separated data.

Reading data into a statistical system for analysis and exporting the results to some other system for report writing can be a challenge. When a user reads data from a text file, it is the responsibility of that user to know and to specify the conventions used to create that file. For example, the representation for missing values, the comment character, the value separator, whether a header line is present; and so forth.

The eXtensible Markup Language (XML) package provides general facilities for reading and writing XML documents within R.

When writing R script add the dot R suffix for the code file type exampleScript.R type the R code source(“exampleScript.R”). Ctrl+c to copy and Ctrl+v to paste the script from the editor to R; or "File Save As" and save the script as exampleScript.R in your workspace. You can load the script into R using the R source(“exampleScript.R”) function. Every time you edit the script source file you have to save it and then use the source( ) function to source it back into R. When you print the data is printed to your R console.

Type the R function ls( ) or objects( ) to see the objects you are currently using.

When using R, the "up-arrow" will bring back all previously typed commands, last in first out.

Use the following functions to double check your data is loaded correctly:

str( )

summary( )

fix( )

Open the data with Notepad++ to see how the data is separated; i.e., by comma, by space, or by tab.

II. REFERENCES

The New S Language by Richard A. Becker, John M. Chambers, and Allan R. Wilks (New York: Chapman & Hall, 1988).

Enjoy the Five Classes of Objects in R!

Elcric Otto Circle

-->

-->

-->

How to Link to My Home Page

It will appear on your website as:

"Link to: ELCRIC OTTO CIRCLE's Home Page"

R DATA ANALYSIS AND GRAPHICS COMPUTER PROGRAMMING LANGUAGE

Saturday, December 22, 2012

R OBJECT CLASSES

How to Link to My Home Page

No comments:

Post a Comment