Monday, December 24, 2012

R CLASSES AND METHODS

R CLASSES AND METHODS



REVISED: Saturday, March 2, 2013




R Classes and Methods

R is object oriented. Objects are instances of classes. Vectors, data frames, matrices and lists are the major R structures; structures are different from classes.

I.  INTRODUCTION TO R CLASSES AND METHODS

A. CLASS

The R code for classes is in the methods package, which is normally loaded by default. If it is not loaded, you can load it with the function library(methods).

setClass( ) function creates, defines, a class which is a description or blue print of a thing, a new data type. 


Refer to ?Classes and ?setClass in the help documentation for details.

B. OBJECT

The new( ) function creates an object which is an instance of a class. The class( ) function can be used to determine the class of an object.

C. METHOD

Methods are dispatched by generic R functions. A method is the implementation of a generic function for an object of a particular class. Refer to the ?Methods, ?setMethod, and ?setGeneric in the help documentation for details. The showMethods() function can be used to determine methods for a particular generic function.

II. GENERIC FUNCTIONS


A class object is the first argument of a generic function.
The class of the object is checked by the generic function.

The existence of a method for the class is determined by a search. If it exists the method is called on the object. However, if there is no method for that class, a search is done for the default method for the generic. If a generic exists the generic is called.

If a generic does not exist an error is thrown.

You should never call methods directly. Rather, use the generic function and let the method be dispatched automatically.

If you write new methods for new classes, you should also consider writing methods for the following generics:

print/show

summary

plot

III. CREATING A NEW CLASS

A new class can be defined using the setClass( ) function.

You must specify the name of the class.

Specify data elements, also called slots.

Define methods for the class with the setMethods( ) function.

Use the showClass( ) function to get information about a class.

A. NEW CLASS EXAMPLE

setClass("polygon",
                 representation(x = "numeric",
                                           y = "numeric"))


The slots for the polygon class are x and y and they can be accessed using the @ operator.

1. A plot method can be created using the setMethod( ) function. For setMethod( ) you need to specify a generic function (plot), and a signature. A signature is a character vector indicating the classes of objects accepted by the method.

In this example, the plot method will take one type of object; i.e., a polygon object.

setMethod("plot", "polygon"
                     function(x, y, ...){
                                   plot(x@x, x@y, type = "n", ...)
                                   xp <- c(x@x, x@x[1])

                                   yp <- c(x@y, x@y[1])
                                   lines(xp, yp)
                    })


Notice that the slots of the polygon, the x, y coordinates, are accessed with the @ operator.

After calling setClass( ) and setMethod( ) as shown above; you can call showMethods("plot").

B. Documentation

Refer to new class examples written in packages by other people on CRAN, or in the stats4 package that comes with R, to help you understand the process.

Enjoy R Classes and Methods.


Elcric Otto Circle







-->




-->




-->

















How to Link to My Home Page

It will appear on your website as:

"Link to: ELCRIC OTTO CIRCLE's Home Page"

R RANDOM NUMBER SIMULATION

R RANDOM NUMBER SIMULATION



REVISED: Saturday, March 2, 2013




R "Random Number Simulation".

I.  GENERATING RANDOM NUMBERS

Probability distribution functions have four functions associated with them. The functions are prefixed with a:

A. d DENSITY


  1. dbeta( )

  2. dbinom( )

  3. dcauchy( )

  4. dchisq( )

  5. dexp( )

  6. df( )

  7. dgamma( )

  8. dgeom( )

  9. dhyper( )

10. dlogis( )

11. dlnorm( )

12. dnbinom( )

13. dnorm( )

dnorm(x, mean = 0, sd = 1, log = FALSE)

The dnorm( ) function evaluates the Normal probability density with a given mean or standard deviation at a point or vector of points.

14. dpois( )

15. dt( )

16. dunif( )

17. dweibull( )

B. p for cumulative distribution.


pnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)

The pnorm( ) function evaluates the cumulative distribution function for a Normal distribution.

C. q for quantile function. 

qnorm(n, mean = 0, sd = 1)

A quantile is each of any set of values of a variate which divide a frequency distribution into equal groups, each containing the same fraction of the total population.

D. r for random number generation and distributions.

  1. rbeta( )

  2. rbinon( )

  3. rcauchy( )

  4. rchisq( )

  5. rexp( )

  6. rf( )

  7. rgamma( )

  8. rgeom( )

  9. rhyper( )

10. rlogis( )

11. rlnorm( )

12. rnbinom( )

13. rnorm( )

rnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)

The rnorm( ) function generates random Normal variants with a given mean and standard deviation (The standard deviation is the square root of the variance.) rnorm( ) generates numbers that mimic a random normal distribution. rnorm( ) has three arguments.  The first argument is how many numbers you want generated.  The second argument is the mean of the generated numbers. And, the third argument is the standard deviation of the generated numbers.

14. rpois( )

The rpois( ) function generates random Poisson variants with a given rate.

15. rt( )

16. runif( )

The runif( ) uniform function.

17. rweibull( )

II. User Defined Random Numbers

myFunction <- function(n) {              # Input will be n.
  for(i in 1:n) {
    randomNumber <- runif(1, 0, 1)    # For 1 number between 0 and 1.
    print(randomNumber)
  }
}

> myFunction(10)                              # 10 is input value for n.
[1] 0.7833938
[1] 0.1486565
[1] 0.6070959
[1] 0.3871606
[1] 0.8954091
[1] 0.1125646
[1] 0.5442749
[1] 0.7796356
[1] 0.8579459
[1] 0.9691205
>

III. RANDOM SAMPLING

The sample( ) function selects randomly from a set of scalar objects allowing you to sample from arbitrary distributions.

The set.seed( ) function sets the seed when conducting a simulation. This allows you to reproduce a random number selection.


The following are examples of R random sampling:

>set.seed(1)
>sample(1:10, 4) # Sampling without replacement.
[1] 3 4 5 7

>sample(1:10, 4) # Sampling without replacement.
[1] 3 9 8 5


>sample(letters, 5) # Sampling without replacement.
[1] "q" "b" "e" "x" "p"
 


>sample(1:10) # Permutation.
[1] 4 7 10 6 9 2 8 3 1 5


>sample(1:10) # Permutation.
[1] 2 3 4 1 9 5 10 8 6 7


>sample(1:10, replace = TRUE) # Sampling with replacement.
[1] 2 9 7 8 2 8 5 9 7 8


Enjoy R "Random Number Simulation".

Elcric Otto Circle







-->




-->




-->

















How to Link to My Home Page

It will appear on your website as:

"Link to: ELCRIC OTTO CIRCLE's Home Page"

Sunday, December 23, 2012

R GRAPHICS

R GRAPHICS



REVISED: Saturday, March 2, 2013




R Graphics.

Before you start plotting you might want to tell R where you want the graph you are creating stored. One way to do this is to use the pdf( "filename.pdf") function. The pdf( ) function sets the graphical output file to pdf. dev.off( ) closes the graphical output file.

I.  PLOTTING PACKAGES

The plotting and graphics engine in R is encapsulated in a few base and recommended packages.

A. graphics 


graphics package contains plotting functions for the base graphing systems, including plot( ), hist( ), boxplot( ), and many others.

B. lattice 


lattice package contains code for producing Trellis graphics, which are independent of the base graphics system, including functions like xyplot( ), bwplot( ), and levelplot( ).

C. grid 


grid package implements a different graphing system independent of the base system. The lattice system builds on top of grid. We seldom call functions from the grid package directly.

D. grDevices


grDevices package, which means graphics devices, contains all the code implementing all the various graphics devices, including X11, PDF, PostScript, PNG; etc.

II. PLOTTING PROCESS

When plotting you must make a few choices.

A. DEVICE


To what device will the plot be sent? Do you want it sent to the screen? Do you want it sent to a file?

B. VIEWING


Is the plot for temporary viewing on a console or will it become part of a presentation or a paper? Plots included in a paper or presentation will need to use a file device rather than a screen device. You should also consider who the audience will be and where it will be presented if you are preparing a presentation.

C. DATA


Is there a lot of data going into the plot; or, is it just a few points?

D. RESIZE


Do you need to be able to resize the graphics?

E. GRAPHICS SYSTEM


What graphics system will you use: base grid, or lattice? These generally can not be mixed.

F. BASE GRAPHICS


Base graphics are generally constructed piecemeal, with a series of function calls handling each aspect of the plot separately. This allows plotting to mirror the thought process.

G. LATTICE AND GRID GRAPHICS


A single function call is used to create lattice or grid graphics; therefore, all of the graphics parameters have to be specified at one time. Specifying everything all at once allows R to automatically calculate the spacing and font sizes.

III. BASE GRAPHICS SYSTEM

The most commonly used base graphics are a very powerful system for creating 2-D graphics.

A. FUNCTION CALL


Calling plot(x,y), or hist(x) for a histogram of a single variable, will launch a graphics device, if one is not already open, and draw the plot on the device. The def.off( ) function turns off the graphics device.

For example, the following code makes a histogram:

> str(rnorm) # str( ) shows argument list
function (n, mean = 0, sd = 1)  
> str(hist)
function (x, ...)  

> z <- rnorm(500) # Example using default arguments
> hist(z)

>

You can control which device gets the plot with the dev.set( ) function. You can close all the graphics devices with the graphics.off( ) function.

B. DEFAULT PLOT METHOD


If the arguments to plot are not of a special class, then the default method for plot is called. The default method for plot( ) has many arguments which will let you set the title, the x axis label, the y axis label; etc.

C. BASE GRAPHICS SYSTEM


The base graphics system has many parameters that can be set and tweaked. These are documented in the ?par, for the par( ) function. You should consider memorizing this help page.


You can also use the example(points) function to see demonstration examples that come prepackaged with R.

IV. BASE GRAPHICS PARAMETERS

The par( ) function is used to specify global graphics parameters that affect all parameters in an R session. These parameters can often be overridden as arguments to specific plotting functions.

The following are commonly used par( ) function parameters.

A. pch


pch is the plotting symbol, the default is 1 which is an open circle. 

B. lty


lty is the line type, the default is a solid line, and can be dashed, dotted; etc.

C. lwd


lwd is the line width specified as an integer multiple.

D. col


col is the plotting color, specified as a number, string, or hex code; the colors given give you a vector of colors by name. The default color is black.

E. las


las is the orientation of the axis labels on the plot.

The following are commonly used base graphics parameters.

A. bg


bg is the background color.

B. mar


mar is the margin size. 

C. oma


oma is the outer margin size, the default is 0 for all sides.

D. mfrow


mfrow plots are filled row-wise.

E. mfcol


mfcol plots are filled column-wise.

V. BASE PLOTTING FUNCTIONS

A. plot( )


plot( ) function makes a scatter plot or other type of plot depending on the class of the object being plotted.

B. lines( )


lines( ) function adds lines to a plot given a vector of x values and a corresponding vector of y values. A two column matrix. The function connects the dots.

C. points( )


points( ) adds points to a plot.

D. text( )


text( ) function adds text labels to a plot using specified x, y coordinates.

E. title( )


title( ) function adds annotation to x, y axis labels, title, subtitle, and outer margin.

F. mtext( )


mtext( ) function adds arbitrary text to the margins, inner or outer of the plot.

G. axis( )


axis( ) function adds axis ticks or labels.

VI. LATTICE FUNCTIONS


A. xyplot( ) 

xyplot( ) is the main function for creating scatter plots.

> library(lattice)
> library(nlme)
> xyplot(distance ~ age | Subject, data = Orthodont )
> xyplot(distance ~ age | Subject, data = Orthodont, type = "b" )
>

B. bwplot( ) 

bwplot( ) box-and-whiskers plots, also called box plots.

C. histogram( )

histogram( ) for histograms.

D. stripplot( )

stripplot( ) is like a box plot; however, it has actual points.

E. dotplot( ) 

dotplot( ) plots dots on "violin strings".

F. splom( )

splom( ) scattered matrix; like pairs in base graphics system.

G. levelplot( )

levelplot( ) is a contour plot for plotting image data.

VII. LATTICE FUNCTIONS

Lattice functions take a formula for their first argument; e.g.:

y ~ x|  f * g

On the left of the ~ is the y variable. On the right of the ~ is the x variable. 

After the | are the conditioning variables. The conditioning variables are optional.

The * indicates an interaction.

The data frame providing the variables in the formula is the second argument.

The parent frame is used if no data frame or list is passed.

There are defaults that can be used if no other arguments are passed.

VIII. IMPORTANT LATTICE HELP PAGES

A. ?par 


B. ?plot

C. ?xyplot

D. ?plotmath

E. ?axis

Enjoy R Graphics.

Elcric Otto Circle







-->




-->




-->

















How to Link to My Home Page

It will appear on your website as:

"Link to: ELCRIC OTTO CIRCLE's Home Page"

R DEBUGGING

R DEBUGGING



REVISED: Saturday, March 2, 2013



Debugging R

"Object not found" errors, syntax errors, and all the rest are the three types of errors.

I.  R ERROR INDICATORS

A. MESSAGE

A generic notification or diagnostic message produced by the message( ) function; execution of the function continues.

B. WARNING

A warning indicates something is wrong but not fatal; execution of the function continues; generated by the warning( ) function.

C. ERROR

An error indicates an execution stopping fatal problem has occurred.  Errors are generated by the stop( ) function.

D. CONDITION
Programmers can create their own conditions, generic concepts indicating that something unexpected can occur.

II. FUNCTION ERRORS

How do you know something is wrong with your function?

A. DEBUGGING QUESTIONS

1. What was your input? How did you call the function?

2. What were you expecting; were you expecting output, message, other results?

3. What did you get?

4. How does what you get differ from what you were expecting?

5. Were you expecting correct in the first place?

6. Can you reproduce the problem exactly?

III. R DEBUGGING TOOLS

Most debugging takes place either through calls to browser( ) or debug( ).

A. traceback( ) 


traceback( ) prints out the function call stack after an error occurs, does nothing if there is no error. traceback( ) is one of your most important debugging tools, take the time to learn how to use it.

> str(traceback)
function (x = NULL, max.lines = getOption("deparse.max.lines"))
>

B. debug( ) 


debug( ) flags a user defined function for "debug mode" allowing you to step through your user defined function one line at a time.

> str(debug)
function (fun, text = "", condition = NULL)
>
 


You can single step through your R code line by line using debug( ) or browser( ).

C. browser( ) 


browser( ) suspends the operation of a user defined function wherever it is called and puts the user defined function into debug mode.

> str(browser)
function (text = "", condition = NULL, expr = TRUE, skipCalls = 0L)
>

Use the command q to exit the browser( ) mode.

D. trace( ) 

trace( ) allows you to insert debugging code into a function at specific places.

> str(trace)
function (what, tracer, exit, at, print, signature, where = topenv(parent.frame( )), edit = FALSE)
>

trace( ) could be used with the lm( ) linear modeling function.

> str(lm)
function (formula, data, subset, weights, na.action, method = "qr", 
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 

contrasts = NULL, offset, ...)  
>

E. recover( ) 

recover( ) allows you to modify the error behavior so that you can browse the error call stack.

> str(recover)
function ( )  
>

These are interactive tools designed to help you pick through a function. You can also insert print( ) or cat( ) statements in the function.

IV. COMMON ERRORS

A. SYNTAX ERRORS

Wrong type of closing brace, missing commas, and unmatched parentheses are common syntax mistakes .

B. OBJECT NOT FOUND

Errors of the "object not found" variety can have one of several causes. For example, the name is not spelled correctly; the capitalization is wrong; the package or file containing the object is not on the search list.

C. MISMATCHED OPERATORS

Trying to use operators intended for numeric values, on character values.

V. REFERENCES

The Art of R Programming, A Tour of Statistical Design by Norman Matloff (2009). 

Enjoy debugging R!


Elcric Otto Circle







-->




-->




-->

















How to Link to My Home Page

It will appear on your website as:

"Link to: ELCRIC OTTO CIRCLE's Home Page"

R SCOPING RULES

R SCOPING RULES



REVISED: Saturday, March 2, 2013




R binding.

I. R BINDING

A. search( ) FUNCTION

The search list can be found by using the search( ) function. R first searches the "global environment" which is your workspace. Next R searches the namespaces of each of the packages in the search list. R has separate namespaces for functions and non-functions so it is possible to have an object named z and a function named z.  library( ) function is used to load a package.

B. SCOPING RULES

Scope means where do your variables have value.

Scoping rules determine how a value is associated with a free variable (global variable) in a function. R uses lexical or static scoping. R uses the search list to bind a value to a symbol.

A function + an environment = a closure or function closure.

An environment is a collection of symbol value pairs. The value of free variables is searched for in the environment in which the function was defined. Typically, a function is defined in the global environment, and the value of the free variable is found in the user's workspace.

In R you can have functions defined in the body of other functions. In R it is possible for a function to return another function.

You can determine a function's environment by calling the ls( ) function which gives you the object name and the function name. You can then call the get( ) function using the object name.

In lexical scoping the value of a free variable in a function is looked up in the environment in which the function was defined.

In dynamic scoping the value of a free variable in a function is looked up in the environment in which the function was called.

The parent frame is the calling environment in R.

When a function is defined in the global environment and subsequently called from the global environment, the defining environment and the calling environment are the same. This can sometimes give the appearance of dynamic scoping.

Other languages that support lexical scoping include:
Scheme
Pearl
Python
Common Lisp


In R all objects must be stored in memory. All functions must carry a pointer to their respective defining environments.

Enjoy R binding!

Elcric Otto Circle







-->




-->




-->

















How to Link to My Home Page

It will appear on your website as:

"Link to: ELCRIC OTTO CIRCLE's Home Page"

Saturday, December 22, 2012

R FILE I/O

R FILE I/O



REVISED: Saturday, March 2, 2013




R I/O

An interactive session with input from the keyboard and output to the screen is begun by default when you start R.

I.  INPUT FUNCTIONS WHICH READ DATA INTO R

A. read.table( )

read.table( ) and read.csv( ), for reading tabular data.

B. readLines( )

Reading lines of a text file is performed by invoking the function readLines( ).

C. source( )

source( ) for reading in R code files (inverse of dump( )). The file is taken from the "current working directory" (CWD) if the file name does not include a path.

D. dget( )

dget( ) for reading in R code files (inverse of dput( )).

E. load( )

load( ) is for reading in saved workspaces.

F. unserialize( )

unserialize( ) is for reading in single R objects in binary form.

II. OUTPUT FUNCTIONS WHICH WRITE DATA FROM R TO FILES


A. write.table( )

B. writeLines( )

C. dump( )

D. dput( )

E. save( )

F. serialize( )

G. sink( )


sink( ) function defines the direction of the output. sink( ) will not redirect graphic output.

III. read.table( ) ARGUMENTS


> str(read.table)
function (file, header = FALSE, sep = "", quote = "\"'", dec = ".", row.names, 
    col.names, as.is = !stringsAsFactors, na.strings = "NA", colClasses = NA, 
    nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, 
    strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, 
    flush = FALSE, stringsAsFactors = default.stringsAsFactors(), fileEncoding = "", 
    encoding = "unknown", text)
>

A. file

file is the name of a file or connection.

B. header

header is a logical indicating if the file has a header.

C. sep

sep is a string indicating how the columns are separated. The default sep is a space.

D. colClasses

coClasses is a character vector indicating the class of each column in the dataset.

E. nrows

nrows is a number indicating the number of rows in the dataset.

F. skip

skip is a number indicating the number of lines to skip from the beginning.

G. comment.char

comment.char is a character string indicating the comment character.

H. stringsAsFactors

stringsAsFactors is a logical indicating should character variables be coded as factors.

IV. read.table( )


For small to moderately sized datasets, you can usually call read.table( ) without specifying any other arguments.


data <- read.table("foo.txt")


R will automatically:
Skip comment lines that begin with a # sign.
Determine the number of rows and required memory allocation.
Determine the type of variable in each column of the table.


Telling R these things directly makes R run faster and more efficiently.


read.csv( ) is the same as read.table( ) except a comma is the default separator.
Read the help page for read.table( ) it provides many helpful hints.

Make a rough calculation of the memory required to store your dataset. If the dataset memory requirement is larger than the RAM on your computer you should not use R until you acquire adequate RAM.

Set comment.char = " " if there are no commented lines in your file.

Use the colClasses argument.

Instead of using the default, specifying this option can often make read.table( ) run twice as fast. You have to know the class of each column in your data frame in order to use this option. If all of the columns are "numeric", for example, you can just set colClass = "numeric".

In order to determine each column's classes:
initial = read.table("dataTable.txt", nrows = 100)
classes <- sapply(initial, class)

tabAll <- read.table("dataTable.txt", colClasses - classes) 
Set nrows. 
This does not make R run faster but it helps with memory usage. 

When you are trying to describe the object you can use either the str( ) or summary( ) 
diagnostic function.

file.exists( "filename.R" ) will return a TRUE or FALSE based on whether or not the file is in the current working directory.

Enjoy R I/O!

Elcric Otto Circle







-->




-->




-->

















How to Link to My Home Page

It will appear on your website as:

"Link to: ELCRIC OTTO CIRCLE's Home Page"