Categories
status

with within and transform in R I found…

with, within, and transform in R.
I found that they are useful to deal with data set.
I don’t quite understand the explanation in the help page and I can’t not guarantee the correctness of my explanation so I provide some examples to show their behavior.
Fist of all, they need data frame or list but not matrix. These are three examples to add a column which is the sum of the first two columns. See the differences among the functions.

  • with: returns one column
  • within: returns the whole data
  • transform: returns the whole data but the function argument is slightly different.

with

testwith <- data.frame(x1 = 1:10, x2 = 11:20)
> testwith$y <- with(testwith, {x1 + x2})
> testwith
   x1 x2  y
1   1 11 12
2   2 12 14
3   3 13 16
4   4 14 18
5   5 15 20
6   6 16 22
7   7 17 24
8   8 18 26
9   9 19 28
10 10 20 30

within

> testwith <- data.frame(x1 = 1:10, x2 = 11:20)
> testwith <- within(testwith, {y <- x1 + x2})
> testwith
   x1 x2  y
1   1 11 12
2   2 12 14
3   3 13 16
4   4 14 18
5   5 15 20
6   6 16 22
7   7 17 24
8   8 18 26
9   9 19 28
10 10 20 30

transform

> testwith <- data.frame(x1 = 1:10, x2 = 11:20)
> testwith <- transform(testwith, y = x1 + x2)
> testwith
   x1 x2  y
1   1 11 12
2   2 12 14
3   3 13 16
4   4 14 18
5   5 15 20
6   6 16 22
7   7 17 24
8   8 18 26
9   9 19 28
10 10 20 30

Some more examples at
http://stackoverflow.com/questions/1310247/in-r-do-you-use-attach-or-call-variables-by-name-or-slicing

Categories
status

Difference using names or entries in a column…

Difference using names or entries in a column selecting elements from data frame.
Use names to select elements from data frame, if possible.
When using the names, the slicing index does not need to the the same length as the data,
but if one of the column is used, the index and the data should be the same length.

Example

test.df <- data.frame(cbind(letters[1:3], 1:3, 4:6)
row.names(test.df) <- letters[1:3]
# When the length of the index is different from the number of row of the data frame, it does not work.
test.idx <- c('a', 'c')
test.df[test.idx, ]
test.df[test.df[[[1]], ]
# When the length of the index is the same as the row of the data frame, it works
test.idx <- c('a', 'b', 'c')
test.df[test.idx, ]
test.df[test.df[[[1]], ]
Categories
status

Different behaviors between data frame and matrix in…

Different behaviors between data frame and matrix in R

> # Generate an artificial matrix
> test.m <- matrix(1:6, nrow = 3)
> row.names(test.m) <- c('x1', 'x2', 'x3')
> col.names(test.m) <- c('a', 'b')
Error in col.names(test.m) <- c("a", "b") :
  could not find function "col.names<-"
  >
  > # Generate a data frame from the matrix
  > test.df <- as.data.frame(test.m)
  >
  >
  > # Selecting elements
  > ## the row names can be used to select elemnts from a data frame or a matrix
  > test.idx <- c('x3', 'x1')
  > test.df[test.idx, ]
     V1 V2
     x3  3  6
     x1  1  4
     > test.m[test.idx, ]
        [,1] [,2]
        x3    3    6
        x1    1    4
        >
        > # Selecting elements with index having a name which is not in the data
        > ## data frame returns NA rows
        > ## matrix returns an error
        > test.idx <- c('x4', 'x1')
        > test.df[test.idx, ]
           V1 V2
           NA NA NA
           x1  1  4
           > test.m[test.idx, ]
           Error: subscript out of bounds
           >
           >
           > # Duplicate row names
           > ## duplicate row names are not allowed in data frame.
           > ## duplicate row names are allowed in matrix.
           > test.row.names <- c('x1', 'x2', 'x1')
           > row.names(test.df) <- test.row.names
           Error in `row.names<-.data.frame`(`*tmp*`, value = c("x1", "x2", "x1")) :
             duplicate 'row.names' are not allowed
             In addition: Warning message:
             non-unique value when setting 'row.names': ‘x1’
             > rownames(test.m) <- test.row.names
             >
             > # names
             > ## names() returns column name in data frame
             > ## names() returns NULL in matrix
             > names(test.df)
             [1] "V1" "V2"
             > names(test.m)
             NULL