Tag: R

Insert the current time into R history timestamp…

Visitez ce lien Insert the current time into R history

timestamp()

March 20, 2012

matchPWM is now part of Biostrings 2.7.47 in…

matchPWM() is now part of Biostrings 2.7.47 (in BioC devel, so you need R-2.7).

Load Biostrings:

   library(Biostrings)

Suppose 'pwm' contains a Position Weight Matrix, let's say:

   pwm <- rbind(A=c( 1,  0, 19, 20, 18,  1, 20,  7),
                C=c( 1,  0,  1,  0,  1, 18,  0,  2),
                G=c(17,  0,  0,  0,  1,  0,  0,  3),
                T=c( 1, 20,  0,  0,  0,  1,  0,  8))

Note that this is just a standard integer matrix with the 4 DNA base letters
as row names (having these row names is mandatory).
Some low-level utility functions are available for manipulating this kind of
matrix:

   > maxWeights(pwm)  # the max weight in each column
   [1] 17 20 19 20 18 18 20  8

   > maxScore(pwm)  # the max possible score
   [1] 140

Let's match 'pwm' against Human chr1:

   library(BSgenome.Hsapiens.UCSC.hg18)
   chr1 <- Hsapiens$chr1

Number of "best" matches:

   > countPWM(pwm, chr1, min.score="100%")  # takes about 5 seconds on my system
   [1] 5152

With a lower cut-off value:

   m <- matchPWM(pwm, chr1, min.score="90%")

See the 10 first matches ("first" means "smallest chromosome location", NOT "best"
matches):

   > m[1:10]
     Views on a 247249719-letter DNAString subject
   subject: TAACCCTAACCCTAACCCTAACCCTAACCCTAAC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
   views:
        start   end width
    [1] 31931 31938     8 [GTAAACAA]
    [2] 33324 33331     8 [GTAAACAT]
    [3] 38425 38432     8 [GTAAACAG]
    [4] 39177 39184     8 [GTAAACAC]
    [5] 46971 46978     8 [GTAAACAT]
    [6] 49952 49959     8 [GTAAACAT]
    [7] 70381 70388     8 [GTAAACAG]
    [8] 74359 74366     8 [GTAAACAC]
    [9] 90714 90721     8 [GTAAACAT]
   [10] 96544 96551     8 [GTAAACAC]

The speed could be improved, maybe by a factor 2 (or more, for longest PWMs).

Also maybe an additional argument could be added to let the user control how
the returned matches should be sorted ("left-to-right" or "best-first")?

Unlike MatInspector or the transfac-tool, there is no facility yet to suggest
individual cut-off values depending on the length of the PWMs.

See '?matchPWM' for more information (e.g. how to search the minus strand of
the chromosome).

https://stat.ethz.ch/pipermail/bioconductor/2008-April/022029.html

February 21, 2012

Save R function to a text file latex…
Save R function to a text file
```
latex.code <- function(){
   cat("\\begin{align}\n")
   cat("[X'X]^{-1}X'y\n")
   cat("\\end{align}\n")
}
sink(file='ols.txt')
latex.code()
sink()
```
http://stackoverflow.com/questions/6222565/how-can-i-save-text-to-a-file-in-r
February 21, 2012
Quick and easy submission of R on Sun…

Quick and easy submission of R on Sun Grid Engine

Easiest way to submit R jobs

Here are two scripts and a symlink I created to make it easy as possible to submit R jobs to your Grid:

qsub-R

If you normally do something along the lines of:

user@exec:~$ nohup nice R CMD BATCH toodles.R
Now all you need to do is:

user@submit:~$ qsub-R toodles.R
Your job 3540 (“toodles.R”) has been submitted
qsub-R is linked to submit-R, a script I wrote. It calls qsub and submits a simple shell wrapper with the R file as an argument. It ends up in the queue and eventually your output arrives in the current directory: toodles.R.o3540

Download it and install it. You’ll need to make the ‘qsub-R’ symlink to ‘3rd_party/uoa-dos/submit-R’ yourself, although there is one in the package already for lx24-x86: qsub-R.tar (10 KiB, tar)

http://www.stat.auckland.ac.nz/~kimihia/sun-grid#submitting

February 14, 2012

Subsetting ranged data with its values Example subset…

Subsetting ranged data with its values.
Example
subset reads according to the width

> paired.read.rd
RangedData with 997111 rows and 1 value column across 1 space
          space             ranges   |   strand
                    | 
1          chr4 [ 146855,  146898]   |        +
2          chr4 [1322462, 1322493]   |        -
3          chr4 [ 135547,  135703]   |        +
4          chr4 [ 965138,  965228]   |        -
5          chr4 [ 614464,  614606]   |        +
6          chr4 [ 274244,  274297]   |        +
7          chr4 [1191851, 1191994]   |        -
8          chr4 [ 310251,  310393]   |        +
9          chr4 [ 524981,  525273]   |        +
...         ...                ... ...      ...
997103     chr4 [1071785, 1071930]   |        -
997104     chr4 [ 819270,  819409]   |        -
997105     chr4 [ 951987,  952139]   |        +
997106     chr4 [ 327573,  327659]   |        -
997107     chr4 [ 343265,  343289]   |        -
997108     chr4 [ 615827,  615992]   |        +
997109     chr4 [ 615402,  615423]   |        -
997110     chr4 [ 128254,  128323]   |        +
997111     chr4 [ 659492,  659641]   |        -


> paired.read.rd[width(paired.read.rd) > 100, ]
RangedData with 623327 rows and 1 value column across 1 space
          space             ranges   |   strand
                    | 
1          chr4 [ 135547,  135703]   |        +
2          chr4 [ 614464,  614606]   |        +
3          chr4 [1191851, 1191994]   |        -
4          chr4 [ 310251,  310393]   |        +
5          chr4 [ 524981,  525273]   |        +
6          chr4 [1174028, 1174189]   |        -
7          chr4 [1174480, 1174655]   |        +
8          chr4 [ 869049,  869191]   |        -
9          chr4 [ 595415,  595565]   |        +
...         ...                ... ...      ...
623319     chr4 [ 646433,  646588]   |        +
623320     chr4 [ 227923,  228078]   |        -
623321     chr4 [1204606, 1204767]   |        -
623322     chr4 [1013562, 1013741]   |        -
623323     chr4 [1071785, 1071930]   |        -
623324     chr4 [ 819270,  819409]   |        -
623325     chr4 [ 951987,  952139]   |        +
623326     chr4 [ 615827,  615992]   |        +
623327     chr4 [ 659492,  659641]   |        -

Or subset() works, too.

January 30, 2012

To import a SAM file or other data…
To import a SAM file or other data having “#” as data using read.table, it is necessary to change the “comment.char” option.
```
test.sam <- read.table('test.sam', comment.char = "")
```
January 17, 2012
Import specific columns into R using read table…
Import specific columns into R using read.table
```
test.import <- read.table(pipe("cut -f1,2,4 data.tab"))
```
January 17, 2012
Retrieving rda into a different variable name in…
Retrieving .rda into a different variable name in R

1. Using environment
```
x <- 1
save(x, file = 'saved_x.rda')
Data.Env <- new.env()
load(file = 'saved_x.rda')
y <- Data.Env$x
```
2. Using saveRDS, readRDS for a single object
```
x <- 1
saveRDS(x, file = 'saved_x.rds')
y <- readRDS('saved_x.rds')
```
November 24, 2011

Convert a list of data frame into a…

Convert a list of data frame into a data frame
ldply in plyr package did awesome job.

I have a list with different length of data. I’d like to convert the list to a data frame with the name of the each list showing as a column. Then I can do grouping the data.
Here is an example

> test.table
$a
1 2 3 
1 1 1 
$b
1 2 3 4 
2 2 1 1 
$c
1 
1 
$d
1 3 4 
1 1 1 
$e
7 
1

> test.table.df <- ldply(test.table, data.frame)
> test.table.df
   .id Var1 Freq
1    a    1    1
2    a    2    1
3    a    3    1
4    b    1    2
5    b    2    2
6    b    3    1
7    b    4    1
8    c    1    1
9    d    1    1
10   d    3    1
11   d    4    1
12   e    7    1

Then I can do summary or grouping the data

> tapply(test.table.df$Freq, test.table.df$Var1, sum)
1 2 3 4 7 
5 3 3 2 1

> ddply(test.table.df, 'Var1', function(x) sum(x$Freq))
  Var1 V1
1    1  5
2    2  3
3    3  3
4    4  2
5    7  1

Final touch. As the grouping variable, Var1 in this example is factor. So if you want to use the level names or labels instead of levels, then you need to convert the type.

> as.numeric(levels(test.count.df$Var1))[test.count.df$Var1]
[1] 1 2 3 4 7
> test.count.df$dist <- with(test.count.df, as.numeric(levels(Var1))[Var1])                                                                                   
> test.count.df
  Var1 V1 dist
1    1  5    1
2    2  3    2
3    3  3    3
4    4  2    4
5    7  1    7

http://stackoverflow.com/questions/2851327/r-converting-a-list-of-data-frames-into-one-data-frame
http://tolstoy.newcastle.edu.au/R/help/03a/6325.html

November 16, 2011

Do NOT use 1 length x in for…

Do NOT use 1:length(x) in for statement in R.
Because when x is empty, the loop will run over 1:0, which is twice.

> y <- as.numeric()
> length(y)
[1] 0
> for (i in 1:length(y)) { print("y")}
[1] "y"
[1] "y"
> for (i in 1:seq(along=y)) { print("y")}
Error in 1:seq(along = y) : argument of length 0

November 14, 2011