Visitez ce lien Insert the current time into R history
timestamp()
Visitez ce lien Insert the current time into R history
timestamp()
matchPWM() is now part of Biostrings 2.7.47 (in BioC devel, so you need R-2.7).
Load Biostrings:
library(Biostrings)
Suppose 'pwm' contains a Position Weight Matrix, let's say:
pwm <- rbind(A=c( 1, 0, 19, 20, 18, 1, 20, 7),
C=c( 1, 0, 1, 0, 1, 18, 0, 2),
G=c(17, 0, 0, 0, 1, 0, 0, 3),
T=c( 1, 20, 0, 0, 0, 1, 0, 8))
Note that this is just a standard integer matrix with the 4 DNA base letters
as row names (having these row names is mandatory).
Some low-level utility functions are available for manipulating this kind of
matrix:
> maxWeights(pwm) # the max weight in each column
[1] 17 20 19 20 18 18 20 8
> maxScore(pwm) # the max possible score
[1] 140
Let's match 'pwm' against Human chr1:
library(BSgenome.Hsapiens.UCSC.hg18)
chr1 <- Hsapiens$chr1
Number of "best" matches:
> countPWM(pwm, chr1, min.score="100%") # takes about 5 seconds on my system
[1] 5152
With a lower cut-off value:
m <- matchPWM(pwm, chr1, min.score="90%")
See the 10 first matches ("first" means "smallest chromosome location", NOT "best"
matches):
> m[1:10]
Views on a 247249719-letter DNAString subject
subject: TAACCCTAACCCTAACCCTAACCCTAACCCTAAC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
views:
start end width
[1] 31931 31938 8 [GTAAACAA]
[2] 33324 33331 8 [GTAAACAT]
[3] 38425 38432 8 [GTAAACAG]
[4] 39177 39184 8 [GTAAACAC]
[5] 46971 46978 8 [GTAAACAT]
[6] 49952 49959 8 [GTAAACAT]
[7] 70381 70388 8 [GTAAACAG]
[8] 74359 74366 8 [GTAAACAC]
[9] 90714 90721 8 [GTAAACAT]
[10] 96544 96551 8 [GTAAACAC]
The speed could be improved, maybe by a factor 2 (or more, for longest PWMs).
Also maybe an additional argument could be added to let the user control how
the returned matches should be sorted ("left-to-right" or "best-first")?
Unlike MatInspector or the transfac-tool, there is no facility yet to suggest
individual cut-off values depending on the length of the PWMs.
See '?matchPWM' for more information (e.g. how to search the minus strand of
the chromosome).
https://stat.ethz.ch/pipermail/bioconductor/2008-April/022029.html
Save R function to a text file
latex.code <- function(){
cat("\\begin{align}\n")
cat("[X'X]^{-1}X'y\n")
cat("\\end{align}\n")
}
sink(file='ols.txt')
latex.code()
sink()
http://stackoverflow.com/questions/6222565/how-can-i-save-text-to-a-file-in-r
Quick and easy submission of R on Sun Grid Engine
Easiest way to submit R jobs
Here are two scripts and a symlink I created to make it easy as possible to submit R jobs to your Grid:
qsub-R
If you normally do something along the lines of:
user@exec:~$ nohup nice R CMD BATCH toodles.R
Now all you need to do is:
user@submit:~$ qsub-R toodles.R
Your job 3540 (“toodles.R”) has been submitted
qsub-R is linked to submit-R, a script I wrote. It calls qsub and submits a simple shell wrapper with the R file as an argument. It ends up in the queue and eventually your output arrives in the current directory: toodles.R.o3540
Download it and install it. You’ll need to make the ‘qsub-R’ symlink to ‘3rd_party/uoa-dos/submit-R’ yourself, although there is one in the package already for lx24-x86: qsub-R.tar (10 KiB, tar)
http://www.stat.auckland.ac.nz/~kimihia/sun-grid#submitting
Subsetting ranged data with its values.
Example
subset reads according to the width
> paired.read.rd
RangedData with 997111 rows and 1 value column across 1 space
space ranges | strand
|
1 chr4 [ 146855, 146898] | +
2 chr4 [1322462, 1322493] | -
3 chr4 [ 135547, 135703] | +
4 chr4 [ 965138, 965228] | -
5 chr4 [ 614464, 614606] | +
6 chr4 [ 274244, 274297] | +
7 chr4 [1191851, 1191994] | -
8 chr4 [ 310251, 310393] | +
9 chr4 [ 524981, 525273] | +
... ... ... ... ...
997103 chr4 [1071785, 1071930] | -
997104 chr4 [ 819270, 819409] | -
997105 chr4 [ 951987, 952139] | +
997106 chr4 [ 327573, 327659] | -
997107 chr4 [ 343265, 343289] | -
997108 chr4 [ 615827, 615992] | +
997109 chr4 [ 615402, 615423] | -
997110 chr4 [ 128254, 128323] | +
997111 chr4 [ 659492, 659641] | -
> paired.read.rd[width(paired.read.rd) > 100, ]
RangedData with 623327 rows and 1 value column across 1 space
space ranges | strand
|
1 chr4 [ 135547, 135703] | +
2 chr4 [ 614464, 614606] | +
3 chr4 [1191851, 1191994] | -
4 chr4 [ 310251, 310393] | +
5 chr4 [ 524981, 525273] | +
6 chr4 [1174028, 1174189] | -
7 chr4 [1174480, 1174655] | +
8 chr4 [ 869049, 869191] | -
9 chr4 [ 595415, 595565] | +
... ... ... ... ...
623319 chr4 [ 646433, 646588] | +
623320 chr4 [ 227923, 228078] | -
623321 chr4 [1204606, 1204767] | -
623322 chr4 [1013562, 1013741] | -
623323 chr4 [1071785, 1071930] | -
623324 chr4 [ 819270, 819409] | -
623325 chr4 [ 951987, 952139] | +
623326 chr4 [ 615827, 615992] | +
623327 chr4 [ 659492, 659641] | -
Or subset() works, too.
To import a SAM file or other data having “#” as data using read.table, it is necessary to change the “comment.char” option.
test.sam <- read.table('test.sam', comment.char = "")
Import specific columns into R using read.table
test.import <- read.table(pipe("cut -f1,2,4 data.tab"))
Retrieving .rda into a different variable name in R
1. Using environment
x <- 1 save(x, file = 'saved_x.rda') Data.Env <- new.env() load(file = 'saved_x.rda') y <- Data.Env$x
2. Using saveRDS, readRDS for a single object
x <- 1
saveRDS(x, file = 'saved_x.rds')
y <- readRDS('saved_x.rds')
Convert a list of data frame into a data frame
ldply in plyr package did awesome job.
I have a list with different length of data. I’d like to convert the list to a data frame with the name of the each list showing as a column. Then I can do grouping the data.
Here is an example
> test.table $a 1 2 3 1 1 1 $b 1 2 3 4 2 2 1 1 $c 1 1 $d 1 3 4 1 1 1 $e 7 1
> test.table.df <- ldply(test.table, data.frame) > test.table.df .id Var1 Freq 1 a 1 1 2 a 2 1 3 a 3 1 4 b 1 2 5 b 2 2 6 b 3 1 7 b 4 1 8 c 1 1 9 d 1 1 10 d 3 1 11 d 4 1 12 e 7 1
Then I can do summary or grouping the data
> tapply(test.table.df$Freq, test.table.df$Var1, sum) 1 2 3 4 7 5 3 3 2 1
or
> ddply(test.table.df, 'Var1', function(x) sum(x$Freq)) Var1 V1 1 1 5 2 2 3 3 3 3 4 4 2 5 7 1
Final touch. As the grouping variable, Var1 in this example is factor. So if you want to use the level names or labels instead of levels, then you need to convert the type.
> as.numeric(levels(test.count.df$Var1))[test.count.df$Var1] [1] 1 2 3 4 7 > test.count.df$dist <- with(test.count.df, as.numeric(levels(Var1))[Var1]) > test.count.df Var1 V1 dist 1 1 5 1 2 2 3 2 3 3 3 3 4 4 2 4 5 7 1 7
http://stackoverflow.com/questions/2851327/r-converting-a-list-of-data-frames-into-one-data-frame
http://tolstoy.newcastle.edu.au/R/help/03a/6325.html
Do NOT use 1:length(x) in for statement in R.
Because when x is empty, the loop will run over 1:0, which is twice.
> y <- as.numeric()
> length(y)
[1] 0
> for (i in 1:length(y)) { print("y")}
[1] "y"
[1] "y"
> for (i in 1:seq(along=y)) { print("y")}
Error in 1:seq(along = y) : argument of length 0