Tag: R

  • Insert the current time into R history timestamp…

    Visitez ce lien Insert the current time into R history

    timestamp()

  • matchPWM is now part of Biostrings 2.7.47 in…

    matchPWM() is now part of Biostrings 2.7.47 (in BioC devel, so you need R-2.7).
    
    Load Biostrings:
    
       library(Biostrings)
    
    Suppose 'pwm' contains a Position Weight Matrix, let's say:
    
       pwm <- rbind(A=c( 1,  0, 19, 20, 18,  1, 20,  7),
                    C=c( 1,  0,  1,  0,  1, 18,  0,  2),
                    G=c(17,  0,  0,  0,  1,  0,  0,  3),
                    T=c( 1, 20,  0,  0,  0,  1,  0,  8))
    
    Note that this is just a standard integer matrix with the 4 DNA base letters
    as row names (having these row names is mandatory).
    Some low-level utility functions are available for manipulating this kind of
    matrix:
    
       > maxWeights(pwm)  # the max weight in each column
       [1] 17 20 19 20 18 18 20  8
    
       > maxScore(pwm)  # the max possible score
       [1] 140
    
    Let's match 'pwm' against Human chr1:
    
       library(BSgenome.Hsapiens.UCSC.hg18)
       chr1 <- Hsapiens$chr1
    
    Number of "best" matches:
    
       > countPWM(pwm, chr1, min.score="100%")  # takes about 5 seconds on my system
       [1] 5152
    
    With a lower cut-off value:
    
       m <- matchPWM(pwm, chr1, min.score="90%")
    
    See the 10 first matches ("first" means "smallest chromosome location", NOT "best"
    matches):
    
       > m[1:10]
         Views on a 247249719-letter DNAString subject
       subject: TAACCCTAACCCTAACCCTAACCCTAACCCTAAC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
       views:
            start   end width
        [1] 31931 31938     8 [GTAAACAA]
        [2] 33324 33331     8 [GTAAACAT]
        [3] 38425 38432     8 [GTAAACAG]
        [4] 39177 39184     8 [GTAAACAC]
        [5] 46971 46978     8 [GTAAACAT]
        [6] 49952 49959     8 [GTAAACAT]
        [7] 70381 70388     8 [GTAAACAG]
        [8] 74359 74366     8 [GTAAACAC]
        [9] 90714 90721     8 [GTAAACAT]
       [10] 96544 96551     8 [GTAAACAC]
    
    The speed could be improved, maybe by a factor 2 (or more, for longest PWMs).
    
    Also maybe an additional argument could be added to let the user control how
    the returned matches should be sorted ("left-to-right" or "best-first")?
    
    Unlike MatInspector or the transfac-tool, there is no facility yet to suggest
    individual cut-off values depending on the length of the PWMs.
    
    See '?matchPWM' for more information (e.g. how to search the minus strand of
    the chromosome).
    

    https://stat.ethz.ch/pipermail/bioconductor/2008-April/022029.html

  • Save R function to a text file latex…

    Save R function to a text file

    latex.code <- function(){
       cat("\\begin{align}\n")
       cat("[X'X]^{-1}X'y\n")
       cat("\\end{align}\n")
    }
    sink(file='ols.txt')
    latex.code()
    sink()
    

    http://stackoverflow.com/questions/6222565/how-can-i-save-text-to-a-file-in-r

  • Quick and easy submission of R on Sun…

    Quick and easy submission of R on Sun Grid Engine

    Easiest way to submit R jobs

    Here are two scripts and a symlink I created to make it easy as possible to submit R jobs to your Grid:

    qsub-R

    If you normally do something along the lines of:

    user@exec:~$ nohup nice R CMD BATCH toodles.R
    Now all you need to do is:

    user@submit:~$ qsub-R toodles.R
    Your job 3540 (“toodles.R”) has been submitted
    qsub-R is linked to submit-R, a script I wrote. It calls qsub and submits a simple shell wrapper with the R file as an argument. It ends up in the queue and eventually your output arrives in the current directory: toodles.R.o3540

    Download it and install it. You’ll need to make the ‘qsub-R’ symlink to ‘3rd_party/uoa-dos/submit-R’ yourself, although there is one in the package already for lx24-x86: qsub-R.tar (10 KiB, tar)

    http://www.stat.auckland.ac.nz/~kimihia/sun-grid#submitting

  • Subsetting ranged data with its values Example subset…

    Subsetting ranged data with its values.
    Example
    subset reads according to the width

    > paired.read.rd
    RangedData with 997111 rows and 1 value column across 1 space
              space             ranges   |   strand
                        | 
    1          chr4 [ 146855,  146898]   |        +
    2          chr4 [1322462, 1322493]   |        -
    3          chr4 [ 135547,  135703]   |        +
    4          chr4 [ 965138,  965228]   |        -
    5          chr4 [ 614464,  614606]   |        +
    6          chr4 [ 274244,  274297]   |        +
    7          chr4 [1191851, 1191994]   |        -
    8          chr4 [ 310251,  310393]   |        +
    9          chr4 [ 524981,  525273]   |        +
    ...         ...                ... ...      ...
    997103     chr4 [1071785, 1071930]   |        -
    997104     chr4 [ 819270,  819409]   |        -
    997105     chr4 [ 951987,  952139]   |        +
    997106     chr4 [ 327573,  327659]   |        -
    997107     chr4 [ 343265,  343289]   |        -
    997108     chr4 [ 615827,  615992]   |        +
    997109     chr4 [ 615402,  615423]   |        -
    997110     chr4 [ 128254,  128323]   |        +
    997111     chr4 [ 659492,  659641]   |        -
    
    
    > paired.read.rd[width(paired.read.rd) > 100, ]
    RangedData with 623327 rows and 1 value column across 1 space
              space             ranges   |   strand
                        | 
    1          chr4 [ 135547,  135703]   |        +
    2          chr4 [ 614464,  614606]   |        +
    3          chr4 [1191851, 1191994]   |        -
    4          chr4 [ 310251,  310393]   |        +
    5          chr4 [ 524981,  525273]   |        +
    6          chr4 [1174028, 1174189]   |        -
    7          chr4 [1174480, 1174655]   |        +
    8          chr4 [ 869049,  869191]   |        -
    9          chr4 [ 595415,  595565]   |        +
    ...         ...                ... ...      ...
    623319     chr4 [ 646433,  646588]   |        +
    623320     chr4 [ 227923,  228078]   |        -
    623321     chr4 [1204606, 1204767]   |        -
    623322     chr4 [1013562, 1013741]   |        -
    623323     chr4 [1071785, 1071930]   |        -
    623324     chr4 [ 819270,  819409]   |        -
    623325     chr4 [ 951987,  952139]   |        +
    623326     chr4 [ 615827,  615992]   |        +
    623327     chr4 [ 659492,  659641]   |        -
    

    Or subset() works, too.

  • To import a SAM file or other data…

    To import a SAM file or other data having “#” as data using read.table, it is necessary to change the “comment.char” option.

    test.sam <- read.table('test.sam', comment.char = "")
    
  • Import specific columns into R using read table…

    Import specific columns into R using read.table

    test.import <- read.table(pipe("cut -f1,2,4 data.tab"))
    
  • Retrieving rda into a different variable name in…

    Retrieving .rda into a different variable name in R

    1. Using environment

    x <- 1
    save(x, file = 'saved_x.rda')
    Data.Env <- new.env()
    load(file = 'saved_x.rda')
    y <- Data.Env$x
    

    2. Using saveRDS, readRDS for a single object

    x <- 1
    saveRDS(x, file = 'saved_x.rds')
    y <- readRDS('saved_x.rds')
    
  • Convert a list of data frame into a…

    Convert a list of data frame into a data frame
    ldply in plyr package did awesome job.

    I have a list with different length of data. I’d like to convert the list to a data frame with the name of the each list showing as a column. Then I can do grouping the data.
    Here is an example

    > test.table
    $a
    1 2 3 
    1 1 1 
    $b
    1 2 3 4 
    2 2 1 1 
    $c
    1 
    1 
    $d
    1 3 4 
    1 1 1 
    $e
    7 
    1 
    
    > test.table.df <- ldply(test.table, data.frame)
    > test.table.df
       .id Var1 Freq
    1    a    1    1
    2    a    2    1
    3    a    3    1
    4    b    1    2
    5    b    2    2
    6    b    3    1
    7    b    4    1
    8    c    1    1
    9    d    1    1
    10   d    3    1
    11   d    4    1
    12   e    7    1
    

    Then I can do summary or grouping the data

    > tapply(test.table.df$Freq, test.table.df$Var1, sum)
    1 2 3 4 7 
    5 3 3 2 1 
    

    or

    > ddply(test.table.df, 'Var1', function(x) sum(x$Freq))
      Var1 V1
    1    1  5
    2    2  3
    3    3  3
    4    4  2
    5    7  1
    

    Final touch. As the grouping variable, Var1 in this example is factor. So if you want to use the level names or labels instead of levels, then you need to convert the type.

    > as.numeric(levels(test.count.df$Var1))[test.count.df$Var1]
    [1] 1 2 3 4 7
    > test.count.df$dist <- with(test.count.df, as.numeric(levels(Var1))[Var1])                                                                                   
    > test.count.df
      Var1 V1 dist
    1    1  5    1
    2    2  3    2
    3    3  3    3
    4    4  2    4
    5    7  1    7
    

    http://stackoverflow.com/questions/2851327/r-converting-a-list-of-data-frames-into-one-data-frame
    http://tolstoy.newcastle.edu.au/R/help/03a/6325.html

  • Do NOT use 1 length x in for…

    Do NOT use 1:length(x) in for statement in R.
    Because when x is empty, the loop will run over 1:0, which is twice.

    > y <- as.numeric()
    > length(y)
    [1] 0
    > for (i in 1:length(y)) { print("y")}
    [1] "y"
    [1] "y"
    > for (i in 1:seq(along=y)) { print("y")}
    Error in 1:seq(along = y) : argument of length 0