Category: status

  • Subsetting ranged data with its values Example subset…

    Subsetting ranged data with its values.
    Example
    subset reads according to the width

    > paired.read.rd
    RangedData with 997111 rows and 1 value column across 1 space
              space             ranges   |   strand
                        | 
    1          chr4 [ 146855,  146898]   |        +
    2          chr4 [1322462, 1322493]   |        -
    3          chr4 [ 135547,  135703]   |        +
    4          chr4 [ 965138,  965228]   |        -
    5          chr4 [ 614464,  614606]   |        +
    6          chr4 [ 274244,  274297]   |        +
    7          chr4 [1191851, 1191994]   |        -
    8          chr4 [ 310251,  310393]   |        +
    9          chr4 [ 524981,  525273]   |        +
    ...         ...                ... ...      ...
    997103     chr4 [1071785, 1071930]   |        -
    997104     chr4 [ 819270,  819409]   |        -
    997105     chr4 [ 951987,  952139]   |        +
    997106     chr4 [ 327573,  327659]   |        -
    997107     chr4 [ 343265,  343289]   |        -
    997108     chr4 [ 615827,  615992]   |        +
    997109     chr4 [ 615402,  615423]   |        -
    997110     chr4 [ 128254,  128323]   |        +
    997111     chr4 [ 659492,  659641]   |        -
    
    
    > paired.read.rd[width(paired.read.rd) > 100, ]
    RangedData with 623327 rows and 1 value column across 1 space
              space             ranges   |   strand
                        | 
    1          chr4 [ 135547,  135703]   |        +
    2          chr4 [ 614464,  614606]   |        +
    3          chr4 [1191851, 1191994]   |        -
    4          chr4 [ 310251,  310393]   |        +
    5          chr4 [ 524981,  525273]   |        +
    6          chr4 [1174028, 1174189]   |        -
    7          chr4 [1174480, 1174655]   |        +
    8          chr4 [ 869049,  869191]   |        -
    9          chr4 [ 595415,  595565]   |        +
    ...         ...                ... ...      ...
    623319     chr4 [ 646433,  646588]   |        +
    623320     chr4 [ 227923,  228078]   |        -
    623321     chr4 [1204606, 1204767]   |        -
    623322     chr4 [1013562, 1013741]   |        -
    623323     chr4 [1071785, 1071930]   |        -
    623324     chr4 [ 819270,  819409]   |        -
    623325     chr4 [ 951987,  952139]   |        +
    623326     chr4 [ 615827,  615992]   |        +
    623327     chr4 [ 659492,  659641]   |        -
    

    Or subset() works, too.

  • Exchange current window with the next on Ctrl…

    Exchange current window with the next on.
    Ctrl-w x

  • The Minus File Again following the lead of…

    The Minus File
    Again following the lead of the standard shell utilities, Perl’s open function treats a file whose name is a single minus, “-“, in a special way. If you open minus for reading, it really means to access the standard input. If you open minus for writing, it really means to access the standard output.

    UPDATE
    ========
    The minus file is not working with 3-argument open.

    Bryan> How can I use the “safe” 3-argument open and still be able to read off a
    Bryan> pipe?

    You don’t. 2-arg open has to be good for something.

    And 2-arg open is perfectly safe if the second arg is a literal:

    open OTHER, “<-" or die; open my $handle, "<-" or die; Don't let anyone tell you "Always use 3-arg open" unless they also footnote it with "unless you have no variables involved". -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
    See http://methodsandmessages.posterous.com/ for Smalltalk discussion

  • Nice tips of unix command tools 123 398…

    Nice tips of unix command tools

    123   398  17359
    317    19    2909
     39  -399   -5789
     49    33      200
    255   33     -378
    
    sort -n file 
    

    Sort the file by first column. The -n option ensures numeric (as opposed to lexicographic) sort.

    sort -k 2 -n file 
    

    Sort the file by second column. The “-k” option here denotes the column used as sort key.

    grep '33' file 
    

    Extract all lines containing the string “33” (in the above example, lines 4 and 5).

    grep -c '33' file 
    

    Same, but display only the number of matching lines (2 in the example), not the lines themselves. This is useful to analyze large data files of output data. For example, if a sequence of one million integers, is saved as a file, one per line, “grep -c ’33’ file” will display the number of 0’s in that sequence.

    grep -c '-' *.out 
    

    Same command, but applied to all files in the current directory matching “*.out”. For each file there is an output line of the form “Filename: x”, where “x” is the number of matching lines in the file.

    sort -n file | cat -n 
    

    Sort the file, then prepend line numbers to each line. This results in the following:

         1	 39  -399  -5789
         2	 49  33   200
         3	123  398  17359
         4	255 33   -378
         5	317  19  2909
    

    http://www.math.uiuc.edu/~hildebr/computer/unixtips.html

  • To import a SAM file or other data…

    To import a SAM file or other data having “#” as data using read.table, it is necessary to change the “comment.char” option.

    test.sam <- read.table('test.sam', comment.char = "")
    
  • Import specific columns into R using read table…

    Import specific columns into R using read.table

    test.import <- read.table(pipe("cut -f1,2,4 data.tab"))
    
  • More discussions about the coordinate systems http biostar…

    More discussions about the coordinate systems

    http://biostar.stackexchange.com/questions/6394/what-are-the-advantages-disadvantages-of-one-based-vs-zero-based-genome-coordina
    http://bergmanlab.smith.man.ac.uk/?p=36

  • 0 based half open coordinate system or interbase…

    0-based, half-open coordinate system or interbase coordinate system

    | A | C | G | T | A | C | G | T |
    0 | 1 | 2 | 3 |  4 | 5 |  6 | 7  | 8
    

    So if you wanted to describe the first base, it would be:

    chr 0 1

    and GTAC:

    chr 2 6

    http://biostar.stackexchange.com/questions/7064/bed-coordinates

    Update:
    Interbase coordinates (top) and base-oriented (below) from Chado Wiki

  • Explains the SAM flags http picard sourceforge net…

    Explains the SAM flags.

    http://picard.sourceforge.net/explain-flags.html