Categories
status

Mask repetitive region using BEDtools intersectBed a reads…

Mask repetitive region using BEDtools.

intersectBed -a reads.bed -b repeat_region.bed -v
Categories
status

Viewing huge text files between specific lines $…

Viewing huge text files; between specific lines

$ sed -n 101,110p /var/log/cron

10 Awesome Examples for Viewing Huge Log Files in Unix

Categories
status

Subsetting ranged data with its values Example subset…

Subsetting ranged data with its values.
Example
subset reads according to the width

> paired.read.rd
RangedData with 997111 rows and 1 value column across 1 space
          space             ranges   |   strand
                    | 
1          chr4 [ 146855,  146898]   |        +
2          chr4 [1322462, 1322493]   |        -
3          chr4 [ 135547,  135703]   |        +
4          chr4 [ 965138,  965228]   |        -
5          chr4 [ 614464,  614606]   |        +
6          chr4 [ 274244,  274297]   |        +
7          chr4 [1191851, 1191994]   |        -
8          chr4 [ 310251,  310393]   |        +
9          chr4 [ 524981,  525273]   |        +
...         ...                ... ...      ...
997103     chr4 [1071785, 1071930]   |        -
997104     chr4 [ 819270,  819409]   |        -
997105     chr4 [ 951987,  952139]   |        +
997106     chr4 [ 327573,  327659]   |        -
997107     chr4 [ 343265,  343289]   |        -
997108     chr4 [ 615827,  615992]   |        +
997109     chr4 [ 615402,  615423]   |        -
997110     chr4 [ 128254,  128323]   |        +
997111     chr4 [ 659492,  659641]   |        -


> paired.read.rd[width(paired.read.rd) > 100, ]
RangedData with 623327 rows and 1 value column across 1 space
          space             ranges   |   strand
                    | 
1          chr4 [ 135547,  135703]   |        +
2          chr4 [ 614464,  614606]   |        +
3          chr4 [1191851, 1191994]   |        -
4          chr4 [ 310251,  310393]   |        +
5          chr4 [ 524981,  525273]   |        +
6          chr4 [1174028, 1174189]   |        -
7          chr4 [1174480, 1174655]   |        +
8          chr4 [ 869049,  869191]   |        -
9          chr4 [ 595415,  595565]   |        +
...         ...                ... ...      ...
623319     chr4 [ 646433,  646588]   |        +
623320     chr4 [ 227923,  228078]   |        -
623321     chr4 [1204606, 1204767]   |        -
623322     chr4 [1013562, 1013741]   |        -
623323     chr4 [1071785, 1071930]   |        -
623324     chr4 [ 819270,  819409]   |        -
623325     chr4 [ 951987,  952139]   |        +
623326     chr4 [ 615827,  615992]   |        +
623327     chr4 [ 659492,  659641]   |        -

Or subset() works, too.

Categories
status

Exchange current window with the next on Ctrl…

Exchange current window with the next on.
Ctrl-w x

Categories
status

The Minus File Again following the lead of…

The Minus File
Again following the lead of the standard shell utilities, Perl’s open function treats a file whose name is a single minus, “-“, in a special way. If you open minus for reading, it really means to access the standard input. If you open minus for writing, it really means to access the standard output.

UPDATE
========
The minus file is not working with 3-argument open.

Bryan> How can I use the “safe” 3-argument open and still be able to read off a
Bryan> pipe?

You don’t. 2-arg open has to be good for something.

And 2-arg open is perfectly safe if the second arg is a literal:

open OTHER, “<-" or die; open my $handle, "<-" or die; Don't let anyone tell you "Always use 3-arg open" unless they also footnote it with "unless you have no variables involved". -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion

Categories
link

Finding nearby genes using MySQL query to the…

Finding nearby genes using MySQL query to the public server of USCS.

http://genomewiki.ucsc.edu/index.php/Finding_nearby_genes

Categories
status

Nice tips of unix command tools 123 398…

Nice tips of unix command tools

123   398  17359
317    19    2909
 39  -399   -5789
 49    33      200
255   33     -378
sort -n file 

Sort the file by first column. The -n option ensures numeric (as opposed to lexicographic) sort.

sort -k 2 -n file 

Sort the file by second column. The “-k” option here denotes the column used as sort key.

grep '33' file 

Extract all lines containing the string “33” (in the above example, lines 4 and 5).

grep -c '33' file 

Same, but display only the number of matching lines (2 in the example), not the lines themselves. This is useful to analyze large data files of output data. For example, if a sequence of one million integers, is saved as a file, one per line, “grep -c ’33’ file” will display the number of 0’s in that sequence.

grep -c '-' *.out 

Same command, but applied to all files in the current directory matching “*.out”. For each file there is an output line of the form “Filename: x”, where “x” is the number of matching lines in the file.

sort -n file | cat -n 

Sort the file, then prepend line numbers to each line. This results in the following:

     1	 39  -399  -5789
     2	 49  33   200
     3	123  398  17359
     4	255 33   -378
     5	317  19  2909

http://www.math.uiuc.edu/~hildebr/computer/unixtips.html

Categories
status

To import a SAM file or other data…

To import a SAM file or other data having “#” as data using read.table, it is necessary to change the “comment.char” option.

test.sam <- read.table('test.sam', comment.char = "")
Categories
status

Import specific columns into R using read table…

Import specific columns into R using read.table

test.import <- read.table(pipe("cut -f1,2,4 data.tab"))
Categories
link

Ten Simple Rules for Getting Help from Online…

Ten Simple Rules for Getting Help from Online Scientific Communities

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002202