matchPWM is now part of Biostrings 2.7.47 in…


rencontres angola matchPWM() is now part of Biostrings 2.7.47 (in BioC devel, so you need R-2.7). Load Biostrings: library(Biostrings) Suppose 'pwm' contains a Position Weight Matrix, let's say: pwm <- rbind(A=c( 1, 0, 19, 20, 18, 1, 20, 7), C=c( 1, 0, 1, 0, 1, 18, 0, 2), G=c(17, 0, 0, 0, 1, 0, 0, 3), T=c( 1, 20, 0, 0, 0, 1, 0, 8)) Note that this is just a standard integer matrix with the 4 DNA base letters as row names (having these row names is mandatory). Some low-level utility functions are available for manipulating this kind of matrix: > maxWeights(pwm) # the max weight in each column [1] 17 20 19 20 18 18 20 8 > maxScore(pwm) # the max possible score [1] 140 Let's match 'pwm' against Human chr1: library(BSgenome.Hsapiens.UCSC.hg18) chr1 <- Hsapiens$chr1 Number of "best" matches: > countPWM(pwm, chr1, min.score="100%") # takes about 5 seconds on my system [1] 5152 With a lower cut-off value: m <- matchPWM(pwm, chr1, min.score="90%") See the 10 first matches ("first" means "smallest chromosome location", NOT "best" matches): > m[1:10] Views on a 247249719-letter DNAString subject subject: TAACCCTAACCCTAACCCTAACCCTAACCCTAAC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN views: start end width [1] 31931 31938 8 [GTAAACAA] [2] 33324 33331 8 [GTAAACAT] [3] 38425 38432 8 [GTAAACAG] [4] 39177 39184 8 [GTAAACAC] [5] 46971 46978 8 [GTAAACAT] [6] 49952 49959 8 [GTAAACAT] [7] 70381 70388 8 [GTAAACAG] [8] 74359 74366 8 [GTAAACAC] [9] 90714 90721 8 [GTAAACAT] [10] 96544 96551 8 [GTAAACAC] The speed could be improved, maybe by a factor 2 (or more, for longest PWMs). Also maybe an additional argument could be added to let the user control how the returned matches should be sorted ("left-to-right" or "best-first")? Unlike MatInspector or the transfac-tool, there is no facility yet to suggest individual cut-off values depending on the length of the PWMs. See '?matchPWM' for more information (e.g. how to search the minus strand of the chromosome).

http://msfr.fr/wp-content/plugins/hello.php https://stat.ethz.ch/pipermail/bioconductor/2008-April/022029.html


Leave a Reply

Your email address will not be published. Required fields are marked *