Tag: R

ggplot2 opts list https github com hadley ggplot2…

http://intercampus.fr/?radichego=chaumont-rencontre-couple&1c8=49 ggplot2 opts list

https://github.com/hadley/ggplot2/wiki/%2Bopts%28%29-List

June 15, 2012

Sometimes R gives the error message Error in…

Sometimes R gives the error message.

Error in X11(d$display, d$width, d$height, d$pointsize, d$gamma, d$colortype,  : 
  unable to start device X11cairo

Here is the fix.

Sys.setenv("DISPLAY"=":0.0")

June 15, 2012

Two ways to save a plot in png…
Two ways to save a plot in png.

dev.copy() saves the plot with transparent background while savePlot() with white background.
See the difference.
```
dev.copy(device = png, file = "devcopy.png")
dev.off()
```
```
savePlot(file = "saveplot.png")
```
Saved by dev.copy()
Saved by savePlot()
May 31, 2012
Using discontinuous axis or breaks may not…

Using discontinuous axis or “breaks” may not be a good idea to visualize data.
Instead, use two plots, one with the whole data and another for zoomed data.
However, if the situation is not allowed, there is an R package to do it.
The package plotrix can put breaks or gaps on the axes.

May 31, 2012

How to use eval Here is an example…

How to use eval()?
Here is an example.

test.dframe <- data.frame(x = 1:10, y = rnorm(10))                                                                                                                                
addSmooth <- function(data, smooth = TRUE) {
    require(ggplot2)
    p <- "ggplot(data = data, aes(x = x, y = y))"
    p.line <- "geom_line()"
    p.smooth <- "stat_smooth()"
    if (smooth == TRUE) {
        p.all <- paste(p, p.line, p.smooth, sep = " + ")
    } else {
        p.all <- paste(p, p.line, sep = " + ")
    }.
    eval(parse(text = p.all))
}
 
addSmooth(test.dframe, smooth = FALSE)

Created by Pretty R at inside-R.org

May 30, 2012

checkUsage in the codetools package examines the function…

checkUsage() in the codetools package examines the function and report possible problems.

R functions look for a global variable if the variable is not found in the scope. If a function accidentally has an “orphan” variable which is not passed on to the function but has the same name as one of the global variables, the function will use the value of the global variable. It is a hard to find bug. checkUsage() will find such cases and report.

May 24, 2012

Multiple plots with different titles using ggplot lapply…

Multiple plots with different titles using ggplot, lapply, and do.call.

parameters = data.frame(p1=letters[1:5], p2=round(rnorm(5),2))
l = replicate(5, data.frame(x=1:10, y=rnorm(10)), simplify=FALSE)
names(l) = do.call(paste, c(parameters, sep=","))
 
plot_one = function(x)
  ggplot(data = l[[x]]) + geom_path(aes(x, y)) +
   opts(title = x)
 
plots = lapply(names(l), plot_one)
 
do.call(gridExtra::grid.arrange, plots)

Created by Pretty R at inside-R.org

http://stackoverflow.com/questions/10726470/ggplot-over-many-data-frames-changing-titles

May 24, 2012

Side by side comparison between ggplot2 and lattice…

Side by side comparison between ggplot2 and lattice
http://learnr.wordpress.com/tag/ggplot2/

Code examples and plot output
http://wiki.stdout.org/rcookbook/Graphs/

And of course,
http://had.co.nz/ggplot2/

April 5, 2012
Differences in zooming or scaling in ggplot Either…

Differences in zooming or scaling in ggplot.

Either scale_y_continuous() or coord_cartesian() can be used but they work in slightly different way.
scale_y_continuous (or scale_x_continuous) will drop data which are out of the range, while coord_cartesian() won’t. It does not affect the plot if the plot is drawn only with the given data. However, if the plot includes processed data by ggplot, such as smoothing or histogram, the result may not the same.

http://had.co.nz/ggplot2/coord_cartesian.html

April 5, 2012
Plot multi column data with ggplot ggplot is…
Plot multi column data with ggplot

ggplot is a great visualization tool for R. It draws beautiful plots but the difference from the native plotting system in R takes some time to get used to it.

Here are two examples how to plot data in multiple columns.
The original data have three columns with one x-variable and two y-variables. The data look like this.
```
head(e.plot$nucl.di.sm)
```
```
     x       aatt       ggcc
1  -71 0.10117730 0.05899822
2  -70 0.09955112 0.05715069
3  -69 0.09949577 0.05404929
4  -68 0.09990107 0.05115649
5  -67 0.09933432 0.04910463
6  -66 0.09688013 0.04802868
7  -65 0.09345548 0.04829135
8  -64 0.09024192 0.04977192
9  -63 0.08826623 0.05155951
10 -62 0.08823234 0.05227631
```
First, I can use separate geom for each column.
```
e.plot$dinucl.plot <- ggplot(e.plot$nucl.di.sm)

e.plot$dinucl.plot + geom_line(aes(x = sidx.plot.x, y = aatt, colour = 'AATT')) 
  + geom_line(aes(x = sidx.plot.x, y = ggcc, colour = 'GGCC')) 
+ scale_colour_discrete("Pattern")
 + xlab("Dist from dyad") + ylab("Dinucleotide frequency")
```
This was the first approach and the result is below.

Plotting was easy but I had to spend quite a bit of time to figure out how to change the color and put the legend. Check Hadley's answer for this. How to change the legend title

Then I found there is another way of doing it. It involves reshaping the data using melt() came with reshape package.
```
e.plot$test.melt <- melt(e.plot$nucl.di.sm, measure.vars=c('aatt', 'ggcc'))      
```
After reshaping by melt, the data look like this.
```
head(e.plot$test.melt)
```
```
     x variable      value
1  -71     aatt 0.10117730
2  -70     aatt 0.09955112
3  -69     aatt 0.09949577
4  -68     aatt 0.09990107
5  -67     aatt 0.09933432
6  -66     aatt 0.09688013
7  -65     aatt 0.09345548
8  -64     aatt 0.09024192
9  -63     aatt 0.08826623
10 -62     aatt 0.08823234
```
Then the data can be plotted with one geom.
```
e.plot$test.melt.ggplot <- ggplot(e.plot$test.melt)
 
e.plot$test.melt.ggplot + geom_line(aes(colour = variable)) 
+ scale_colour_discrete("Pattern") 
+ xlab("Dist from dyad")
 + ylab("Dinucleotide frequency")
```
With this method, ggplot took care of colors and legend automatically. Cool! Here is the result.

As you can see the two results are almost identical except for the label in the legend. The label follows the column name of the data. I have a feeling, also from several comment online, that ggplot prefers long table to wide table or one column for y variable. If you want this approach, melt will be a invaluable tool and ggplot takes care of many formatting jobs so that user can save lots of time.

http://stackoverflow.com/questions/1787578/problem-with-legend-while-plotting-data-from-two-data-frame
http://stackoverflow.com/questions/1313954/plotting-two-vectors-of-data-on-a-ggplot2-scatter-plot-using-r

Update: I found a way to change the labels for the legend.
```
e.plot$test.melt.ggplot <- ggplot(e.plot$test.melt)
e.plot$test.melt.ggplot + geom_line(aes(colour = factor(variable, labels = c("AATT", "GGCC")))) 
+ scale_colour_discrete("Pattern")
+ xlab("Dist from dyad") 
+ ylab("Dinucleotide frequency")
```
http://stackoverflow.com/questions/2339953/how-to-add-custom-series-labels-to-a-legend-in-rs-ggplot
March 23, 2012