Ransformation Introduced Infinite Values in Continuous Y axis

Data visualization with ggplot2

Objectives

To learn how to create publishable figures using the ggplot2 package in R.

By the end of the course, students should be able to create simple, pretty, and effective figures.

Introducing ggplot2

ggplot2 is a R graphics package from the tidyverse collection. It allows the user to create informative plots quickly by using a 'grammar of graphics' implementation, which is described as "a coherent system for describing and building graphs" (R4DS). The power of this package is that plots are built in layers and few changes to the code result in very different outcomes. This makes it easy to reuse parts of the code for very different figures.

Being a part of the tidyverse collection, ggplot2 works best with long format data (i.e., tidy data), which you should already be accustomed to.

To begin plotting, let's load our tidyverse library.

                                                        #load libraries                                        library(tidyverse) # Tidyverse automatically loads ggplot2                                                      
                                    ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──                                  
                                    ## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4 ## ✓ tibble  3.1.6     ✓ dplyr   1.0.7 ## ✓ tidyr   1.1.4     ✓ stringr 1.4.0 ## ✓ readr   2.1.1     ✓ forcats 0.5.1                                  
                                    ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## x dplyr::filter() masks stats::filter() ## x dplyr::lag()    masks stats::lag()                                  

We also need some data to plot, so if you haven't already, let's load the data we will need for this lesson.

                                                        #scaled_counts                                        #We used this in lesson 2 so you may not need to reload                                        scaled_counts<-                                                              read.delim("./data/filtlowabund_scaledcounts_airways.txt",                                                              as.is=TRUE)                                                                                dexp<-read.delim("./data/diffexp_results_edger_airways.txt",                                                              as.is=TRUE)                                                      

The ggplot2 template

The following represents the basic ggplot2 template

                                    ggplot(data = <DATA>) +    <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))                                  
The main components include data we want to plot, geom function(s), and mapping aesthetics. Notice the + symbol following the ggplot() function. This symbol will precede each additional layer of code for the plot, and it is important that it is placed at the end of the line. More on geom functions and mapping aesthetics to come.

Let's see this template in practice.

What is the relationship between total transcript sums per sample and the number of recovered transcripts per sample?

                                                        #let's get some data                                        #we are only interested in transcript counts greater than 100                                        #read in the data                                        sc<-read.csv("./data/sc.csv")                                                                                #If you are curious how this was made; here is the code                                        #scaled_counts %>% group_by(dex, SampleName) %>%                                        #  summarize(Num_transcripts=sum(counts>100),TotalCounts=sum(counts))                                                                                #let's view the data                                        sc                                                      
                                    ##     dex SampleName Num_transcripts TotalCounts ## 1   trt GSM1275863           10768    18783120 ## 2   trt GSM1275867           10051    15144524 ## 3   trt GSM1275871           11658    30776089 ## 4   trt GSM1275875           10900    21135511 ## 5 untrt GSM1275862           11177    20608402 ## 6 untrt GSM1275866           11526    25311320 ## 7 untrt GSM1275870           11425    24411867 ## 8 untrt GSM1275874           11000    19094104                                  
                                                        #let's plot                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts))                                                      

We can easily see that there is a relationship between the number of transcripts per sample and the total transcripts recovered per sample. ggplot2 default parameters are great for exploratory data analysis. But, with only a few tweaks, we can make some beautiful, publishable figures.

What did we do in the above code?
The first step to creating this plot was initializing the ggplot object using the function ggplot(). Remember, we can look further for help using ?ggplot(). The function ggplot() takes data, mapping, and further arguments. However, none of this needs to actually be provided at the initialization phase, which creates the coordinate system from which we build our plot. But, typically, you should at least call the data at this point.

The data we called was from the data frame sc, which we created above. Next, we provided a geom function (geom_point()), which created a scatter plot. This scatter plot required mapping information, which we provided for the x and y axes. More on this in a moment.

Let's break down the individual components of the code.

                                                        #What does running ggplot() do?                                        ggplot(data=sc)                                                      

                                                        #What about just running a geom function?                                        geom_point(data=sc,aes(x=Num_transcripts, y = TotalCounts))                                                      
                                    ## mapping: x = ~Num_transcripts, y = ~TotalCounts  ## geom_point: na.rm = FALSE ## stat_identity: na.rm = FALSE ## position_identity                                  
                                                        #what about this                                        ggplot() +                                        geom_point(data=sc,aes(x=Num_transcripts, y = TotalCounts))                                                      

Geom functions

A geom is the geometrical object that a plot uses to represent data. People often describe plots by the type of geom that the plot uses. --- R4DS

There are multiple geom functions that change the basic plot type or the plot representation. We can create scatter plots (geom_point()), line plots (geom_line(),geom_path()), bar plots (geom_bar(), geom_col()), line modeled to fitted data (geom_smooth()), heat maps (geom_tile()), geographic maps (geom_polygon()), etc.

ggplot2 provides over 40 geoms, and extension packages provide even more (see https://exts.ggplot2.tidyverse.org/gallery/ for a sampling). The best way to get a comprehensive overview is the ggplot2 cheatsheet, which you can find at http://rstudio.com/resources/cheatsheets. --- R4DS

You can also see a number of options pop up when you type geom into the console, or you can look up the ggplot2 documentation in the help tab.

We can see how easy it is to change the way the data is plotted. Let's plot the same data using geom_line().

                                                        ggplot(data=sc) +                                                              geom_line(aes(x=Num_transcripts, y = TotalCounts))                                                      

Mapping and aesthetics (aes())

The geom functions require a mapping argument. The mapping argument includes the aes() function, which "describes how variables in the data are mapped to visual properties (aesthetics) of geoms" (ggplot2 R Documentation). If not included it will be inherited from the ggplot() function.

An aesthetic is a visual property of the objects in your plot.---R4DS

Mapping aesthetics include some of the following:
1. the x and y data arguments
2. shapes
3. color
4. fill
5. size
6. linetype
7. alpha

This is not an all encompassing list.

Let's return to our plot above. Is there a relationship between treatment ("dex") and the number of transcripts or total counts?

                                                        #adding the color argument to our mapping aesthetic                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,color=dex))                                                      

There is potentially a relationship. ASM cells treated with dexamethasone in general have lower total numbers of transcripts and lower total counts.

Notice how we changed the color of our points to represent a variable, in this case. To do this, we set color equal to 'dex' within the aes() function. This mapped our aesthetic, color, to a variable we were interested in exploring. Aesthetics that are not mapped to our variables are placed outside of the aes() function. These aesthetics are manually assigned and do not undergo the same scaling process as those within aes().

For example

                                                        #map the shape aesthetic to the variable "dex"                                        #use the color purple across all points (NOT mapped to a variable)                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,shape=dex),                                                              color="purple")                                                      

We can also see from this that 'dex' could be mapped to other aesthetics. In the above example, we see it mapped to shape rather than color. By default, ggplot2 will only map six shapes at a time, and if your number of categories goes beyond 6, the remaining groups will go unmapped. This is by design because it is hard to discriminate between more than six shapes at any given moment. This is a clue from ggplot2 that you should choose a different aesthetic to map to your variable. However, if you choose to ignore this functionality, you can manually assign more than six shapes.

We could have just as easily mapped it to alpha, which adds a gradient to the point visibility by category.

                                                        #map the alpha aesthetic to the variable "dex"                                        #use the color purple across all points (NOT mapped to a variable)                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,alpha=dex),                                                              color="purple") #note the warning.                                                      
                                    ## Warning: Using alpha for a discrete variable is not advised.                                  

Or we could map it to size. There are multiple options, so explore a little with your plots.

Other things to note:
The assignment of color, shape, or alpha to our variable was automatic, with a unique aesthetic level representing each category (i.e., 'trt', 'untrt') within our variable. You will also notice that ggplot2 automatically created a legend to explain the levels of the aesthetic mapped. We can change aesthetic parameters - what colors are used, for example - by adding additional layers to the plot. We will be adding layers throughout the tutorial.

R objects can also store figures

As we have discussed, R objects are used to store things created in R to memory. This includes plots.

                                                        dot_plot<-ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,color=dex))                                                                                dot_plot                                                      

We can add additional layers directly to our object. We will see how this works by defining some colors for our 'dex' variable.

Colors

ggplot2 will automatically assign colors to the categories in our data. Colors are assigned to the fill and color aesthetics in aes(). We can change the default colors by providing an additional layer to our figure. To change the color, we use the scale_color functions: scale_color_manual(), scale_color_brewer(), scale_color_grey(), etc. We can also change the name of the color labels in the legend using the labels argument of these functions

                                                        dot_plot +                                                              scale_color_manual(values=c("red","black"),                                                              labels=c('treated','untreated'))                                                      

                                                        dot_plot +                                                              scale_color_grey()                                                      

                                                        dot_plot +                                                              scale_color_brewer(palette = "Paired")                                                      

Similarly,if we want to change the fill, we would use the scale_fill options. To apply scale_fill to shape, we will have to alter the shapes, as only some shapes take a fill argument.

R Shapes{width=50%}

                                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,fill=dex),                                                              shape=21,size=2) + #increase size and change points                                                              scale_fill_manual(values=c("purple", "yellow"))                                                      

There are a number of ways to specify the color argument including by name, number, and hex code.Here is a great resource from the R Graph Gallery for assigning colors in R.

There are also a number of complementary packages in R that expand our color options. One of my favorites is viridis, which provides colorblind friendly palettes. randomcoloR is a great package if you need a large number of unique colors.

                                                        library(viridis) #Remember to load installed packages before use                                                      
                                    ## Loading required package: viridisLite                                  
                                                        dot_plot + scale_color_viridis(discrete=TRUE, option="viridis")                                                      

Paletteer contains a comprehensive set of color palettes, if you want to load the palettes from multiple packages all at once. See the Github page for details.

Facets

A way to add variables to a plot beyond mapping them to an aesthetic is to use facets or subplots. There are two primary functions to add facets, facet_wrap() and facet_grid(). If faceting by a single variable, use facet_wrap(). If multiple variables, use facet_grid(). The first argument of either function is a formula, with variables separated by a ~ (See below). Variables must be discrete (not continuous).

You should remember this plot from our reshaping example. The gene counts in the scaled_counts data were scaled to account for technical and composition differences using the trimmed mean of M values (TMM) from EdgeR (Robinson and Oshlack 2010). We can compare scaled vs unscaled counts by sample easily using faceting.

                                                        #density plot                                        #let's grab the data and take a look                                        density_data<-read.csv("./data/density_data.csv",                                                              stringsAsFactors=TRUE)                                                                                head(density_data)                                                      
                                    ##           feature sample SampleName   cell   dex albut        Run avgLength ## 1 ENSG00000000003    508 GSM1275862 N61311 untrt untrt SRR1039508       126 ## 2 ENSG00000000003    508 GSM1275862 N61311 untrt untrt SRR1039508       126 ## 3 ENSG00000000419    508 GSM1275862 N61311 untrt untrt SRR1039508       126 ## 4 ENSG00000000419    508 GSM1275862 N61311 untrt untrt SRR1039508       126 ## 5 ENSG00000000457    508 GSM1275862 N61311 untrt untrt SRR1039508       126 ## 6 ENSG00000000457    508 GSM1275862 N61311 untrt untrt SRR1039508       126 ##   Experiment    Sample    BioSample transcript ref_genome .abundant      TMM ## 1  SRX384345 SRS508568 SAMN02422669     TSPAN6       hg38      TRUE 1.055278 ## 2  SRX384345 SRS508568 SAMN02422669     TSPAN6       hg38      TRUE 1.055278 ## 3  SRX384345 SRS508568 SAMN02422669       DPM1       hg38      TRUE 1.055278 ## 4  SRX384345 SRS508568 SAMN02422669       DPM1       hg38      TRUE 1.055278 ## 5  SRX384345 SRS508568 SAMN02422669      SCYL3       hg38      TRUE 1.055278 ## 6  SRX384345 SRS508568 SAMN02422669      SCYL3       hg38      TRUE 1.055278 ##   multiplier        source abundance ## 1   1.415149        counts  679.0000 ## 2   1.415149 counts_scaled  960.8864 ## 3   1.415149        counts  467.0000 ## 4   1.415149 counts_scaled  660.8748 ## 5   1.415149        counts  260.0000 ## 6   1.415149 counts_scaled  367.9388                                  
                                                        #plot                                        ggplot(data= density_data)+                                                              aes(x=abundance,                                                              color=SampleName)+ #initialize ggplot                                                              geom_density() + #call density plot geom                                                              facet_wrap(~source) + #use facet_wrap; see ~source                                                              scale_x_log10()#scales the x axis using a base-10 log transformation                                                      
                                    ## Warning: Transformation introduced infinite values in continuous x-axis                                  
                                    ## Warning: Removed 140 rows containing non-finite values (stat_density).                                  

The distributions of sample counts did not differ greatly between samples before scaling, but regardless, we can see that the distributions are more similar after scaling.

Here, faceting allowed us to visualize multiple features of our data. We were able to see count distributions by sample as well as normalized vs non-normalized counts.

Note the help options with ?facet_wrap(). How would we make our plot facets vertical rather than horizontal?

                                                        ggplot(data= density_data)+ #initialize ggplot                                                              geom_density(aes(x=abundance,                                                              color=SampleName)) + #call density plot geom                                                              facet_wrap(~source, ncol=1) + #use the ncol argument                                                              scale_x_log10()                                                      
                                    ## Warning: Transformation introduced infinite values in continuous x-axis                                  
                                    ## Warning: Removed 140 rows containing non-finite values (stat_density).                                  

We could plot each sample individually using facet_grid()

                                                        ggplot(data= density_data)+ #initialize ggplot                                                              geom_density(aes(x=abundance,                                                              color=SampleName)) + #call density plot geom                                                              facet_grid(as.factor(sample)~source) + # formula is sample ~ source                                                              scale_x_log10()                                                      
                                    ## Warning: Transformation introduced infinite values in continuous x-axis                                  
                                    ## Warning: Removed 140 rows containing non-finite values (stat_density).                                  

Using multiple geoms per plot

Because we build plots using layers in ggplot2. We can add multiple geoms to a plot to represent the data in unique ways.

                                                        #We can combine geoms; here we combine a scatter plot with a                                        #add a line to our plot                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,color=dex)) +                                                              geom_line(aes(x=Num_transcripts, y = TotalCounts,color=dex))                                                      

                                                        #to make our code more effective, we can put shared aesthetics in the                                        #ggplot function                                        ggplot(data=sc, aes(x=Num_transcripts, y = TotalCounts,color=dex)) +                                                              geom_point() +                                                              geom_line()                                                      

                                                        #or plot different aesthetics per layer                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,                                                              color=SampleName)) +                                                              geom_line(aes(x=Num_transcripts, y = TotalCounts,color=dex))                                                      

                                                        #you can also add subsets of data in a new layer without overriding                                        #preceding layers                                        #let's only provide a line for the treated samples                                                              ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,                                                              color=SampleName)) +                                                              geom_line(data=filter(sc,dex=="trt"),                                                              aes(x=Num_transcripts, y = TotalCounts,color=dex))                                                      

To get multiple legends for the same aesthetic, check out the CRAN package ggnewscale.

Statistical transformations

Many graphs, like scatterplots, plot the raw values of your dataset. Other graphs, like bar charts, calculate new values to plot:

  • bar charts, histograms, and frequency polygons bin your data and then plot bin counts, the number of points that fall in each bin.
  • smoothers fit a model to your data and then plot predictions from the model.
  • boxplots compute a robust summary of the distribution and then display a specially formatted box. The algorithm used to calculate new values for a graph is called a stat, short for statistical transformation. --- R4DS

Let's plot a bar graph using the same data (sc) from above.

                                                        #returns an error message. What went wrong?                                        ggplot(data=sc) +                                                              geom_bar( aes(x=Num_transcripts, y = TotalCounts))                                                      
                                    ## Error: stat_count() can only have an x or y aesthetic.                                  
                                                        #What's the difference between stat identity and stat count?                                        ggplot(data=sc) +                                                              geom_bar( aes(x=Num_transcripts, y = TotalCounts), stat="identity")                                                      

Let's look at another example.

                                                        #Let's filter our data to only include 4 transcripts of interest                                        #We used this code in the tidyverse lesson                                        keep_t<-c("CPD","EXT1","MCL1","LASP1")                                        interesting_trnsc<-scaled_counts %>%                                                              filter(transcript %in% keep_t) %>% droplevels()                                                                                #the default here is `stat_count()`                                        ggplot(data = interesting_trnsc) +                                                              geom_bar(mapping = aes(x = transcript, y=counts_scaled))                                                      
                                    ## Error: stat_count() can only have an x or y aesthetic.                                  
                                                        #Let's take away the y aesthetic                                        ggplot(data = interesting_trnsc) +                                                              geom_bar(mapping = aes(x = transcript))                                                      

This is not a very useful figure, and probably not worth plotting. We could have gotten this info using str(). However, the point here is that there are default statistical transformations occurring with many geoms, and you can specify alternatives.

Let's change the stat parameter to "identity". This will plot the raw values of the normalized counts rather than how many rows are present for each transcript.

                                                        #stat identity defaulted to a stacked barplot                                        ggplot(data = interesting_trnsc) +                                                              geom_bar(mapping = aes(x = transcript,y=counts_scaled,                                                              fill=SampleName),                                                              stat="identity",color="black") +                                                              facet_wrap(~dex)                                                      

                                                        #What if we wanted the columns side by side                                        #introducing the position argument                                        ggplot(data = interesting_trnsc) +                                                              geom_bar(mapping = aes(x = transcript,y=counts_scaled,                                                              fill=SampleName),                                                              stat="identity",color="black",position="dodge") +                                                              facet_wrap(~dex)                                                      

How do we know what the default stat is for geom_bar()? Well, we could read the documentation, ?geom_bar(). This is true of multiple geoms. The statistical transformation can often be customized, so if the default is not what you need, check out the documentation to learn more about how to make modifications. For example, you could provide custom mapping for a box plot. To do this, see the examples section of the geom_boxplot() documentation.

Coordinate systems

ggplot2 uses a default coordinate system (the Cartesian coordinate system). This isn't super important until we want to do something like make a map (See coord_quickmap()) or pie chart (See coord_polar()).

When will we have to think about coordinate systems? We likely won't have to modify from default in too many cases (see those above). The most common circumstance in which we will likely need to change coordinate system is in the event that we want to switch the x and y axes (?coord_flip()).

                                                        #let's return to our bar plot above                                        #get horizontal bars instead of vertical bars                                                                                ggplot(data = interesting_trnsc) +                                                              geom_bar(mapping = aes(x = transcript,y=counts_scaled,                                                              fill=SampleName),                                                              stat="identity",color="black",position="dodge") +                                                              facet_wrap(~dex) +                                                              coord_flip()                                                      

Labels, legends, scales, and themes

How do we ultimately get our figures to a publishable state? The bread and butter of pretty plots really falls to the additional non-data layers of our ggplot2 code. These layers will include code to label the axes, scale the axes, and customize the legends and theme.

The default axes and legend titles come from the ggplot2 code.

                                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,fill=dex),                                                              shape=21,size=2) +                                                              scale_fill_manual(values=c("purple", "yellow"))                                                      

In the above plot, the y-axis label (TotalCounts) is the variable name mapped to the y aesthetic, while the x-axis label (Num_transcripts) is the variable name named to the x aesthetic. The fill aesthetic was set equal to "dex", and so this became the default title of the fill legend. We can change these labels using ylab(), xlab(), and guide() for the legend.

                                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,fill=dex),                                                              shape=21,size=2) +                                                              scale_fill_manual(values=c("purple", "yellow"),                                                              labels=c('treated','untreated'))+                                                              #can change labels of fill levels along with colors                                                              xlab("Recovered transcripts per sample") + #add x label                                                              ylab("Total sequences per sample") #add y label                                                      

Let's change the legend title.

                                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,fill=dex),                                                              shape=21,size=2) +                                                              scale_fill_manual(values=c("purple", "yellow"),                                                              labels=c('treated','untreated'))+                                                              #can change labels of fill levels along with colors                                                              xlab("Recovered transcripts per sample") + #add x label                                                              ylab("Total sequences per sample") +#add y label                                                              guides(fill = guide_legend(title="Treatment"))                                                      

We can modify the axes scales of continuous variables using scale_x_contiuous() and scale_y_continuous(). Discrete (categorical variable) axes can be modified using scale_x_discrete() and scale_y_discrete().

                                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,fill=dex),                                                              shape=21,size=2) +                                                              scale_fill_manual(values=c("purple", "yellow"),                                                              labels=c('treated','untreated'))+                                                              #can change labels of fill levels along with colors                                                              xlab("Recovered transcripts per sample") + #add x label                                                              ylab("Total sequences per sample") +#add y label                                                              guides(fill = guide_legend(title="Treatment")) + #label the legend                                                              scale_y_continuous(breaks=seq(1.0e7, 3.5e7, by = 2e6),                                                              limits=c(1.0e7,3.5e7)) #change breaks and limits                                                      

                                                        #maybe we want this on a logarithmic scale                                        ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,fill=dex),                                                              shape=21,size=2) +                                                              scale_fill_manual(values=c("purple", "yellow"),                                                              labels=c('treated','untreated'))+                                                              #can change labels of fill levels along with colors                                                              xlab("Recovered transcripts per sample") + #add x label                                                              ylab("Total sequences per sample") +#add y label                                                              guides(fill = guide_legend(title="Treatment")) + #label the legend                                                              scale_y_continuous(trans="log10") #use the trans argument                                                      

Finally, we can change the overall look of non-data elements of our plot (titles, labels, fonts, background, gridlines, and legends) by customizing ggplot2 themes. Check out ?ggplot2::theme(). For a list of available parameters. ggplot2 provides 8 complete themes, with theme_gray() as the default theme.
ggplot2 complete themes You can also create your own custom theme and then apply it to all figures in a plot.

Create a custom theme to use with multiple figures.

                                                        #Setting a theme                                        my_theme <-                                                              theme_bw() +                                                              theme(                                                              panel.border = element_blank(),                                                              axis.line = element_line(),                                                              panel.grid.major = element_line(size = 0.2),                                                              panel.grid.minor = element_line(size = 0.1),                                                              text = element_text(size = 12),                                                              legend.position = "bottom",                                                              axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1)                                                              )                                                                                                                                                                ggplot(data=sc) +                                                              geom_point(aes(x=Num_transcripts, y = TotalCounts,fill=dex),                                                              shape=21,size=2) +                                                              scale_fill_manual(values=c("purple", "yellow"),                                                              labels=c('treated','untreated'))+                                                              #can change labels of fill levels along with colors                                                              xlab("Recovered transcripts per sample") + #add x label                                                              ylab("Total sequences per sample") +#add y label                                                              guides(fill = guide_legend(title="Treatment")) + #label the legend                                                              scale_y_continuous(trans="log10") + #use the trans argument                                                              my_theme                                                      

Saving plots (ggsave())

Finally, we have a quality plot ready to publish. The next step is to save our plot to a file. The easiest way to do this with ggplot2 is ggsave(). This function will save the last plot that you displayed by default. Look at the function parameters using ?ggsave().

                                                        ggsave("Plot1.png",width=5.5,height=3.5,units="in",dpi=300)                                                      

Nice plot example

These steps can be used to create a publish worthy figure. For example, let's create a volcano plot of our differential expression results.

A volcano plot is a type of scatterplot that shows statistical significance (P value) versus magnitude of change (fold change). It enables quick visual identification of genes with large fold changes that are also statistically significant. These may be the most biologically significant genes. --- Maria Doyle, 2021

                                                        #get the data                                        dexp_sigtrnsc<-dexp %>%                                                              mutate(Significant = FDR < 0.05 & abs(logFC) >= 2) %>% arrange(FDR)                                        topgenes<-dexp_sigtrnsc$transcript[c(1:6)]                                                      

Plot

                                                        #install.packages(ggrepel)                                        library(ggrepel)                                        ggplot(data=dexp_sigtrnsc,aes(x = logFC, y = log10(FDR))) +                                                              geom_point(aes( color = Significant, size = Significant,                                                              alpha = Significant)) +                                                              geom_text_repel(data=dexp_sigtrnsc %>%                                                              filter(transcript %in% topgenes),                                                              aes(label=transcript),                                                              nudge_y=0.5,hjust=0.5,direction="y",                                                              segment.color="gray") +                                                              scale_y_reverse(limits=c(0,-7))+                                                              scale_color_manual(values = c("black", "#e11f28")) +                                                              scale_size_discrete(range = c(0, 2)) +                                                              guides(size = "none", alpha= "none")+                                                              my_theme                                                      
                                    ## Warning: Using size for a discrete variable is not advised.                                  
                                    ## Warning: Using alpha for a discrete variable is not advised.                                  

Recommendations for creating publishable figures

(Inspired by Visualizing Data in the Tidyverse, a Coursera lesson)

  1. Consider whether the plot type you have chosen is the best way to convey your message
  2. Make your plot visually appealing

    • Careful color selection - color blind friendly if possible
    • Eliminate unnecessary white space
    • Carefully choose themes including font types
  3. Label all axes with concise and informative labels

    • These labels should be straight forward and adequately describe the data
  4. Ask yourself "Does the data make sense?"

    • Does the data plotted address the question you are answering?
  5. Try not to mislead the audience

    • Often this means starting the y-axis at 0
    • Keep axes consistent when arranging facets or multiple plots
  6. Do not try to convey too much information in the same plot

    • Keep plots fairly simple

Complementary packages

There are many complementary R packages related to creating publishable figures using ggplot2. Check out the packages cowplot and ggpubr. Cowplot is particularly great for providing functions that facilitate arranging multiple plots in a grid panel. Usually publications restrict the number of figures allowed, and so it is helpful to be able to group multiple figures into a single figure panel. GGpubr is particularly great for beginners, providing easy code to make publish worthy figures. It is particularly great for stats integration and easily incorporating brackets and p-values for group comparisons.

Resource list

  1. ggplot2 cheatsheet
  2. The R Graph Gallery
  3. The R Graphics Cookbook

Acknowledgements

Material from this lesson was adapted from Chapter 3 of R for Data Science and from a 2021 workshop entitled Introduction to Tidy Transciptomics by Maria Doyle and Stefano Mangiola.

hernandezkneand1992.blogspot.com

Source: https://btep.ccr.cancer.gov/docs/rintro/Lesson3_rmd_to_md/

0 Response to "Ransformation Introduced Infinite Values in Continuous Y axis"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel