/usr/lib/R/site-library/phyloseq/doc/phyloseq-mixture-models.Rmd is in r-bioc-phyloseq 1.22.3-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | ---
title: "Example using Negative Binomial in Microbiome Differential Abundance Testing"
output:
BiocStyle::html_document:
fig_height: 7
fig_width: 10
toc: yes
toc_depth: 2
number_sections: true
---
<!--
%% \VignetteEngine{knitr::rmarkdown}
%% \VignetteIndexEntry{phyloseq and DESeq2 on Colorectal Cancer Data}
-->
`r library("knitr")`
`r opts_chunk$set(cache=FALSE, fig.width=9, message=FALSE, warning=FALSE)`
Paul J. McMurdie and Susan Holmes
<mcmurdie@stanford.edu>
[phyloseq Home Page](http://joey711.github.io/phyloseq/)
If you find phyloseq and/or its tutorials useful, please acknowledge and cite phyloseq in your publications:
**phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data** (2013) PLoS ONE 8(4):e61217
http://dx.plos.org/10.1371/journal.pone.0061217
# Other resources
The phyloseq project also has a number of supporting online resources, most of which can by found at [the phyloseq home page](http://joey711.github.com/phyloseq/), or from the phyloseq stable release [page on Bioconductor](http://bioconductor.org/packages/release/bioc/html/phyloseq.html).
To post feature requests or ask for help, try [the phyloseq Issue Tracker](https://github.com/joey711/phyloseq/issues).
# The experimental data used in this example
In this example I use the publicly available data from a study on colorectal cancer:
[Genomic analysis identifies association of Fusobacterium with colorectal carcinoma](http://genome.cshlp.org/content/22/2/292.long).
Kostic, A. D., Gevers, D., Pedamallu, C. S., Michaud, M., Duke, F., Earl, A. M., et al. (2012). *Genome research*, 22(2), 292-298.
As a side-note, this work was published ahead of print in [Genome Research](http://genome.cshlp.org/) alongside a highly-related article from a separate group of researchers (long-live reproducible observations!): [Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma](http://genome.cshlp.org/content/22/2/299.long). In case you are interested. For the purposes of example, however, we will stick to the data from the former study, with data available at the [microbio.me/qiime](http://www.microbio.me/qiime/) server.
Data source, from methods section in article:
> The 16S gene data set consists of 454 FLX Titanium sequences spanning the V3 to V5 variable regions obtained for 190 samples (95 pairs). Detailed protocols used for 16S amplification and se- quencing are available on the HMP Data Analysis and Coordination Center website (http://www.hmpdacc.org/tools_protocols/tools_ protocols.php).
Study ID: `1457`
Project Name: `Kostic_colorectal_cancer_fusobacterium`
Study Abstract:
> The tumor microenvironment of colorectal carcinoma is a complex community of genomically altered cancer cells, nonneoplastic cells, and a diverse collection of microorganisms. Each of these components may contribute to carcino genesis; however, the role of the microbiota is the least well understood. We have characterized the composition of the microbiota in colorectal carcinoma using whole genome sequences from nine tumor/normal pairs. Fusobacterium sequences were enriched in carcinomas, confirmed by quantitative PCR and 16S rDNA sequence analysis of 95 carcinoma/normal DNA pairs, while the Bacteroidetes and Firmicutes phyla were depleted in tumors. Fusobacteria were also visualized within colorectal tumors using FISH. These findings reveal alterations in the colorectal cancer microbiota; however, the precise role of Fusobacteria in colorectal carcinoma pathogenesis requires further investigation.
# Import data with phyloseq, convert to DESeq2
Start by loading phyloseq.
```{r load-phyloseq, message=FALSE, warning=FALSE}
library("phyloseq"); packageVersion("phyloseq")
```
Defined file path, and import the published OTU count data into R.
```{r filepath}
filepath = system.file("extdata", "study_1457_split_library_seqs_and_mapping.zip", package="phyloseq")
kostic = microbio_me_qiime(filepath)
```
Here I had to use a relative file path so that this example works on all systems that have phyloseq installed. In practice, your file path will look like this (if you've downloaded the data ahead of time):
```{r example-path-local, eval=FALSE}
filepath = "~/Downloads/study_1457_split_library_seqs_and_mapping.zip"
kostic = microbio_me_qiime(filepath)
```
Or like this (if you're accessing data directly from the microbio.me/qiime server directly):
```{r example-path-remote, eval=FALSE}
kostic = microbio_me_qiime(1457)
```
# Convert to DESeq2's DESeqDataSet class
In this example I'm using the major sample covariate, `DIAGNOSIS`, as the study design factor. The focus of this study was to compare the microbiomes of pairs of healthy and cancerous tissues, so this makes sense. Your study could have a more complex or nested design, and you should think carefully about the study design formula, because this is critical to the test results and their meaning. You might even need to define a new factor if none of the variables in your current table appropriately represent your study's design. See [the DESeq2 home page](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html) for more details.
Here is the summary of the data variable `kostic` that we are about to use, as well as the first few entries of the `DIAGNOSIS` factor.
```{r show-variables}
kostic
head(sample_data(kostic)$DIAGNOSIS, 10)
```
# DESeq2 conversion and call
First load DESeq2.
```{r deseq2, message=FALSE, warning=FALSE}
library("DESeq2"); packageVersion("DESeq2")
```
The following two lines actually do all the complicated DESeq2 work. The function `phyloseq_to_deseq2` converts your phyloseq-format microbiome data into a `DESeqDataSet` with dispersions estimated, using the experimental design formula, also shown (the `~DIAGNOSIS` term). The `DESeq` function does the rest of the testing, in this case with default testing framework, but you can actually use alternatives.
First remove the 5 samples that had no `DIAGNOSIS` attribute assigned.
These introduce a spurious third design class
that is actually a rare artifact in the dataset.
Also remove samples with less than `500` reads (counts).
Note that this kind of data cleanup
is useful, necessary, and should be well-documented
because it can also be dangerous to alter or omit data
without clear documentation.
In this case I actually explored the data first,
and am omitting some of the details
(and explanatory plots) here for clarity.
```{r rm-bad-samples}
kostic <- subset_samples(kostic, DIAGNOSIS != "None")
kostic <- prune_samples(sample_sums(kostic) > 500, kostic)
kostic
```
```{r run-deseq2}
diagdds = phyloseq_to_deseq2(kostic, ~ DIAGNOSIS)
# calculate geometric means prior to estimate size factors
gm_mean = function(x, na.rm=TRUE){
exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}
geoMeans = apply(counts(diagdds), 1, gm_mean)
diagdds = estimateSizeFactors(diagdds, geoMeans = geoMeans)
diagdds = DESeq(diagdds, fitType="local")
```
Note: The default multiple-inference correction is Benjamini-Hochberg, and occurs within the `DESeq` function.
# Investigate test results table
The following `results` function call creates a table of the results of the tests. Very fast. The hard work was already stored with the rest of the DESeq2-related data in our latest version of the `diagdds` object (see above). I then order by the adjusted p-value, removing the entries with an `NA` value. The rest of this example is just formatting the results table with taxonomic information for nice(ish) display in the HTML output.
```{r grab-results-process-table}
res = results(diagdds)
res = res[order(res$padj, na.last=NA), ]
alpha = 0.01
sigtab = res[(res$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(kostic)[rownames(sigtab), ], "matrix"))
head(sigtab)
```
Let's look at just the OTUs that were significantly enriched in the carcinoma tissue. First, cleaning up the table a little for legibility.
```{r table-prelim}
posigtab = sigtab[sigtab[, "log2FoldChange"] > 0, ]
posigtab = posigtab[, c("baseMean", "log2FoldChange", "lfcSE", "padj", "Phylum", "Class", "Family", "Genus")]
```
```{r make-markdown-table, echo=FALSE, results='asis'}
# Make a markdown table
posigtab = data.frame(OTU=rownames(posigtab), posigtab)
cat(paste(colnames(posigtab), collapse=" | "), fill=TRUE)
cat(paste(rep("---", times=ncol(posigtab)), collapse=" | "), fill=TRUE)
dummy = apply(posigtab, 1, function(x){
cat(paste(x, collapse=" | "), fill=TRUE)
})
```
As expected from the original study abstract and title, a *Fusobacterium* OTU was among the most-significantly differentially abundant between the cancerous and healthy samples.
# Plot Results
Here is a bar plot showing the log2-fold-change, showing Genus and Phylum. Uses some ggplot2 commands.
```{r bar-plot}
library("ggplot2")
theme_set(theme_bw())
sigtabgen = subset(sigtab, !is.na(Genus))
# Phylum order
x = tapply(sigtabgen$log2FoldChange, sigtabgen$Phylum, function(x) max(x))
x = sort(x, TRUE)
sigtabgen$Phylum = factor(as.character(sigtabgen$Phylum), levels=names(x))
# Genus order
x = tapply(sigtabgen$log2FoldChange, sigtabgen$Genus, function(x) max(x))
x = sort(x, TRUE)
sigtabgen$Genus = factor(as.character(sigtabgen$Genus), levels=names(x))
ggplot(sigtabgen, aes(y=Genus, x=log2FoldChange, color=Phylum)) +
geom_vline(xintercept = 0.0, color = "gray", size = 0.5) +
geom_point(size=6) +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))
```
|