# Analyzing Samples¶

## Qiita now uses QIIME2 plugins for analysis.¶

Thanks to this, we’ve got new layout of the analysis panel and the following new features:

- Alpha Diversity (including statistics calculations; example here)
- Beta Diversity (including stats)
- Principal Coordinate Analysis (PCoA), including ordination results and EMPeror plots (example here)
- Rarefaction
- Filter Samples
- Taxa Summary (example here)

## Creating A New Analysis¶

**Create New Analysis Page****Filter results by column data (Title, Abstract, PI, etc.)**: Searches for studies with the title/abstract/PI/etc. that you inputted**Filter study by Study Tags**: Searches for studies with the tag you searched for**Title**: Brings you to Study Information Page of that experiment**Green Expand for Analysis Button**: Reveals the studies done on this data that can be used for further analysis**Per Artifact Button**: Reveals the names of the artifacts, the number of samples in the prep info, and the files**Add**: Adds data to be analyzed- More than 1 can be done at once to do large meta-data analysis

**Create New Analysis**: Creates the analysis using the data that has been added**Analysis Name**(required): Name for the analysis that will be done**Description**(optional): Description for the analysis that will be done

## Single vs. Meta Analysis¶

**Single analysis**: One study chosen to analyze**Meta-analysis**: Multiple studies chosen to analyze*You can only merge like data*

## Processing Network Page: Commands¶

### Rarefying Features¶

**Rarefy features**: Subsample frequencies from all samples without replacement so that the sum of frequencies in each sample is equal to the sampling-depth.**BIOM table**(required): Feature table containing the samples for which features should be rarefied**Parameter set**: Parameters at which the rarefication is run**Sampling depth**(required): Total frequency that each sample should be rarefied to, samples where sum of frequencies is less than sampling depth will not be included in resulting table

Note that rarefaction has some advantages for beta-diversity analyses [11], but can have undesirable properties in tests of differential abundance [12]. To analyze your data with alternative normalization strategies, you can easily download the raw biom tables (see Downloading From Qiita) and load them into an analysis pipeline such as Phyloseq.

### Filtering Samples by Metadata¶

**Filter samples by metadata**: Filters samples from an OTU table on the basis of the number of observations in that sample, or on the basis of sample metadata**BIOM table**(required): Feature table containing the samples for which features should be filtered**Maximum feature frequency across samples**(optional): Maximum total frequency that a feature can have to be retained**Maximum features per sample**(optional): Maximum number of features that a sample can have to be retained**Minimum feature frequency across samples**(optional): Minimum total frequency that a feature must have to be retained**Minimum features per sample**(optional): Minimum number of features that a sample can have to be retained**SQLite WHERE-clause**(optional): Metadata group that is being filtered out- If you want to filter your samples by body_site and you want to only keep the tongue samples, fill the clause this way:
`body_site = 'UBERON:tongue'`

- If you want to filter your samples by body_site and you want to only remove the tongue samples, fill the clause this way:
`body_site != 'UBERON:tongue'`

- If you want to filter your samples by body_site and you want to only keep the tongue samples, fill the clause this way:

### Summarizing Taxa¶

**Summarize Taxa**: Creates a bar plot of the taxa within the analysis*Can only be performed with closed-reference data***BIOM table**(required): Feature table containing the samples to visualize at various taxonomic levels

### Calculating Alpha Diversity¶

**Calculate alpha diversity**[13] : Measures the diversity within a sample**BIOM table**(required): Feature table containing the samples for which alpha diversity should be computed**Diversity metric**(required): Alpha diversity metric to be run**Abundance-based Coverage Estimator (ACE) metric**[14] : Calculates the ACE metric- Estimates species richness using a correction factor

**Berger-Parker Dominance Index**[15] : Calculates Berger-Parker dominance index- Relative richness of the abundant species

**Brillouin’s index**[16] : Calculates Brillouin’s index- Measures the diversity of the species present
- Use when randomness can’t be guaranteed

**Chao1 index**[14] : Calculates Chao1 index- Estimates diversity from abundant data
- Estimates number of rare taxa missed from undersampling

**Dominance measure**: Calculates dominance measure- How equally the taxa are presented

**Effective Number of Species (ENS)/Probability of intra-or interspecific encounter (PIE) metric**[17] : Calculates Effective Number of Species (ENS)/Probability of intra-or interspecific encounter (PIE) metric- Shows how absolute amount of species, relative abundances of species, and their intraspecific clustering affect differences in biodiversity among communities

**Faith’s phylogenetic diversity**[18] : Calculates faith’s phylogenetic diversity- Measures of biodiversity that incorporates phylogenetic difference between species
- Sum of length of branches

**Fisher’s index**[19] : Calculates Fisher’s index- Relationship between the number of species and the abundance of each species

**Gini index**[20] : Calculates Gini index- Measures species abundance
- Assumes that the sampling is accurate and that additional data would fall on linear gradients between the values of the given data

**Good’s coverage of counts**[21] : Calculates Good’s coverage of counts.- Estimates the percent of an entire species that is represented in a sample

**Heip’s evenness measure**[22] : Calculates Heip’s evenness measure.- Removes dependency on species number

**Lladser’s point estimate**[23] : Calculates Lladser’ point estimate- Estimates how much of the environment contains unsampled taxa
- Best estimate on a complete sample

**Margalef’s richness index**[24] : Calculates Margalef’s richness index- Measures species richness in a given area or community

**Mcintosh dominance index D**[25] : Calculates McIntosh dominance index D- Affected by the variation in dominant taxa and less affected by the variation in less abundant or rare taxa

**Mcintosh evenness index E**[22] : Calculates McIntosh’s evenness measure E- How evenly abundant taxa are

**Menhinick’s richness index**[24] : Calculates Menhinick’s richness index- The ratio of the number of taxa to the square root of the sample size

**Michaelis-Menten fit to rarefaction curve of observed OTUs**[26] : Calculates Michaelis-Menten fit to rarefaction curve of observed OTUs.- Estimated richness of species pools

**Number of distinct features**[27] : Calculates number of distinct OTUs**Number of double occurrences**: Calculates number of double occurrence OTUs (doubletons)- OTUs that only occur twice

**Number of single occurrences**: Calculates number of single occurrence OTUs (singletons)- OTUs that appear only once in a given sample

**Pielou’s evenness**[28] : Calculates Pielou’s eveness- Measure of relative evenness of species richness

**Robbins’ estimator**[29] : Calculates Robbins’ estimator- Probability of unobserved outcomes

**Shannon’s index**[30] : Calculates Shannon’s index- Calculates richness and diversity using a natural logarithm
- Accounts for both abundance and evenness of the taxa present

**Simpson evenness measure E**[31] : Calculates Simpson’s evenness measure E.- Diversity that account for the number of organisms and number of species

**Simpson’s index**[31] : Calculates Simpson’s index- Measures the relative abundance of the different species making up the sample richness

**Strong’s dominance index (Dw)**[32] : Calculates Strong’s dominance index- Measures species abundance unevenness

**Phylogenetic tree**(required for Faith PD): Phylogenetic tree to be used with alpha analyses (only include when necessary)- Currently the only tree that can be used is the GreenGenes 97% OTU based phylogenetic tree

### Calculating Beta Diversity¶

**Calculate beta diversity**[13] : Measured the diversity between samples**BIOM table**(required): Feature table containing the samples for which beta diversity should be computed**Adjust variance**[33] (phylogenetic only): Performs variance adjustment- Weighs distances based on the proportion of the relative abundance represented between the samples at a given node under evaluation

**Alpha value**(Generalized UniFrac only): Controls importance of sample proportions- 1.0 is weighted normalized UniFrac. 0.0 is close to unweighted UniFrac, but only if the sample are dichotomized.

**Bypass tips**(phylogenetic only): In a bifurcating tree, the tips make up about 50% of the nodes in a tree. By ignoring them, specificity can be traded for reduced compute time. This has the effect of collapsing the phylogeny, and is analogous (in concept) to moving from 99% to 97% OTUs**Diversity metric**(required): Beta diversity metric to be run**Bray-Curtis dissimilarity**[34] : Calculates Bray–Curtis dissimilarity- Fraction of overabundant counts

**Canberra distance**[35] : Calculates Canberra distance- Overabundance on a feature by feature basis

**Chebyshev distance**[36] : Calculates Chebyshev distance- Maximum distance between two samples

**City-block distance**[37] : Calculates City-block distance- Similar to the Euclidean distance but the effect of a large difference in a single dimension is reduced

**Correlation coefficient**[38] : Measures Correlation coefficient- Measure of strength and direction of linear relationship between samples

**Cosine Similarity**[39] : Measures Cosine similarity- Ratio of the amount of common species in a sample to the mean of the two samples

**Dice measures**[40] : Calculates Dice measure- Statistic used for comparing the similarity of two samples
- Only counts true positives once

**Euclidean distance**[41] : Measures Euclidean distance- Species-by-species distance matrix

**Generalized Unifrac**[42] : Measures Generalized UniFrac- Detects a wider range of biological changes compared to unweighted and weighted UniFrac

**Hamming distance**[43] : Measures Hamming distance- Minimum number of substitutions required to change one group to the other

**Jaccard similarity index**[44] : Calculates Jaccard similarity index- Fraction of unique features, regardless of abundance

**Kulczynski dissimilarity index**[45] : Measures Kulczynski dissimilarity index- Describes the dissimilarity between two samples

**Matching components**[46] : Measures Matching components- Compares indices under all possible situations

**Rogers-tanimoto distance**[47] : Measures Rogers-Tanimoto distance- Allows the possibility of two samples, which are quite different from each other, to both be similar to a third

**Russel-Rao coefficient**[48] : Calculates Russell-Rao coefficients- Equal weight is given to matches and non-matches

**Sokal-Michener coefficient**[49] : Measures Sokal-Michener coefficient- Proportion of matches between samples

**Sokal-Sneath Index**[50] : Calculates Sokal-Sneath index- Measure of species turnover

**Species-by-species Euclidean**[41] : Measures Species-by-species Euclidean- Standardized Euclidean distance between two groups
- Each coordinate difference between observations is scaled by dividing by the corresponding element of the standard deviation

**Squared Euclidean**[41] : Measures squared Euclidean distance- Place progressively greater weight on samples that are farther apart

**Unweighted Unifrac**[51] : Measures unweighted UniFrac- Measures the fraction of unique branch length

**Weighted Minkowski metric**[52] : Measures Weighted Minkowski metric- Allows the use of the k-means-type paradigm to cluster large data sets

**Weighted normalized UniFrac**[53] : Measures Weighted normalized UniFrac- Takes into account abundance
- Normalization adjusts for varying root-to-tip distances.

**Weighted unnormalized UniFrac**[53] : Measures Weighted unnormalized UniFrac- Takes into account abundance
*Doesn’t correct for unequal sampling effort or different evolutionary rates between taxa*

**Yule index**[19] : Measures Yule index- Measures biodiversity
- Determined by the diversity of species and the proportions between the abundance of those species.

**Number of jobs**: Number of workers to use**Phylogenetic tree**(required for Weighted Minkowski metric and all UniFrac metrics): Phylogenetic tree to be used with beta analyses (only include when necessary)- Currently the only tree that can be used is the GreenGenes 97% OTU based phylogenetic tree

### Calculating Alpha Correlation¶

**Calculate alpha correlation**[54] : Determines if the numeric sample metadata category is correlated with alpha diversity

### Performing Principal Coordinate Analysis¶

### Calculating Beta Group Significance¶

**Calculate beta group significance**: Determines whether groups of samples are significantly different from one another using a permutation-based statistical test**Distance matrix**(required): Matrix of distances between pairs of samples**Comparison Type**(required): Perform or not perform pairwise tests between all pairs of groups in addition to the test across all groups**Metadata category**(required): Category from metadata file or artifact viewable as metadata**Method**(required): Correlation test being applied**Anosim**[59] : Describes the strength and significance that a category has in determining the distances between points and can accept either categorical or continuous variables in the metadata mapping file**Permanova**[60] : Describes the strength and significance that a category has in determining the distances between points and can accept categorical variables

**Number of permutations**(required): Number of permutations to be run when computing p-values

### Calculating Beta Correlation¶

**Calculate beta correlation**: Identifies a correlation between the distance matrix and a numeric sample metadata category**Distance-matrix**(required): Matrix of distances between pairs of samples**Correlation method**(required): Correlation test being applied**Metadata-category**(required): Category from metadata file or artifact viewable as metadata**Number of permutations**(required): Number of permutations to be run when computing p-values

## Processing Network Page: Results¶

### Taxa Bar Plot¶

**Taxonomic Level**: How specific the taxa will be displayed- 1- Kingdom, 2- Phylum, 3- Class, 4- Order, 5- Family, 6- Genus, 7- Species

**Color Palette**: Changes the coloring of your taxa bar plot**Discrete**: Each taxon is a different color**Continuous**: Each taxon is a different shade of one color

**Sort Sample By**: Sorts data by sample metadata or taxonomic abundance and either by ascending or descending order

### Alpha Diversity Box Plots and Statistics¶

**Boxplot**: Shows how different measures of alpha diversity correlate with different metadata categories**Category**: Choose the metadata category you would like to analyze**Kruskal-Wallis**[61] : Result of Kruskal-Wallis tests- Says if the differences are statistically significant

### Alpha Correlation Box Plots and Statistics¶

**Boxplot**: Shows how different measures of alpha diversity correlate with different metadata categories- Gives the Spearman or Pearson result (rho and p-value)

### Principal Coordinate Analysis Plot¶

**Emperor Plot**: Visualization of similarities/dissimilarities between samples**Color**: Choose colors for each group**Color Category**: Groups each sample by the given category chosen by a given color

**Visibility**Allows for making certain samples invisible*Does not remove them from the analysis*- Must perform filtering to do that

**Opacity**: Change the transparency of a given category**Scale**: Change the size of a given category**Shape**: Groups each sample by the given category chosen by a given shape**Axes**: Change the position of the axis as well as the color of the graph**Animations**: Traces the samples sorted by a metadata category*Requires a gradient column (the order in which samples are connected together, must be numeric) and a trajectory column (the way in which samples are grouped together) within the sample information file**Works best for time series*

### Beta Group Significance Box Plots and Statistics¶

**Boxplot**: Shows how different measures of beta diversity correlate with different metadata categories- Gives the Permanova or Anosim result (psuedo-F and p-value)

### Beta Correlation¶

- Gives the Spearman or Pearson result (rho and p-value)
- Gives scatterplot of the distance matrix on the x-axis and the variable being tested on the y-axis