dms_summarizealignments

Overview

dms_summarizealignments is a program included with the dms_tools package. It can be used to make plots that summarize the results after examining several samples with any of the following programs:

After you install dms_tools, this program will be available to run at the command line.

Command-line usage

Makes plots that summarize alignments for one or more samples. Designed to be run after you have used another program to make your alignments, and now you want to visualize the results. This script is part of dms_tools (version 1.1.20) written by the Bloom Lab (see https://github.com/jbloomlab/dms_tools/graphs/contributors for all contributors). Detailed documentation is at http://jbloomlab.github.io/dms_tools/

usage: dms_summarizealignments [-h] [--chartype {codon}]
                               [--maxmutcounts MAXMUTCOUNTS]
                               [--maxperbarcode MAXPERBARCODE]
                               [--writemutfreqs] [--groupbyfirst] [-v]
                               outprefix
                               {barcodedsubamplicons,subassemble,matchsubassembledbarcodes}
                               alignments [alignments ...]

Positional Arguments

outprefix

Prefix for the output PDF plot files. Any existing files with these names are overwritten.

See Output files for a detailed description of the created files and examples.

alignment_type

Possible choices: barcodedsubamplicons, subassemble, matchsubassembledbarcodes

The type of data being summarized. Use ‘barcodedsubamplicons’ if output is from ‘dms_barcodedsubamplicons’, ‘subassemble’ if output is from ‘dms_subassemble’, ‘matchsubassembledbarcodes’ if output is from ‘dms_matchsubassembledbarcodes’.

alignments

This argument is repeated to specify each alignment that is being summarized. Each repetition is two comma-delimited strings (no spaces): ALIGNPREFIX,NAME. ALIGNPREFIX is the value of ‘–outprefix’ used when calling the alignment program and NAME is the name assigned to that sample in the summaries. We expect to find all of the alignment output files with ALIGNPREFIX; these are the input data.

ALIGNPREFIX should be the value of –outprefix used when running dms_barcodedsubamplicons, dms_subassemble, or dms_matchsubassembledbarcodes (depending on the value of alignment_type). This program expects to find the files that are created by running that program with –outprefix as ALIGNPREFIX. NAME is simply the name given to each sample in the Output files.

Named Arguments

--chartype

Possible choices: codon

Character type used for the alignments / mutation counting.

Default: “codon”

--maxmutcounts

Maximum x-value for cumulative counts plots plots.

Default: 20

--maxperbarcode
 

In ‘barcodes.pdf’ plot, group all barcodes with >= this many reads.

Default: 3

--writemutfreqs
 

Write a file ‘mutfreqs.txt’ that gives the numerical values plotted in ‘mutfreqs.pdf’?

Default: False

--groupbyfirst

Make separate ‘mutdepth’ and ‘cumulfracs’ plots for each name prefix (first word before underscore).

Default: False

This option currently only applies when alignment_type is matchsubassembledbarcodes`. In this case, for the ``allmutdepth.pdf, singlemutdepth.pdf, allcumulfracs.pdf, and singlecumulfracs.pdf Output files, a separate output file is made by grouping samples with the same first word (prefix) in the name, where the first word is separated from the others by an underscore (_). This first name is then added directly before the plot extension in the file name.

-v, --version show program’s version number and exit

Example usage

Imagine that you have six samples, DNA, mutDNA, virus-p1, mutvirus-p1, virus-p2, and mutvirus-p2, and furthermore imagine that for each sample you ran dms_barcodedsubamplicons with its --outprefix option to values of DNA/DNA_, mutDNA/mutDNA_, etc. You could then summarize the results with the following command:

dms_summarizealignments alignmentsummary_ barcodedsubamplicons DNA/DNA_,DNA mutDNA/mutDNA_,mutDNA virus-p1/virus-p1_,virus-p1 mutvirus-p1/mutvirus-p1_,mutvirus-p1 virus-p2/virus-p2_,virus-p2 mutvirus-p2/mutvirus-p2_,mutvirus-p2

Because this command uses outprefix of alignmentsummary_, the Output files will have names like alignmentsummary_mutfreqs.pdf, alignmentsummary_mutcounts_all.pdf, etc. The alignments option specifies the prefix for each of the six samples, and then gives them the same names as above (DNA, mutDNA, etc).

Alternatively, if you have subassembled samples Lib1, Lib2, Lib3, and WT with dms_subassemble with --outprefix as subassembly/Lib1, subassembly/Lib2, etc then you could summarize the results with:

dms_summarizealignments subassemblysummary_ subassemble ./subassembly/Lib1,Lib1 ./subassembly/Lib2,Lib2 ./subassembly/Lib3,Lib3 ./subassembly/WT,WT

Because the command uses outprefix of subassemblysummary_, the Output files will have names like subassemblysummary_reads.pdf, subassemblysummary_depth.pdf, and subassemblysummary_nsubassembled.pdf.

Alternatively, if you have matched barcodes for samples after subassembly, you could summarize the results with:

dms_summarizealignments matchbarcodesummary_ matchsubassembledbarcodes ./barcodes//Lib1_Input,Lib1_Input ./barcodes//Lib1_0-cip,Lib1_0-cip ./barcodes//Lib1_4-cip,Lib1_4-cip ./barcodes//Lib1_10-cip,Lib1_10-cip ./barcodes//Lib1_15-cip,Lib1_15-cip ./barcodes//Lib2_Input,Lib2_Input ./barcodes//Lib2_0-cip,Lib2_0-cip ./barcodes//Lib2_4-cip,Lib2_4-cip ./barcodes//Lib2_10-cip,Lib2_10-cip ./barcodes//Lib2_15-cip,Lib2_15-cip ./barcodes//Lib3_Input,Lib3_Input ./barcodes//Lib3_0-cip,Lib3_0-cip ./barcodes//Lib3_4-cip,Lib3_4-cip ./barcodes//Lib3_10-cip,Lib3_10-cip ./barcodes//Lib3_15-cip,Lib3_15-cip ./barcodes//WT_Input,WT_Input ./barcodes//WT_0-cip,WT_0-cip ./barcodes//WT_4-cip,WT_4-cip ./barcodes//WT_10-cip,WT_10-cip ./barcodes//WT_15-cip,WT_15-cip --groupbyfirst

Because the command uses outprefix of matchbarcodesummary, the Output files will have names like matchbarcodesummary_reads.pdf, etc.

Output files

For alignment_type of barcodedsubamplicons

Here are the output files, with the names that would be produced if running dms_summarizealignments as in the Example usage for alignment_type of barcodedsubamplicons:

reads.pdf

This plot shows the number of read pairs that failed the Illumina chastity filter, were filtered by dms_barcodedsubamplicons as being of low quality, or were successfully processed and assigned to a barcode.

alignmentsummary_reads.pdf

barcodes.pdf

For all barcodes for which read were assigned, this plot shows the number barcodes with one, two, or at least three read pairs. For each category, it also indicates whether it was possible to build a consensus and align it to the reference sequence, whether the consensus was not alignable to the reference sequence, or whether the barcode simply had to be discarded due to insufficient matching reads.

alignmentsummary_barcodes.pdf

mutfreqs.pdf

This plot shows the average per-codon mutation rate across the gene as determined from the aligned barcodes. Each bar is split: the left half categories mutations as to whether they are synonymous, nonsynonymous, or to stop codons; the right categorizes mutations by the number of nucleotide changes to that codon.

alignmentsummary_mutfreqs.pdf

If --writemutfreqs is used, a text file with the suffix mutfreqs.txt is created that has the raw data in this plot.

depth.pdf

This plot shows the per-codon sequencing depth as a function of the codon position in the primary sequence.

alignmentsummary_depth.pdf

mutdepth.pdf

This plot shows the per-codon mutation rate as a function of the codon position in the primary sequence.

alignmentsummary_mutdepth.pdf

mutcounts_all.pdf

This plot shows the number of times each of the possible codon mutations and each of the possible synonymous codon mutations was observed. For each number of counts, it indicates the fraction of all of these possible mutations that were observed at least that many times.

alignmentsummary_mutcounts_all.pdf

mutcounts_multi_nt.pdf

This plot shows the number of times each codon mutation was observed only for mutations that involve multi-nucleotide changes to the codon. This plot might be preferable to the one above in some cases because single-nucleotide codon changes can arise due to errors, but multi-nucleotide changes typically do not.

alignmentsummary_mutcounts_multi_nt.pdf

singlemuttypes.pdf

This plot shows the frequency of different types of nucleotide mutations among only codons with single-nucleotide changes. This can be useful for assessing if oxidative damage is affecting your sample by inducing specific mutations. Note that this file shows a different set of samples than for the others above.

alignmentsummary_singlemutttypes.pdf

For alignment_type of subassemble

Here are the output files, with the names that would be produced if running dms_summarizealignments as in the Example usage for alignment_type of subassemble:

reads.pdf

The number of reads and whether or not they were part of a successful subassembly.

subassemblysummary_reads.pdf

depth.pdf

The number of reads at each position, regardless of whether or not they were part of a successful subassembly.

subassemblysummary_depth.pdf

nsubassembled.pdf

The number of subassembled barcodes and how many mutations they have.

subassemblysummary_nsubassembled.pdf

For alignment_type of matchsubassembledbarcodes

Here are the output files, with the names that would be produced if running dms_summarizealignments as in the Example usage for alignment_type of matchsubassembledbarcodes:

reads.pdf

The number of read pairs for each sample, and their fate (only those indicated as nretained are actually successfully matched to a subassembled barcode).

matchbarcodesummary_reads.pdf

muts_per_matched_read.pdf

For each read pair that matches a barcode (there are nretained such reads as indicated in the previous plot), this shows that number that match to a variant with the indicated number of mutations.

matchbarcodesummary_muts_per_matched_read.pdf

allmutfreqs.pdf

This plot shows the average per-site mutation rate averaged over all sites and variants (unmutated, singly mutated, multiply mutated).

matchbarcodesummary_allmutfreqs.pdf

If --writemutfreqs is used, a text file with the suffix allmutfreqs.txt is created that has the raw data in this plot.

singlemutfreqs.pdf

This plot shows the fraction of variants that have a mutation of the indicated type only among the unmutated and singly mutated variants.

matchbarcodesummary_singlemutfreqs.pdf

If --writemutfreqs is used, a text file with the suffix singlemutfreqs.txt is created that has the raw data in this plot.

allmutdepth.pdf

This plot shows the average per-site mutation rate as a function of primary sequence for all variants (unmutated, singly mutated, multiply mutated). Note that since we have used --groupbyfirst, the Example usage above would create four plots. Here we show the first, allmutdepth_Lib1.pdf.

matchbarcodesummary_allmutdepth_Lib1.pdf

singlemutdepth.pdf

This plot shows the average per-site mutation rate at which each site has a singly mutated variant versus the number of unmutated variants. Note that since we have used --groupbyfirst, the Example usage above would create four plots. Here we show the first, singlemutdepth_Lib1.pdf.

matchbarcodesummary_singlemutdepth_Lib1.pdf

singlemutdepth and allmutdepth plots for each sample

A plot is also made for each sample that shows the mutation depth divided into nonsynonymous, synonymous, and stop-codon mutations. These plots have names like matchbarcodesummary_allmutdepth_Lib1_Input.pdf. Note that the synonymous and stop-codon mutations frequencies are multiplied by a scale factor as indicated in the legend so that they are on a comparable scale to the nonsynonymous mutation frequencies.

Plots like this are made for all variants and just for single mutant variants.

matchbarcodesummary_allmutdepth_Lib1_Input.pdf

allcumulcounts.pdf

The number of times each mutation is observed among all variants (singly and multiply mutated). Since we used the --groupbyfirst, the Example usage above would create four plots. here is the first, allcumulcounts_Lib1.pdf.

matchbarcodesummary_allcumulcounts_Lib1.pdf

singlecumulcounts.pdf

The number of times each mutation is aboserved among singly mutated variants. Since we used the --groupbyfirst, the Example usage above would create four plots. here is the first, singlecumulcounts_Lib1.pdf.

matchbarcodesummary_singlecumulcounts_Lib1.pdf