The dms2_batch_bcsubamp program processes FASTQ files generated by Barcoded-subamplicon sequencing to count the frequencies of mutations at each site for a set of samples, and then summarize the results.

The dms2_batch_bcsubamp program simply runs dms2_bcsubamp for each sample listed in a batch file specified by --batchfile. Specifically, as described in Command-line usage, you can specify a few sample-specific arguments in the --batchfile. All other arguments are specified using the normal option syntax (e.g., --bclen BCLEN) and are shared between all samples specified in --batchfile. The result is the output for each individual run of dms2_bcsubamp plus the summary plots described in Output files.

The Doud2016 example to illustrates the usage of dms2_batch_bcsubamp on a real dataset.

Because dms2_batch_bcsubamp simply runs dms2_bcsubamp on each sample specfied by the --batchfile argument described below, see the dms2_bcsubamp Algorithm for assembling and aligning subamplicons and the dms2_bcsubamp Command-line usage for details that are helpful for understanding many of the arguments in the dms2_batch_bcsubamp Command-line usage below.

Command-line usage

Perform many runs of dms2_bcsubamp and plot results. Part of dms_tools2 (version 2.6.6) written by the Bloom Lab.

usage: dms2_batch_bcsubamp [-h] [--outdir OUTDIR] [--ncpus NCPUS]
                           [--use_existing {yes,no}] [-v] --refseq REFSEQ
                           --alignspecs ALIGNSPECS [ALIGNSPECS ...]
                           [--bclen BCLEN] [--fastqdir FASTQDIR]
                           [--R2 R2 [R2 ...]] [--R1trim R1TRIM [R1TRIM ...]]
                           [--R2trim R2TRIM [R2TRIM ...]] [--bclen2 BCLEN2]
                           [--chartype {codon}] [--maxmuts MAXMUTS]
                           [--minq MINQ] [--minreads MINREADS]
                           [--minfraccall MINFRACCALL] [--minconcur MINCONCUR]
                           [--sitemask SITEMASK] [--purgeread PURGEREAD]
                           [--purgebc PURGEBC] [--bcinfo] [--bcinfo_csv]
                           --batchfile BATCHFILE --summaryprefix SUMMARYPREFIX

Named Arguments


Output files to this directory (create if needed).


Number of CPUs to use, -1 is all available.

Default: -1

Multiple runs of dms2_bcsubamp can be performed in parallel on the different samples specified by --batchfile. This argument determines how many CPUs are used if running multiple jobs.


Possible choices: yes, no

If files with names of expected output already exist, do not re-run.

Default: “no”

-v, --version

show program’s version number and exit


Align subamplicons to gene in this FASTA file.


Subamplicon alignment positions as ‘REFSEQSTART,REFSEQEND,R1START,R2START’. REFSEQSTART is nt (1, 2, … numbering) in ‘refseq’ where nt R1START in R1 aligns. REFSEQEND is nt in ‘refseq’ where nt R2START in R2 aligns.’


Length of NNN… barcode at start of each read. Assumed to be same for R1 and R2, use –bclen2 if this is not the case.

Default: 8


R1 and R2 files in this directory.


Read 2 (R2) FASTQ files assumed to have same names as R1 but with ‘_R1’ replaced by ‘_R2’. If that is not case, provide names here.


Trim R1 from 3’ end to this length. One value for all reads or values for each subamplicon in --alignspecs.


Like ‘–R1trim’, but for R2.


If R1 and R2 have different length barcodes, use –bclen for R1 length and –bclen2 for R2 length.


Possible choices: codon

Character type for which we count mutations.

Default: “codon”


Max allowed mismatches in alignment of subamplicon; mismatches counted in terms of character ‘–chartype’.

Default: 4


Only call nucleotides with Q score >= this.

Default: 15


Require this many reads in a barcode to agree to call consensus nucleotide identity.

Default: 2


Retain only barcodes where trimmed consensus sequence for each read has >= this frac sites called.

Default: 0.95


Only call consensus identity for barcode when >= this fraction of reads concur.

Default: 0.75


Use to only consider mutations at a subset of sites. Should be a CSV file with column named site listing all sites to include.


Randomly purge read pairs with this probability to subsample data.

Default: 0


Randomly purge barcodes with this probability to subsample data.

Default: 0


Create file with suffix ‘bcinfo.txt.gz’ with info about each barcode.

Default: False


Store ‘bcinfo’ file as a csv with the suffix ‘bcinfo.csv.gz’. Only has an effect if –bcinfo is used.

Default: False


CSV file specifying each dms2_bcsubamp run. Must have these columns: name, R1. Can optionally have columns R1trim and R2trim with spaces delimiting subamplicon-specific trimming. If R1trim / R2trim in batch file, do not also give values for --R1trim and --R2trim. Other columns are ignored, so other dms2_bcsubamp args should be passed as separate command line args rather than in --batchfile.


Prefix of output summary plots.

As detailed in Output files below, dms2_batch_bcsubamp creates a variety of plots summarizing the output. These files are in the directory specified by --outdir, and have the prefix specified here. This prefix should only contain letters, numbers, dashes, and spaces. Underscores are not allowed as they are a LaTex special character.

Output files

Running dms2_batch_bcsubamp produces a variety of output files, all of which will be found in the directory specified by --outdir.

Results for each sample

The program dms2_bcsubamp is run on each sample specified by --batchfile, so you will create all of the dms2_bcsubamp Output files.

Summary files

Plots are created that summarize the output for all samples specified by --batchfile. These samples have the prefix specified by --summaryprefix. So for instance, if you run dms2_batch_bcsubamp with the arguments --outdir results --summaryprefix summary then these files will have the prefix ./results/summary. They will have the suffixes listed below:

  • .log: a text file that logs the progress of the program.

  • _readstats.pdf: plot of reads for each sample.

  • _bcstats.pdf: plot of barcodes for each sample.

  • _readsperbc.pdf: plot of distribution of the number of reads per-barcode for each sample.

  • _depth.pdf: plot of number of counts called at each site for each sample.

  • _mutfreq.pdf: plot of mutation frequency at each site for each sample.

  • _codonmuttypes.pdf: plot of average frequency of different types of codon mutations.

  • _codonmuttypes.csv: numerical data in _codonmuttypes.pdf.

  • _codnntchanges.pdf: plot of average frequency of codon mutations with different numbers of nucleotide changes.

  • _singlentchanges.pdf: plot frequencies of different types of nucleotide mutations among codons with just one nucleotide change.

  • _cumulmutcounts.pdf: plot fraction of mutations that occur \(\leq\) a given number of times.

Examples and more detailed explanations of these plots can be found in the Doud2016 example.

Memory usage

As described in the Memory usage section for dms2_bcsubamp, each iteration of that program can consume substantial memory. So obviously running it multiple times in parallel with dms2_batch_bcsubamp will consume even more memory.