dms2_batch_prefs
¶
Overview¶
The dms2_batch_prefs
program processes files giving the number of observed counts of characters pre- and post-selection to estimate Amino-acid preferences.
The dms2_batch_prefs
program simply runs dms2_prefs for each sample listed in a batch file specified by --batchfile
.
Specifically, as described in Command-line usage, you can specify a few sample-specific arguments in the --batchfile
.
All other arguments are specified using the normal option syntax (e.g., --indir INDIR
) and are shared between all samples specified in --batchfile
.
The result is the output for each individual run of dms2_prefs plus the summary plots described in Output files.
The dms2_batch_prefs
program simply runs dms2_prefs for each sample listed in a batch file.
It then creates the summary plots described in Output files.
The Doud2016 example to illustrates the usage of dms2_batch_prefs
on a real dataset.
Because dms2_batch_prefs
simply runs dms2_prefs
on each sample specfied by the --batchfile
argument described below, see the dms2_prefs
Command-line usage for details that are helpful for understanding many of the arguments in the dms2_batch_prefs
Command-line usage below.
Command-line usage¶
Perform many runs of dms2_prefs
and summarize results. Part of dms_tools2 (version 2.6.6) written by the Bloom Lab.
usage: dms2_batch_prefs [-h] [--outdir OUTDIR] [--ncpus NCPUS]
[--use_existing {yes,no}] [-v]
[--method {ratio,bayesian}] [--indir INDIR]
[--chartype {codon_to_aa}] [--excludestop {yes,no}]
[--conc Cprefs Cmut Cerr] [--pseudocount PSEUDOCOUNT]
--batchfile BATCHFILE --summaryprefix SUMMARYPREFIX
[--no_corr] [--no_avg]
Named Arguments¶
- --outdir
Output files to this directory (create if needed).
- --ncpus
Number of CPUs to use, -1 is all available.
Default: -1
- --use_existing
Possible choices: yes, no
If files with names of expected output already exist, do not re-run.
Default: “no”
- -v, --version
show program’s version number and exit
- --method
Possible choices: ratio, bayesian
Method to estimate preferences: normalized enrichment ratios or Bayesian inference.
Default: “bayesian”
- --indir
Input counts files in this directory.
- --chartype
Possible choices: codon_to_aa
Characters for which preferences are estimated. codon_to_aa = amino acids from codon counts.
Default: “codon_to_aa”
- --excludestop
Possible choices: yes, no
Exclude stop codons as a possible amino acid?
Default: “yes”
- --conc
Concentration parameters for priors for
--method bayesian
. Priors are over preferences, mutagenesis rate, and error rate(s).Default: [1, 1, 1]
- --pseudocount
Pseudocount used with
--method ratio
.Default: 1
- --batchfile
CSV file specifying each
dms2_prefs
run. Must have these columns: name, pre, post. Can also have these columns: err or errpre and errpost. Other columns are ignored, so otherdms2_prefs
args should be passed as separate command line args rather than in--batchfile
.Each of the arguments name, pre, and post gives the value of the same parameter as passed to
dms2_prefs
.If you are running with no error-control counts, then do not specify
--err
or--errpre
/--errpost
.If you are running with a single error-control for both the pre- and post-selection counts, then specify this counts file with
--err
.If you have different controls for the pre- and post-selection counts, specify them separately with
--errpre
and--errpost
.- --summaryprefix
Prefix of output summary files and plots.
As detailed in Output files below,
dms2_batch_prefs
creates a variety of plots summarizing the output. These files are in the directory specified by--outdir
, and have the prefix specified here. This name should only contain letters, numbers, dashes, and spaces. Underscores are not allowed as they are a LaTex special character.- --no_corr
Do not create correlation plot.
Default: False
- --no_avg
Do not create average prefs CSV.
Default: False
Output files¶
Running dms2_batch_prefs
produces output files in the directory specified by --outdir
.
Results for each sample¶
The program dms2_prefs
is run on each sample specified by --batchfile
, so you will create all of the dms2_prefs
Output files.
Average preferences¶
A file is created that holds the preferences averaged across all samples in --batchfile
.
This file has the prefix specified by --summaryprefix
.
For instance, if you run dms2_batch_prefs
with the arguments --outdir results --summaryprefix summary
then the plot will be ./results/summary_avgprefs.csv
.
It has the same format as the preferences files created by dms2_prefs
.
Correlation plot¶
A plot is created that summarizes the correlation between the preferences for each sample in --batchfile
.
This plot has the prefix specified by --summaryprefix
.
For instance, if you run dms2_batch_prefs
with the arguments --outdir results --summaryprefix summary
then the plot will be ./results/summary_prefscorr.pdf
.
An example of this plot is in the Doud2016 example.
Program run time¶
As described in the Program run time section for dms2_prefs
, each iteration of that program can take a while to run.
So obviously running it multiple times with dms2_batch_prefs
will take even longer.
The time can be reduced by specifying more CPUs to use with --ncpus
.