dms2_batch_fracsurvive
¶
Overview¶
The dms2_batch_fracsurvive
program can be used to estimate Fraction surviving for each mutation following a stringent selection such as antbody treatment.
The dms2_batch_fracsurvive
program runs dms2_fracsurvive for each sample listed in a batch file specified by --batchfile
.
Specifically, as described in Command-line usage, you can specify a few sample-specific arguments in the --batchfile
.
All other arguments are specified using the normal option syntax (e.g., --indir INDIR
) and are shared between all samples specified in --batchfile
.
The result is the output for each individual run of dms2_fracsurvive plus the summary plots described in Output files.
It then creates the summary plots described in Output files.
Because dms2_batch_fracsurvive
simply runs dms2_fracsurvive on each sample specified by the --batchfile
argument described below, see the dms2_fracsurvive Command-line usage for details that are helpful for understanding some of the arguments in the dms2_batch_fracsurvive
Command-line usage below.
Command-line usage¶
Perform runs of dms2_fracsurvive
and summarize results. Part of dms_tools2 (version 2.6.6) written by the Bloom Lab.
usage: dms2_batch_fracsurvive [-h] [--outdir OUTDIR] [--ncpus NCPUS]
[--use_existing {yes,no}] [-v] [--indir INDIR]
[--chartype {codon_to_aa}] [--aboveavg {yes,no}]
[--excludestop {yes,no}]
[--pseudocount PSEUDOCOUNT]
[--mincount MINCOUNT] --batchfile BATCHFILE
--summaryprefix SUMMARYPREFIX
Named Arguments¶
- --outdir
Output files to this directory (create if needed).
- --ncpus
Number of CPUs to use, -1 is all available.
Default: -1
- --use_existing
Possible choices: yes, no
If files with names of expected output already exist, do not re-run.
Default: “no”
- -v, --version
show program’s version number and exit
- --indir
Input counts files in this directory.
- --chartype
Possible choices: codon_to_aa
Characters for which fraction surviving selection is estimated. codon_to_aa = amino acids from codon counts.
Default: “codon_to_aa”
- --aboveavg
Possible choices: yes, no
Report fracsurvive above the library average rather than direct fracsurvive values.
Default: “no”
- --excludestop
Possible choices: yes, no
Exclude stop codons as a possible amino acid?
Default: “yes”
- --pseudocount
Pseudocount added to each count for sample with smaller depth; pseudocount for other sample scaled by relative depth.
Default: 5
- --mincount
Report as NaN the fracsurvive of mutations for which both selected and mock-selected samples have < this many counts.
Default: 0
- --batchfile
CSV file specifying each
dms2_fracsurvive
run. Must have these columns: name, sel, mock, libfracsurvive. Can also have these err, group, grouplabel. If group is used, samples are grouped in summary plots labeled by group, or by grouplabel if it is specified. Other columns are ignored, so otherdms2_fracsurvive
args should be passed as separate command line args rather than in--batchfile
.Each of the arguments name, sel, mock, libfracsurvive, and optionally err gives the value of the same parameter passed to
dms2_fracsurvive
. If group is being used, then the group is pre-pended to name for that sample. In addition, group is used to organize output for similar runs that should be grouped when calculating means / medians and plotting.If you are running with no error-control counts, then do not specify
--err
.- --summaryprefix
Prefix of output summary files and plots.
As detailed in Output files below,
dms2_batch_fracsurvive
creates a variety of plots summarizing the output. These files are in the directory specified by--outdir
, and have the prefix specified here. This name should only contain letters, numbers, dashes, and spaces. Underscores are not allowed as they are a LaTex special character.
Output files¶
Running dms2_batch_fracsurvive
produces output files in the directory specified by --outdir
.
Results for each sample¶
The program dms2_fracsurvive is run on each sample specified by --batchfile
, so you will create all of the dms2_fracsurvive Output files.
If you are using the group entry in --batchfile
, then for each sample we create a name by pre-pending the group to the name.
For instance, if --batchfile
is:
group,name,sel,mock
antibody-1,replicate-1,sel_1_1,mock_1
antibody-1,replicate-2,sel_1_2,mock_2
antibody-2,replicate-1,sel_2_1,mock_1
antibody-2,replicate-2,sel_2_2,mock_2
then the output files for the individual samples will have prefixes like antibody-1-replicate-1_*
, antibody-1-replicate-2_*
, etc.
On the other hand, if --batchfile
does not specify groups, then the name for each sample is just given by the name column.
So if --batchfile
is:
name,sel,mock
replicate-1,sel_1_1,mock_1
replicate-2,sel_1_2,mock_2
then the output files will have prefixes like replicate-1_*
, replicate-2_*
.
Mean and median fraction surviving¶
The program computes the mean and median fraction surviving values for each group (if there are groups), or for all samples. Note that the means and medians are computed on the mutation fraction surviving values, and then the site values are computed from these mean / median mutation selections. The files are in the same format as those created by dms2_fracsurvive.
For instance, for the first example --batchfile
in the section above (the one with a group column), we would get the following files if we used --summaryprefix summary
:
summary_antibody-1-meanmutfracsurvive.csv
summary_antibody-2-meanmutfracsurvive.csv
summary_antibody-1-medianmutfracsurvive.csv
summary_antibody-2-medianmutfracsurvive.csv
summary_antibody-1-meansitefracsurvive.csv
summary_antibody-2-meansitefracsurvive.csv
summary_antibody-1-mediansitefracsurvive.csv
summary_antibody-2-mediansitefracsurvive.csv
For the second example --batchfile
(the one without a group column), we would get the following files:
summary_meanmutfracsurvive.csv
summary_medianmutfracsurvive.csv
summary_meansitefracsurvive.csv
summary_mediansitefracsurvive.csv
It is often useful to visualize the mean or median mutfracsurvive files with dms2_logoplot.
Correlation plots¶
Scatter plots are created that show the correlations among samples within the same group, or among all samples if there are not any groups.
Separate plots are made for the mutfracsurvive, the avgfracsurvive (averaged across all mutations at each site), and the maximum mutfracsurvive at each site. The names will have the form:
summary_antibody-1-mutfracsurvivecorr.pdf
summary_antibody-1-avgfracsurvivecorr.pdf
summary_antibody-1-maxfracsurvivecorr.pdf
Fracsurvive plots¶
Plots are made that show the site average and maximum fraction surviving as a function of the primary sequence.
These plots show the mean and median values for each group, and are faceted by group (if there are groups).
If you run with --summaryprefix summary
, then the plots will be:
avgfracsurvive: files
summary_meanavgfracsurvive.pdf
andsummary_medianavgfracsurvive.pdf
show the average mutfracsurvive across all mutations for each site.maxfracsurvive: files
summary_meanmaxfracsurvive.pdf
andsummary_medianmaxfracsurvive.pdf
show the maximum mutfracsurvive across all mutations for each site.