# dms2_batch_prefs¶

## Overview¶

The dms2_batch_prefs program processes files giving the number of observed counts of characters pre- and post-selection to estimate Amino-acid preferences.

The dms2_batch_prefs program simply runs dms2_prefs for each sample listed in a batch file specified by --batchfile. Specifically, as described in Command-line usage, you can specify a few sample-specific arguments in the --batchfile. All other arguments are specified using the normal option syntax (e.g., --indir INDIR) and are shared between all samples specified in --batchfile. The result is the output for each individual run of dms2_prefs plus the summary plots described in Output files.

The dms2_batch_prefs program simply runs dms2_prefs for each sample listed in a batch file. It then creates the summary plots described in Output files.

The Doud2016 example to illustrates the usage of dms2_batch_prefs on a real dataset.

Because dms2_batch_prefs simply runs dms2_prefs on each sample specfied by the --batchfile argument described below, see the dms2_prefs Command-line usage for details that are helpful for understanding many of the arguments in the dms2_batch_prefs Command-line usage below.

## Command-line usage¶

Perform many runs of dms2_prefs and summarize results. Part of dms_tools2 (version 2.6.6) written by the Bloom Lab.

usage: dms2_batch_prefs [-h] [--outdir OUTDIR] [--ncpus NCPUS]
[--use_existing {yes,no}] [-v]
[--method {ratio,bayesian}] [--indir INDIR]
[--chartype {codon_to_aa}] [--excludestop {yes,no}]
[--conc Cprefs Cmut Cerr] [--pseudocount PSEUDOCOUNT]
--batchfile BATCHFILE --summaryprefix SUMMARYPREFIX
[--no_corr] [--no_avg]


### Named Arguments¶

--outdir

Output files to this directory (create if needed).

--ncpus

Number of CPUs to use, -1 is all available.

Default: -1

--use_existing

Possible choices: yes, no

If files with names of expected output already exist, do not re-run.

Default: “no”

-v, --version

show program’s version number and exit

--method

Possible choices: ratio, bayesian

Method to estimate preferences: normalized enrichment ratios or Bayesian inference.

Default: “bayesian”

--indir

Input counts files in this directory.

--chartype

Possible choices: codon_to_aa

Characters for which preferences are estimated. codon_to_aa = amino acids from codon counts.

Default: “codon_to_aa”

--excludestop

Possible choices: yes, no

Exclude stop codons as a possible amino acid?

Default: “yes”

--conc

Concentration parameters for priors for --method bayesian. Priors are over preferences, mutagenesis rate, and error rate(s).

Default: [1, 1, 1]

--pseudocount

Pseudocount used with --method ratio.

Default: 1

--batchfile

CSV file specifying each dms2_prefs run. Must have these columns: name, pre, post. Can also have these columns: err or errpre and errpost. Other columns are ignored, so other dms2_prefs args should be passed as separate command line args rather than in --batchfile.

Each of the arguments name, pre, and post gives the value of the same parameter as passed to dms2_prefs.

If you are running with no error-control counts, then do not specify --err or --errpre / --errpost.

If you are running with a single error-control for both the pre- and post-selection counts, then specify this counts file with --err.

If you have different controls for the pre- and post-selection counts, specify them separately with --errpre and --errpost.

--summaryprefix

Prefix of output summary files and plots.

As detailed in Output files below, dms2_batch_prefs creates a variety of plots summarizing the output. These files are in the directory specified by --outdir, and have the prefix specified here. This name should only contain letters, numbers, dashes, and spaces. Underscores are not allowed as they are a LaTex special character.

--no_corr

Do not create correlation plot.

Default: False

--no_avg

Do not create average prefs CSV.

Default: False

## Output files¶

Running dms2_batch_prefs produces output files in the directory specified by --outdir.

### Results for each sample¶

The program dms2_prefs is run on each sample specified by --batchfile, so you will create all of the dms2_prefs Output files.

### Average preferences¶

A file is created that holds the preferences averaged across all samples in --batchfile. This file has the prefix specified by --summaryprefix. For instance, if you run dms2_batch_prefs with the arguments --outdir results --summaryprefix summary then the plot will be ./results/summary_avgprefs.csv. It has the same format as the preferences files created by dms2_prefs.

### Correlation plot¶

A plot is created that summarizes the correlation between the preferences for each sample in --batchfile. This plot has the prefix specified by --summaryprefix. For instance, if you run dms2_batch_prefs with the arguments --outdir results --summaryprefix summary then the plot will be ./results/summary_prefscorr.pdf. An example of this plot is in the Doud2016 example.

### Log file¶

A log file is created that summarizes the output. For instance, if you run dms2_batch_prefs with the arguments --outdir results --summaryprefix summary then the log will be ./results/summary.log.

## Program run time¶

As described in the Program run time section for dms2_prefs, each iteration of that program can take a while to run. So obviously running it multiple times with dms2_batch_prefs will take even longer. The time can be reduced by specifying more CPUs to use with --ncpus.