The dms2_prefs program processes files giving the number of observed counts of characters pre- and post-selection to estimate Amino-acid preferences.

If you have multiple replicates, you should probably use the dms2_batch_prefs program rather than running dms2_prefs directly.

Command-line usage

Estimate preferences from mutation counts. Part of dms_tools2 (version 2.6.6) written by the Bloom Lab.

usage: dms2_prefs [-h] [--outdir OUTDIR] [--ncpus NCPUS]
                  [--use_existing {yes,no}] [-v] [--method {ratio,bayesian}]
                  [--indir INDIR] [--chartype {codon_to_aa}]
                  [--excludestop {yes,no}] [--conc Cprefs Cmut Cerr]
                  [--pseudocount PSEUDOCOUNT] --pre PRE --post POST --name
                  NAME [--err ERRPRE ERRPOST]

Named Arguments


Output files to this directory (create if needed).


Number of CPUs to use, -1 is all available.

Default: -1


Possible choices: yes, no

If files with names of expected output already exist, do not re-run.

Default: “no”

-v, --version

show program’s version number and exit


Possible choices: ratio, bayesian

Method to estimate preferences: normalized enrichment ratios or Bayesian inference.

Default: “bayesian”


Input counts files in this directory.

This option can be useful if the counts files are found in a common directory. Instead of repeatedly listing that directory name, you can just provide it here.


Possible choices: codon_to_aa

Characters for which preferences are estimated. codon_to_aa = amino acids from codon counts.

Default: “codon_to_aa”


Possible choices: yes, no

Exclude stop codons as a possible amino acid?

Default: “yes”


Concentration parameters for priors for --method bayesian. Priors are over preferences, mutagenesis rate, and error rate(s).

Default: [1, 1, 1]


Pseudocount used with --method ratio.

Default: 1


Pre-selection counts file or prefix used when creating this file.

The counts files have the format of the files created by programs such as dms2_bcsubamp. Specifically, they must have the following columns: ‘site’, ‘wildtype’, and then a column for each possible character (e.g., codon).


Like --pre but for post-selection counts.


Name used for output files.

The Output files will have a prefix equal to the name specified here. This name should only contain letters, numbers, dashes, and spaces. Underscores are not allowed as they are a LaTex special character.


Like --pre but for counts for error control(s) for --pre and --post. Specify same file twice for same control for both.

Output files

The output files all have the prefix specified by --outdir and --name. For instance, if you use --outdir results --name replicate-1, then the output files will have the prefix ./results/replicate-1 and the suffixes described below.

Here are the specific output files:

Log file

This file has the suffix .log. It is a text file that logs the progress of the program.

Preferences file

This file has the suffix _prefs.csv. It gives the estimate preference for each character at each site. For instance:


Program run time

If you run dms_prefs with --method ratio then it will run very quickly.

If you run it with --method bayesian then the runtime will be somewhat longer due to the MCMC. Exactly how long depends on whether you are using error controls for the counts (the --err option). If you use different files for the pre- and post-selection error controls, and are using --chartype codon_to_aa then the program will typically take about 4 or 5 hours if you give it 4 CPUs. If you give it more CPUs, or using the same (or no) error control for pre- and post-selection, then it will be faster.