dms2_fracsurvive

Overview

The dms2_fracsurvive program processes files giving the number of observed counts of characters in a selected and mock-selected condition along with a measurement of the overall fraction of the library surviving the selection to estimate the Fraction surviving for each mutation.

If you have multiple related replicates or samples (or even if you have just one), you should probably use the dms2_batch_fracsurvive program rather than running dms2_fracsurvive directly. This is because dms2_batch_fracsurvive runs dms2_fracsurvive, but then also makes some nice summary plots.

Command-line usage

Estimate fraction surviving for each mutation. Part of dms_tools2 (version 2.6.6) written by the Bloom Lab.

usage: dms2_fracsurvive [-h] [--outdir OUTDIR] [--ncpus NCPUS]
                        [--use_existing {yes,no}] [-v] [--indir INDIR]
                        [--chartype {codon_to_aa}] [--aboveavg {yes,no}]
                        [--excludestop {yes,no}] [--pseudocount PSEUDOCOUNT]
                        [--mincount MINCOUNT] --name NAME --sel SEL --mock
                        MOCK --libfracsurvive LIBFRACSURVIVE [--err ERR]

Named Arguments

--outdir

Output files to this directory (create if needed).

--ncpus

Number of CPUs to use, -1 is all available.

Default: -1

--use_existing

Possible choices: yes, no

If files with names of expected output already exist, do not re-run.

Default: “no”

-v, --version

show program’s version number and exit

--indir

Input counts files in this directory.

This option can be useful if the counts files are found in a common directory. Instead of repeatedly listing that directory name, you can just provide it here.

--chartype

Possible choices: codon_to_aa

Characters for which fraction surviving selection is estimated. codon_to_aa = amino acids from codon counts.

Default: “codon_to_aa”

--aboveavg

Possible choices: yes, no

Report fracsurvive above the library average rather than direct fracsurvive values.

Default: “no”

--excludestop

Possible choices: yes, no

Exclude stop codons as a possible amino acid?

Default: “yes”

--pseudocount

Pseudocount added to each count for sample with smaller depth; pseudocount for other sample scaled by relative depth.

Default: 5

--mincount

Report as NaN the fracsurvive of mutations for which both selected and mock-selected samples have < this many counts.

Default: 0

--name

Name used for output files.

The Output files will have a prefix equal to the name specified here. This name should only contain letters, numbers, dashes, and spaces. Underscores are not allowed as they are a LaTex special character.

--sel

Post-selection counts file or prefix used when creating this file.

The counts files have the format of the files created by programs such as dms2_bcsubamp. Specifically, they must have the following columns: ‘site’, ‘wildtype’, and then a column for each possible character (e.g., codon).

--mock

Like --sel, but for mock-selection counts.

--libfracsurvive

Overall fraction of total library surviving selection versus mock condition. Should be between 0 and 1.

--err

Like --sel but for error-control to correct mutation counts.

Output files

The output files all have the prefix specified by --outdir and --name. For instance, if you use --outdir results --name replicate-1, then the output files will have the prefix ./results/replicate-1 and the suffixes described below.

Here are the specific output files:

Log file

This file has the suffix .log. It is a text file that logs the progress of the program.

Mutation fraction surviving file

This file has the suffix _mutfracsurvive.csv. It gives the fraction surviving for each mutation at each site, which is the \(F_{r,x}\) value defined in Equation (30) of the Fraction surviving section. Note that the quantity is calculated for the wildtype as well as the mutant characters at each site. Note also that if you are using --aboveavg yes then these are the fraction surviving above the library average, denoted as \(F_{r,x}^{\rm{aboveavg}}\) in Equation (31) of the Fraction surviving section. If --mincounts is greater than zero, the fraction surviving may be undefined for some mutations due to low counts, and any such undefined values are also shown as NaN.

Here are the first and last few lines of a _mutfracsurvive.csv file:

site,wildtype,mutation,mutfracsurvive
156,G,S,0.8189280293912643
146,N,D,0.626080490632122
157,K,S,0.5933429890043687
158,S,A,0.5723610875357631
...
540,L,C,0.0016521655105078295
175,P,G,0.0013556328151850907
545,S,T,0.001310686952133144
490,E,D,0.001016678753248357

Note that the file is sorted from largest to smallest fraction surviving.

Site fraction surviving file

This file has the suffix _sitefracsurvive.csv It gives several measures that summarize the fraction surviving each site. All values in the _sitefracsurvive.csv file can be calculated from the values in the _mutfracsurvive.csv file, but the program outputs both files to make things simpler for the user.

Specifically, it gives the following quantities:

  • avgfracsurvive is the average of the mutation fraction surviving values. If any of the mutation fraction surviving values are NaN (which can happen if you use --mincounts), they are not included in this average.

  • maxfracsurvive is the maximum mutation fraction surviving taken over all non-wildtype characters for each site.

Here are the first and last lines of a _sitefracsurvive.csv file:

site,avgfracsurvive,maxfracsurvive
153,0.2841412228950011,0.54440652255242
157,0.25281575693873276,0.5933429890043687
136,0.16447395413487326,0.33671209426557835
156,0.13827268994538547,0.8189280293912643
...
210,0.013137991003787609,0.022298531591000946
170,0.011505865316217256,0.029289469175867944
176,0.010814303017871948,0.02157678717020969
175,0.008361156202363286,0.027496593984557578

If all mutations at a site have a mutation fraction surviving of NaN (which can be the case if --mincounts is > 0), then the site values are reported as NaN.