dms_merge

Overview

dms_merge is a program included with the dms_tools package. It merges preferences, differential selection values, or differential preferences in different files by averaging them or adding / subtracting them. It also sums the counts from count files together. See Examples for illustrations of how you might do this.

After you install dms_tools, this program will be available to run at the command line.

Command-line usage

Merge preferences, differential selection, or differential preferences by averaging or adding / subtracting the values in multiple files. Alternatively, sum counts by adding the counts for the characters at each site across multiple files. All files must specify the same character type: can be nucleotide, codon, or amino acid (see “–excludestop” if using amino acids). This script is part of dms_tools (version 1.1.20) written by the Bloom Lab (see https://github.com/jbloomlab/dms_tools/graphs/contributors for all contributors). Detailed documentation is at http://jbloomlab.github.io/dms_tools/

usage: dms_merge [-h] [--excludestop] [--minus MINUS [MINUS ...]]
                 [--sitediffselfile SITEDIFFSELFILE] [-v]
                 [--stringencyparameter STRINGENCYPARAMETER] [--normalize]
                 [--chartype {DNA,codon,aa}]
                 outfile {average,median,sum,rescale} infiles [infiles ...]

Positional Arguments

outfile

Created output file with merged values; removed if it already exists. If the “infiles” do not all have the same wildtype residue at a site, then the wildtype is indicated as “?” in “outfile”.

For merge_method of “average”, then this file will be of the same type as infiles (preferences, mutation-level differential selection values, or differential preferences). For merging differential selection, any values of NaN are ignored.

For merge_method of “median”, then infiles and outfile must be of the format of the *mutdiffsel.txt files created by dms_diffselection.

For merge_method of “sum”, this file may be either preferences, differential preferences, or counts:

If any of the files in infiles and –minus give preferences and the sum at each site is one, then outfile gives preferences.

If any of the files in infiles and –minus give preferences and the sum at each site is zero, then outfile gives differential preferences.

If all of the files in infiles and –minus are differential preferences, the outfile gives differential preferences.

If all of the files in infiles are counts, the outfile gives counts.

For merge_method of “rescale”, this created file will give preferences (after re-scaling).

If outfile gives preferences, it will be in the preferences_file format. Note that this file will not have any columns giving 95% credible intervals, as these cannot be calculated when merging files.

If outfile gives differential preferences, it will be in the diffpreferences_file format. Note that this file will not have any columns giving posterior probabilities, as these cannot be calculated when merging files.

If outfile gives mutation-level differential selection values, it will be in the format of the *mutdiffsel.txt file created by dms_diffselection. To create a corresponding *sitediffsel.txt file, see the –sitediffselfile option.

If outfile gives counts, it will be in the dms_counts format.

merge_method

Possible choices: average, median, sum, rescale

How to merge: If “average” then all “infiles” must specify preferences, differential selection on mutations (mutdiffsel.txt file), or differential preferences; these are then averaged (by taking the mean) in “outfile”. If “median”, then all “infiles” must specify differential selection on mutations (mutdiffsel.txt file); these are then averaged by taking the median. If “sum” then “infiles” can either all be count files, all be differential preferences, or can be a combination of preferences and differential preferences that (along with any additional files specified by “–minus”) sum to a total preference or a total differential preference of zero at each site. If “rescale”, then only one infile can be specified and it must be preferences; these preferences are then scaled by the value provided by “stringencyparameter”

infiles

Files to average or sum. Must all have the same sites and character type, but do not need to have the same wildtype residue at each site. Note that for differential selection, must be the *mutdiffsel.txt file.

Note that for differential selection, the files can only be the *mutdiffsel.txt files created by dms_diffselection, not the *sitediffsel.txt files. However, you can create a *sitediffsel.txt file from the merged *mutdiffsel.txt files using the –sitediffselfile option.

Named Arguments

--excludestop

If we are using amino acids, do we remove stop codons (denoted by “*”)? We only remove stop codons if this argument is specified. If this option is used, then any files with stop codons have these codons removed (re-normalizing preferences to sum to one, and differential preferences to sum to zero) before the merge.

Default: False

--minus

Files to subtract when summing. Can only be used if “merge_method” is “sum” and if files are either preferences or differential preferences.

Files should be in the formats of a preferences_file or a diffpreferences_file. Currently, this option cannot be used with counts files.

--sitediffselfile
 

If merging differential selection, ‘infiles’ and ‘outfile’ are the ‘*mutdiffsel.txt’ file. If you also want to create a ‘*sitdiffsel.txt’ file for the merged ‘*mutdiffsel.txt’ file, specify the name of that file here.

After merging the mutation-level differential selection values in *mutdiffsel.txt infiles, you can create a new *sitediffsel.txt file (in the format of those created by dms_diffselection with this option.

-v, --version show program’s version number and exit
--stringencyparameter
 Stringency parameter used to rescale preferences. Can only be used if “merge_method” is “rescale”.
--normalize

Whether to normalize the counts at each site to the minimum number of counts observed at that site across all provided count files, before summing the counts. Can only be used if “merge_method” is “sum”.

Default: False

--chartype

Possible choices: DNA, codon, aa

Characters for which counts are summed: “DNA” = counts for DNA; “codon” = counts for codons; “aa” = counts for amino acids (possibly including stop codons, see “–excludestop”). This option only needs to be specified when summing count files.

Default: “codon”

Examples

Rescaling preferences by a stringency parameter

Use:

dms_merge rescaledprefs.txt rescale prefs.txt --stringencyparameter 2.5

to rescale the preferences in prefs.txt by a stringency parameter of \(beta = 2.5\). The created file rescaledprefs.txt will be in the Preferences file format.

Averaging differential selection

Use:

dms_merge avgmutdiffsel.txt average mutdiffsel1.txt mutdiffsel2.txt

to average the two mutation differential selection files to create avgmutdiffsel.txt in the format of a *mutdiffsel.txt file from dms_diffselection. If you also want to create a *sitediffsel.txt file corresponding to avgmutdiffsel.txt, use:

dms_merge avgmutdiffsel.txt average mutdiffsel1.txt mutdiffsel2.txt --sitediffselfile avgsitediffsel.txt

to create avgsitediffsel.txt with the site differential selection values in the format created by dms_diffselection.

Averaging preferences

Use:

dms_merge avgprefs.txt average prefs1.txt prefs2.txt

The created file avgprefs.txt will have the Preferences file format.

Adding and subtracting preferences

The command:

dms_merge summedprefs.txt sum prefs1.txt prefs2.txt --minus prefs3.txt

would create an output file summedprefs.txt in the Preferences file format.

The command:

dms_merge summedprefs.txt sum prefs1.txt prefs2.txt

is invalid since it does not create preferences that sum to one at each site (instead they sum to two).

The command:

dms_merge diffs.txt sum prefs1.txt prefs2.txt --minus prefs3.txt prefs4.txt

would create a file diffs.txt in the Differential preferences file format since the total for each site sums to zero.

Adding counts

The command:

dms_merge summedcounts.txt sum counts1.txt counts2.txt --chartype codon

sums counts from the two provided counts files and writes them to the file summedcounts.txt in the Deep mutational scanning counts file format.

Adding counts with normalization

The command:

dms_merge summedcounts.txt sum counts1.txt counts2.txt --chartype codon --normalize

normalizes counts at each site to the minimum number of total counts observed at that site in counts1.txt and counts2.txt. The normalized counts are then summed and written to the file summedcounts.txt in the Deep mutational scanning counts file format.