dms_merge
¶
Contents
Overview¶
dms_merge
is a program included with the dms_tools package. It merges preferences, differential selection values, or differential preferences in different files by averaging them or adding / subtracting them. It also sums the counts from count files together. See Examples for illustrations of how you might do this.
After you install dms_tools, this program will be available to run at the command line.
Command-line usage¶
Merge preferences, differential selection, or differential preferences by averaging or adding / subtracting the values in multiple files. Alternatively, sum counts by adding the counts for the characters at each site across multiple files. All files must specify the same character type: can be nucleotide, codon, or amino acid (see “–excludestop” if using amino acids). This script is part of dms_tools (version 1.1.20) written by the Bloom Lab (see https://github.com/jbloomlab/dms_tools/graphs/contributors for all contributors). Detailed documentation is at http://jbloomlab.github.io/dms_tools/
usage: dms_merge [-h] [--excludestop] [--minus MINUS [MINUS ...]]
[--sitediffselfile SITEDIFFSELFILE] [-v]
[--stringencyparameter STRINGENCYPARAMETER] [--normalize]
[--chartype {DNA,codon,aa}]
outfile {average,median,sum,rescale} infiles [infiles ...]
Positional Arguments¶
outfile | Created output file with merged values; removed if it already exists. If the “infiles” do not all have the same wildtype residue at a site, then the wildtype is indicated as “?” in “outfile”. For merge_method of “average”, then this file will be of the same type as infiles (preferences, mutation-level differential selection values, or differential preferences). For merging differential selection, any values of NaN are ignored. For merge_method of “median”, then infiles and outfile must be of the format of the *mutdiffsel.txt files created by dms_diffselection. For merge_method of “sum”, this file may be either preferences, differential preferences, or counts: If any of the files in infiles and –minus give preferences and the sum at each site is one, then outfile gives preferences. If any of the files in infiles and –minus give preferences and the sum at each site is zero, then outfile gives differential preferences. If all of the files in infiles and –minus are differential preferences, the outfile gives differential preferences. If all of the files in infiles are counts, the outfile gives counts. For merge_method of “rescale”, this created file will give preferences (after re-scaling). If outfile gives preferences, it will be in the preferences_file format. Note that this file will not have any columns giving 95% credible intervals, as these cannot be calculated when merging files. If outfile gives differential preferences, it will be in the diffpreferences_file format. Note that this file will not have any columns giving posterior probabilities, as these cannot be calculated when merging files. If outfile gives mutation-level differential selection values, it will be in the format of the *mutdiffsel.txt file created by dms_diffselection. To create a corresponding *sitediffsel.txt file, see the –sitediffselfile option. If outfile gives counts, it will be in the dms_counts format. |
merge_method | Possible choices: average, median, sum, rescale How to merge: If “average” then all “infiles” must specify preferences, differential selection on mutations (mutdiffsel.txt file), or differential preferences; these are then averaged (by taking the mean) in “outfile”. If “median”, then all “infiles” must specify differential selection on mutations (mutdiffsel.txt file); these are then averaged by taking the median. If “sum” then “infiles” can either all be count files, all be differential preferences, or can be a combination of preferences and differential preferences that (along with any additional files specified by “–minus”) sum to a total preference or a total differential preference of zero at each site. If “rescale”, then only one infile can be specified and it must be preferences; these preferences are then scaled by the value provided by “stringencyparameter” |
infiles | Files to average or sum. Must all have the same sites and character type, but do not need to have the same wildtype residue at each site. Note that for differential selection, must be the *mutdiffsel.txt file. Note that for differential selection, the files can only be the *mutdiffsel.txt files created by dms_diffselection, not the *sitediffsel.txt files. However, you can create a *sitediffsel.txt file from the merged *mutdiffsel.txt files using the –sitediffselfile option. |
Named Arguments¶
--excludestop | If we are using amino acids, do we remove stop codons (denoted by “*”)? We only remove stop codons if this argument is specified. If this option is used, then any files with stop codons have these codons removed (re-normalizing preferences to sum to one, and differential preferences to sum to zero) before the merge. Default: False |
--minus | Files to subtract when summing. Can only be used if “merge_method” is “sum” and if files are either preferences or differential preferences. Files should be in the formats of a preferences_file or a diffpreferences_file. Currently, this option cannot be used with counts files. |
--sitediffselfile | |
If merging differential selection, ‘infiles’ and ‘outfile’ are the ‘*mutdiffsel.txt’ file. If you also want to create a ‘*sitdiffsel.txt’ file for the merged ‘*mutdiffsel.txt’ file, specify the name of that file here. After merging the mutation-level differential selection values in *mutdiffsel.txt infiles, you can create a new *sitediffsel.txt file (in the format of those created by dms_diffselection with this option. | |
-v, --version | show program’s version number and exit |
--stringencyparameter | |
Stringency parameter used to rescale preferences. Can only be used if “merge_method” is “rescale”. | |
--normalize | Whether to normalize the counts at each site to the minimum number of counts observed at that site across all provided count files, before summing the counts. Can only be used if “merge_method” is “sum”. Default: False |
--chartype | Possible choices: DNA, codon, aa Characters for which counts are summed: “DNA” = counts for DNA; “codon” = counts for codons; “aa” = counts for amino acids (possibly including stop codons, see “–excludestop”). This option only needs to be specified when summing count files. Default: “codon” |
Examples¶
Rescaling preferences by a stringency parameter¶
Use:
dms_merge rescaledprefs.txt rescale prefs.txt --stringencyparameter 2.5
to rescale the preferences in prefs.txt
by a stringency parameter of \(beta = 2.5\). The created file rescaledprefs.txt
will be in the Preferences file format.
Averaging differential selection¶
Use:
dms_merge avgmutdiffsel.txt average mutdiffsel1.txt mutdiffsel2.txt
to average the two mutation differential selection files to create avgmutdiffsel.txt
in the format of a *mutdiffsel.txt
file from dms_diffselection. If you also want to create a *sitediffsel.txt
file corresponding to avgmutdiffsel.txt
, use:
dms_merge avgmutdiffsel.txt average mutdiffsel1.txt mutdiffsel2.txt --sitediffselfile avgsitediffsel.txt
to create avgsitediffsel.txt
with the site differential selection values in the format created by dms_diffselection.
Averaging preferences¶
Use:
dms_merge avgprefs.txt average prefs1.txt prefs2.txt
The created file avgprefs.txt
will have the Preferences file format.
Adding and subtracting preferences¶
The command:
dms_merge summedprefs.txt sum prefs1.txt prefs2.txt --minus prefs3.txt
would create an output file summedprefs.txt
in the Preferences file format.
The command:
dms_merge summedprefs.txt sum prefs1.txt prefs2.txt
is invalid since it does not create preferences that sum to one at each site (instead they sum to two).
The command:
dms_merge diffs.txt sum prefs1.txt prefs2.txt --minus prefs3.txt prefs4.txt
would create a file diffs.txt
in the Differential preferences file format since the total for each site sums to zero.
Adding counts¶
The command:
dms_merge summedcounts.txt sum counts1.txt counts2.txt --chartype codon
sums counts from the two provided counts files and writes them to the file summedcounts.txt
in the Deep mutational scanning counts file format.
Adding counts with normalization¶
The command:
dms_merge summedcounts.txt sum counts1.txt counts2.txt --chartype codon --normalize
normalizes counts at each site to the minimum number of total counts observed at that site in counts1.txt
and counts2.txt
. The normalized counts are then summed and written to the file summedcounts.txt
in the Deep mutational scanning counts file format.