dms_logoplot
¶
Contents
Overview¶
dms_logoplot
is a program included with the dms_tools package. It uses weblogo (via the weblogolib
Python API) to make logo plots that visually display the preferences or differential preferences.
After you install dms_tools, this program will be available to run at the command line.
Command-line usage¶
Make a logo plot visually displaying the preferences or differential preferences. Utilizes weblogo (https://code.google.com/p/weblogo/). This script is part of dms_tools (version 1.1.20) written by the Bloom Lab (see https://github.com/jbloomlab/dms_tools/graphs/contributors for all contributors). Detailed documentation is at http://jbloomlab.github.io/dms_tools/
usage: dms_logoplot [-h] [--nperline NPERLINE] [--numberevery NUMBEREVERY]
[--restrictdiffsel {None,positive,negative}]
[--diffselheight [DIFFSELHEIGHT [DIFFSELHEIGHT ...]]]
[--nosepline] [--diffprefheight DIFFPREFHEIGHT]
[--excludestop] [--colormap COLORMAP]
[--letterheight LETTERHEIGHT]
[--overlay1 FILE SHORTNAME LONGNAME]
[--overlay2 FILE SHORTNAME LONGNAME]
[--overlay3 FILE SHORTNAME LONGNAME]
[--stringencyparameter STRINGENCYPARAMETER]
[--mapmetric {kd,mw,charge,functionalgroup}]
[--overlay_cmap OVERLAY_CMAP] [-v]
infile logoplot
Positional Arguments¶
infile | Existing file giving preferences, differential preferences, or differential selection. The program auto-detects which type of data is present. The values can be for DNA or amino acids (stop codons allowed, indicated by “*”). Should be in the format of a preferences_file, a diffpreferences_file, or one of the *mutdiffsel.txt files created by dms_diffselection. |
logoplot | Name of created file containing the logo plot, must end in the extension “.pdf”. Overwritten if it already exists. See Examples for images of the types of plots that are created. |
Named Arguments¶
--nperline | Put this many sites per line of the logo plot. Default: 60 |
--numberevery | Number x-axis ticks for sites at this interval. Default: 10 |
--restrictdiffsel | |
Possible choices: None, positive, negative Specify ‘positive’ or ‘negative’ to restrict plotted differential selection to positive or negative selection. Only meaningful if ‘infile’ gives differential selection. Default: “None” | |
--diffselheight | |
Only meaningful if ‘infile’ gives differential selection. List other differential selection files, and y-axis limits will be set to the max and min of ‘infile’ and all files listed here. This is useful if you want to make multiple differential selection plots with the same y-axis limits. So if you are using this option, you should specify a list of files in the format of the *mutdiffsel.txt files created by dms_diffselection. | |
--nosepline | Do not plot a black center line separating positive and negative values of differential selection or preferences. Default: False |
--diffprefheight | |
This option is only meaningful if “infile” gives differential preferences. In that case, it gives the height of logo stacks (extends from minus this to plus this). Cannot be smaller than the maximum total differential preference range. Default: 1.0 | |
--excludestop | If we are using amino acids, do we remove stop codons (denoted by “*”)? We only remove stop codons if this argument is specified. If this option is used, then data for stop codons is removed by re-normalizing preferences to sum to one, and differential preferences to sum to zero. Default: False |
--colormap | Colormap for amino-acid hydrophobicity or molecular weight. Must specify a valid Default: “jet” |
--letterheight | Relative parameter indicating the height of the letter stacks in the logo plots. Default: 1 |
--overlay1 | Specify an overlay bar above each line of the logo plot to illustrate a per-residue property such as relative solvent accessibility or secondary structure. Requires three arguments: FILE SHORTNAME LONGNAME. FILE is the name of an existing file. Except for comment lines beginning with “#”, each line should have two whitespace delimited columns (additional columns are allowed but ignored). The first column gives the site number (matching that in “infile”) and the second column giving the property for this site; properties must either all be non-whitespace strings giving a discrete category (such as secondary structure), or all be numbers (such as relative solvent accessibility). All listed sites must be in “infile”, but not all sites in “infile” must be in FILE – missing sites are assumed to lack a known value for the property and are shown in white. SHORTNAME is a short (3-5 character) name of the property, such as “RSA” for “relative solvent accessibility.” LONGNAME is a longer name (such as “relative solvent accessibiity”), or the same as SHORTNAME if you do not have a separate long name. For instance, contents of an example file specifying relative solvent accessibility for residues 2, 3, 4, and 6 is shown below. In the created logoplot, residue 5 (assuming it exists in infile) will not have any relative solvent accessibility shown: #SITE RSA 2 0.3 3 0.56 4 0.02 6 0.72 A file specifying secondary structure for the same subset of residues is here: #SITE SS 2 helix 3 helix 4 coil 5 sheet For instance, if the above file was named secondary_structures.txt, then you would use the option as –overlay1 secondary_structures.txt SS “secondary structure”. |
--overlay2 | Specify a second overlay bar. Arguments have the same meaning as for “overlay1”. |
--overlay3 | Specify a third overlay bar. Arguments have the same meaning as for “overlay1”. |
--stringencyparameter | |
Scale preferences by this stringency parameter; only valid when ‘infile’ specifies preferences. Use this option if you have fit a stringency parameter and want to rescale the visualization of the preferences using this parameter. The preferences are rescaled so that each preference is proportional to , so values > 1 increase the weight of the preferences and values < 1 flatten them. | |
--mapmetric | Possible choices: kd, mw, charge, functionalgroup Specify the amino-acid metric used to map colors to amino-acids in the logoplot. ‘kd’ uses the Kyte-Doolittle hydrophobicity scale, ‘mw’ uses molecular weight, ‘functionalgroup’ divides the amino acids into seven functional groups, and ‘charge’ uses charge at neutral pH. When using ‘kd’ or ‘mw’, ‘colormap’ is used to map colors to the metric; when using ‘charge’, a black/red/blue colormapping is used for neutral/positive/negative; similarly ‘functionalgroup’ has its own colormap. Default: “kd” |
--overlay_cmap | Specify color map for overlay bars. Should be name of valid matplotlib colormap such as ‘jet’ or ‘OrRd’ (http://matplotlib.org/users/colormaps.html) |
-v, --version | show program’s version number and exit |
Examples¶
Preferences logo plot¶
Imagine that you have used inferred site-specific preferences into preferences.txt
, which has the format of a Preferences file. To display these preferences using a logo plot, run:
dms_logoplot preferences.txt prefs_logoplot.pdf --nperline 81 --excludestop
This will create the file prefs_logoplot.pdf
, which will look something like the image below. Note that this logo plot does not include stop codons; if they were shown then they would be black *
characters. In the logo plot, the height of each letter is proportional to the preference for that amino acid. The letter heights sum to one, since \(\sum_a \pi_{r,a} = 1\).
Differential selection logo plot¶
Imagine that you have inferred differential selection into mutdiffsel.txt
using dms_diffselection.
You can visualize the differential selection (the \(s_{r,x}\) values described in dms_diffselection) using:
dms_logoplot mutdiffsel.txt diffsel_logoplot.pdf --nperline 115 --diffselheight mutdiffsel.txt mutdiffsel2.txt mutdiffsel3.txt
The --diffselheight
is useful if you are making several such plots and want them to share a common y-axis.
Any mutations that have missing differential selection values (specified as NaN
in the mutdiffsel.txt
file) are treated as having a differential selection of zero in the plotting.
Here is an example of a created plot:
Differential preferences logo plot¶
Imagine that you have inferred differential preferences into diffprefs.txt
, which has the format of a Differential preferences file. To display these differential preferences, run:
dms_logoplot diffprefs.txt diffprefs_logoplot.pdf --nperline 81 --excludestop --diffprefheight 0.5
This will create the file diffprefs_logoplot.pdf
, which will look something like the image below. The height of letters above or below the center line are proportional to the differential preference for or against that amino acids. The overall negative and positive heights are equal since \(0 = \sum_a\Delta\pi_{r,a}\). For instance, in the plot below there is a strong differential preference for M at site 182, and a strong differential preference for V at position 90. Note that in the command above we used --diffprefheight 0.5
. This is because if we were not to rescale the y-axis (maximum height of the letter stacks) down, we would only use some of the dynamic range since the maximal differential preference is \(< 0\).
Plot with an overlay¶
Now let’s add an overlay to the Preferences logo plot. We create files giving the secondary structure and the relative solvent accessibility. These files are SSs.txt
, which has the first few lines as follows:
#SITE SS
18 loop
19 strand
20 strand
21 strand
22 strand
23 strand
24 strand
25 strand
26 loop
and RSAs.txt
, which has the first few lines as follows:
#SITE RSA
18 0.170984455959
19 0.168604651163
20 0.0
21 0.0778443113772
22 0.00507614213198
23 0.0
24 0.0
25 0.0803571428571
26 0.0077519379845
We now use the command:
dms_logoplot preferences.txt prefs_logoplot.pdf --nperline 81 --excludestop --overlay1 RSAs.txt RSA "relative solvent accessibility" --overlay2 SSs.txt SS "secondary structure"
to create the following image, which displays overlay bars with the secondary structure and solvent accessibility: