phydmslib.weblogo module¶
weblogo
module
Module for making sequence logos with the weblogolib package distributed with
weblogo
This module interfaces with the weblogolib API,
and so is only known to work with weblogolib version 3.4 and 3.5.
Written by Jesse Bloom and Mike Doud
-
phydmslib.weblogo.
ChargeColorMapping
(maptype='jet', reverse=False)¶ Maps amino-acid charge at neutral pH to colors. Currently does not use the keyword arguments for maptype or reverse but accepts these arguments to be consistent with KyteDoolittleColorMapping and MWColorMapping for now.
-
phydmslib.weblogo.
FunctionalGroupColorMapping
(maptype='jet', reverse=False)¶ Maps amino-acid functional groups to colors. Currently does not use the keyword arguments for maptype or reverse but accepts these arguments to be consistent with the other mapping functions, which all get called with these arguments.
-
phydmslib.weblogo.
KyteDoolittleColorMapping
(maptype='jet', reverse=True)¶ Maps amino-acid hydrophobicities to colors.
Uses the Kyte-Doolittle hydrophobicity scale defined by:
J. Kyte & R. F. Doolittle: "A simple method for displaying the hydropathic character of a protein." J Mol Biol, 157, 105-132
More positive values indicate higher hydrophobicity, while more negative values indicate lower hydrophobicity.
The returned variable is the 3-tuple (cmap, mapping_d, mapper):
cmap is a
pylab
LinearSegmentedColorMap object.mapping_d is a dictionary keyed by the one-letter amino-acid codes. The values are the colors in CSS2 format (e.g. #FF0000 for red) for that amino acid. The value for a stop codon (denoted by a
*
character) is black (#000000).mapper is the actual pylab.cm.ScalarMappable object.
The optional argument maptype should specify a valid
pylab
color map.The optional calling argument reverse specifies that we set up the color map so that the most hydrophobic residue comes first (in the Kyte-Doolittle scale the most hydrophobic comes last as it has the largest value). This option is True by default as it seems more intuitive to have charged residues red and hydrophobic ones blue.
-
phydmslib.weblogo.
LogoOverlay
(sites, overlayfile, overlay, nperline, sitewidth, rmargin, logoheight, barheight, barspacing, fix_limits={}, fixlongname=False, overlay_cmap=None, underlay=False, scalebar=False)¶ Makes overlay for LogoPlot.
This function creates colored bars overlay bars showing up to two properties. The trick of this function is to create the bars the right size so they align when they overlay the logo plot.
CALLING VARIABLES:
sites : same as the variable of this name used by LogoPlot.
overlayfile is a string giving the name of created PDF file containing the overlay. It must end in the extension
.pdf
.overlay : same as the variable of this name used by LogoPlot.
nperline : same as the variable of this name used by LogoPlot.
sitewidth is the width of each site in points.
rmargin is the right margin in points.
logoheight is the total height of each logo row in points.
barheight is the total height of each bar in points.
barspacing is the vertical spacing between bars in points.
fix_limits has the same meaning as in LogoPlot.
fixlongname has the same meaning as in LogoPlot.
overlay_cmap has the same meaning as in LogoPlot.
underlay is a bool. If True, make an underlay rather than an overlay.
scalebar: if not False, is 2-tuple (scalebarheight, scalebarlabel) where scalebarheight is in points.
-
phydmslib.weblogo.
LogoPlot
(sites, datatype, data, plotfile, nperline, numberevery=10, allowunsorted=False, ydatamax=1.01, overlay=None, fix_limits={}, fixlongname=False, overlay_cmap=None, ylimits=None, relativestackheight=1, custom_cmap='jet', map_metric='kd', noseparator=False, underlay=False, scalebar=False)¶ Create sequence logo showing amino-acid or nucleotide preferences.
The heights of each letter is equal to the preference of that site for that amino acid or nucleotide.
Note that stop codons may or may not be included in the logo depending on whether they are present in pi_d.
CALLING VARIABLES:
sites is a list of all of the sites that are being included in the logo, as strings. They must be in natural sort or an error will be raised unless allowunsorted is True. The sites in the plot are ordered in the same arrangement listed in sites. These should be strings, not integers.
datatype should be one of the following strings:
‘prefs’ for preferences
‘diffprefs’ for differential preferences
‘diffsel’ for differential selection
data is a dictionary that has a key for every entry in sites. For every site r in sites, sites[r][x] is the value for character x. Preferences must sum to one; differential preferences to zero. All sites must have the same set of characters. The characters must be the set of nucleotides or amino acids with or without stop codons.
plotfile is a string giving the name of the created PDF file of the logo plot. It must end in the extension
.pdf
.- nperline is the number of sites per line. Often 40 to 80 are
good values.
numberevery is specifies how frequently we put labels for the sites on x-axis.
allowunsorted : if True then we allow the entries in sites to not be sorted. This means that the logo plot will not have sites in sorted order.
ydatamax : meaningful only if datatype is ‘diffprefs’. In this case, it gives the maximum that the logo stacks extend in the positive and negative directions. Cannot be smaller than the maximum extent of the differential preferences.
ylimits: is mandatory if datatype is ‘diffsel’, and meaningless otherwise. It is (ymin, ymax) where ymax > 0 > ymin, and gives extent of the data in the positive and negative directions. Must encompass the actual maximum and minimum of the data.
overlay : make overlay bars that indicate other properties for the sites. If you set to something other than None, it should be a list giving one to three properties. Each property is a tuple: (prop_d, shortname, longname) where:
prop_d is a dictionary keyed by site numbers that are in sites. For each r in sites, prop_d[r] gives the value of the property, or if there is no entry in prop_d for r, then the property is undefined and is colored white. Properties can either be:
continuous: in this case, all of the values should be numbers.
discrete : in this case, all of the values should be strings. While in practice, if you have more than a few discrete categories (different strings), the plot will be a mess.
shortname : short name for the property; will not format well if more than 4 or 5 characters.
longname : longer name for property used on axes label. Can be the same as shortname if you don’t need a different long name.
In the special case where both shortname and longname are the string wildtype, then rather than an overlay bar we right the one-character wildtype identity in prop_d for each site.
- fix_limits is only meaningful if overlay is being used. In this case,
for any shortname in overlay that also keys an entry in fix_limits, we use fix_limits[shortname] to set the limits for shortname. Specifically, fix_limits[shortname] should be the 2-tuple (ticks, ticknames). ticks should be a list of tick locations (numbers) and ticknames should be a list of the corresponding tick label for that tick.
If fixlongname is True, then we use the longname in overlay exactly as written; otherwise we add a parenthesis indicating the shortname for which this longname stands.
overlay_cmap can be the name of a valid matplotlib.colors.Colormap, such as the string jet or bwr. Otherwise, it can be None and a (hopefully) good choice will be made for you.
custom_cmap can be the name of a valid matplotlib.colors.Colormap which will be used to color amino-acid one-letter codes in the logoplot by the map_metric when either ‘kd’ or ‘mw’ is used as map_metric. If map_metric is ‘singlecolor’, then should be string giving the color to plot.
relativestackheight indicates how high the letter stack is relative to the default. The default is multiplied by this number, so make it > 1 for a higher letter stack.
map_metric specifies the amino-acid property metric used to map colors to amino-acid letters. Valid options are ‘kd’(Kyte-Doolittle hydrophobicity scale, default), ‘mw’ (molecular weight), ‘functionalgroup’ (functional groups: small, nucleophilic, hydrophobic, aromatic, basic, acidic, and amide), ‘charge’ (charge at neutral pH), and ‘singlecolor’. If ‘charge’ is used, then the argument for custom_cmap will no longer be meaningful, since ‘charge’ uses its own blue/black/red colormapping. Similarly, ‘functionalgroup’ uses its own colormapping.
noseparator is only meaningful if datatype is ‘diffsel’ or ‘diffprefs’. If it set to True, then we do not a black horizontal line to separate positive and negative values.
underlay if True then make an underlay rather than an overlay.
scalebar: show a scale bar. If False, no scale bar shown. Otherwise should be a 2-tuple of (scalebarlen, scalebarlabel). Currently only works when data is diffsel.
-
phydmslib.weblogo.
MWColorMapping
(maptype='jet', reverse=True)¶ Maps amino-acid molecular weights to colors. Otherwise, this function is identical to KyteDoolittleColorMapping
-
phydmslib.weblogo.
SingleColorMapping
(maptype='#999999')¶ Maps all amino acids to the single color given by maptype.