Formatting CodonVariantTable plots

A CodonVariantTable generates informative plots about a deep mutational scanning experiment. Here are some tips on how to format those plots.

Setup for notebook

Import Python modules / packages:

[1]:
import random
import tempfile
import warnings

from IPython.display import display, Image

import numpy

from plotnine import *

import dms_variants.codonvarianttable
import dms_variants.plotnine_themes
import dms_variants.simulate
from dms_variants.constants import CBPALETTE, CODONS_NOSTOP

Hide warnings that clutter output:

[2]:
warnings.simplefilter("ignore")

Simulate a CodonVariantTable

We simulate a CodonVariantTable to use to demonstrate the plot formatting. Set parameters that define the simulated data:

[3]:
seed = 1  # random number seed
genelength = 40  # gene length in codons
libs = ["lib_1", "lib_2"]  # distinct libraries of gene
variants_per_lib = 500 * genelength  # variants per library
avgmuts = 2.0  # average codon mutations per variant
bclen = 16  # length of nucleotide barcode for each variant
variant_error_rate = 0.01  # rate at which variant sequence mis-called
avgdepth_per_variant = 200  # average per-variant sequencing depth
lib_uniformity = 5  # uniformity of library pre-selection
noise = 0.02  # random noise in selections
bottlenecks = {  # bottlenecks from pre- to post-selection
    "tight_bottle": variants_per_lib * 5,
    "loose_bottle": variants_per_lib * 100,
}

Seed random number generator for reproducible output:

[4]:
random.seed(seed)

Simulate wildtype gene sequence:

[5]:
geneseq = "".join(random.choices(CODONS_NOSTOP, k=genelength))
print(f"Wildtype gene of {genelength} codons:\n{geneseq}")
Wildtype gene of 40 codons:
AGATCCGTGATTCTGCGTGCTTACACCAACTCACGGGTGAAACGTGTAATCTTATGCAACAACGACTTACCTATCCGCAACATCCGGCTGATGATGATCCTACACAACTCCGACGCTAGT

Generate a CodonVariantTable using simulate_CodonVariantTable function:

[6]:
variants = dms_variants.simulate.simulate_CodonVariantTable(
    geneseq=geneseq,
    bclen=bclen,
    library_specs={
        lib: {"avgmuts": avgmuts, "nvariants": variants_per_lib} for lib in libs
    },
    seed=seed,
)

Simulate counts for samples. First, we need a “phenotype” function to simulate the counts for each variant. We define this function using a SigmoidPhenotypeSimulator:

[7]:
phenosimulator = dms_variants.simulate.SigmoidPhenotypeSimulator(geneseq, seed=seed)

We then use the simulator to simulate some sample counts:

[8]:
counts = dms_variants.simulate.simulateSampleCounts(
    variants=variants,
    phenotype_func=phenosimulator.observedEnrichment,
    variant_error_rate=variant_error_rate,
    pre_sample={
        "total_count": variants_per_lib * numpy.random.poisson(avgdepth_per_variant),
        "uniformity": lib_uniformity,
    },
    pre_sample_name="pre-selection",
    post_samples={
        name: {
            "noise": noise,
            "total_count": variants_per_lib
            * numpy.random.poisson(avgdepth_per_variant),
            "bottleneck": bottle,
        }
        for name, bottle in bottlenecks.items()
    },
    seed=seed,
)

Add these counts to the CodonVariantTable:

[9]:
variants.add_sample_counts_df(counts)

Now we’ve completed the simulation of the CodonVariantTable:

Formatting plots

The plots returned by a CodonVariantTable above are all plotnine ggplot objects. So you can format them differently by setting a plotnine theme.

First make a plot using the default plotnine theme:

[10]:
# NBVAL_IGNORE_OUTPUT

p = variants.plotNumCodonMutsByType("all", samples=None)
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_20_0.png

The dms_variants package defines a gray grid plotnine theme in dms_variants.plotnine_themes that gives an especially nice appearance for the plots. Here we set that theme and then re-draw the above plot:

[11]:
_ = theme_set(dms_variants.plotnine_themes.theme_graygrid())
[12]:
# NBVAL_IGNORE_OUTPUT

p = variants.plotNumCodonMutsByType("all", samples=None)
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_23_0.png

The plot looks even cleaner if we get rid of the vertical grid lines:

[13]:
# NBVAL_IGNORE_OUTPUT

p = p + theme(panel_grid_major_x=element_blank())  # no vertical grid lines
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_25_0.png

There are also lots of other themes defined by plotnine:

[14]:
# NBVAL_IGNORE_OUTPUT

theme_set(theme_bw())
p = variants.plotNumCodonMutsByType("all", samples=None)
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_27_0.png

Or more silly:

[15]:
# NBVAL_IGNORE_OUTPUT

theme_set(theme_xkcd())
p = variants.plotNumCodonMutsByType(
    "all", samples=None, heightscale=1.2, widthscale=1.2
)
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_29_0.png

Note how the above call also used the heightscale and widthscale options (which exist for all plotting methods of a CodonVariantTable) to make the plot larger.

You can also set the orientation differently with orientation, and rename samples with sample_rename:

[16]:
# NBVAL_IGNORE_OUTPUT

theme_set(dms_variants.plotnine_themes.theme_graygrid())  # restore gray-grid theme

p = variants.plotNumCodonMutsByType(
    "all",
    samples="all",
    orientation="v",
    heightscale=1.2,
    sample_rename={"loose_bottle": "loose bottle", "tight_bottle": "narrow bottle"},
)
p = p + theme(panel_grid_major_x=element_blank())  # no vertical grid lines
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_31_0.png

Or only show some of the facets. For instance, just show the individual libraries:

[17]:
# NBVAL_IGNORE_OUTPUT

p = variants.plotNumCodonMutsByType(
    "all",
    samples="all",
    libraries=variants.libraries,
    sample_rename={"loose_bottle": "loose bottle", "tight_bottle": "narrow bottle"},
)
p = p + theme(panel_grid_major_x=element_blank())  # no vertical grid lines
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_33_0.png

Note that if you just specify one library, by default the library name is not shown in the facet title:

[18]:
# NBVAL_IGNORE_OUTPUT

p = variants.plotNumCodonMutsByType(
    "all",
    samples="all",
    libraries=["lib_1"],
    sample_rename={"loose_bottle": "loose bottle", "tight_bottle": "narrow bottle"},
)
p = p + theme(panel_grid_major_x=element_blank())  # no vertical grid lines
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_35_0.png

You can change this behavior by setting the one_lib_facet parameter to True:

[19]:
# NBVAL_IGNORE_OUTPUT

p = variants.plotNumCodonMutsByType(
    "all",
    samples="all",
    libraries=["lib_1"],
    sample_rename={"loose_bottle": "loose bottle", "tight_bottle": "narrow bottle"},
    one_lib_facet=True,
)
p = p + theme(panel_grid_major_x=element_blank())  # no vertical grid lines
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_37_0.png

Or only the merge of all libraries:

[20]:
# NBVAL_IGNORE_OUTPUT

p = variants.plotNumCodonMutsByType(
    "all",
    samples="all",
    libraries="all_only",
    sample_rename={"loose_bottle": "loose bottle", "tight_bottle": "narrow bottle"},
    orientation="v",
)
p = p + theme(panel_grid_major_x=element_blank())  # no vertical grid lines
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_39_0.png

Or only show some samples:

[21]:
# NBVAL_IGNORE_OUTPUT

p = variants.plotNumMutsHistogram(
    mut_type="codon",
    samples=["pre-selection", "tight_bottle"],
    orientation="v",
    sample_rename={"loose_bottle": "loose bottle", "tight_bottle": "narrow bottle"},
)
p = p + theme(panel_grid_major_x=element_blank())  # no vertical grid lines
_ = p.draw(show=True)
_images/codonvariant_plot_formatting_41_0.png

You can also save the plots to image files using their save method. Here we show how to do this, saving the plot as a PNG to a temporary file and then displaying that PNG:

[22]:
with tempfile.NamedTemporaryFile(suffix=".png") as f:
    p.save(f.name)
    display(Image(f.name))
_images/codonvariant_plot_formatting_43_0.png