Compute probabilities of escape

In some experiments that involve antibody selections, it is possible to spike in a “neutralization standard”, which is a set of variants known not to be affected by the antibody. In such cases, it is then possible to compute the probability of escape of each variant, which is just its change in frequency relative to the standard. For instance, if such experiments are done at enough concentrations, it is even possible to reconstruct a conventional neutralization curve.

This notebook illustrates how to use dms_variants to compute these probabilities of escape:

First, import Python modules:

[1]:
import io
import textwrap

import altair as alt

import dms_variants.codonvarianttable
import dms_variants.utils

import numpy

import pandas as pd

Compute escape probabilities

Read in the CodonVariantTable. These data correspond to snippets of the variant counts from a real experiment on the SARS-CoV-2 spike:

[2]:
with open("spike.txt") as f:
    spike_seq = f.read().strip()

variants = dms_variants.codonvarianttable.CodonVariantTable.from_variant_count_df(
    variant_count_df_file="prob_escape_codon_variant_table.csv",
    primary_target="spike",
    geneseq=spike_seq,
    allowgaps=True,
)

Set up a data frame giving the antibody / no-antibody sample pairings for each selection:

[3]:
selections_df = pd.read_csv(
    io.StringIO(
        textwrap.dedent(
            """\
    library,antibody_sample,no-antibody_sample,antibody_concentration
    lib1,thaw-1_REGN10933_0.037_1,thaw-1_no-antibody_control_1,0.037
    lib1,thaw-1_REGN10933_0.15_1,thaw-1_no-antibody_control_1,0.15
    lib1,thaw-1_REGN10933_0.15_2,thaw-1_no-antibody_control_2,0.15
    lib1,thaw-1_REGN10933_0.59_1,thaw-1_no-antibody_control_1,0.59
    lib1,thaw-1_REGN10933_0.59_2,thaw-1_no-antibody_control_2,0.59
    lib1,thaw-2_279C_0.00088_1,thaw-2_no-antibody_control_1,0.00088
    lib1,thaw-2_279C_0.00088_2,thaw-2_no-antibody_control_2,0.00088
    lib1,thaw-2_279C_0.0035_1,thaw-2_no-antibody_control_1,0.0035
    lib1,thaw-2_279C_0.0035_2,thaw-2_no-antibody_control_2,0.0035
    lib1,thaw-2_279C_0.014_1,thaw-2_no-antibody_control_1,0.014
    lib1,thaw-2_279C_0.014_2,thaw-2_no-antibody_control_2,0.014
    lib2,thaw-1_REGN10933_0.037_1,thaw-1_no-antibody_control_1,0.037
    lib2,thaw-1_REGN10933_0.037_2,thaw-1_no-antibody_control_2,0.037
    lib2,thaw-1_REGN10933_0.15_1,thaw-1_no-antibody_control_1,0.15
    lib2,thaw-1_REGN10933_0.15_2,thaw-1_no-antibody_control_2,0.15
    lib2,thaw-1_REGN10933_0.59_1,thaw-1_no-antibody_control_1,0.59
    lib2,thaw-1_REGN10933_0.59_2,thaw-1_no-antibody_control_2,0.59
    """
        )
    )
)

Now run CodonVariantTable.prob_escape using the above data frame to define which samples to pair in the comparisons:

[4]:
prob_escape, neut_standard_fracs, neutralization = variants.prob_escape(
    selections_df=selections_df,
)

Fraction of reads that are neutralization standard

First look at the returned neut_standard_fracs data frame, which gives the fraction of all reads corresponding to the neutralization standard for all samples:

[5]:
neut_standard_fracs.round(4)
[5]:
library antibody_sample no-antibody_sample antibody_concentration antibody_count antibody_frac no-antibody_count no-antibody_frac
0 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 0.0370 11107 0.0210 16475 0.0098
1 lib1 thaw-1_REGN10933_0.15_1 thaw-1_no-antibody_control_1 0.1500 18045 0.0220 16475 0.0098
2 lib1 thaw-1_REGN10933_0.15_2 thaw-1_no-antibody_control_2 0.1500 14283 0.0371 11343 0.0103
3 lib1 thaw-1_REGN10933_0.59_1 thaw-1_no-antibody_control_1 0.5900 26027 0.0636 16475 0.0098
4 lib1 thaw-1_REGN10933_0.59_2 thaw-1_no-antibody_control_2 0.5900 23240 0.0510 11343 0.0103
5 lib1 thaw-2_279C_0.00088_1 thaw-2_no-antibody_control_1 0.0009 8231 0.0157 11134 0.0157
6 lib1 thaw-2_279C_0.00088_2 thaw-2_no-antibody_control_2 0.0009 8763 0.0177 9002 0.0147
7 lib1 thaw-2_279C_0.0035_1 thaw-2_no-antibody_control_1 0.0035 9298 0.0200 11134 0.0157
8 lib1 thaw-2_279C_0.0035_2 thaw-2_no-antibody_control_2 0.0035 9394 0.0231 9002 0.0147
9 lib1 thaw-2_279C_0.014_1 thaw-2_no-antibody_control_1 0.0140 67940 0.1408 11134 0.0157
10 lib1 thaw-2_279C_0.014_2 thaw-2_no-antibody_control_2 0.0140 60115 0.1545 9002 0.0147
11 lib2 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 0.0370 14003 0.0152 11144 0.0103
12 lib2 thaw-1_REGN10933_0.037_2 thaw-1_no-antibody_control_2 0.0370 12780 0.0233 10877 0.0088
13 lib2 thaw-1_REGN10933_0.15_1 thaw-1_no-antibody_control_1 0.1500 22879 0.0271 11144 0.0103
14 lib2 thaw-1_REGN10933_0.15_2 thaw-1_no-antibody_control_2 0.1500 20040 0.0263 10877 0.0088
15 lib2 thaw-1_REGN10933_0.59_1 thaw-1_no-antibody_control_1 0.5900 57699 0.0760 11144 0.0103
16 lib2 thaw-1_REGN10933_0.59_2 thaw-1_no-antibody_control_2 0.5900 52409 0.0671 10877 0.0088

Now plot the fraction of all reads corresponding to the neutralization standard for all samples, making an interactive altair plot where you can mouse over points for details and select which libraries to show with the drop down at the bottom. Note the plot uses a symlog scale:

[6]:
# make tidy version of neut_standard_fracs
melt_cols = ["antibody_frac", "no-antibody_frac"]
neut_standard_fracs_tidy = neut_standard_fracs.melt(
    id_vars=[c for c in neut_standard_fracs.columns if c not in melt_cols],
    value_vars=melt_cols,
    value_name="neut_standard_frac",
    var_name="sample_type",
).assign(
    sample_type=lambda x: x["sample_type"].str.replace("_frac", ""),
    library_sample=lambda x: x["library"] + " " + x["antibody_sample"],
)
[7]:
# NBVAL_IGNORE_OUTPUT

# set up selections over other columns of interest, in this case just library:
selections = [
    alt.selection_point(
        fields=[col],
        bind=alt.binding_select(
            options=[None] + neut_standard_fracs_tidy[col].unique().tolist(),
            labels=["all"] + [str(x) for x in neut_standard_fracs_tidy[col].unique()],
            name=col,
        ),
    )
    for col in ["library"]
]

neut_standard_fracs_chart = (
    alt.Chart(neut_standard_fracs_tidy)
    .encode(
        x=alt.X(
            "neut_standard_frac",
            title="neutralization standard fraction",
            scale=alt.Scale(type="symlog", constant=0.02, domainMax=1),
        ),
        y=alt.Y("library_sample", title=None),
        color="sample_type",
        shape="sample_type",
        tooltip=[
            alt.Tooltip(c, format=".2g") if c == "neut_standard_frac" else c
            for c in neut_standard_fracs_tidy.columns
        ],
    )
    .mark_point()
    .properties(width=275, height=alt.Step(14))
    .add_params(*selections)
    .configure_axis(labelLimit=500)
)
for selection in selections:
    neut_standard_fracs_chart = neut_standard_fracs_chart.transform_filter(selection)

neut_standard_fracs_chart
[7]:

Neutralization averaged over all variants with given number of mutations

Now look at neutralization, which gives the extent of neutralization averages over all variants with each number of amino-acid mutations:

[8]:
neutralization.round(3)
[8]:
library antibody_sample no-antibody_sample n_aa_substitutions antibody_neut_standard_count no-antibody_neut_standard_count antibody_count no-antibody_count prob_escape_uncensored prob_escape
0 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 0 11107 16475 42343 140698 0.446 0.446
1 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 1 11107 16475 102466 340357 0.447 0.447
2 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 2 11107 16475 124509 404439 0.457 0.457
3 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 3 11107 16475 106742 342394 0.462 0.462
4 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 4 11107 16475 141667 442459 0.475 0.475
... ... ... ... ... ... ... ... ... ... ...
80 lib2 thaw-1_REGN10933_0.59_2 thaw-1_no-antibody_control_2 0 52409 10877 70691 174569 0.084 0.084
81 lib2 thaw-1_REGN10933_0.59_2 thaw-1_no-antibody_control_2 1 52409 10877 179056 361718 0.103 0.103
82 lib2 thaw-1_REGN10933_0.59_2 thaw-1_no-antibody_control_2 2 52409 10877 201992 331201 0.127 0.127
83 lib2 thaw-1_REGN10933_0.59_2 thaw-1_no-antibody_control_2 3 52409 10877 141530 219133 0.134 0.134
84 lib2 thaw-1_REGN10933_0.59_2 thaw-1_no-antibody_control_2 4 52409 10877 135208 136249 0.206 0.206

85 rows × 10 columns

Now plot neutralization. First, get the neutralization data ready to plot by melting, and doing a few other transformations:

[9]:
cols_to_melt = ["prob_escape", "prob_escape_uncensored"]
neutralization_to_plot = (
    neutralization.melt(
        id_vars=[c for c in neutralization.columns if c not in cols_to_melt],
        var_name="censored",
        value_name="fraction not neutralized",
    )
    .assign(
        library_sample=lambda x: x["library"] + " " + x["antibody_sample"],
        censored=lambda x: x["censored"] == "prob_escape",
        n_aa_substitutions=lambda x: (
            x["n_aa_substitutions"].map(
                lambda n: f">{n - 1}" if n == x["n_aa_substitutions"].max() else str(n)
            )
        ),
    )
    .rename(
        columns={
            c: c.replace("_", " ")
            for c in neutralization.columns
            if c.endswith("count")
        }
    )
    .merge(selections_df, validate="many_to_one")
)

Now make the plot. You can mouseover points for details, click on the legend to get values for a specific number of amino-acid substitutions, and use the drop-down to show censored (to [0, 1]) values or raw values, and subset which data to show:

[10]:
# NBVAL_SKIP

censored_selection = alt.selection_point(
    fields=["censored"],
    value=[{"censored": True}],
    bind=alt.binding_select(
        options=[True, False],
        name="censored to be between 0 and 1",
    ),
)

n_aa_substitutions_selection = alt.selection_point(
    fields=["n_aa_substitutions"],
    bind="legend",
)

neutralization_chart = (
    alt.Chart(neutralization_to_plot)
    .encode(
        x=alt.X("fraction not neutralized"),
        y=alt.Y("library_sample", title=None),
        color="n_aa_substitutions:N",
        opacity=alt.condition(
            n_aa_substitutions_selection, alt.value(0.7), alt.value(0)
        ),
        tooltip=[
            (
                alt.Tooltip(c, format=".3g")
                if c.endswith("_count")
                or c.startswith("fraction")
                or c == "antibody_concentration"
                else c
            )
            for c in neutralization_to_plot.columns
            if c not in ["library_sample"]
        ],
    )
    .mark_point(filled=True, size=50)
    .properties(width=275, height=alt.Step(14))
    .add_params(censored_selection, n_aa_substitutions_selection, *selections)
    .transform_filter(censored_selection)
    .configure_axis(labelLimit=500)
)
for selection in selections:
    neutralization_chart = neutralization_chart.transform_filter(selection)

neutralization_chart
[10]:

Variant-level probabilities of escape

Now we examine the variant-level probabilities of escape.

[11]:
prob_escape.round(3).head()
[11]:
library antibody_sample no-antibody_sample codon_substitutions n_codon_substitutions aa_substitutions n_aa_substitutions barcode prob_escape prob_escape_uncensored antibody_count no-antibody_count antibody_neut_standard_count no-antibody_neut_standard_count
0 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 AAG180AGG CAT653CAG 2 K180R H653Q 2 AACTCAAAATCCTTAG 0.510 0.510 1794 5218 11107 16475
1 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 CAC146CAG TCC151--- GCT417GCA AGC475ATC ACC111... 6 H146Q S151- S475I I1208T 4 AAGGTCCCGGAGTAAC 0.628 0.628 1733 4094 11107 16475
2 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 AGA450AAG TTG580AAC AGC744AGT CAG955AAG 4 R450K L580N Q955K 3 AAAGCTGGAGTACGTA 0.593 0.593 1428 3574 11107 16475
3 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 AGC98CAC GTG306ATC AGC475ATC GAT1082AAT CTG119... 5 S98H V306I S475I D1082N L1191M 5 AACCCAGCACCGTTGA 0.669 0.669 1350 2994 11107 16475
4 lib1 thaw-1_REGN10933_0.037_1 thaw-1_no-antibody_control_1 0 0 AATCACAACCTTTACC 0.478 0.478 1318 4087 11107 16475

First, we filter on pre-selection counts (counts in the no-antibody condition) and pre-selection fraction (fraction in no-antibody condition among all variants). This enables us to calculate a minimum threshold for each sample, which is the larger of this counts or fraction threshold for that sample:

[12]:
min_no_antibody_counts = 20  # pre-selection counts must be >= this
min_no_antibody_frac = 0.00001  # pre-selection counts must >= this fraction of total

min_no_antibody_count_threshold_df = (
    prob_escape.groupby(["library", "no-antibody_sample"], as_index=False)
    .aggregate(total_no_antibody_count=pd.NamedAgg("no-antibody_count", "sum"))
    .assign(
        threshold=lambda x: (x["total_no_antibody_count"] * min_no_antibody_frac)
        .clip(lower=min_no_antibody_counts)
        .round()
        .astype(int),
    )
)

min_no_antibody_count_threshold_df
[12]:
library no-antibody_sample total_no_antibody_count threshold
0 lib1 thaw-1_no-antibody_control_1 5011041 50
1 lib1 thaw-1_no-antibody_control_2 2186016 22
2 lib1 thaw-2_no-antibody_control_1 2096481 21
3 lib1 thaw-2_no-antibody_control_2 1812264 20
4 lib2 thaw-1_no-antibody_control_1 3199962 32
5 lib2 thaw-1_no-antibody_control_2 3668610 37

Now draw a box plot on the pre-selection (no-antibody) counts with black line at median, boxes spanning 25th to 75th percentile, whiskers spanning minimum to maximum, and a red line indicating the threshold. You can mouseover bars for details:

[13]:
no_antibody_count_boxplot_df = (
    prob_escape.groupby(
        ["library", "antibody_sample", "no-antibody_sample"], as_index=False
    )
    .aggregate(
        median=pd.NamedAgg("no-antibody_count", "median"),
        percentile_25=pd.NamedAgg("no-antibody_count", lambda s: s.quantile(0.25)),
        percentile_75=pd.NamedAgg("no-antibody_count", lambda s: s.quantile(0.75)),
        min=pd.NamedAgg("no-antibody_count", "min"),
        max=pd.NamedAgg("no-antibody_count", "max"),
    )
    .merge(
        min_no_antibody_count_threshold_df.drop(columns="total_no_antibody_count"),
        validate="many_to_one",
    )
    .assign(library_sample=lambda x: x["library"] + " " + x["antibody_sample"])
)
assert (
    len(no_antibody_count_boxplot_df)
    == no_antibody_count_boxplot_df["library_sample"].nunique()
)
[14]:
# NBVAL_SKIP

no_antibody_count_base = alt.Chart(no_antibody_count_boxplot_df).encode(
    y=alt.Y("library_sample", title=None),
    tooltip=[
        c if (c.endswith("sample") or c == "library") else alt.Tooltip(c, format=".3g")
        for c in no_antibody_count_boxplot_df.columns
        if c != "library_sample"
    ],
)

no_antibody_count_quartile_bars = no_antibody_count_base.encode(
    alt.X(
        "percentile_25",
        scale=alt.Scale(type="symlog", constant=20),
        title="counts for variant",
    ),
    alt.X2("percentile_75"),
).mark_bar(color="blue")

no_antibody_count_range_lines = no_antibody_count_base.encode(
    alt.X("min"),
    alt.X2("max"),
).mark_rule(color="blue", opacity=0.5)

no_antibody_count_median_lines = no_antibody_count_base.encode(
    alt.X("median"), alt.X2("median")
).mark_bar(xOffset=1, x2Offset=-1, color="black")

no_antibody_count_threshold = no_antibody_count_base.encode(
    alt.X("threshold"), alt.X2("threshold")
).mark_bar(xOffset=1, x2Offset=-1, color="red")

no_antibody_count_chart = (
    no_antibody_count_quartile_bars
    + no_antibody_count_range_lines
    + no_antibody_count_median_lines
    + no_antibody_count_threshold
).properties(width=350, height=alt.Step(14))

for s in selections:
    no_antibody_count_chart = no_antibody_count_chart.add_params(s).transform_filter(s)

no_antibody_count_chart
[14]:

Now get just prob escape measurements for variants that exceed pre-selection (no-antibody) count threshold:

[15]:
prob_escape_filtered = prob_escape.merge(
    min_no_antibody_count_threshold_df, validate="many_to_one"
).query("`no-antibody_count` >= threshold")

Plot distribution of prob escape values across non-filtered variants. The boxes span 25th to 75th percentile, the black vertical line is at the median, and the thin lines extend from the min to max:

[16]:
prob_escape_boxplot_df = (
    prob_escape_filtered.melt(
        id_vars=["library", "antibody_sample", "no-antibody_sample"],
        value_vars=["prob_escape", "prob_escape_uncensored"],
        var_name="censored",
        value_name="probability escape",
    )
    .groupby(
        ["library", "antibody_sample", "no-antibody_sample", "censored"],
        as_index=False,
    )
    .aggregate(
        median=pd.NamedAgg("probability escape", "median"),
        percentile_25=pd.NamedAgg("probability escape", lambda s: s.quantile(0.25)),
        percentile_75=pd.NamedAgg("probability escape", lambda s: s.quantile(0.75)),
        min=pd.NamedAgg("probability escape", "min"),
        max=pd.NamedAgg("probability escape", "max"),
    )
    .assign(
        library_sample=lambda x: x["library"] + " " + x["antibody_sample"],
        censored=lambda x: x["censored"] == "prob_escape",
    )
)
assert (
    len(prob_escape_boxplot_df)
    == 2 * prob_escape_boxplot_df["library_sample"].nunique()
)
[17]:
# NBVAL_SKIP

prob_escape_base = alt.Chart(prob_escape_boxplot_df).encode(
    y=alt.Y("library_sample", title=None),
    tooltip=[
        c if (c.endswith("sample") or c == "library") else alt.Tooltip(c, format=".3g")
        for c in prob_escape_boxplot_df.columns
        if c != "library_sample"
    ],
)

prob_escape_quartile_bars = prob_escape_base.encode(
    alt.X(
        "percentile_25",
        scale=alt.Scale(type="symlog", constant=20),
        title="probability escape",
    ),
    alt.X2("percentile_75"),
).mark_bar(color="blue")

prob_escape_range_lines = prob_escape_base.encode(
    alt.X("min"),
    alt.X2("max"),
).mark_rule(color="blue", opacity=0.5)

prob_escape_median_lines = prob_escape_base.encode(
    alt.X("median"), alt.X2("median")
).mark_bar(xOffset=1, x2Offset=-1, color="black")

prob_escape_chart = (
    prob_escape_quartile_bars + prob_escape_range_lines + prob_escape_median_lines
).properties(width=350, height=alt.Step(14))

for s in selections + [censored_selection]:
    prob_escape_chart = prob_escape_chart.add_params(s).transform_filter(s)

prob_escape_chart
[17]:

Correlations in variant-level escape probabilitites

Analyze correlations between escape probabilities of different variants.

[18]:
assert len(
    prob_escape_filtered.groupby(["library", "antibody_sample", "no-antibody_sample"])
) == len(prob_escape_filtered.groupby(["library", "antibody_sample"]))

corrs = (
    dms_variants.utils.tidy_to_corr(
        df=prob_escape_filtered,
        sample_col="antibody_sample",
        label_col="barcode",
        value_col="prob_escape",
        group_cols=["library"],
    )
    .assign(r2=lambda x: x["correlation"] ** 2)
    .drop(columns="correlation")
)

# add other properties
suffixes = ["_1", "_2"]
for suffix in suffixes:
    corrs = corrs.merge(
        selections_df,
        left_on=["library", f"antibody_sample{suffix}"],
        right_on=["library", "antibody_sample"],
        validate="many_to_one",
        suffixes=suffixes,
    ).drop(columns="antibody_sample")

corrs.round(3).head()
[18]:
library antibody_sample_1 antibody_sample_2 r2 no-antibody_sample_1 antibody_concentration_1 no-antibody_sample_2 antibody_concentration_2
0 lib1 thaw-1_REGN10933_0.037_1 thaw-1_REGN10933_0.037_1 1.000 thaw-1_no-antibody_control_1 0.037 thaw-1_no-antibody_control_1 0.037
1 lib1 thaw-1_REGN10933_0.15_1 thaw-1_REGN10933_0.037_1 0.100 thaw-1_no-antibody_control_1 0.150 thaw-1_no-antibody_control_1 0.037
2 lib1 thaw-1_REGN10933_0.15_2 thaw-1_REGN10933_0.037_1 0.001 thaw-1_no-antibody_control_2 0.150 thaw-1_no-antibody_control_1 0.037
3 lib1 thaw-1_REGN10933_0.59_1 thaw-1_REGN10933_0.037_1 0.020 thaw-1_no-antibody_control_1 0.590 thaw-1_no-antibody_control_1 0.037
4 lib1 thaw-1_REGN10933_0.59_2 thaw-1_REGN10933_0.037_1 0.000 thaw-1_no-antibody_control_2 0.590 thaw-1_no-antibody_control_1 0.037
[19]:
# NBVAL_SKIP

for library, library_corr in corrs.groupby("library"):
    corr_chart = (
        alt.Chart(library_corr)
        .encode(
            alt.X("antibody_sample_1", title=None),
            alt.Y("antibody_sample_2", title=None),
            color=alt.Color("r2", scale=alt.Scale(zero=True)),
            tooltip=[
                (
                    alt.Tooltip(c, format=".3g")
                    if c == "r2" or c.startswith("antibody_concentration")
                    else c
                )
                for c in library_corr.columns
            ],
        )
        .mark_rect(stroke="black")
        .properties(width=alt.Step(15), height=alt.Step(15), title=library)
        .configure_axis(labelLimit=500)
    )
    display(corr_chart)