Compute probabilities of escape¶

In some experiments that involve antibody selections, it is possible to spike in a “neutralization standard”, which is a set of variants known not to be affected by the antibody. In such cases, it is then possible to compute the probability of escape of each variant, which is just its change in frequency relative to the standard. For instance, if such experiments are done at enough concentrations, it is even possible to reconstruct a conventional neutralization curve.

This notebook illustrates how to use dms_variants to compute these probabilities of escape:

First, import Python modules:

[1]:

import io
import textwrap

import altair as alt

import dms_variants.codonvarianttable
import dms_variants.utils

import numpy

import pandas as pd

Compute escape probabilities¶

Read in the CodonVariantTable. These data correspond to snippets of the variant counts from a real experiment on the SARS-CoV-2 spike:

[2]:

with open("spike.txt") as f:
    spike_seq = f.read().strip()

variants = dms_variants.codonvarianttable.CodonVariantTable.from_variant_count_df(
    variant_count_df_file="prob_escape_codon_variant_table.csv",
    primary_target="spike",
    geneseq=spike_seq,
    allowgaps=True,
)

Set up a data frame giving the antibody / no-antibody sample pairings for each selection:

[3]:

selections_df = pd.read_csv(
    io.StringIO(
        textwrap.dedent(
            """\
    library,antibody_sample,no-antibody_sample,antibody_concentration
    lib1,thaw-1_REGN10933_0.037_1,thaw-1_no-antibody_control_1,0.037
    lib1,thaw-1_REGN10933_0.15_1,thaw-1_no-antibody_control_1,0.15
    lib1,thaw-1_REGN10933_0.15_2,thaw-1_no-antibody_control_2,0.15
    lib1,thaw-1_REGN10933_0.59_1,thaw-1_no-antibody_control_1,0.59
    lib1,thaw-1_REGN10933_0.59_2,thaw-1_no-antibody_control_2,0.59
    lib1,thaw-2_279C_0.00088_1,thaw-2_no-antibody_control_1,0.00088
    lib1,thaw-2_279C_0.00088_2,thaw-2_no-antibody_control_2,0.00088
    lib1,thaw-2_279C_0.0035_1,thaw-2_no-antibody_control_1,0.0035
    lib1,thaw-2_279C_0.0035_2,thaw-2_no-antibody_control_2,0.0035
    lib1,thaw-2_279C_0.014_1,thaw-2_no-antibody_control_1,0.014
    lib1,thaw-2_279C_0.014_2,thaw-2_no-antibody_control_2,0.014
    lib2,thaw-1_REGN10933_0.037_1,thaw-1_no-antibody_control_1,0.037
    lib2,thaw-1_REGN10933_0.037_2,thaw-1_no-antibody_control_2,0.037
    lib2,thaw-1_REGN10933_0.15_1,thaw-1_no-antibody_control_1,0.15
    lib2,thaw-1_REGN10933_0.15_2,thaw-1_no-antibody_control_2,0.15
    lib2,thaw-1_REGN10933_0.59_1,thaw-1_no-antibody_control_1,0.59
    lib2,thaw-1_REGN10933_0.59_2,thaw-1_no-antibody_control_2,0.59
    """
        )
    )
)

Now run CodonVariantTable.prob_escape using the above data frame to define which samples to pair in the comparisons:

[4]:

prob_escape, neut_standard_fracs, neutralization = variants.prob_escape(
    selections_df=selections_df,
)

Fraction of reads that are neutralization standard¶

First look at the returned neut_standard_fracs data frame, which gives the fraction of all reads corresponding to the neutralization standard for all samples:

[5]:

neut_standard_fracs.round(4)

[5]:

	library	antibody_sample	no-antibody_sample	antibody_concentration	antibody_count	antibody_frac	no-antibody_count	no-antibody_frac
0	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	0.0370	11107	0.0210	16475	0.0098
1	lib1	thaw-1_REGN10933_0.15_1	thaw-1_no-antibody_control_1	0.1500	18045	0.0220	16475	0.0098
2	lib1	thaw-1_REGN10933_0.15_2	thaw-1_no-antibody_control_2	0.1500	14283	0.0371	11343	0.0103
3	lib1	thaw-1_REGN10933_0.59_1	thaw-1_no-antibody_control_1	0.5900	26027	0.0636	16475	0.0098
4	lib1	thaw-1_REGN10933_0.59_2	thaw-1_no-antibody_control_2	0.5900	23240	0.0510	11343	0.0103
5	lib1	thaw-2_279C_0.00088_1	thaw-2_no-antibody_control_1	0.0009	8231	0.0157	11134	0.0157
6	lib1	thaw-2_279C_0.00088_2	thaw-2_no-antibody_control_2	0.0009	8763	0.0177	9002	0.0147
7	lib1	thaw-2_279C_0.0035_1	thaw-2_no-antibody_control_1	0.0035	9298	0.0200	11134	0.0157
8	lib1	thaw-2_279C_0.0035_2	thaw-2_no-antibody_control_2	0.0035	9394	0.0231	9002	0.0147
9	lib1	thaw-2_279C_0.014_1	thaw-2_no-antibody_control_1	0.0140	67940	0.1408	11134	0.0157
10	lib1	thaw-2_279C_0.014_2	thaw-2_no-antibody_control_2	0.0140	60115	0.1545	9002	0.0147
11	lib2	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	0.0370	14003	0.0152	11144	0.0103
12	lib2	thaw-1_REGN10933_0.037_2	thaw-1_no-antibody_control_2	0.0370	12780	0.0233	10877	0.0088
13	lib2	thaw-1_REGN10933_0.15_1	thaw-1_no-antibody_control_1	0.1500	22879	0.0271	11144	0.0103
14	lib2	thaw-1_REGN10933_0.15_2	thaw-1_no-antibody_control_2	0.1500	20040	0.0263	10877	0.0088
15	lib2	thaw-1_REGN10933_0.59_1	thaw-1_no-antibody_control_1	0.5900	57699	0.0760	11144	0.0103
16	lib2	thaw-1_REGN10933_0.59_2	thaw-1_no-antibody_control_2	0.5900	52409	0.0671	10877	0.0088

Now plot the fraction of all reads corresponding to the neutralization standard for all samples, making an interactive altair plot where you can mouse over points for details and select which libraries to show with the drop down at the bottom. Note the plot uses a symlog scale:

[6]:

# make tidy version of neut_standard_fracs
melt_cols = ["antibody_frac", "no-antibody_frac"]
neut_standard_fracs_tidy = neut_standard_fracs.melt(
    id_vars=[c for c in neut_standard_fracs.columns if c not in melt_cols],
    value_vars=melt_cols,
    value_name="neut_standard_frac",
    var_name="sample_type",
).assign(
    sample_type=lambda x: x["sample_type"].str.replace("_frac", ""),
    library_sample=lambda x: x["library"] + " " + x["antibody_sample"],
)

[7]:

# NBVAL_IGNORE_OUTPUT

# set up selections over other columns of interest, in this case just library:
selections = [
    alt.selection_point(
        fields=[col],
        bind=alt.binding_select(
            options=[None] + neut_standard_fracs_tidy[col].unique().tolist(),
            labels=["all"] + [str(x) for x in neut_standard_fracs_tidy[col].unique()],
            name=col,
        ),
    )
    for col in ["library"]
]

neut_standard_fracs_chart = (
    alt.Chart(neut_standard_fracs_tidy)
    .encode(
        x=alt.X(
            "neut_standard_frac",
            title="neutralization standard fraction",
            scale=alt.Scale(type="symlog", constant=0.02, domainMax=1),
        ),
        y=alt.Y("library_sample", title=None),
        color="sample_type",
        shape="sample_type",
        tooltip=[
            alt.Tooltip(c, format=".2g") if c == "neut_standard_frac" else c
            for c in neut_standard_fracs_tidy.columns
        ],
    )
    .mark_point()
    .properties(width=275, height=alt.Step(14))
    .add_params(*selections)
    .configure_axis(labelLimit=500)
)
for selection in selections:
    neut_standard_fracs_chart = neut_standard_fracs_chart.transform_filter(selection)

neut_standard_fracs_chart

[7]:

Neutralization averaged over all variants with given number of mutations¶

Now look at neutralization, which gives the extent of neutralization averages over all variants with each number of amino-acid mutations:

[8]:

neutralization.round(3)

[8]:

	library	antibody_sample	no-antibody_sample	n_aa_substitutions	antibody_neut_standard_count	no-antibody_neut_standard_count	antibody_count	no-antibody_count	prob_escape_uncensored	prob_escape
0	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	0	11107	16475	42343	140698	0.446	0.446
1	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	1	11107	16475	102466	340357	0.447	0.447
2	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	2	11107	16475	124509	404439	0.457	0.457
3	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	3	11107	16475	106742	342394	0.462	0.462
4	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	4	11107	16475	141667	442459	0.475	0.475
...	...	...	...	...	...	...	...	...	...	...
80	lib2	thaw-1_REGN10933_0.59_2	thaw-1_no-antibody_control_2	0	52409	10877	70691	174569	0.084	0.084
81	lib2	thaw-1_REGN10933_0.59_2	thaw-1_no-antibody_control_2	1	52409	10877	179056	361718	0.103	0.103
82	lib2	thaw-1_REGN10933_0.59_2	thaw-1_no-antibody_control_2	2	52409	10877	201992	331201	0.127	0.127
83	lib2	thaw-1_REGN10933_0.59_2	thaw-1_no-antibody_control_2	3	52409	10877	141530	219133	0.134	0.134
84	lib2	thaw-1_REGN10933_0.59_2	thaw-1_no-antibody_control_2	4	52409	10877	135208	136249	0.206	0.206

85 rows × 10 columns

Now plot neutralization. First, get the neutralization data ready to plot by melting, and doing a few other transformations:

[9]:

cols_to_melt = ["prob_escape", "prob_escape_uncensored"]
neutralization_to_plot = (
    neutralization.melt(
        id_vars=[c for c in neutralization.columns if c not in cols_to_melt],
        var_name="censored",
        value_name="fraction not neutralized",
    )
    .assign(
        library_sample=lambda x: x["library"] + " " + x["antibody_sample"],
        censored=lambda x: x["censored"] == "prob_escape",
        n_aa_substitutions=lambda x: (
            x["n_aa_substitutions"].map(
                lambda n: f">{n - 1}" if n == x["n_aa_substitutions"].max() else str(n)
            )
        ),
    )
    .rename(
        columns={
            c: c.replace("_", " ")
            for c in neutralization.columns
            if c.endswith("count")
        }
    )
    .merge(selections_df, validate="many_to_one")
)

Now make the plot. You can mouseover points for details, click on the legend to get values for a specific number of amino-acid substitutions, and use the drop-down to show censored (to [0, 1]) values or raw values, and subset which data to show:

[10]:

# NBVAL_SKIP

censored_selection = alt.selection_point(
    fields=["censored"],
    value=[{"censored": True}],
    bind=alt.binding_select(
        options=[True, False],
        name="censored to be between 0 and 1",
    ),
)

n_aa_substitutions_selection = alt.selection_point(
    fields=["n_aa_substitutions"],
    bind="legend",
)

neutralization_chart = (
    alt.Chart(neutralization_to_plot)
    .encode(
        x=alt.X("fraction not neutralized"),
        y=alt.Y("library_sample", title=None),
        color="n_aa_substitutions:N",
        opacity=alt.condition(
            n_aa_substitutions_selection, alt.value(0.7), alt.value(0)
        ),
        tooltip=[
            (
                alt.Tooltip(c, format=".3g")
                if c.endswith("_count")
                or c.startswith("fraction")
                or c == "antibody_concentration"
                else c
            )
            for c in neutralization_to_plot.columns
            if c not in ["library_sample"]
        ],
    )
    .mark_point(filled=True, size=50)
    .properties(width=275, height=alt.Step(14))
    .add_params(censored_selection, n_aa_substitutions_selection, *selections)
    .transform_filter(censored_selection)
    .configure_axis(labelLimit=500)
)
for selection in selections:
    neutralization_chart = neutralization_chart.transform_filter(selection)

neutralization_chart

[10]:

Variant-level probabilities of escape¶

Now we examine the variant-level probabilities of escape.

[11]:

prob_escape.round(3).head()

[11]:

	library	antibody_sample	no-antibody_sample	codon_substitutions	n_codon_substitutions	aa_substitutions	n_aa_substitutions	barcode	prob_escape	prob_escape_uncensored	antibody_count	no-antibody_count	antibody_neut_standard_count	no-antibody_neut_standard_count
0	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	AAG180AGG CAT653CAG	2	K180R H653Q	2	AACTCAAAATCCTTAG	0.510	0.510	1794	5218	11107	16475
1	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	CAC146CAG TCC151--- GCT417GCA AGC475ATC ACC111...	6	H146Q S151- S475I I1208T	4	AAGGTCCCGGAGTAAC	0.628	0.628	1733	4094	11107	16475
2	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	AGA450AAG TTG580AAC AGC744AGT CAG955AAG	4	R450K L580N Q955K	3	AAAGCTGGAGTACGTA	0.593	0.593	1428	3574	11107	16475
3	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1	AGC98CAC GTG306ATC AGC475ATC GAT1082AAT CTG119...	5	S98H V306I S475I D1082N L1191M	5	AACCCAGCACCGTTGA	0.669	0.669	1350	2994	11107	16475
4	lib1	thaw-1_REGN10933_0.037_1	thaw-1_no-antibody_control_1		0		0	AATCACAACCTTTACC	0.478	0.478	1318	4087	11107	16475

First, we filter on pre-selection counts (counts in the no-antibody condition) and pre-selection fraction (fraction in no-antibody condition among all variants). This enables us to calculate a minimum threshold for each sample, which is the larger of this counts or fraction threshold for that sample:

[12]:

min_no_antibody_counts = 20  # pre-selection counts must be >= this
min_no_antibody_frac = 0.00001  # pre-selection counts must >= this fraction of total

min_no_antibody_count_threshold_df = (
    prob_escape.groupby(["library", "no-antibody_sample"], as_index=False)
    .aggregate(total_no_antibody_count=pd.NamedAgg("no-antibody_count", "sum"))
    .assign(
        threshold=lambda x: (x["total_no_antibody_count"] * min_no_antibody_frac)
        .clip(lower=min_no_antibody_counts)
        .round()
        .astype(int),
    )
)

min_no_antibody_count_threshold_df

[12]:

	library	no-antibody_sample	total_no_antibody_count	threshold
0	lib1	thaw-1_no-antibody_control_1	5011041	50
1	lib1	thaw-1_no-antibody_control_2	2186016	22
2	lib1	thaw-2_no-antibody_control_1	2096481	21
3	lib1	thaw-2_no-antibody_control_2	1812264	20
4	lib2	thaw-1_no-antibody_control_1	3199962	32
5	lib2	thaw-1_no-antibody_control_2	3668610	37

Now draw a box plot on the pre-selection (no-antibody) counts with black line at median, boxes spanning 25th to 75th percentile, whiskers spanning minimum to maximum, and a red line indicating the threshold. You can mouseover bars for details:

[13]:

no_antibody_count_boxplot_df = (
    prob_escape.groupby(
        ["library", "antibody_sample", "no-antibody_sample"], as_index=False
    )
    .aggregate(
        median=pd.NamedAgg("no-antibody_count", "median"),
        percentile_25=pd.NamedAgg("no-antibody_count", lambda s: s.quantile(0.25)),
        percentile_75=pd.NamedAgg("no-antibody_count", lambda s: s.quantile(0.75)),
        min=pd.NamedAgg("no-antibody_count", "min"),
        max=pd.NamedAgg("no-antibody_count", "max"),
    )
    .merge(
        min_no_antibody_count_threshold_df.drop(columns="total_no_antibody_count"),
        validate="many_to_one",
    )
    .assign(library_sample=lambda x: x["library"] + " " + x["antibody_sample"])
)
assert (
    len(no_antibody_count_boxplot_df)
    == no_antibody_count_boxplot_df["library_sample"].nunique()
)

[14]:

# NBVAL_SKIP

no_antibody_count_base = alt.Chart(no_antibody_count_boxplot_df).encode(
    y=alt.Y("library_sample", title=None),
    tooltip=[
        c if (c.endswith("sample") or c == "library") else alt.Tooltip(c, format=".3g")
        for c in no_antibody_count_boxplot_df.columns
        if c != "library_sample"
    ],
)

no_antibody_count_quartile_bars = no_antibody_count_base.encode(
    alt.X(
        "percentile_25",
        scale=alt.Scale(type="symlog", constant=20),
        title="counts for variant",
    ),
    alt.X2("percentile_75"),
).mark_bar(color="blue")

no_antibody_count_range_lines = no_antibody_count_base.encode(
    alt.X("min"),
    alt.X2("max"),
).mark_rule(color="blue", opacity=0.5)

no_antibody_count_median_lines = no_antibody_count_base.encode(
    alt.X("median"), alt.X2("median")
).mark_bar(xOffset=1, x2Offset=-1, color="black")

no_antibody_count_threshold = no_antibody_count_base.encode(
    alt.X("threshold"), alt.X2("threshold")
).mark_bar(xOffset=1, x2Offset=-1, color="red")

no_antibody_count_chart = (
    no_antibody_count_quartile_bars
    + no_antibody_count_range_lines
    + no_antibody_count_median_lines
    + no_antibody_count_threshold
).properties(width=350, height=alt.Step(14))

for s in selections:
    no_antibody_count_chart = no_antibody_count_chart.add_params(s).transform_filter(s)

no_antibody_count_chart

[14]:

Now get just prob escape measurements for variants that exceed pre-selection (no-antibody) count threshold:

[15]:

prob_escape_filtered = prob_escape.merge(
    min_no_antibody_count_threshold_df, validate="many_to_one"
).query("`no-antibody_count` >= threshold")

Plot distribution of prob escape values across non-filtered variants. The boxes span 25th to 75th percentile, the black vertical line is at the median, and the thin lines extend from the min to max:

[16]:

prob_escape_boxplot_df = (
    prob_escape_filtered.melt(
        id_vars=["library", "antibody_sample", "no-antibody_sample"],
        value_vars=["prob_escape", "prob_escape_uncensored"],
        var_name="censored",
        value_name="probability escape",
    )
    .groupby(
        ["library", "antibody_sample", "no-antibody_sample", "censored"],
        as_index=False,
    )
    .aggregate(
        median=pd.NamedAgg("probability escape", "median"),
        percentile_25=pd.NamedAgg("probability escape", lambda s: s.quantile(0.25)),
        percentile_75=pd.NamedAgg("probability escape", lambda s: s.quantile(0.75)),
        min=pd.NamedAgg("probability escape", "min"),
        max=pd.NamedAgg("probability escape", "max"),
    )
    .assign(
        library_sample=lambda x: x["library"] + " " + x["antibody_sample"],
        censored=lambda x: x["censored"] == "prob_escape",
    )
)
assert (
    len(prob_escape_boxplot_df)
    == 2 * prob_escape_boxplot_df["library_sample"].nunique()
)

[17]:

# NBVAL_SKIP

prob_escape_base = alt.Chart(prob_escape_boxplot_df).encode(
    y=alt.Y("library_sample", title=None),
    tooltip=[
        c if (c.endswith("sample") or c == "library") else alt.Tooltip(c, format=".3g")
        for c in prob_escape_boxplot_df.columns
        if c != "library_sample"
    ],
)

prob_escape_quartile_bars = prob_escape_base.encode(
    alt.X(
        "percentile_25",
        scale=alt.Scale(type="symlog", constant=20),
        title="probability escape",
    ),
    alt.X2("percentile_75"),
).mark_bar(color="blue")

prob_escape_range_lines = prob_escape_base.encode(
    alt.X("min"),
    alt.X2("max"),
).mark_rule(color="blue", opacity=0.5)

prob_escape_median_lines = prob_escape_base.encode(
    alt.X("median"), alt.X2("median")
).mark_bar(xOffset=1, x2Offset=-1, color="black")

prob_escape_chart = (
    prob_escape_quartile_bars + prob_escape_range_lines + prob_escape_median_lines
).properties(width=350, height=alt.Step(14))

for s in selections + [censored_selection]:
    prob_escape_chart = prob_escape_chart.add_params(s).transform_filter(s)

prob_escape_chart

[17]:

Correlations in variant-level escape probabilitites¶

Analyze correlations between escape probabilities of different variants.

[18]:

assert len(
    prob_escape_filtered.groupby(["library", "antibody_sample", "no-antibody_sample"])
) == len(prob_escape_filtered.groupby(["library", "antibody_sample"]))

corrs = (
    dms_variants.utils.tidy_to_corr(
        df=prob_escape_filtered,
        sample_col="antibody_sample",
        label_col="barcode",
        value_col="prob_escape",
        group_cols=["library"],
    )
    .assign(r2=lambda x: x["correlation"] ** 2)
    .drop(columns="correlation")
)

# add other properties
suffixes = ["_1", "_2"]
for suffix in suffixes:
    corrs = corrs.merge(
        selections_df,
        left_on=["library", f"antibody_sample{suffix}"],
        right_on=["library", "antibody_sample"],
        validate="many_to_one",
        suffixes=suffixes,
    ).drop(columns="antibody_sample")

corrs.round(3).head()

[18]:

	library	antibody_sample_1	antibody_sample_2	r2	no-antibody_sample_1	antibody_concentration_1	no-antibody_sample_2	antibody_concentration_2
0	lib1	thaw-1_REGN10933_0.037_1	thaw-1_REGN10933_0.037_1	1.000	thaw-1_no-antibody_control_1	0.037	thaw-1_no-antibody_control_1	0.037
1	lib1	thaw-1_REGN10933_0.15_1	thaw-1_REGN10933_0.037_1	0.100	thaw-1_no-antibody_control_1	0.150	thaw-1_no-antibody_control_1	0.037
2	lib1	thaw-1_REGN10933_0.15_2	thaw-1_REGN10933_0.037_1	0.001	thaw-1_no-antibody_control_2	0.150	thaw-1_no-antibody_control_1	0.037
3	lib1	thaw-1_REGN10933_0.59_1	thaw-1_REGN10933_0.037_1	0.020	thaw-1_no-antibody_control_1	0.590	thaw-1_no-antibody_control_1	0.037
4	lib1	thaw-1_REGN10933_0.59_2	thaw-1_REGN10933_0.037_1	0.000	thaw-1_no-antibody_control_2	0.590	thaw-1_no-antibody_control_1	0.037

[19]:

# NBVAL_SKIP

for library, library_corr in corrs.groupby("library"):
    corr_chart = (
        alt.Chart(library_corr)
        .encode(
            alt.X("antibody_sample_1", title=None),
            alt.Y("antibody_sample_2", title=None),
            color=alt.Color("r2", scale=alt.Scale(zero=True)),
            tooltip=[
                (
                    alt.Tooltip(c, format=".3g")
                    if c == "r2" or c.startswith("antibody_concentration")
                    else c
                )
                for c in library_corr.columns
            ],
        )
        .mark_rect(stroke="black")
        .properties(width=alt.Step(15), height=alt.Step(15), title=library)
        .configure_axis(labelLimit=500)
    )
    display(corr_chart)

Compute probabilities of escape¶

Compute escape probabilities¶

Fraction of reads that are neutralization standard¶

Neutralization averaged over all variants with given number of mutations¶

Variant-level probabilities of escape¶

Correlations in variant-level escape probabilitites¶

dms_variants

Navigation

Related Topics