Real antibody cocktail¶

In this notebook we fit real data from a deep mutational scanning experiment that used a cocktail containing two monoclonal antibodies. This mimics a polyclonal serum with a pre-defined composition.

The monoclonal antibodies in question are 1C04 and 5G04, and the deep mutational scanning is against the Influenza A/HongKong/45/2019 (H3N2) hemagglutinin, using a library designed to contain only mutations that are thought to be well tolerated.

First, import the Python modules:

[1]:

import requests
import tempfile
import time

import altair as alt

import polyclonal
import polyclonal.pdb_utils

import pandas as pd

Get the data to fit¶

Now we read the deep mutational scanning measurements, which quantify the “probability of escape” (fraction not neutralized) for each variant. For a description of key columns, see here.

[2]:

prob_escape = pd.read_csv(
    "libA_220810_1_1C04-5G04_1_prob_escape.csv", keep_default_na=False, na_values="nan"
).query(
    "`no-antibody_count` >= no_antibody_count_threshold"
)  # filter for those with sufficient no-antibody counts
assert prob_escape.notnull().all().all()
prob_escape.head()

[2]:

	library	antibody_sample	no-antibody_sample	aa_substitutions_sequential	n_aa_substitutions	barcode	prob_escape	prob_escape_uncensored	antibody_count	no-antibody_count	antibody_neut_standard_count	no-antibody_neut_standard_count	total_no_antibody_count	no_antibody_count_threshold	aa_substitutions_reference	antibody	antibody_concentration
0	libA	220810_1_antibody_1C04-5G04_3.65_1	220810_1_no-antibody_control_1	K297I	1	ATAACACAAAAAAGTA	0.0017	0.0017	78972	246344	8599550	44895	10428350	15	K278I	1C04-5G04	3.65
1	libA	220810_1_antibody_1C04-5G04_3.65_1	220810_1_no-antibody_control_1	R111S V366M R402S	3	TATCTACCTAACGAAA	0.0047	0.0047	70662	78014	8599550	44895	10428350	15	R92S V347M R383S	1C04-5G04	3.65
2	libA	220810_1_antibody_1C04-5G04_3.65_1	220810_1_no-antibody_control_1	A125M P246H I393A G398Q F411Y	5	CTTTCAATTATGAGAC	0.0370	0.0370	57908	8163	8599550	44895	10428350	15	A106M P227H I374A G379Q F392Y	1C04-5G04	3.65
3	libA	220810_1_antibody_1C04-5G04_3.65_1	220810_1_no-antibody_control_1	Y113M S143N S164N I307M I393Y E468Q	6	TGTATTAGCATTTTGA	0.0074	0.0074	37740	26593	8599550	44895	10428350	15	Y94M S124N S145N I288M I374Y E449Q	1C04-5G04	3.65
4	libA	220810_1_antibody_1C04-5G04_3.65_1	220810_1_no-antibody_control_1	G237H P246H V366M	3	CCAAGGAGCACGAAAA	0.0218	0.0218	26699	6407	8599550	44895	10428350	15	G218H P227H V347M	1C04-5G04	3.65

Display the number of variants per concentration.

[3]:

display(
    prob_escape.groupby("antibody_concentration").aggregate(
        n_variants=pd.NamedAgg("barcode", "nunique")
    )
)

	n_variants
antibody_concentration
0.114	31001
0.228	31001
0.456	31001
0.913	31001
1.825	31001
3.650	31001

Plot mean probability of escape across all variants with the indicated number of mutations. Note that this plot weights each variant the same in the means regardless of how many barcode counts it has. We plot means for both censored (set to between 0 and 1) and uncensored probabilities of escape. Also, note it uses a symlog scale for the y-axis. Mouseover points for values:

[4]:

# NBVAL_IGNORE_OUTPUT
max_aa_subs = 4  # group if >= this many substitutions

mean_prob_escape = (
    prob_escape.assign(
        n_subs=lambda x: (
            x["aa_substitutions_sequential"]
            .str.split()
            .map(len)
            .clip(upper=max_aa_subs)
            .map(lambda n: str(n) if n < max_aa_subs else f">{max_aa_subs - 1}")
        )
    )
    .groupby(["antibody_concentration", "n_subs"], as_index=False)
    .aggregate({"prob_escape": "mean", "prob_escape_uncensored": "mean"})
    .rename(
        columns={
            "prob_escape": "censored to [0, 1]",
            "prob_escape_uncensored": "not censored",
        }
    )
    .melt(
        id_vars=["antibody_concentration", "n_subs"],
        var_name="censored",
        value_name="probability escape",
    )
)

mean_prob_escape_chart = (
    alt.Chart(mean_prob_escape)
    .encode(
        x=alt.X("antibody_concentration"),
        y=alt.Y(
            "probability escape",
            scale=alt.Scale(type="symlog", constant=0.05),
        ),
        column=alt.Column("censored", title=None),
        color=alt.Color("n_subs", title="n substitutions"),
        tooltip=[
            alt.Tooltip(c, format=".3g") if mean_prob_escape[c].dtype == float else c
            for c in mean_prob_escape.columns
        ],
    )
    .mark_line(point=True, size=0.5)
    .properties(width=200, height=125)
    .configure_axis(grid=False)
)

mean_prob_escape_chart

[4]:

Now fit the model and see if we get the two antibodies. We use a higher regularization on the epitope uniqueness and escape weights to get a clean profile with two distinct antibodies:

[5]:

# NBVAL_IGNORE_OUTPUT
model = polyclonal.Polyclonal(
    n_epitopes=2,
    data_to_fit=prob_escape.rename(
        columns={
            "antibody_concentration": "concentration",
            "aa_substitutions_reference": "aa_substitutions",
        }
    ).query("concentration > 1"),
    alphabet=polyclonal.AAS_WITHSTOP_WITHGAP,
)

# fit model
opt_res = model.fit(
    logfreq=200,
    reg_uniqueness2_weight=1,
    fix_hill_coefficient=True,
    fix_non_neutralized_frac=True,
)

display(model.curves_plot())

display(model.mut_escape_plot())

#
# Fitting site-level model.
# Starting optimization of 1026 parameters at Tue Apr  4 16:01:14 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity reg_hill_coefficient reg_non_neutralized_frac
           0    0.050597      268.74      263.33           0           0           0              0               0       5.4083                    0                        0
         164      8.0089       129.1      120.75      2.1217           0           0              0        0.059982       6.1635                    0                        0
# Successfully finished at Tue Apr  4 16:01:22 2023.
#
# Fitting model.
# Starting optimization of 6736 parameters at Tue Apr  4 16:01:23 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity reg_hill_coefficient reg_non_neutralized_frac
           0    0.052449      185.27      156.74      17.325  5.2769e-32           0              0          5.0411       6.1635                    0                        0
         200      11.613       163.5      146.27      9.9647      0.3878           0              0          0.3949       6.4813                    0                        0
         208      11.975      163.49      146.27      9.9648     0.38725           0              0         0.39229       6.4801                    0                        0
# Successfully finished at Tue Apr  4 16:01:35 2023.

We have successfully deconvolved the cocktail into the two component antibodies.

Real antibody cocktail¶

Get the data to fit¶

polyclonal

Navigation

Related Topics