Real antibody cocktail

In this notebook we fit real data from a deep mutational scanning experiment that used a cocktail containing two monoclonal antibodies. This mimics a polyclonal serum with a pre-defined composition.

The monoclonal antibodies in question are 1C04 and 5G04, and the deep mutational scanning is against the Influenza A/HongKong/45/2019 (H3N2) hemagglutinin, using a library designed to contain only mutations that are thought to be well tolerated.

First, import the Python modules:

[1]:
import requests
import tempfile
import time

import altair as alt

import polyclonal
import polyclonal.pdb_utils

import pandas as pd

Get the data to fit

Now we read the deep mutational scanning measurements, which quantify the “probability of escape” (fraction not neutralized) for each variant. For a description of key columns, see here.

[2]:
prob_escape = pd.read_csv(
    "libA_220810_1_1C04-5G04_1_prob_escape.csv", keep_default_na=False, na_values="nan"
).query(
    "`no-antibody_count` >= no_antibody_count_threshold"
)  # filter for those with sufficient no-antibody counts
assert prob_escape.notnull().all().all()
prob_escape.head()
[2]:
library antibody_sample no-antibody_sample aa_substitutions_sequential n_aa_substitutions barcode prob_escape prob_escape_uncensored antibody_count no-antibody_count antibody_neut_standard_count no-antibody_neut_standard_count total_no_antibody_count no_antibody_count_threshold aa_substitutions_reference antibody antibody_concentration
0 libA 220810_1_antibody_1C04-5G04_3.65_1 220810_1_no-antibody_control_1 K297I 1 ATAACACAAAAAAGTA 0.0017 0.0017 78972 246344 8599550 44895 10428350 15 K278I 1C04-5G04 3.65
1 libA 220810_1_antibody_1C04-5G04_3.65_1 220810_1_no-antibody_control_1 R111S V366M R402S 3 TATCTACCTAACGAAA 0.0047 0.0047 70662 78014 8599550 44895 10428350 15 R92S V347M R383S 1C04-5G04 3.65
2 libA 220810_1_antibody_1C04-5G04_3.65_1 220810_1_no-antibody_control_1 A125M P246H I393A G398Q F411Y 5 CTTTCAATTATGAGAC 0.0370 0.0370 57908 8163 8599550 44895 10428350 15 A106M P227H I374A G379Q F392Y 1C04-5G04 3.65
3 libA 220810_1_antibody_1C04-5G04_3.65_1 220810_1_no-antibody_control_1 Y113M S143N S164N I307M I393Y E468Q 6 TGTATTAGCATTTTGA 0.0074 0.0074 37740 26593 8599550 44895 10428350 15 Y94M S124N S145N I288M I374Y E449Q 1C04-5G04 3.65
4 libA 220810_1_antibody_1C04-5G04_3.65_1 220810_1_no-antibody_control_1 G237H P246H V366M 3 CCAAGGAGCACGAAAA 0.0218 0.0218 26699 6407 8599550 44895 10428350 15 G218H P227H V347M 1C04-5G04 3.65

Display the number of variants per concentration.

[3]:
display(
    prob_escape.groupby("antibody_concentration").aggregate(
        n_variants=pd.NamedAgg("barcode", "nunique")
    )
)
n_variants
antibody_concentration
0.114 31001
0.228 31001
0.456 31001
0.913 31001
1.825 31001
3.650 31001

Plot mean probability of escape across all variants with the indicated number of mutations. Note that this plot weights each variant the same in the means regardless of how many barcode counts it has. We plot means for both censored (set to between 0 and 1) and uncensored probabilities of escape. Also, note it uses a symlog scale for the y-axis. Mouseover points for values:

[4]:
# NBVAL_IGNORE_OUTPUT
max_aa_subs = 4  # group if >= this many substitutions

mean_prob_escape = (
    prob_escape.assign(
        n_subs=lambda x: (
            x["aa_substitutions_sequential"]
            .str.split()
            .map(len)
            .clip(upper=max_aa_subs)
            .map(lambda n: str(n) if n < max_aa_subs else f">{max_aa_subs - 1}")
        )
    )
    .groupby(["antibody_concentration", "n_subs"], as_index=False)
    .aggregate({"prob_escape": "mean", "prob_escape_uncensored": "mean"})
    .rename(
        columns={
            "prob_escape": "censored to [0, 1]",
            "prob_escape_uncensored": "not censored",
        }
    )
    .melt(
        id_vars=["antibody_concentration", "n_subs"],
        var_name="censored",
        value_name="probability escape",
    )
)

mean_prob_escape_chart = (
    alt.Chart(mean_prob_escape)
    .encode(
        x=alt.X("antibody_concentration"),
        y=alt.Y(
            "probability escape",
            scale=alt.Scale(type="symlog", constant=0.05),
        ),
        column=alt.Column("censored", title=None),
        color=alt.Color("n_subs", title="n substitutions"),
        tooltip=[
            alt.Tooltip(c, format=".3g") if mean_prob_escape[c].dtype == float else c
            for c in mean_prob_escape.columns
        ],
    )
    .mark_line(point=True, size=0.5)
    .properties(width=200, height=125)
    .configure_axis(grid=False)
)

mean_prob_escape_chart
[4]:

Now fit the model and see if we get the two antibodies. We use a higher regularization on the epitope uniqueness and escape weights to get a clean profile with two distinct antibodies:

[5]:
# NBVAL_IGNORE_OUTPUT
model = polyclonal.Polyclonal(
    n_epitopes=2,
    data_to_fit=prob_escape.rename(
        columns={
            "antibody_concentration": "concentration",
            "aa_substitutions_reference": "aa_substitutions",
        }
    ).query("concentration > 1"),
    alphabet=polyclonal.AAS_WITHSTOP_WITHGAP,
)

# fit model
opt_res = model.fit(
    logfreq=200,
    reg_uniqueness2_weight=1,
    fix_hill_coefficient=True,
    fix_non_neutralized_frac=True,
)

display(model.curves_plot())

display(model.mut_escape_plot())
#
# Fitting site-level model.
# Starting optimization of 1026 parameters at Tue Apr  4 16:01:14 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity reg_hill_coefficient reg_non_neutralized_frac
           0    0.050597      268.74      263.33           0           0           0              0               0       5.4083                    0                        0
         164      8.0089       129.1      120.75      2.1217           0           0              0        0.059982       6.1635                    0                        0
# Successfully finished at Tue Apr  4 16:01:22 2023.
#
# Fitting model.
# Starting optimization of 6736 parameters at Tue Apr  4 16:01:23 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity reg_hill_coefficient reg_non_neutralized_frac
           0    0.052449      185.27      156.74      17.325  5.2769e-32           0              0          5.0411       6.1635                    0                        0
         200      11.613       163.5      146.27      9.9647      0.3878           0              0          0.3949       6.4813                    0                        0
         208      11.975      163.49      146.27      9.9648     0.38725           0              0         0.39229       6.4801                    0                        0
# Successfully finished at Tue Apr  4 16:01:35 2023.

We have successfully deconvolved the cocktail into the two component antibodies.