Real antibody cocktail¶
In this notebook we fit real data from a deep mutational scanning experiment that used a cocktail containing two monoclonal antibodies. This mimics a polyclonal serum with a pre-defined composition.
The monoclonal antibodies in question are 1C04 and 5G04, and the deep mutational scanning is against the Influenza A/HongKong/45/2019 (H3N2) hemagglutinin, using a library designed to contain only mutations that are thought to be well tolerated.
First, import the Python modules:
[1]:
import requests
import tempfile
import time
import altair as alt
import polyclonal
import polyclonal.pdb_utils
import pandas as pd
Get the data to fit¶
Now we read the deep mutational scanning measurements, which quantify the “probability of escape” (fraction not neutralized) for each variant. For a description of key columns, see here.
[2]:
prob_escape = pd.read_csv(
"libA_220810_1_1C04-5G04_1_prob_escape.csv", keep_default_na=False, na_values="nan"
).query(
"`no-antibody_count` >= no_antibody_count_threshold"
) # filter for those with sufficient no-antibody counts
assert prob_escape.notnull().all().all()
prob_escape.head()
[2]:
library | antibody_sample | no-antibody_sample | aa_substitutions_sequential | n_aa_substitutions | barcode | prob_escape | prob_escape_uncensored | antibody_count | no-antibody_count | antibody_neut_standard_count | no-antibody_neut_standard_count | total_no_antibody_count | no_antibody_count_threshold | aa_substitutions_reference | antibody | antibody_concentration | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | libA | 220810_1_antibody_1C04-5G04_3.65_1 | 220810_1_no-antibody_control_1 | K297I | 1 | ATAACACAAAAAAGTA | 0.0017 | 0.0017 | 78972 | 246344 | 8599550 | 44895 | 10428350 | 15 | K278I | 1C04-5G04 | 3.65 |
1 | libA | 220810_1_antibody_1C04-5G04_3.65_1 | 220810_1_no-antibody_control_1 | R111S V366M R402S | 3 | TATCTACCTAACGAAA | 0.0047 | 0.0047 | 70662 | 78014 | 8599550 | 44895 | 10428350 | 15 | R92S V347M R383S | 1C04-5G04 | 3.65 |
2 | libA | 220810_1_antibody_1C04-5G04_3.65_1 | 220810_1_no-antibody_control_1 | A125M P246H I393A G398Q F411Y | 5 | CTTTCAATTATGAGAC | 0.0370 | 0.0370 | 57908 | 8163 | 8599550 | 44895 | 10428350 | 15 | A106M P227H I374A G379Q F392Y | 1C04-5G04 | 3.65 |
3 | libA | 220810_1_antibody_1C04-5G04_3.65_1 | 220810_1_no-antibody_control_1 | Y113M S143N S164N I307M I393Y E468Q | 6 | TGTATTAGCATTTTGA | 0.0074 | 0.0074 | 37740 | 26593 | 8599550 | 44895 | 10428350 | 15 | Y94M S124N S145N I288M I374Y E449Q | 1C04-5G04 | 3.65 |
4 | libA | 220810_1_antibody_1C04-5G04_3.65_1 | 220810_1_no-antibody_control_1 | G237H P246H V366M | 3 | CCAAGGAGCACGAAAA | 0.0218 | 0.0218 | 26699 | 6407 | 8599550 | 44895 | 10428350 | 15 | G218H P227H V347M | 1C04-5G04 | 3.65 |
Display the number of variants per concentration.
[3]:
display(
prob_escape.groupby("antibody_concentration").aggregate(
n_variants=pd.NamedAgg("barcode", "nunique")
)
)
n_variants | |
---|---|
antibody_concentration | |
0.114 | 31001 |
0.228 | 31001 |
0.456 | 31001 |
0.913 | 31001 |
1.825 | 31001 |
3.650 | 31001 |
Plot mean probability of escape across all variants with the indicated number of mutations. Note that this plot weights each variant the same in the means regardless of how many barcode counts it has. We plot means for both censored (set to between 0 and 1) and uncensored probabilities of escape. Also, note it uses a symlog scale for the y-axis. Mouseover points for values:
[4]:
# NBVAL_IGNORE_OUTPUT
max_aa_subs = 4 # group if >= this many substitutions
mean_prob_escape = (
prob_escape.assign(
n_subs=lambda x: (
x["aa_substitutions_sequential"]
.str.split()
.map(len)
.clip(upper=max_aa_subs)
.map(lambda n: str(n) if n < max_aa_subs else f">{max_aa_subs - 1}")
)
)
.groupby(["antibody_concentration", "n_subs"], as_index=False)
.aggregate({"prob_escape": "mean", "prob_escape_uncensored": "mean"})
.rename(
columns={
"prob_escape": "censored to [0, 1]",
"prob_escape_uncensored": "not censored",
}
)
.melt(
id_vars=["antibody_concentration", "n_subs"],
var_name="censored",
value_name="probability escape",
)
)
mean_prob_escape_chart = (
alt.Chart(mean_prob_escape)
.encode(
x=alt.X("antibody_concentration"),
y=alt.Y(
"probability escape",
scale=alt.Scale(type="symlog", constant=0.05),
),
column=alt.Column("censored", title=None),
color=alt.Color("n_subs", title="n substitutions"),
tooltip=[
alt.Tooltip(c, format=".3g") if mean_prob_escape[c].dtype == float else c
for c in mean_prob_escape.columns
],
)
.mark_line(point=True, size=0.5)
.properties(width=200, height=125)
.configure_axis(grid=False)
)
mean_prob_escape_chart
[4]:
Now fit the model and see if we get the two antibodies. We use a higher regularization on the epitope uniqueness and escape weights to get a clean profile with two distinct antibodies:
[5]:
# NBVAL_IGNORE_OUTPUT
model = polyclonal.Polyclonal(
n_epitopes=2,
data_to_fit=prob_escape.rename(
columns={
"antibody_concentration": "concentration",
"aa_substitutions_reference": "aa_substitutions",
}
).query("concentration > 1"),
alphabet=polyclonal.AAS_WITHSTOP_WITHGAP,
)
# fit model
opt_res = model.fit(
logfreq=200,
reg_uniqueness2_weight=1,
fix_hill_coefficient=True,
fix_non_neutralized_frac=True,
)
display(model.curves_plot())
display(model.mut_escape_plot())
#
# Fitting site-level model.
# Starting optimization of 1026 parameters at Tue Apr 4 16:01:14 2023.
step time_sec loss fit_loss reg_escape reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity reg_hill_coefficient reg_non_neutralized_frac
0 0.050597 268.74 263.33 0 0 0 0 0 5.4083 0 0
164 8.0089 129.1 120.75 2.1217 0 0 0 0.059982 6.1635 0 0
# Successfully finished at Tue Apr 4 16:01:22 2023.
#
# Fitting model.
# Starting optimization of 6736 parameters at Tue Apr 4 16:01:23 2023.
step time_sec loss fit_loss reg_escape reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity reg_hill_coefficient reg_non_neutralized_frac
0 0.052449 185.27 156.74 17.325 5.2769e-32 0 0 5.0411 6.1635 0 0
200 11.613 163.5 146.27 9.9647 0.3878 0 0 0.3949 6.4813 0 0
208 11.975 163.49 146.27 9.9648 0.38725 0 0 0.39229 6.4801 0 0
# Successfully finished at Tue Apr 4 16:01:35 2023.
We have successfully deconvolved the cocktail into the two component antibodies.