escapecalculator

Calculate escape after some mutations to the SARS-CoV-2 RBD.

This Python module, is available at https://github.com/jbloomlab/SARS2-RBD-escape-calc/blob/main/escapecalculator.py

It provides a programmatic method to perform the escape calculations implemented in the interactive calculator implemented at https://jbloomlab.github.io/SARS2-RBD-escape-calc

By default, it downloads the same data used by the interactive calculator and uses the same initial parameter settings as the interactive calculator. You can override these defaults by using different arguments when initializing the EscapeCalculator.

See https://github.com/jbloomlab/SARS2-RBD-escape-calc for all code and data. The calculator is written by Jesse Bloom, and the citation is this paper .

To use the calculator, open a Python session, import this module, and initialize an EscapeCalculator:

>>> calc = EscapeCalculator()

With no mutations, there is no escape (all neutralization retained):

>>> calc.binding_retained([])
1.0

But if you mutate some key antigenic sites, there will be a dramatic reduction in neutralization retained:

>>> calc.binding_retained([440, 505]).round(3)
0.545

If you have a whole set of sequences and have tabulated which sites are mutated, you can apply the calculator in bulk to the data frame. First, create such a data frame of sequences:

>>> import pandas as pd
>>> seqs = pd.DataFrame.from_records(
...     [("seq1", []), ("seq2", [498]), ("seq3", [440]), ("seq4", [440, 505])],
...     columns=["name", "mutated sites"],
... )
>>> seqs
   name mutated sites
0  seq1            []
1  seq2         [498]
2  seq3         [440]
3  seq4    [440, 505]

Now apply the calculator:

>>> seqs["neutralization retained"] = seqs["mutated sites"].map(calc.binding_retained)
>>> seqs.round(3)
   name mutated sites  neutralization retained
0  seq1            []                    1.000
1  seq2         [498]                    0.769
2  seq3         [440]                    0.790
3  seq4    [440, 505]                    0.545

You can also create new calculators that compute escape relative to different viruses, for instance BA.2:

>>> calc_ba2 = EscapeCalculator(virus="BA.2")

Now the escape will be different because different antibodies neutralize that virus:

>>> calc_ba2.binding_retained([440, 505]).round(3)
0.786
  1"""Calculate escape after some mutations to the SARS-CoV-2 RBD.
  2
  3This Python module, is available at
  4https://github.com/jbloomlab/SARS2-RBD-escape-calc/blob/main/escapecalculator.py
  5
  6It provides a programmatic method to perform the escape calculations implemented in
  7the interactive calculator implemented at
  8https://jbloomlab.github.io/SARS2-RBD-escape-calc
  9
 10By default, it downloads the same data used by the interactive calculator and uses the
 11same initial parameter settings as the interactive calculator. You can override these
 12defaults by using different arguments when initializing the :class:`EscapeCalculator`.
 13
 14See https://github.com/jbloomlab/SARS2-RBD-escape-calc for all code and data. The
 15calculator is written by Jesse Bloom, and the citation is
 16`this paper <https://doi.org/10.1093/ve/veac021>`_.
 17
 18To use the calculator, open a Python session, import this module, and initialize an
 19:class:`EscapeCalculator`:
 20
 21>>> calc = EscapeCalculator()
 22
 23With no mutations, there is no escape (all neutralization retained):
 24
 25>>> calc.binding_retained([])
 261.0
 27
 28But if you mutate some key antigenic sites, there will be a dramatic reduction in
 29neutralization retained:
 30
 31>>> calc.binding_retained([440, 505]).round(3)
 320.545
 33
 34If you have a whole set of sequences and have tabulated which sites are mutated,
 35you can apply the calculator in bulk to the data frame.
 36First, create such a data frame of sequences:
 37
 38>>> import pandas as pd
 39>>> seqs = pd.DataFrame.from_records(
 40...     [("seq1", []), ("seq2", [498]), ("seq3", [440]), ("seq4", [440, 505])],
 41...     columns=["name", "mutated sites"],
 42... )
 43>>> seqs
 44   name mutated sites
 450  seq1            []
 461  seq2         [498]
 472  seq3         [440]
 483  seq4    [440, 505]
 49
 50Now apply the calculator:
 51
 52>>> seqs["neutralization retained"] = seqs["mutated sites"].map(calc.binding_retained)
 53>>> seqs.round(3)
 54   name mutated sites  neutralization retained
 550  seq1            []                    1.000
 561  seq2         [498]                    0.769
 572  seq3         [440]                    0.790
 583  seq4    [440, 505]                    0.545
 59
 60You can also create new calculators that compute escape relative to different viruses,
 61for instance BA.2:
 62
 63>>> calc_ba2 = EscapeCalculator(virus="BA.2")
 64
 65Now the escape will be different because different antibodies neutralize that virus:
 66
 67>>> calc_ba2.binding_retained([440, 505]).round(3)
 680.786
 69
 70"""
 71
 72__docformat__ = 'numpy'
 73
 74
 75import requests
 76
 77import numpy
 78
 79import pandas as pd
 80
 81import yaml
 82
 83
 84class EscapeCalculator:
 85    """Calculates residual polyclonal antibody binding after some mutations.
 86
 87    The calculator is implemented interactively at
 88    `https://jbloomlab.github.io/SARS2-RBD-escape-calc <https://jbloomlab.github.io/SARS2-RBD-escape-calc/>`_
 89
 90    By default, this command-line calculator will exactly implement the calculations of the
 91    interactive calculator with default settings. You can pass parameters to override
 92    the default settings.
 93
 94    Parameters
 95    ----------
 96    escape : str
 97        Path or URL of CSV containing the escape data.
 98    antibody_ic50s : str
 99        Path or URL of CSV containing the antibody IC50s.
100    antibody_binding : str
101        Path or URL of CSV containing the antibody binding.
102    antibody_sources : str
103        Path or URL of CSV containing the antibody sources.
104    antibody_reweighting : str
105        Path or URL of CSV containing the antibody reweightings.
106    config : str
107        Path or URL of YAML file containing initial settings.
108    mut_escape_strength : None or float
109        If not `None`, override default `init_mutation_escape_strength` in `config`.
110    weight_by_neg_log_ic50 : None or bool
111        If not `None`, override default `init_weight_by_neg_log_IC50` in `config`.
112    study : None or str
113        If not `None`, override default `init_study` in `config`.
114    binds : None or str
115        If not `None`, override default `init_binds` in `config`.
116    virus : None or str
117        If not `None`, override default `init_virus` in `config`.
118    sources : None or dict
119        If not `None`, override default `init_sources` in `config`.
120    reweight : None or bool
121        If not `None`, override default `init_reweight` in `config`.
122
123    Example
124    -------
125
126    Initialize the calculator:
127
128    >>> calc = EscapeCalculator()
129
130    Calculate escape at sites after some mutations:
131
132    >>> sites_of_interest = [403, 440, 484, 498, 505, 510]
133    >>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3)
134         site  original_escape  retained_escape
135    69    403            0.105            0.030
136    101   440            0.162            0.018
137    143   484            0.043            0.032
138    156   498            0.169            0.076
139    163   505            0.208            0.022
140    167   510            0.001            0.001
141
142    Calculate overall neutralization retained after no mutations or some mutations:
143
144    >>> calc.binding_retained([])
145    1.0
146    >>> calc.binding_retained([440, 505]).round(3)
147    0.545
148
149    Now repeat tests with some non-default options:
150
151    >>> calc2 = EscapeCalculator(
152    ...     mut_escape_strength=1,
153    ...     weight_by_neg_log_ic50=False,
154    ...     study="Cao et al, 2022, Nature",
155    ...     binds="Wuhan-Hu-1",
156    ...     virus="D614G",
157    ...     sources={"include_exclude": "include", "sources": ["WT convalescents"]},
158    ... )
159
160    >>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3)
161         site  original_escape  retained_escape
162    62    403            0.006            0.006
163    84    440            0.008            0.007
164    123   484            0.212            0.029
165    134   498            0.009            0.008
166    141   505            0.002            0.002
167    144   510            0.000            0.000
168
169    >>> calc2.binding_retained([484]).round(3)
170    0.788
171
172    """
173    def __init__(self,
174        escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv",
175        antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv",
176        antibody_binding="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_binding.csv",
177        antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv",
178        antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv",
179        config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml",
180        *,
181        mut_escape_strength=None,
182        weight_by_neg_log_ic50=None,
183        study=None,
184        binds=None,
185        virus=None,
186        sources=None,
187        reweight=None,
188    ):
189        """See main class docstring."""
190        # read input data 
191        self.escape = pd.read_csv(escape)
192        assert set(self.escape.columns) == {"antibody", "site", "escape"}
193        assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"]))
194        assert self.escape.notnull().all().all()
195        antibodies = set(self.escape["antibody"])
196
197        self.antibody_ic50s = pd.read_csv(antibody_ic50s)
198        assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"}
199        assert (
200            len(self.antibody_ic50s)
201            == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"]))
202        )
203        assert self.antibody_ic50s["IC50"].max() == 10
204        assert antibodies == set(self.antibody_ic50s["antibody"])
205
206        self.antibody_binding = pd.read_csv(antibody_binding)
207        assert set(self.antibody_binding.columns) == {"antibody", "binds"}
208        assert (
209            len(self.antibody_binding)
210            == len(self.antibody_binding.groupby(["antibody", "binds"]))
211        )
212        assert antibodies == set(self.antibody_binding["antibody"])
213
214        self.antibody_sources = pd.read_csv(antibody_sources)
215        assert set(self.antibody_sources.columns) == {"antibody", "source", "study"}
216        assert (
217            len(self.antibody_sources)
218            == len(self.antibody_sources.groupby(["antibody", "source", "study"]))
219            == len(antibodies)
220        )
221        assert antibodies == set(self.antibody_sources["antibody"])
222
223        self.antibody_reweighting = pd.read_csv(antibody_reweighting)
224        assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"}
225        assert antibodies.issuperset(self.antibody_reweighting["antibody"])
226
227        self.data = (
228            self.escape
229            .merge(self.antibody_ic50s, on="antibody")
230            .merge(self.antibody_binding, on="antibody")
231            .merge(self.antibody_sources, on="antibody")
232            .merge(self.antibody_reweighting, on="antibody", how="left")
233            .assign(reweight=lambda x: x["reweight"].fillna(1))
234        )
235        assert self.data.notnull().all().all()
236
237        # get initial config
238        if config.startswith("http"):
239            response = requests.get(config, allow_redirects=True)
240            config_text = response.content.decode("utf-8")
241        else:
242            with open(config) as f:
243                config_text = f.read()
244        config = yaml.safe_load(config_text)
245        self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1))
246        studies = config["studies"]
247        studies_rev = {value: key for (key, value) in studies.items()}
248        assert set(self.data["study"]) == set(studies)
249
250        if mut_escape_strength is None:
251            self.mut_escape_strength = config["init_mutation_escape_strength"]
252        else:
253            self.mut_escape_strength = mut_escape_strength
254
255        if weight_by_neg_log_ic50 is None:
256            self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"]
257        else:
258            self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50
259        assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50
260
261        if study is None:
262            self.study = config["init_study"]
263        else:
264            if study == "any" or study in studies:
265                self.study = study
266            elif study in studies_rev:
267                self.study = studies_rev[study]
268            else:
269                raise ValueError(f"invalid {study=}")
270
271        if binds is None:
272            self.binds = config["init_binds"]
273        else:
274            self.binds = binds
275
276        if virus is None:
277            self.virus = config["init_virus"]
278        else:
279            self.virus = virus
280
281        if sources is None:
282            sources = config["init_sources"]
283        include_exclude = sources["include_exclude"]
284        sources_list = sources["sources"]
285        self.sources = set(self.data["source"])
286        assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}"
287        if include_exclude == "exclude":
288            self.sources = self.sources - set(sources_list)
289        elif include_exclude == "include":
290            self.sources = set(sources_list)
291        else:
292            raise ValueError(f"invalid {include_exclude=} in {sources=}")
293
294        if reweight is None:
295            self.reweight = config["init_reweight"]
296        else:
297            self.reweight = reweight
298        assert isinstance(self.reweight, bool), self.reweight
299
300        # filter data
301        if self.study != "any":
302            assert self.study in set(self.data["study"])
303            self.data = self.data.query("study == @self.study").drop(columns="study")
304        else:
305            self.data = self.data.drop(columns="study")
306
307        if self.binds != "any":
308            assert self.binds in set(self.data["binds"])
309            self.data = self.data.query("binds == @self.binds").drop(columns="binds")
310        else:
311            self.data = self.data.drop(columns="binds").drop_duplicates()
312
313        assert self.virus in set(self.data["virus"])
314        self.data = self.data.query("virus == @self.virus").drop(columns="virus")
315
316        assert self.sources.issubset(self.data["source"])
317        self.data = self.data.query("source in @self.sources").drop(columns="source")
318
319        assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"}
320        assert len(self.data) == len(self.data.drop_duplicates())
321        self.data = (
322            self.data
323            .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10))
324            .drop(columns="IC50")
325        )
326
327        max_escape_per_antibody = (
328            self.data
329            .groupby("antibody")
330            .aggregate(max_escape=pd.NamedAgg("escape", "max"))
331        )
332        assert (max_escape_per_antibody["max_escape"] == 1).all()
333
334    def escape_per_site(self, mutated_sites):
335        """Escape at each site after mutating indicated sites.
336
337        Parameters
338        ----------
339        mutated_sites : array-like of integers
340            List of mutated sites.
341
342        Returns
343        -------
344        pandas.DataFrame
345            For each site, gives the original escape and the escape
346            retained after mutations.
347
348        """
349        mutated_sites = set(mutated_sites)
350        if not mutated_sites.issubset(self.sites):
351            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
352        df = (
353            self.data
354            .assign(
355                mutated=lambda x: x['site'].isin(mutated_sites).astype(int),
356                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
357            )
358            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
359            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
360            .assign(
361                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
362                    self.mut_escape_strength
363                ),
364                weight=lambda x: (
365                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
366                    * (x["reweight"] if self.reweight else 1)
367                ),
368            )
369            [["antibody", "antibody_bind_retain", "weight"]]
370            .merge(self.data[["antibody", "site", "escape"]])
371            .assign(
372                escape=lambda x: x["escape"] * x["weight"],
373                retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"],
374            )
375            .groupby("site")
376            .aggregate(
377                original_escape=pd.NamedAgg("escape", "sum"),
378                retained_escape=pd.NamedAgg("retained_escape", "sum"),
379            )
380        ) / self.data["antibody"].nunique()
381        return df.reset_index()
382
383    def binding_retained(self, mutated_sites):
384        """Fraction binding or neutralization retained after mutating indicated sites.
385
386        Parameters
387        ----------
388        mutated_sites : array-like of integers
389            List of mutated sites, must all be in :attr:`BindingCalculator.sites`.
390
391        Returns
392        -------
393        float
394            The fraction binding retained after these mutations.
395
396        """
397        mutated_sites = set(mutated_sites)
398        if not mutated_sites.issubset(self.sites):
399            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
400        retained = (
401            self.data
402            .assign(
403                mutated=lambda x: x["site"].isin(mutated_sites).astype(int),
404                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
405            )
406            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
407            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
408            .assign(
409                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
410                    self.mut_escape_strength
411                ),
412                weight=lambda x: (
413                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
414                    * (x["reweight"] if self.reweight else 1)
415                ),
416                weighted_antibody_bind_retain=lambda x: (
417                    x["antibody_bind_retain"] * x["weight"]
418                ),
419            )
420            [["weight", "weighted_antibody_bind_retain"]]
421            .sum(axis=0)
422        )
423        return retained["weighted_antibody_bind_retain"] / retained["weight"]
424
425
426if __name__ == '__main__':
427    import doctest
428    doctest.testmod()
class EscapeCalculator:
 85class EscapeCalculator:
 86    """Calculates residual polyclonal antibody binding after some mutations.
 87
 88    The calculator is implemented interactively at
 89    `https://jbloomlab.github.io/SARS2-RBD-escape-calc <https://jbloomlab.github.io/SARS2-RBD-escape-calc/>`_
 90
 91    By default, this command-line calculator will exactly implement the calculations of the
 92    interactive calculator with default settings. You can pass parameters to override
 93    the default settings.
 94
 95    Parameters
 96    ----------
 97    escape : str
 98        Path or URL of CSV containing the escape data.
 99    antibody_ic50s : str
100        Path or URL of CSV containing the antibody IC50s.
101    antibody_binding : str
102        Path or URL of CSV containing the antibody binding.
103    antibody_sources : str
104        Path or URL of CSV containing the antibody sources.
105    antibody_reweighting : str
106        Path or URL of CSV containing the antibody reweightings.
107    config : str
108        Path or URL of YAML file containing initial settings.
109    mut_escape_strength : None or float
110        If not `None`, override default `init_mutation_escape_strength` in `config`.
111    weight_by_neg_log_ic50 : None or bool
112        If not `None`, override default `init_weight_by_neg_log_IC50` in `config`.
113    study : None or str
114        If not `None`, override default `init_study` in `config`.
115    binds : None or str
116        If not `None`, override default `init_binds` in `config`.
117    virus : None or str
118        If not `None`, override default `init_virus` in `config`.
119    sources : None or dict
120        If not `None`, override default `init_sources` in `config`.
121    reweight : None or bool
122        If not `None`, override default `init_reweight` in `config`.
123
124    Example
125    -------
126
127    Initialize the calculator:
128
129    >>> calc = EscapeCalculator()
130
131    Calculate escape at sites after some mutations:
132
133    >>> sites_of_interest = [403, 440, 484, 498, 505, 510]
134    >>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3)
135         site  original_escape  retained_escape
136    69    403            0.105            0.030
137    101   440            0.162            0.018
138    143   484            0.043            0.032
139    156   498            0.169            0.076
140    163   505            0.208            0.022
141    167   510            0.001            0.001
142
143    Calculate overall neutralization retained after no mutations or some mutations:
144
145    >>> calc.binding_retained([])
146    1.0
147    >>> calc.binding_retained([440, 505]).round(3)
148    0.545
149
150    Now repeat tests with some non-default options:
151
152    >>> calc2 = EscapeCalculator(
153    ...     mut_escape_strength=1,
154    ...     weight_by_neg_log_ic50=False,
155    ...     study="Cao et al, 2022, Nature",
156    ...     binds="Wuhan-Hu-1",
157    ...     virus="D614G",
158    ...     sources={"include_exclude": "include", "sources": ["WT convalescents"]},
159    ... )
160
161    >>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3)
162         site  original_escape  retained_escape
163    62    403            0.006            0.006
164    84    440            0.008            0.007
165    123   484            0.212            0.029
166    134   498            0.009            0.008
167    141   505            0.002            0.002
168    144   510            0.000            0.000
169
170    >>> calc2.binding_retained([484]).round(3)
171    0.788
172
173    """
174    def __init__(self,
175        escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv",
176        antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv",
177        antibody_binding="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_binding.csv",
178        antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv",
179        antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv",
180        config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml",
181        *,
182        mut_escape_strength=None,
183        weight_by_neg_log_ic50=None,
184        study=None,
185        binds=None,
186        virus=None,
187        sources=None,
188        reweight=None,
189    ):
190        """See main class docstring."""
191        # read input data 
192        self.escape = pd.read_csv(escape)
193        assert set(self.escape.columns) == {"antibody", "site", "escape"}
194        assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"]))
195        assert self.escape.notnull().all().all()
196        antibodies = set(self.escape["antibody"])
197
198        self.antibody_ic50s = pd.read_csv(antibody_ic50s)
199        assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"}
200        assert (
201            len(self.antibody_ic50s)
202            == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"]))
203        )
204        assert self.antibody_ic50s["IC50"].max() == 10
205        assert antibodies == set(self.antibody_ic50s["antibody"])
206
207        self.antibody_binding = pd.read_csv(antibody_binding)
208        assert set(self.antibody_binding.columns) == {"antibody", "binds"}
209        assert (
210            len(self.antibody_binding)
211            == len(self.antibody_binding.groupby(["antibody", "binds"]))
212        )
213        assert antibodies == set(self.antibody_binding["antibody"])
214
215        self.antibody_sources = pd.read_csv(antibody_sources)
216        assert set(self.antibody_sources.columns) == {"antibody", "source", "study"}
217        assert (
218            len(self.antibody_sources)
219            == len(self.antibody_sources.groupby(["antibody", "source", "study"]))
220            == len(antibodies)
221        )
222        assert antibodies == set(self.antibody_sources["antibody"])
223
224        self.antibody_reweighting = pd.read_csv(antibody_reweighting)
225        assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"}
226        assert antibodies.issuperset(self.antibody_reweighting["antibody"])
227
228        self.data = (
229            self.escape
230            .merge(self.antibody_ic50s, on="antibody")
231            .merge(self.antibody_binding, on="antibody")
232            .merge(self.antibody_sources, on="antibody")
233            .merge(self.antibody_reweighting, on="antibody", how="left")
234            .assign(reweight=lambda x: x["reweight"].fillna(1))
235        )
236        assert self.data.notnull().all().all()
237
238        # get initial config
239        if config.startswith("http"):
240            response = requests.get(config, allow_redirects=True)
241            config_text = response.content.decode("utf-8")
242        else:
243            with open(config) as f:
244                config_text = f.read()
245        config = yaml.safe_load(config_text)
246        self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1))
247        studies = config["studies"]
248        studies_rev = {value: key for (key, value) in studies.items()}
249        assert set(self.data["study"]) == set(studies)
250
251        if mut_escape_strength is None:
252            self.mut_escape_strength = config["init_mutation_escape_strength"]
253        else:
254            self.mut_escape_strength = mut_escape_strength
255
256        if weight_by_neg_log_ic50 is None:
257            self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"]
258        else:
259            self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50
260        assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50
261
262        if study is None:
263            self.study = config["init_study"]
264        else:
265            if study == "any" or study in studies:
266                self.study = study
267            elif study in studies_rev:
268                self.study = studies_rev[study]
269            else:
270                raise ValueError(f"invalid {study=}")
271
272        if binds is None:
273            self.binds = config["init_binds"]
274        else:
275            self.binds = binds
276
277        if virus is None:
278            self.virus = config["init_virus"]
279        else:
280            self.virus = virus
281
282        if sources is None:
283            sources = config["init_sources"]
284        include_exclude = sources["include_exclude"]
285        sources_list = sources["sources"]
286        self.sources = set(self.data["source"])
287        assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}"
288        if include_exclude == "exclude":
289            self.sources = self.sources - set(sources_list)
290        elif include_exclude == "include":
291            self.sources = set(sources_list)
292        else:
293            raise ValueError(f"invalid {include_exclude=} in {sources=}")
294
295        if reweight is None:
296            self.reweight = config["init_reweight"]
297        else:
298            self.reweight = reweight
299        assert isinstance(self.reweight, bool), self.reweight
300
301        # filter data
302        if self.study != "any":
303            assert self.study in set(self.data["study"])
304            self.data = self.data.query("study == @self.study").drop(columns="study")
305        else:
306            self.data = self.data.drop(columns="study")
307
308        if self.binds != "any":
309            assert self.binds in set(self.data["binds"])
310            self.data = self.data.query("binds == @self.binds").drop(columns="binds")
311        else:
312            self.data = self.data.drop(columns="binds").drop_duplicates()
313
314        assert self.virus in set(self.data["virus"])
315        self.data = self.data.query("virus == @self.virus").drop(columns="virus")
316
317        assert self.sources.issubset(self.data["source"])
318        self.data = self.data.query("source in @self.sources").drop(columns="source")
319
320        assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"}
321        assert len(self.data) == len(self.data.drop_duplicates())
322        self.data = (
323            self.data
324            .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10))
325            .drop(columns="IC50")
326        )
327
328        max_escape_per_antibody = (
329            self.data
330            .groupby("antibody")
331            .aggregate(max_escape=pd.NamedAgg("escape", "max"))
332        )
333        assert (max_escape_per_antibody["max_escape"] == 1).all()
334
335    def escape_per_site(self, mutated_sites):
336        """Escape at each site after mutating indicated sites.
337
338        Parameters
339        ----------
340        mutated_sites : array-like of integers
341            List of mutated sites.
342
343        Returns
344        -------
345        pandas.DataFrame
346            For each site, gives the original escape and the escape
347            retained after mutations.
348
349        """
350        mutated_sites = set(mutated_sites)
351        if not mutated_sites.issubset(self.sites):
352            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
353        df = (
354            self.data
355            .assign(
356                mutated=lambda x: x['site'].isin(mutated_sites).astype(int),
357                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
358            )
359            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
360            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
361            .assign(
362                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
363                    self.mut_escape_strength
364                ),
365                weight=lambda x: (
366                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
367                    * (x["reweight"] if self.reweight else 1)
368                ),
369            )
370            [["antibody", "antibody_bind_retain", "weight"]]
371            .merge(self.data[["antibody", "site", "escape"]])
372            .assign(
373                escape=lambda x: x["escape"] * x["weight"],
374                retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"],
375            )
376            .groupby("site")
377            .aggregate(
378                original_escape=pd.NamedAgg("escape", "sum"),
379                retained_escape=pd.NamedAgg("retained_escape", "sum"),
380            )
381        ) / self.data["antibody"].nunique()
382        return df.reset_index()
383
384    def binding_retained(self, mutated_sites):
385        """Fraction binding or neutralization retained after mutating indicated sites.
386
387        Parameters
388        ----------
389        mutated_sites : array-like of integers
390            List of mutated sites, must all be in :attr:`BindingCalculator.sites`.
391
392        Returns
393        -------
394        float
395            The fraction binding retained after these mutations.
396
397        """
398        mutated_sites = set(mutated_sites)
399        if not mutated_sites.issubset(self.sites):
400            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
401        retained = (
402            self.data
403            .assign(
404                mutated=lambda x: x["site"].isin(mutated_sites).astype(int),
405                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
406            )
407            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
408            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
409            .assign(
410                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
411                    self.mut_escape_strength
412                ),
413                weight=lambda x: (
414                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
415                    * (x["reweight"] if self.reweight else 1)
416                ),
417                weighted_antibody_bind_retain=lambda x: (
418                    x["antibody_bind_retain"] * x["weight"]
419                ),
420            )
421            [["weight", "weighted_antibody_bind_retain"]]
422            .sum(axis=0)
423        )
424        return retained["weighted_antibody_bind_retain"] / retained["weight"]

Calculates residual polyclonal antibody binding after some mutations.

The calculator is implemented interactively at https://jbloomlab.github.io/SARS2-RBD-escape-calc

By default, this command-line calculator will exactly implement the calculations of the interactive calculator with default settings. You can pass parameters to override the default settings.

Parameters
  • escape (str): Path or URL of CSV containing the escape data.
  • antibody_ic50s (str): Path or URL of CSV containing the antibody IC50s.
  • antibody_binding (str): Path or URL of CSV containing the antibody binding.
  • antibody_sources (str): Path or URL of CSV containing the antibody sources.
  • antibody_reweighting (str): Path or URL of CSV containing the antibody reweightings.
  • config (str): Path or URL of YAML file containing initial settings.
  • mut_escape_strength (None or float): If not None, override default init_mutation_escape_strength in config.
  • weight_by_neg_log_ic50 (None or bool): If not None, override default init_weight_by_neg_log_IC50 in config.
  • study (None or str): If not None, override default init_study in config.
  • binds (None or str): If not None, override default init_binds in config.
  • virus (None or str): If not None, override default init_virus in config.
  • sources (None or dict): If not None, override default init_sources in config.
  • reweight (None or bool): If not None, override default init_reweight in config.
Example

Initialize the calculator:

>>> calc = EscapeCalculator()

Calculate escape at sites after some mutations:

>>> sites_of_interest = [403, 440, 484, 498, 505, 510]
>>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3)
     site  original_escape  retained_escape
69    403            0.105            0.030
101   440            0.162            0.018
143   484            0.043            0.032
156   498            0.169            0.076
163   505            0.208            0.022
167   510            0.001            0.001

Calculate overall neutralization retained after no mutations or some mutations:

>>> calc.binding_retained([])
1.0
>>> calc.binding_retained([440, 505]).round(3)
0.545

Now repeat tests with some non-default options:

>>> calc2 = EscapeCalculator(
...     mut_escape_strength=1,
...     weight_by_neg_log_ic50=False,
...     study="Cao et al, 2022, Nature",
...     binds="Wuhan-Hu-1",
...     virus="D614G",
...     sources={"include_exclude": "include", "sources": ["WT convalescents"]},
... )
>>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3)
     site  original_escape  retained_escape
62    403            0.006            0.006
84    440            0.008            0.007
123   484            0.212            0.029
134   498            0.009            0.008
141   505            0.002            0.002
144   510            0.000            0.000
>>> calc2.binding_retained([484]).round(3)
0.788
EscapeCalculator( escape='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv', antibody_ic50s='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv', antibody_binding='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_binding.csv', antibody_sources='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv', antibody_reweighting='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv', config='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml', *, mut_escape_strength=None, weight_by_neg_log_ic50=None, study=None, binds=None, virus=None, sources=None, reweight=None)
174    def __init__(self,
175        escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv",
176        antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv",
177        antibody_binding="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_binding.csv",
178        antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv",
179        antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv",
180        config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml",
181        *,
182        mut_escape_strength=None,
183        weight_by_neg_log_ic50=None,
184        study=None,
185        binds=None,
186        virus=None,
187        sources=None,
188        reweight=None,
189    ):
190        """See main class docstring."""
191        # read input data 
192        self.escape = pd.read_csv(escape)
193        assert set(self.escape.columns) == {"antibody", "site", "escape"}
194        assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"]))
195        assert self.escape.notnull().all().all()
196        antibodies = set(self.escape["antibody"])
197
198        self.antibody_ic50s = pd.read_csv(antibody_ic50s)
199        assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"}
200        assert (
201            len(self.antibody_ic50s)
202            == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"]))
203        )
204        assert self.antibody_ic50s["IC50"].max() == 10
205        assert antibodies == set(self.antibody_ic50s["antibody"])
206
207        self.antibody_binding = pd.read_csv(antibody_binding)
208        assert set(self.antibody_binding.columns) == {"antibody", "binds"}
209        assert (
210            len(self.antibody_binding)
211            == len(self.antibody_binding.groupby(["antibody", "binds"]))
212        )
213        assert antibodies == set(self.antibody_binding["antibody"])
214
215        self.antibody_sources = pd.read_csv(antibody_sources)
216        assert set(self.antibody_sources.columns) == {"antibody", "source", "study"}
217        assert (
218            len(self.antibody_sources)
219            == len(self.antibody_sources.groupby(["antibody", "source", "study"]))
220            == len(antibodies)
221        )
222        assert antibodies == set(self.antibody_sources["antibody"])
223
224        self.antibody_reweighting = pd.read_csv(antibody_reweighting)
225        assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"}
226        assert antibodies.issuperset(self.antibody_reweighting["antibody"])
227
228        self.data = (
229            self.escape
230            .merge(self.antibody_ic50s, on="antibody")
231            .merge(self.antibody_binding, on="antibody")
232            .merge(self.antibody_sources, on="antibody")
233            .merge(self.antibody_reweighting, on="antibody", how="left")
234            .assign(reweight=lambda x: x["reweight"].fillna(1))
235        )
236        assert self.data.notnull().all().all()
237
238        # get initial config
239        if config.startswith("http"):
240            response = requests.get(config, allow_redirects=True)
241            config_text = response.content.decode("utf-8")
242        else:
243            with open(config) as f:
244                config_text = f.read()
245        config = yaml.safe_load(config_text)
246        self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1))
247        studies = config["studies"]
248        studies_rev = {value: key for (key, value) in studies.items()}
249        assert set(self.data["study"]) == set(studies)
250
251        if mut_escape_strength is None:
252            self.mut_escape_strength = config["init_mutation_escape_strength"]
253        else:
254            self.mut_escape_strength = mut_escape_strength
255
256        if weight_by_neg_log_ic50 is None:
257            self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"]
258        else:
259            self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50
260        assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50
261
262        if study is None:
263            self.study = config["init_study"]
264        else:
265            if study == "any" or study in studies:
266                self.study = study
267            elif study in studies_rev:
268                self.study = studies_rev[study]
269            else:
270                raise ValueError(f"invalid {study=}")
271
272        if binds is None:
273            self.binds = config["init_binds"]
274        else:
275            self.binds = binds
276
277        if virus is None:
278            self.virus = config["init_virus"]
279        else:
280            self.virus = virus
281
282        if sources is None:
283            sources = config["init_sources"]
284        include_exclude = sources["include_exclude"]
285        sources_list = sources["sources"]
286        self.sources = set(self.data["source"])
287        assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}"
288        if include_exclude == "exclude":
289            self.sources = self.sources - set(sources_list)
290        elif include_exclude == "include":
291            self.sources = set(sources_list)
292        else:
293            raise ValueError(f"invalid {include_exclude=} in {sources=}")
294
295        if reweight is None:
296            self.reweight = config["init_reweight"]
297        else:
298            self.reweight = reweight
299        assert isinstance(self.reweight, bool), self.reweight
300
301        # filter data
302        if self.study != "any":
303            assert self.study in set(self.data["study"])
304            self.data = self.data.query("study == @self.study").drop(columns="study")
305        else:
306            self.data = self.data.drop(columns="study")
307
308        if self.binds != "any":
309            assert self.binds in set(self.data["binds"])
310            self.data = self.data.query("binds == @self.binds").drop(columns="binds")
311        else:
312            self.data = self.data.drop(columns="binds").drop_duplicates()
313
314        assert self.virus in set(self.data["virus"])
315        self.data = self.data.query("virus == @self.virus").drop(columns="virus")
316
317        assert self.sources.issubset(self.data["source"])
318        self.data = self.data.query("source in @self.sources").drop(columns="source")
319
320        assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"}
321        assert len(self.data) == len(self.data.drop_duplicates())
322        self.data = (
323            self.data
324            .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10))
325            .drop(columns="IC50")
326        )
327
328        max_escape_per_antibody = (
329            self.data
330            .groupby("antibody")
331            .aggregate(max_escape=pd.NamedAgg("escape", "max"))
332        )
333        assert (max_escape_per_antibody["max_escape"] == 1).all()

See main class docstring.

def escape_per_site(self, mutated_sites):
335    def escape_per_site(self, mutated_sites):
336        """Escape at each site after mutating indicated sites.
337
338        Parameters
339        ----------
340        mutated_sites : array-like of integers
341            List of mutated sites.
342
343        Returns
344        -------
345        pandas.DataFrame
346            For each site, gives the original escape and the escape
347            retained after mutations.
348
349        """
350        mutated_sites = set(mutated_sites)
351        if not mutated_sites.issubset(self.sites):
352            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
353        df = (
354            self.data
355            .assign(
356                mutated=lambda x: x['site'].isin(mutated_sites).astype(int),
357                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
358            )
359            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
360            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
361            .assign(
362                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
363                    self.mut_escape_strength
364                ),
365                weight=lambda x: (
366                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
367                    * (x["reweight"] if self.reweight else 1)
368                ),
369            )
370            [["antibody", "antibody_bind_retain", "weight"]]
371            .merge(self.data[["antibody", "site", "escape"]])
372            .assign(
373                escape=lambda x: x["escape"] * x["weight"],
374                retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"],
375            )
376            .groupby("site")
377            .aggregate(
378                original_escape=pd.NamedAgg("escape", "sum"),
379                retained_escape=pd.NamedAgg("retained_escape", "sum"),
380            )
381        ) / self.data["antibody"].nunique()
382        return df.reset_index()

Escape at each site after mutating indicated sites.

Parameters
  • mutated_sites (array-like of integers): List of mutated sites.
Returns
  • pandas.DataFrame: For each site, gives the original escape and the escape retained after mutations.
def binding_retained(self, mutated_sites):
384    def binding_retained(self, mutated_sites):
385        """Fraction binding or neutralization retained after mutating indicated sites.
386
387        Parameters
388        ----------
389        mutated_sites : array-like of integers
390            List of mutated sites, must all be in :attr:`BindingCalculator.sites`.
391
392        Returns
393        -------
394        float
395            The fraction binding retained after these mutations.
396
397        """
398        mutated_sites = set(mutated_sites)
399        if not mutated_sites.issubset(self.sites):
400            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
401        retained = (
402            self.data
403            .assign(
404                mutated=lambda x: x["site"].isin(mutated_sites).astype(int),
405                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
406            )
407            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
408            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
409            .assign(
410                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
411                    self.mut_escape_strength
412                ),
413                weight=lambda x: (
414                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
415                    * (x["reweight"] if self.reweight else 1)
416                ),
417                weighted_antibody_bind_retain=lambda x: (
418                    x["antibody_bind_retain"] * x["weight"]
419                ),
420            )
421            [["weight", "weighted_antibody_bind_retain"]]
422            .sum(axis=0)
423        )
424        return retained["weighted_antibody_bind_retain"] / retained["weight"]

Fraction binding or neutralization retained after mutating indicated sites.

Parameters
  • mutated_sites (array-like of integers): List of mutated sites, must all be in BindingCalculator.sites.
Returns
  • float: The fraction binding retained after these mutations.