escapecalculator

Calculate escape after some mutations to the SARS-CoV-2 RBD.

This Python module, is available at https://github.com/jbloomlab/SARS2-RBD-escape-calc/blob/main/escapecalculator.py

It provides a programmatic method to perform the escape calculations implemented in the interactive calculator implemented at https://jbloomlab.github.io/SARS2-RBD-escape-calc

By default, it downloads the same data used by the interactive calculator and uses the same initial parameter settings as the interactive calculator. You can override these defaults by using different arguments when initializing the EscapeCalculator.

See https://github.com/jbloomlab/SARS2-RBD-escape-calc for all code and data. The calculator is written by Jesse Bloom, and the citation is this paper .

To use the calculator, open a Python session, import this module, and initialize an EscapeCalculator:

>>> calc = EscapeCalculator()

With no mutations, there is no escape (all neutralization retained):

>>> float(calc.binding_retained([]))
1.0

But if you mutate some key antigenic sites, there will be a dramatic reduction in neutralization retained:

>>> float(calc.binding_retained([440, 505]).round(3))
0.412

If you have a whole set of sequences and have tabulated which sites are mutated, you can apply the calculator in bulk to the data frame. First, create such a data frame of sequences:

>>> import pandas as pd
>>> seqs = pd.DataFrame.from_records(
...     [("seq1", []), ("seq2", [498]), ("seq3", [440]), ("seq4", [440, 505])],
...     columns=["name", "mutated sites"],
... )
>>> seqs
   name mutated sites
0  seq1            []
1  seq2         [498]
2  seq3         [440]
3  seq4    [440, 505]

Now apply the calculator:

>>> seqs["neutralization retained"] = seqs["mutated sites"].map(calc.binding_retained)
>>> seqs.round(3)
   name mutated sites  neutralization retained
0  seq1            []                    1.000
1  seq2         [498]                    0.818
2  seq3         [440]                    0.846
3  seq4    [440, 505]                    0.412

You can also create new calculators that compute escape relative to different viruses, for instance BA.2:

>>> calc_ba2 = EscapeCalculator(virus="BA.2")

Now the escape will be different because different antibodies neutralize that virus:

>>> float(calc_ba2.binding_retained([440, 505]).round(3))
0.59
  1"""Calculate escape after some mutations to the SARS-CoV-2 RBD.
  2
  3This Python module, is available at
  4https://github.com/jbloomlab/SARS2-RBD-escape-calc/blob/main/escapecalculator.py
  5
  6It provides a programmatic method to perform the escape calculations implemented in
  7the interactive calculator implemented at
  8https://jbloomlab.github.io/SARS2-RBD-escape-calc
  9
 10By default, it downloads the same data used by the interactive calculator and uses the
 11same initial parameter settings as the interactive calculator. You can override these
 12defaults by using different arguments when initializing the :class:`EscapeCalculator`.
 13
 14See https://github.com/jbloomlab/SARS2-RBD-escape-calc for all code and data. The
 15calculator is written by Jesse Bloom, and the citation is
 16`this paper <https://doi.org/10.1093/ve/veac021>`_.
 17
 18To use the calculator, open a Python session, import this module, and initialize an
 19:class:`EscapeCalculator`:
 20
 21>>> calc = EscapeCalculator()
 22
 23With no mutations, there is no escape (all neutralization retained):
 24
 25>>> float(calc.binding_retained([]))
 261.0
 27
 28But if you mutate some key antigenic sites, there will be a dramatic reduction in
 29neutralization retained:
 30
 31>>> float(calc.binding_retained([440, 505]).round(3))
 320.412
 33
 34If you have a whole set of sequences and have tabulated which sites are mutated,
 35you can apply the calculator in bulk to the data frame.
 36First, create such a data frame of sequences:
 37
 38>>> import pandas as pd
 39>>> seqs = pd.DataFrame.from_records(
 40...     [("seq1", []), ("seq2", [498]), ("seq3", [440]), ("seq4", [440, 505])],
 41...     columns=["name", "mutated sites"],
 42... )
 43>>> seqs
 44   name mutated sites
 450  seq1            []
 461  seq2         [498]
 472  seq3         [440]
 483  seq4    [440, 505]
 49
 50Now apply the calculator:
 51
 52>>> seqs["neutralization retained"] = seqs["mutated sites"].map(calc.binding_retained)
 53>>> seqs.round(3)
 54   name mutated sites  neutralization retained
 550  seq1            []                    1.000
 561  seq2         [498]                    0.818
 572  seq3         [440]                    0.846
 583  seq4    [440, 505]                    0.412
 59
 60You can also create new calculators that compute escape relative to different viruses,
 61for instance BA.2:
 62
 63>>> calc_ba2 = EscapeCalculator(virus="BA.2")
 64
 65Now the escape will be different because different antibodies neutralize that virus:
 66
 67>>> float(calc_ba2.binding_retained([440, 505]).round(3))
 680.59
 69
 70"""
 71
 72__docformat__ = 'numpy'
 73
 74
 75import requests
 76
 77import numpy
 78
 79import pandas as pd
 80
 81import yaml
 82
 83
 84class EscapeCalculator:
 85    """Calculates residual polyclonal antibody binding after some mutations.
 86
 87    The calculator is implemented interactively at
 88    `https://jbloomlab.github.io/SARS2-RBD-escape-calc <https://jbloomlab.github.io/SARS2-RBD-escape-calc/>`_
 89
 90    By default, this command-line calculator will exactly implement the calculations of the
 91    interactive calculator with default settings. You can pass parameters to override
 92    the default settings.
 93
 94    Parameters
 95    ----------
 96    escape : str
 97        Path or URL of CSV containing the escape data.
 98    antibody_ic50s : str
 99        Path or URL of CSV containing the antibody IC50s.
100    antibody_sources : str
101        Path or URL of CSV containing the antibody sources.
102    antibody_reweighting : str
103        Path or URL of CSV containing the antibody reweightings.
104    config : str
105        Path or URL of YAML file containing initial settings.
106    mut_escape_strength : None or float
107        If not `None`, override default `init_mutation_escape_strength` in `config`.
108    weight_by_neg_log_ic50 : None or bool
109        If not `None`, override default `init_weight_by_neg_log_IC50` in `config`.
110    study : None or str
111        If not `None`, override default `init_study` in `config`.
112    virus : None or str
113        If not `None`, override default `init_virus` in `config`.
114    sources : None or dict
115        If not `None`, override default `init_sources` in `config`.
116    reweight : None or bool
117        If not `None`, override default `init_reweight` in `config`.
118
119    Example
120    -------
121
122    Initialize the calculator:
123
124    >>> calc = EscapeCalculator()
125
126    Calculate escape at sites after some mutations:
127
128    >>> sites_of_interest = [403, 440, 484, 498, 505, 510]
129    >>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3)
130         site  original_escape  retained_escape
131    70    403            0.376            0.070
132    101   440            0.336            0.019
133    145   484            0.035            0.019
134    159   498            0.313            0.103
135    166   505            0.955            0.086
136    170   510            0.009            0.003
137
138    Calculate overall neutralization retained after no mutations or some mutations:
139
140    >>> assert calc.binding_retained([]) == 1.0
141    >>> float(calc.binding_retained([440, 505]).round(3))
142    0.412
143
144    Now repeat tests with some non-default options:
145
146    >>> calc2 = EscapeCalculator(
147    ...     mut_escape_strength=1,
148    ...     weight_by_neg_log_ic50=False,
149    ...     study="Cao et al, 2022, Nature",
150    ...     virus="D614G",
151    ...     sources={"include_exclude": "include", "sources": ["WT convalescents"]},
152    ... )
153
154    >>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3)
155         site  original_escape  retained_escape
156    62    403            0.006            0.006
157    84    440            0.008            0.007
158    123   484            0.215            0.029
159    134   498            0.009            0.008
160    141   505            0.002            0.002
161    144   510            0.000            0.000
162
163    >>> float(calc2.binding_retained([484]).round(3))
164    0.785
165
166    """
167    def __init__(self,
168        escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv",
169        antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv",
170        antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv",
171        antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv",
172        config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml",
173        *,
174        mut_escape_strength=None,
175        weight_by_neg_log_ic50=None,
176        study=None,
177        virus=None,
178        sources=None,
179        reweight=None,
180    ):
181        """See main class docstring."""
182        # read input data 
183        self.escape = pd.read_csv(escape)
184        assert set(self.escape.columns) == {"antibody", "site", "escape"}
185        assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"]))
186        assert self.escape.notnull().all().all()
187        antibodies = set(self.escape["antibody"])
188
189        self.antibody_ic50s = pd.read_csv(antibody_ic50s)
190        assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"}
191        assert (
192            len(self.antibody_ic50s)
193            == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"]))
194        )
195        assert self.antibody_ic50s["IC50"].max() == 10
196        assert antibodies == set(self.antibody_ic50s["antibody"])
197
198        self.antibody_sources = pd.read_csv(antibody_sources)
199        assert set(self.antibody_sources.columns) == {"antibody", "source", "study"}
200        assert (
201            len(self.antibody_sources)
202            == len(self.antibody_sources.groupby(["antibody", "source", "study"]))
203            == len(antibodies)
204        )
205        assert antibodies == set(self.antibody_sources["antibody"])
206
207        self.antibody_reweighting = pd.read_csv(antibody_reweighting)
208        assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"}
209        assert antibodies.issuperset(self.antibody_reweighting["antibody"])
210
211        self.data = (
212            self.escape
213            .merge(self.antibody_ic50s, on="antibody")
214            .merge(self.antibody_sources, on="antibody")
215            .merge(self.antibody_reweighting, on="antibody", how="left")
216            .assign(reweight=lambda x: x["reweight"].fillna(1))
217        )
218        assert self.data.notnull().all().all()
219
220        # get initial config
221        if config.startswith("http"):
222            response = requests.get(config, allow_redirects=True)
223            config_text = response.content.decode("utf-8")
224        else:
225            with open(config) as f:
226                config_text = f.read()
227        config = yaml.safe_load(config_text)
228        self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1))
229        studies = config["studies"]
230        studies_rev = {value: key for (key, value) in studies.items()}
231        assert set(self.data["study"]) == set(studies)
232
233        if mut_escape_strength is None:
234            self.mut_escape_strength = config["init_mutation_escape_strength"]
235        else:
236            self.mut_escape_strength = mut_escape_strength
237
238        if weight_by_neg_log_ic50 is None:
239            self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"]
240        else:
241            self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50
242        assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50
243
244        if study is None:
245            self.study = config["init_study"]
246        else:
247            if study == "any" or study in studies:
248                self.study = study
249            elif study in studies_rev:
250                self.study = studies_rev[study]
251            else:
252                raise ValueError(f"invalid {study=}")
253
254        if virus is None:
255            self.virus = config["init_virus"]
256        else:
257            self.virus = virus
258
259        if sources is None:
260            sources = config["init_sources"]
261        include_exclude = sources["include_exclude"]
262        sources_list = sources["sources"]
263        self.sources = set(self.data["source"])
264        assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}"
265        if include_exclude == "exclude":
266            self.sources = self.sources - set(sources_list)
267        elif include_exclude == "include":
268            self.sources = set(sources_list)
269        else:
270            raise ValueError(f"invalid {include_exclude=} in {sources=}")
271
272        if reweight is None:
273            self.reweight = config["init_reweight"]
274        else:
275            self.reweight = reweight
276        assert isinstance(self.reweight, bool), self.reweight
277
278        # filter data
279        if self.study != "any":
280            assert self.study in set(self.data["study"])
281            self.data = self.data.query("study == @self.study").drop(columns="study")
282        else:
283            self.data = self.data.drop(columns="study")
284
285        assert self.virus in set(self.data["virus"])
286        self.data = self.data.query("virus == @self.virus").drop(columns="virus")
287
288        self.data = self.data.query("source in @self.sources").drop(columns="source")
289
290        assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"}
291        assert len(self.data) == len(self.data.drop_duplicates())
292        self.data = (
293            self.data
294            .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10))
295            .drop(columns="IC50")
296        )
297
298        max_escape_per_antibody = (
299            self.data
300            .groupby("antibody")
301            .aggregate(max_escape=pd.NamedAgg("escape", "max"))
302        )
303        assert (max_escape_per_antibody["max_escape"] == 1).all()
304
305    def escape_per_site(self, mutated_sites):
306        """Escape at each site after mutating indicated sites.
307
308        Parameters
309        ----------
310        mutated_sites : array-like of integers
311            List of mutated sites.
312
313        Returns
314        -------
315        pandas.DataFrame
316            For each site, gives the original escape and the escape
317            retained after mutations.
318
319        """
320        mutated_sites = set(mutated_sites)
321        if not mutated_sites.issubset(self.sites):
322            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
323        df = (
324            self.data
325            .assign(
326                mutated=lambda x: x['site'].isin(mutated_sites).astype(int),
327                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
328            )
329            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
330            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
331            .assign(
332                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
333                    self.mut_escape_strength
334                ),
335                weight=lambda x: (
336                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
337                    * (x["reweight"] if self.reweight else 1)
338                ),
339            )
340            [["antibody", "antibody_bind_retain", "weight"]]
341            .merge(self.data[["antibody", "site", "escape"]])
342            .assign(
343                escape=lambda x: x["escape"] * x["weight"],
344                retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"],
345            )
346            .groupby("site")
347            .aggregate(
348                original_escape=pd.NamedAgg("escape", "sum"),
349                retained_escape=pd.NamedAgg("retained_escape", "sum"),
350            )
351        ) / self.data["antibody"].nunique()
352        return df.reset_index()
353
354    def binding_retained(self, mutated_sites):
355        """Fraction binding or neutralization retained after mutating indicated sites.
356
357        Parameters
358        ----------
359        mutated_sites : array-like of integers
360            List of mutated sites, must all be in :attr:`BindingCalculator.sites`.
361
362        Returns
363        -------
364        float
365            The fraction binding retained after these mutations.
366
367        """
368        mutated_sites = set(mutated_sites)
369        if not mutated_sites.issubset(self.sites):
370            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
371        retained = (
372            self.data
373            .assign(
374                mutated=lambda x: x["site"].isin(mutated_sites).astype(int),
375                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
376            )
377            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
378            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
379            .assign(
380                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
381                    self.mut_escape_strength
382                ),
383                weight=lambda x: (
384                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
385                    * (x["reweight"] if self.reweight else 1)
386                ),
387                weighted_antibody_bind_retain=lambda x: (
388                    x["antibody_bind_retain"] * x["weight"]
389                ),
390            )
391            [["weight", "weighted_antibody_bind_retain"]]
392            .sum(axis=0)
393        )
394        return retained["weighted_antibody_bind_retain"] / retained["weight"]
395
396
397if __name__ == '__main__':
398    import doctest
399    doctest.testmod()
class EscapeCalculator:
 85class EscapeCalculator:
 86    """Calculates residual polyclonal antibody binding after some mutations.
 87
 88    The calculator is implemented interactively at
 89    `https://jbloomlab.github.io/SARS2-RBD-escape-calc <https://jbloomlab.github.io/SARS2-RBD-escape-calc/>`_
 90
 91    By default, this command-line calculator will exactly implement the calculations of the
 92    interactive calculator with default settings. You can pass parameters to override
 93    the default settings.
 94
 95    Parameters
 96    ----------
 97    escape : str
 98        Path or URL of CSV containing the escape data.
 99    antibody_ic50s : str
100        Path or URL of CSV containing the antibody IC50s.
101    antibody_sources : str
102        Path or URL of CSV containing the antibody sources.
103    antibody_reweighting : str
104        Path or URL of CSV containing the antibody reweightings.
105    config : str
106        Path or URL of YAML file containing initial settings.
107    mut_escape_strength : None or float
108        If not `None`, override default `init_mutation_escape_strength` in `config`.
109    weight_by_neg_log_ic50 : None or bool
110        If not `None`, override default `init_weight_by_neg_log_IC50` in `config`.
111    study : None or str
112        If not `None`, override default `init_study` in `config`.
113    virus : None or str
114        If not `None`, override default `init_virus` in `config`.
115    sources : None or dict
116        If not `None`, override default `init_sources` in `config`.
117    reweight : None or bool
118        If not `None`, override default `init_reweight` in `config`.
119
120    Example
121    -------
122
123    Initialize the calculator:
124
125    >>> calc = EscapeCalculator()
126
127    Calculate escape at sites after some mutations:
128
129    >>> sites_of_interest = [403, 440, 484, 498, 505, 510]
130    >>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3)
131         site  original_escape  retained_escape
132    70    403            0.376            0.070
133    101   440            0.336            0.019
134    145   484            0.035            0.019
135    159   498            0.313            0.103
136    166   505            0.955            0.086
137    170   510            0.009            0.003
138
139    Calculate overall neutralization retained after no mutations or some mutations:
140
141    >>> assert calc.binding_retained([]) == 1.0
142    >>> float(calc.binding_retained([440, 505]).round(3))
143    0.412
144
145    Now repeat tests with some non-default options:
146
147    >>> calc2 = EscapeCalculator(
148    ...     mut_escape_strength=1,
149    ...     weight_by_neg_log_ic50=False,
150    ...     study="Cao et al, 2022, Nature",
151    ...     virus="D614G",
152    ...     sources={"include_exclude": "include", "sources": ["WT convalescents"]},
153    ... )
154
155    >>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3)
156         site  original_escape  retained_escape
157    62    403            0.006            0.006
158    84    440            0.008            0.007
159    123   484            0.215            0.029
160    134   498            0.009            0.008
161    141   505            0.002            0.002
162    144   510            0.000            0.000
163
164    >>> float(calc2.binding_retained([484]).round(3))
165    0.785
166
167    """
168    def __init__(self,
169        escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv",
170        antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv",
171        antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv",
172        antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv",
173        config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml",
174        *,
175        mut_escape_strength=None,
176        weight_by_neg_log_ic50=None,
177        study=None,
178        virus=None,
179        sources=None,
180        reweight=None,
181    ):
182        """See main class docstring."""
183        # read input data 
184        self.escape = pd.read_csv(escape)
185        assert set(self.escape.columns) == {"antibody", "site", "escape"}
186        assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"]))
187        assert self.escape.notnull().all().all()
188        antibodies = set(self.escape["antibody"])
189
190        self.antibody_ic50s = pd.read_csv(antibody_ic50s)
191        assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"}
192        assert (
193            len(self.antibody_ic50s)
194            == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"]))
195        )
196        assert self.antibody_ic50s["IC50"].max() == 10
197        assert antibodies == set(self.antibody_ic50s["antibody"])
198
199        self.antibody_sources = pd.read_csv(antibody_sources)
200        assert set(self.antibody_sources.columns) == {"antibody", "source", "study"}
201        assert (
202            len(self.antibody_sources)
203            == len(self.antibody_sources.groupby(["antibody", "source", "study"]))
204            == len(antibodies)
205        )
206        assert antibodies == set(self.antibody_sources["antibody"])
207
208        self.antibody_reweighting = pd.read_csv(antibody_reweighting)
209        assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"}
210        assert antibodies.issuperset(self.antibody_reweighting["antibody"])
211
212        self.data = (
213            self.escape
214            .merge(self.antibody_ic50s, on="antibody")
215            .merge(self.antibody_sources, on="antibody")
216            .merge(self.antibody_reweighting, on="antibody", how="left")
217            .assign(reweight=lambda x: x["reweight"].fillna(1))
218        )
219        assert self.data.notnull().all().all()
220
221        # get initial config
222        if config.startswith("http"):
223            response = requests.get(config, allow_redirects=True)
224            config_text = response.content.decode("utf-8")
225        else:
226            with open(config) as f:
227                config_text = f.read()
228        config = yaml.safe_load(config_text)
229        self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1))
230        studies = config["studies"]
231        studies_rev = {value: key for (key, value) in studies.items()}
232        assert set(self.data["study"]) == set(studies)
233
234        if mut_escape_strength is None:
235            self.mut_escape_strength = config["init_mutation_escape_strength"]
236        else:
237            self.mut_escape_strength = mut_escape_strength
238
239        if weight_by_neg_log_ic50 is None:
240            self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"]
241        else:
242            self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50
243        assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50
244
245        if study is None:
246            self.study = config["init_study"]
247        else:
248            if study == "any" or study in studies:
249                self.study = study
250            elif study in studies_rev:
251                self.study = studies_rev[study]
252            else:
253                raise ValueError(f"invalid {study=}")
254
255        if virus is None:
256            self.virus = config["init_virus"]
257        else:
258            self.virus = virus
259
260        if sources is None:
261            sources = config["init_sources"]
262        include_exclude = sources["include_exclude"]
263        sources_list = sources["sources"]
264        self.sources = set(self.data["source"])
265        assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}"
266        if include_exclude == "exclude":
267            self.sources = self.sources - set(sources_list)
268        elif include_exclude == "include":
269            self.sources = set(sources_list)
270        else:
271            raise ValueError(f"invalid {include_exclude=} in {sources=}")
272
273        if reweight is None:
274            self.reweight = config["init_reweight"]
275        else:
276            self.reweight = reweight
277        assert isinstance(self.reweight, bool), self.reweight
278
279        # filter data
280        if self.study != "any":
281            assert self.study in set(self.data["study"])
282            self.data = self.data.query("study == @self.study").drop(columns="study")
283        else:
284            self.data = self.data.drop(columns="study")
285
286        assert self.virus in set(self.data["virus"])
287        self.data = self.data.query("virus == @self.virus").drop(columns="virus")
288
289        self.data = self.data.query("source in @self.sources").drop(columns="source")
290
291        assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"}
292        assert len(self.data) == len(self.data.drop_duplicates())
293        self.data = (
294            self.data
295            .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10))
296            .drop(columns="IC50")
297        )
298
299        max_escape_per_antibody = (
300            self.data
301            .groupby("antibody")
302            .aggregate(max_escape=pd.NamedAgg("escape", "max"))
303        )
304        assert (max_escape_per_antibody["max_escape"] == 1).all()
305
306    def escape_per_site(self, mutated_sites):
307        """Escape at each site after mutating indicated sites.
308
309        Parameters
310        ----------
311        mutated_sites : array-like of integers
312            List of mutated sites.
313
314        Returns
315        -------
316        pandas.DataFrame
317            For each site, gives the original escape and the escape
318            retained after mutations.
319
320        """
321        mutated_sites = set(mutated_sites)
322        if not mutated_sites.issubset(self.sites):
323            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
324        df = (
325            self.data
326            .assign(
327                mutated=lambda x: x['site'].isin(mutated_sites).astype(int),
328                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
329            )
330            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
331            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
332            .assign(
333                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
334                    self.mut_escape_strength
335                ),
336                weight=lambda x: (
337                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
338                    * (x["reweight"] if self.reweight else 1)
339                ),
340            )
341            [["antibody", "antibody_bind_retain", "weight"]]
342            .merge(self.data[["antibody", "site", "escape"]])
343            .assign(
344                escape=lambda x: x["escape"] * x["weight"],
345                retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"],
346            )
347            .groupby("site")
348            .aggregate(
349                original_escape=pd.NamedAgg("escape", "sum"),
350                retained_escape=pd.NamedAgg("retained_escape", "sum"),
351            )
352        ) / self.data["antibody"].nunique()
353        return df.reset_index()
354
355    def binding_retained(self, mutated_sites):
356        """Fraction binding or neutralization retained after mutating indicated sites.
357
358        Parameters
359        ----------
360        mutated_sites : array-like of integers
361            List of mutated sites, must all be in :attr:`BindingCalculator.sites`.
362
363        Returns
364        -------
365        float
366            The fraction binding retained after these mutations.
367
368        """
369        mutated_sites = set(mutated_sites)
370        if not mutated_sites.issubset(self.sites):
371            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
372        retained = (
373            self.data
374            .assign(
375                mutated=lambda x: x["site"].isin(mutated_sites).astype(int),
376                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
377            )
378            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
379            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
380            .assign(
381                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
382                    self.mut_escape_strength
383                ),
384                weight=lambda x: (
385                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
386                    * (x["reweight"] if self.reweight else 1)
387                ),
388                weighted_antibody_bind_retain=lambda x: (
389                    x["antibody_bind_retain"] * x["weight"]
390                ),
391            )
392            [["weight", "weighted_antibody_bind_retain"]]
393            .sum(axis=0)
394        )
395        return retained["weighted_antibody_bind_retain"] / retained["weight"]

Calculates residual polyclonal antibody binding after some mutations.

The calculator is implemented interactively at https://jbloomlab.github.io/SARS2-RBD-escape-calc

By default, this command-line calculator will exactly implement the calculations of the interactive calculator with default settings. You can pass parameters to override the default settings.

Parameters
  • escape (str): Path or URL of CSV containing the escape data.
  • antibody_ic50s (str): Path or URL of CSV containing the antibody IC50s.
  • antibody_sources (str): Path or URL of CSV containing the antibody sources.
  • antibody_reweighting (str): Path or URL of CSV containing the antibody reweightings.
  • config (str): Path or URL of YAML file containing initial settings.
  • mut_escape_strength (None or float): If not None, override default init_mutation_escape_strength in config.
  • weight_by_neg_log_ic50 (None or bool): If not None, override default init_weight_by_neg_log_IC50 in config.
  • study (None or str): If not None, override default init_study in config.
  • virus (None or str): If not None, override default init_virus in config.
  • sources (None or dict): If not None, override default init_sources in config.
  • reweight (None or bool): If not None, override default init_reweight in config.
Example

Initialize the calculator:

>>> calc = EscapeCalculator()

Calculate escape at sites after some mutations:

>>> sites_of_interest = [403, 440, 484, 498, 505, 510]
>>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3)
     site  original_escape  retained_escape
70    403            0.376            0.070
101   440            0.336            0.019
145   484            0.035            0.019
159   498            0.313            0.103
166   505            0.955            0.086
170   510            0.009            0.003

Calculate overall neutralization retained after no mutations or some mutations:

>>> assert calc.binding_retained([]) == 1.0
>>> float(calc.binding_retained([440, 505]).round(3))
0.412

Now repeat tests with some non-default options:

>>> calc2 = EscapeCalculator(
...     mut_escape_strength=1,
...     weight_by_neg_log_ic50=False,
...     study="Cao et al, 2022, Nature",
...     virus="D614G",
...     sources={"include_exclude": "include", "sources": ["WT convalescents"]},
... )
>>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3)
     site  original_escape  retained_escape
62    403            0.006            0.006
84    440            0.008            0.007
123   484            0.215            0.029
134   498            0.009            0.008
141   505            0.002            0.002
144   510            0.000            0.000
>>> float(calc2.binding_retained([484]).round(3))
0.785
EscapeCalculator( escape='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv', antibody_ic50s='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv', antibody_sources='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv', antibody_reweighting='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv', config='https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml', *, mut_escape_strength=None, weight_by_neg_log_ic50=None, study=None, virus=None, sources=None, reweight=None)
168    def __init__(self,
169        escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv",
170        antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv",
171        antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv",
172        antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv",
173        config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml",
174        *,
175        mut_escape_strength=None,
176        weight_by_neg_log_ic50=None,
177        study=None,
178        virus=None,
179        sources=None,
180        reweight=None,
181    ):
182        """See main class docstring."""
183        # read input data 
184        self.escape = pd.read_csv(escape)
185        assert set(self.escape.columns) == {"antibody", "site", "escape"}
186        assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"]))
187        assert self.escape.notnull().all().all()
188        antibodies = set(self.escape["antibody"])
189
190        self.antibody_ic50s = pd.read_csv(antibody_ic50s)
191        assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"}
192        assert (
193            len(self.antibody_ic50s)
194            == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"]))
195        )
196        assert self.antibody_ic50s["IC50"].max() == 10
197        assert antibodies == set(self.antibody_ic50s["antibody"])
198
199        self.antibody_sources = pd.read_csv(antibody_sources)
200        assert set(self.antibody_sources.columns) == {"antibody", "source", "study"}
201        assert (
202            len(self.antibody_sources)
203            == len(self.antibody_sources.groupby(["antibody", "source", "study"]))
204            == len(antibodies)
205        )
206        assert antibodies == set(self.antibody_sources["antibody"])
207
208        self.antibody_reweighting = pd.read_csv(antibody_reweighting)
209        assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"}
210        assert antibodies.issuperset(self.antibody_reweighting["antibody"])
211
212        self.data = (
213            self.escape
214            .merge(self.antibody_ic50s, on="antibody")
215            .merge(self.antibody_sources, on="antibody")
216            .merge(self.antibody_reweighting, on="antibody", how="left")
217            .assign(reweight=lambda x: x["reweight"].fillna(1))
218        )
219        assert self.data.notnull().all().all()
220
221        # get initial config
222        if config.startswith("http"):
223            response = requests.get(config, allow_redirects=True)
224            config_text = response.content.decode("utf-8")
225        else:
226            with open(config) as f:
227                config_text = f.read()
228        config = yaml.safe_load(config_text)
229        self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1))
230        studies = config["studies"]
231        studies_rev = {value: key for (key, value) in studies.items()}
232        assert set(self.data["study"]) == set(studies)
233
234        if mut_escape_strength is None:
235            self.mut_escape_strength = config["init_mutation_escape_strength"]
236        else:
237            self.mut_escape_strength = mut_escape_strength
238
239        if weight_by_neg_log_ic50 is None:
240            self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"]
241        else:
242            self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50
243        assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50
244
245        if study is None:
246            self.study = config["init_study"]
247        else:
248            if study == "any" or study in studies:
249                self.study = study
250            elif study in studies_rev:
251                self.study = studies_rev[study]
252            else:
253                raise ValueError(f"invalid {study=}")
254
255        if virus is None:
256            self.virus = config["init_virus"]
257        else:
258            self.virus = virus
259
260        if sources is None:
261            sources = config["init_sources"]
262        include_exclude = sources["include_exclude"]
263        sources_list = sources["sources"]
264        self.sources = set(self.data["source"])
265        assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}"
266        if include_exclude == "exclude":
267            self.sources = self.sources - set(sources_list)
268        elif include_exclude == "include":
269            self.sources = set(sources_list)
270        else:
271            raise ValueError(f"invalid {include_exclude=} in {sources=}")
272
273        if reweight is None:
274            self.reweight = config["init_reweight"]
275        else:
276            self.reweight = reweight
277        assert isinstance(self.reweight, bool), self.reweight
278
279        # filter data
280        if self.study != "any":
281            assert self.study in set(self.data["study"])
282            self.data = self.data.query("study == @self.study").drop(columns="study")
283        else:
284            self.data = self.data.drop(columns="study")
285
286        assert self.virus in set(self.data["virus"])
287        self.data = self.data.query("virus == @self.virus").drop(columns="virus")
288
289        self.data = self.data.query("source in @self.sources").drop(columns="source")
290
291        assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"}
292        assert len(self.data) == len(self.data.drop_duplicates())
293        self.data = (
294            self.data
295            .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10))
296            .drop(columns="IC50")
297        )
298
299        max_escape_per_antibody = (
300            self.data
301            .groupby("antibody")
302            .aggregate(max_escape=pd.NamedAgg("escape", "max"))
303        )
304        assert (max_escape_per_antibody["max_escape"] == 1).all()

See main class docstring.

def escape_per_site(self, mutated_sites):
306    def escape_per_site(self, mutated_sites):
307        """Escape at each site after mutating indicated sites.
308
309        Parameters
310        ----------
311        mutated_sites : array-like of integers
312            List of mutated sites.
313
314        Returns
315        -------
316        pandas.DataFrame
317            For each site, gives the original escape and the escape
318            retained after mutations.
319
320        """
321        mutated_sites = set(mutated_sites)
322        if not mutated_sites.issubset(self.sites):
323            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
324        df = (
325            self.data
326            .assign(
327                mutated=lambda x: x['site'].isin(mutated_sites).astype(int),
328                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
329            )
330            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
331            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
332            .assign(
333                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
334                    self.mut_escape_strength
335                ),
336                weight=lambda x: (
337                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
338                    * (x["reweight"] if self.reweight else 1)
339                ),
340            )
341            [["antibody", "antibody_bind_retain", "weight"]]
342            .merge(self.data[["antibody", "site", "escape"]])
343            .assign(
344                escape=lambda x: x["escape"] * x["weight"],
345                retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"],
346            )
347            .groupby("site")
348            .aggregate(
349                original_escape=pd.NamedAgg("escape", "sum"),
350                retained_escape=pd.NamedAgg("retained_escape", "sum"),
351            )
352        ) / self.data["antibody"].nunique()
353        return df.reset_index()

Escape at each site after mutating indicated sites.

Parameters
  • mutated_sites (array-like of integers): List of mutated sites.
Returns
  • pandas.DataFrame: For each site, gives the original escape and the escape retained after mutations.
def binding_retained(self, mutated_sites):
355    def binding_retained(self, mutated_sites):
356        """Fraction binding or neutralization retained after mutating indicated sites.
357
358        Parameters
359        ----------
360        mutated_sites : array-like of integers
361            List of mutated sites, must all be in :attr:`BindingCalculator.sites`.
362
363        Returns
364        -------
365        float
366            The fraction binding retained after these mutations.
367
368        """
369        mutated_sites = set(mutated_sites)
370        if not mutated_sites.issubset(self.sites):
371            raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}")
372        retained = (
373            self.data
374            .assign(
375                mutated=lambda x: x["site"].isin(mutated_sites).astype(int),
376                site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"],
377            )
378            .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False)
379            .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod"))
380            .assign(
381                antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow(
382                    self.mut_escape_strength
383                ),
384                weight=lambda x: (
385                    (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1)
386                    * (x["reweight"] if self.reweight else 1)
387                ),
388                weighted_antibody_bind_retain=lambda x: (
389                    x["antibody_bind_retain"] * x["weight"]
390                ),
391            )
392            [["weight", "weighted_antibody_bind_retain"]]
393            .sum(axis=0)
394        )
395        return retained["weighted_antibody_bind_retain"] / retained["weight"]

Fraction binding or neutralization retained after mutating indicated sites.

Parameters
  • mutated_sites (array-like of integers): List of mutated sites, must all be in BindingCalculator.sites.
Returns
  • float: The fraction binding retained after these mutations.