escapecalculator
Calculate escape after some mutations to the SARS-CoV-2 RBD.
This Python module, is available at https://github.com/jbloomlab/SARS2-RBD-escape-calc/blob/main/escapecalculator.py
It provides a programmatic method to perform the escape calculations implemented in the interactive calculator implemented at https://jbloomlab.github.io/SARS2-RBD-escape-calc
By default, it downloads the same data used by the interactive calculator and uses the
same initial parameter settings as the interactive calculator. You can override these
defaults by using different arguments when initializing the EscapeCalculator
.
See https://github.com/jbloomlab/SARS2-RBD-escape-calc for all code and data. The calculator is written by Jesse Bloom, and the citation is this paper .
To use the calculator, open a Python session, import this module, and initialize an
EscapeCalculator
:
>>> calc = EscapeCalculator()
With no mutations, there is no escape (all neutralization retained):
>>> calc.binding_retained([])
1.0
But if you mutate some key antigenic sites, there will be a dramatic reduction in neutralization retained:
>>> calc.binding_retained([440, 505]).round(3)
0.545
If you have a whole set of sequences and have tabulated which sites are mutated, you can apply the calculator in bulk to the data frame. First, create such a data frame of sequences:
>>> import pandas as pd
>>> seqs = pd.DataFrame.from_records(
... [("seq1", []), ("seq2", [498]), ("seq3", [440]), ("seq4", [440, 505])],
... columns=["name", "mutated sites"],
... )
>>> seqs
name mutated sites
0 seq1 []
1 seq2 [498]
2 seq3 [440]
3 seq4 [440, 505]
Now apply the calculator:
>>> seqs["neutralization retained"] = seqs["mutated sites"].map(calc.binding_retained)
>>> seqs.round(3)
name mutated sites neutralization retained
0 seq1 [] 1.000
1 seq2 [498] 0.769
2 seq3 [440] 0.790
3 seq4 [440, 505] 0.545
You can also create new calculators that compute escape relative to different viruses, for instance BA.2:
>>> calc_ba2 = EscapeCalculator(virus="BA.2")
Now the escape will be different because different antibodies neutralize that virus:
>>> calc_ba2.binding_retained([440, 505]).round(3)
0.786
1"""Calculate escape after some mutations to the SARS-CoV-2 RBD. 2 3This Python module, is available at 4https://github.com/jbloomlab/SARS2-RBD-escape-calc/blob/main/escapecalculator.py 5 6It provides a programmatic method to perform the escape calculations implemented in 7the interactive calculator implemented at 8https://jbloomlab.github.io/SARS2-RBD-escape-calc 9 10By default, it downloads the same data used by the interactive calculator and uses the 11same initial parameter settings as the interactive calculator. You can override these 12defaults by using different arguments when initializing the :class:`EscapeCalculator`. 13 14See https://github.com/jbloomlab/SARS2-RBD-escape-calc for all code and data. The 15calculator is written by Jesse Bloom, and the citation is 16`this paper <https://doi.org/10.1093/ve/veac021>`_. 17 18To use the calculator, open a Python session, import this module, and initialize an 19:class:`EscapeCalculator`: 20 21>>> calc = EscapeCalculator() 22 23With no mutations, there is no escape (all neutralization retained): 24 25>>> calc.binding_retained([]) 261.0 27 28But if you mutate some key antigenic sites, there will be a dramatic reduction in 29neutralization retained: 30 31>>> calc.binding_retained([440, 505]).round(3) 320.545 33 34If you have a whole set of sequences and have tabulated which sites are mutated, 35you can apply the calculator in bulk to the data frame. 36First, create such a data frame of sequences: 37 38>>> import pandas as pd 39>>> seqs = pd.DataFrame.from_records( 40... [("seq1", []), ("seq2", [498]), ("seq3", [440]), ("seq4", [440, 505])], 41... columns=["name", "mutated sites"], 42... ) 43>>> seqs 44 name mutated sites 450 seq1 [] 461 seq2 [498] 472 seq3 [440] 483 seq4 [440, 505] 49 50Now apply the calculator: 51 52>>> seqs["neutralization retained"] = seqs["mutated sites"].map(calc.binding_retained) 53>>> seqs.round(3) 54 name mutated sites neutralization retained 550 seq1 [] 1.000 561 seq2 [498] 0.769 572 seq3 [440] 0.790 583 seq4 [440, 505] 0.545 59 60You can also create new calculators that compute escape relative to different viruses, 61for instance BA.2: 62 63>>> calc_ba2 = EscapeCalculator(virus="BA.2") 64 65Now the escape will be different because different antibodies neutralize that virus: 66 67>>> calc_ba2.binding_retained([440, 505]).round(3) 680.786 69 70""" 71 72__docformat__ = 'numpy' 73 74 75import requests 76 77import numpy 78 79import pandas as pd 80 81import yaml 82 83 84class EscapeCalculator: 85 """Calculates residual polyclonal antibody binding after some mutations. 86 87 The calculator is implemented interactively at 88 `https://jbloomlab.github.io/SARS2-RBD-escape-calc <https://jbloomlab.github.io/SARS2-RBD-escape-calc/>`_ 89 90 By default, this command-line calculator will exactly implement the calculations of the 91 interactive calculator with default settings. You can pass parameters to override 92 the default settings. 93 94 Parameters 95 ---------- 96 escape : str 97 Path or URL of CSV containing the escape data. 98 antibody_ic50s : str 99 Path or URL of CSV containing the antibody IC50s. 100 antibody_binding : str 101 Path or URL of CSV containing the antibody binding. 102 antibody_sources : str 103 Path or URL of CSV containing the antibody sources. 104 antibody_reweighting : str 105 Path or URL of CSV containing the antibody reweightings. 106 config : str 107 Path or URL of YAML file containing initial settings. 108 mut_escape_strength : None or float 109 If not `None`, override default `init_mutation_escape_strength` in `config`. 110 weight_by_neg_log_ic50 : None or bool 111 If not `None`, override default `init_weight_by_neg_log_IC50` in `config`. 112 study : None or str 113 If not `None`, override default `init_study` in `config`. 114 binds : None or str 115 If not `None`, override default `init_binds` in `config`. 116 virus : None or str 117 If not `None`, override default `init_virus` in `config`. 118 sources : None or dict 119 If not `None`, override default `init_sources` in `config`. 120 reweight : None or bool 121 If not `None`, override default `init_reweight` in `config`. 122 123 Example 124 ------- 125 126 Initialize the calculator: 127 128 >>> calc = EscapeCalculator() 129 130 Calculate escape at sites after some mutations: 131 132 >>> sites_of_interest = [403, 440, 484, 498, 505, 510] 133 >>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3) 134 site original_escape retained_escape 135 69 403 0.105 0.030 136 101 440 0.162 0.018 137 143 484 0.043 0.032 138 156 498 0.169 0.076 139 163 505 0.208 0.022 140 167 510 0.001 0.001 141 142 Calculate overall neutralization retained after no mutations or some mutations: 143 144 >>> calc.binding_retained([]) 145 1.0 146 >>> calc.binding_retained([440, 505]).round(3) 147 0.545 148 149 Now repeat tests with some non-default options: 150 151 >>> calc2 = EscapeCalculator( 152 ... mut_escape_strength=1, 153 ... weight_by_neg_log_ic50=False, 154 ... study="Cao et al, 2022, Nature", 155 ... binds="Wuhan-Hu-1", 156 ... virus="D614G", 157 ... sources={"include_exclude": "include", "sources": ["WT convalescents"]}, 158 ... ) 159 160 >>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3) 161 site original_escape retained_escape 162 62 403 0.006 0.006 163 84 440 0.008 0.007 164 123 484 0.212 0.029 165 134 498 0.009 0.008 166 141 505 0.002 0.002 167 144 510 0.000 0.000 168 169 >>> calc2.binding_retained([484]).round(3) 170 0.788 171 172 """ 173 def __init__(self, 174 escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv", 175 antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv", 176 antibody_binding="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_binding.csv", 177 antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv", 178 antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv", 179 config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml", 180 *, 181 mut_escape_strength=None, 182 weight_by_neg_log_ic50=None, 183 study=None, 184 binds=None, 185 virus=None, 186 sources=None, 187 reweight=None, 188 ): 189 """See main class docstring.""" 190 # read input data 191 self.escape = pd.read_csv(escape) 192 assert set(self.escape.columns) == {"antibody", "site", "escape"} 193 assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"])) 194 assert self.escape.notnull().all().all() 195 antibodies = set(self.escape["antibody"]) 196 197 self.antibody_ic50s = pd.read_csv(antibody_ic50s) 198 assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"} 199 assert ( 200 len(self.antibody_ic50s) 201 == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"])) 202 ) 203 assert self.antibody_ic50s["IC50"].max() == 10 204 assert antibodies == set(self.antibody_ic50s["antibody"]) 205 206 self.antibody_binding = pd.read_csv(antibody_binding) 207 assert set(self.antibody_binding.columns) == {"antibody", "binds"} 208 assert ( 209 len(self.antibody_binding) 210 == len(self.antibody_binding.groupby(["antibody", "binds"])) 211 ) 212 assert antibodies == set(self.antibody_binding["antibody"]) 213 214 self.antibody_sources = pd.read_csv(antibody_sources) 215 assert set(self.antibody_sources.columns) == {"antibody", "source", "study"} 216 assert ( 217 len(self.antibody_sources) 218 == len(self.antibody_sources.groupby(["antibody", "source", "study"])) 219 == len(antibodies) 220 ) 221 assert antibodies == set(self.antibody_sources["antibody"]) 222 223 self.antibody_reweighting = pd.read_csv(antibody_reweighting) 224 assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"} 225 assert antibodies.issuperset(self.antibody_reweighting["antibody"]) 226 227 self.data = ( 228 self.escape 229 .merge(self.antibody_ic50s, on="antibody") 230 .merge(self.antibody_binding, on="antibody") 231 .merge(self.antibody_sources, on="antibody") 232 .merge(self.antibody_reweighting, on="antibody", how="left") 233 .assign(reweight=lambda x: x["reweight"].fillna(1)) 234 ) 235 assert self.data.notnull().all().all() 236 237 # get initial config 238 if config.startswith("http"): 239 response = requests.get(config, allow_redirects=True) 240 config_text = response.content.decode("utf-8") 241 else: 242 with open(config) as f: 243 config_text = f.read() 244 config = yaml.safe_load(config_text) 245 self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1)) 246 studies = config["studies"] 247 studies_rev = {value: key for (key, value) in studies.items()} 248 assert set(self.data["study"]) == set(studies) 249 250 if mut_escape_strength is None: 251 self.mut_escape_strength = config["init_mutation_escape_strength"] 252 else: 253 self.mut_escape_strength = mut_escape_strength 254 255 if weight_by_neg_log_ic50 is None: 256 self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"] 257 else: 258 self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50 259 assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50 260 261 if study is None: 262 self.study = config["init_study"] 263 else: 264 if study == "any" or study in studies: 265 self.study = study 266 elif study in studies_rev: 267 self.study = studies_rev[study] 268 else: 269 raise ValueError(f"invalid {study=}") 270 271 if binds is None: 272 self.binds = config["init_binds"] 273 else: 274 self.binds = binds 275 276 if virus is None: 277 self.virus = config["init_virus"] 278 else: 279 self.virus = virus 280 281 if sources is None: 282 sources = config["init_sources"] 283 include_exclude = sources["include_exclude"] 284 sources_list = sources["sources"] 285 self.sources = set(self.data["source"]) 286 assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}" 287 if include_exclude == "exclude": 288 self.sources = self.sources - set(sources_list) 289 elif include_exclude == "include": 290 self.sources = set(sources_list) 291 else: 292 raise ValueError(f"invalid {include_exclude=} in {sources=}") 293 294 if reweight is None: 295 self.reweight = config["init_reweight"] 296 else: 297 self.reweight = reweight 298 assert isinstance(self.reweight, bool), self.reweight 299 300 # filter data 301 if self.study != "any": 302 assert self.study in set(self.data["study"]) 303 self.data = self.data.query("study == @self.study").drop(columns="study") 304 else: 305 self.data = self.data.drop(columns="study") 306 307 if self.binds != "any": 308 assert self.binds in set(self.data["binds"]) 309 self.data = self.data.query("binds == @self.binds").drop(columns="binds") 310 else: 311 self.data = self.data.drop(columns="binds").drop_duplicates() 312 313 assert self.virus in set(self.data["virus"]) 314 self.data = self.data.query("virus == @self.virus").drop(columns="virus") 315 316 assert self.sources.issubset(self.data["source"]) 317 self.data = self.data.query("source in @self.sources").drop(columns="source") 318 319 assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"} 320 assert len(self.data) == len(self.data.drop_duplicates()) 321 self.data = ( 322 self.data 323 .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10)) 324 .drop(columns="IC50") 325 ) 326 327 max_escape_per_antibody = ( 328 self.data 329 .groupby("antibody") 330 .aggregate(max_escape=pd.NamedAgg("escape", "max")) 331 ) 332 assert (max_escape_per_antibody["max_escape"] == 1).all() 333 334 def escape_per_site(self, mutated_sites): 335 """Escape at each site after mutating indicated sites. 336 337 Parameters 338 ---------- 339 mutated_sites : array-like of integers 340 List of mutated sites. 341 342 Returns 343 ------- 344 pandas.DataFrame 345 For each site, gives the original escape and the escape 346 retained after mutations. 347 348 """ 349 mutated_sites = set(mutated_sites) 350 if not mutated_sites.issubset(self.sites): 351 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 352 df = ( 353 self.data 354 .assign( 355 mutated=lambda x: x['site'].isin(mutated_sites).astype(int), 356 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 357 ) 358 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 359 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 360 .assign( 361 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 362 self.mut_escape_strength 363 ), 364 weight=lambda x: ( 365 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 366 * (x["reweight"] if self.reweight else 1) 367 ), 368 ) 369 [["antibody", "antibody_bind_retain", "weight"]] 370 .merge(self.data[["antibody", "site", "escape"]]) 371 .assign( 372 escape=lambda x: x["escape"] * x["weight"], 373 retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"], 374 ) 375 .groupby("site") 376 .aggregate( 377 original_escape=pd.NamedAgg("escape", "sum"), 378 retained_escape=pd.NamedAgg("retained_escape", "sum"), 379 ) 380 ) / self.data["antibody"].nunique() 381 return df.reset_index() 382 383 def binding_retained(self, mutated_sites): 384 """Fraction binding or neutralization retained after mutating indicated sites. 385 386 Parameters 387 ---------- 388 mutated_sites : array-like of integers 389 List of mutated sites, must all be in :attr:`BindingCalculator.sites`. 390 391 Returns 392 ------- 393 float 394 The fraction binding retained after these mutations. 395 396 """ 397 mutated_sites = set(mutated_sites) 398 if not mutated_sites.issubset(self.sites): 399 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 400 retained = ( 401 self.data 402 .assign( 403 mutated=lambda x: x["site"].isin(mutated_sites).astype(int), 404 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 405 ) 406 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 407 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 408 .assign( 409 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 410 self.mut_escape_strength 411 ), 412 weight=lambda x: ( 413 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 414 * (x["reweight"] if self.reweight else 1) 415 ), 416 weighted_antibody_bind_retain=lambda x: ( 417 x["antibody_bind_retain"] * x["weight"] 418 ), 419 ) 420 [["weight", "weighted_antibody_bind_retain"]] 421 .sum(axis=0) 422 ) 423 return retained["weighted_antibody_bind_retain"] / retained["weight"] 424 425 426if __name__ == '__main__': 427 import doctest 428 doctest.testmod()
85class EscapeCalculator: 86 """Calculates residual polyclonal antibody binding after some mutations. 87 88 The calculator is implemented interactively at 89 `https://jbloomlab.github.io/SARS2-RBD-escape-calc <https://jbloomlab.github.io/SARS2-RBD-escape-calc/>`_ 90 91 By default, this command-line calculator will exactly implement the calculations of the 92 interactive calculator with default settings. You can pass parameters to override 93 the default settings. 94 95 Parameters 96 ---------- 97 escape : str 98 Path or URL of CSV containing the escape data. 99 antibody_ic50s : str 100 Path or URL of CSV containing the antibody IC50s. 101 antibody_binding : str 102 Path or URL of CSV containing the antibody binding. 103 antibody_sources : str 104 Path or URL of CSV containing the antibody sources. 105 antibody_reweighting : str 106 Path or URL of CSV containing the antibody reweightings. 107 config : str 108 Path or URL of YAML file containing initial settings. 109 mut_escape_strength : None or float 110 If not `None`, override default `init_mutation_escape_strength` in `config`. 111 weight_by_neg_log_ic50 : None or bool 112 If not `None`, override default `init_weight_by_neg_log_IC50` in `config`. 113 study : None or str 114 If not `None`, override default `init_study` in `config`. 115 binds : None or str 116 If not `None`, override default `init_binds` in `config`. 117 virus : None or str 118 If not `None`, override default `init_virus` in `config`. 119 sources : None or dict 120 If not `None`, override default `init_sources` in `config`. 121 reweight : None or bool 122 If not `None`, override default `init_reweight` in `config`. 123 124 Example 125 ------- 126 127 Initialize the calculator: 128 129 >>> calc = EscapeCalculator() 130 131 Calculate escape at sites after some mutations: 132 133 >>> sites_of_interest = [403, 440, 484, 498, 505, 510] 134 >>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3) 135 site original_escape retained_escape 136 69 403 0.105 0.030 137 101 440 0.162 0.018 138 143 484 0.043 0.032 139 156 498 0.169 0.076 140 163 505 0.208 0.022 141 167 510 0.001 0.001 142 143 Calculate overall neutralization retained after no mutations or some mutations: 144 145 >>> calc.binding_retained([]) 146 1.0 147 >>> calc.binding_retained([440, 505]).round(3) 148 0.545 149 150 Now repeat tests with some non-default options: 151 152 >>> calc2 = EscapeCalculator( 153 ... mut_escape_strength=1, 154 ... weight_by_neg_log_ic50=False, 155 ... study="Cao et al, 2022, Nature", 156 ... binds="Wuhan-Hu-1", 157 ... virus="D614G", 158 ... sources={"include_exclude": "include", "sources": ["WT convalescents"]}, 159 ... ) 160 161 >>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3) 162 site original_escape retained_escape 163 62 403 0.006 0.006 164 84 440 0.008 0.007 165 123 484 0.212 0.029 166 134 498 0.009 0.008 167 141 505 0.002 0.002 168 144 510 0.000 0.000 169 170 >>> calc2.binding_retained([484]).round(3) 171 0.788 172 173 """ 174 def __init__(self, 175 escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv", 176 antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv", 177 antibody_binding="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_binding.csv", 178 antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv", 179 antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv", 180 config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml", 181 *, 182 mut_escape_strength=None, 183 weight_by_neg_log_ic50=None, 184 study=None, 185 binds=None, 186 virus=None, 187 sources=None, 188 reweight=None, 189 ): 190 """See main class docstring.""" 191 # read input data 192 self.escape = pd.read_csv(escape) 193 assert set(self.escape.columns) == {"antibody", "site", "escape"} 194 assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"])) 195 assert self.escape.notnull().all().all() 196 antibodies = set(self.escape["antibody"]) 197 198 self.antibody_ic50s = pd.read_csv(antibody_ic50s) 199 assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"} 200 assert ( 201 len(self.antibody_ic50s) 202 == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"])) 203 ) 204 assert self.antibody_ic50s["IC50"].max() == 10 205 assert antibodies == set(self.antibody_ic50s["antibody"]) 206 207 self.antibody_binding = pd.read_csv(antibody_binding) 208 assert set(self.antibody_binding.columns) == {"antibody", "binds"} 209 assert ( 210 len(self.antibody_binding) 211 == len(self.antibody_binding.groupby(["antibody", "binds"])) 212 ) 213 assert antibodies == set(self.antibody_binding["antibody"]) 214 215 self.antibody_sources = pd.read_csv(antibody_sources) 216 assert set(self.antibody_sources.columns) == {"antibody", "source", "study"} 217 assert ( 218 len(self.antibody_sources) 219 == len(self.antibody_sources.groupby(["antibody", "source", "study"])) 220 == len(antibodies) 221 ) 222 assert antibodies == set(self.antibody_sources["antibody"]) 223 224 self.antibody_reweighting = pd.read_csv(antibody_reweighting) 225 assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"} 226 assert antibodies.issuperset(self.antibody_reweighting["antibody"]) 227 228 self.data = ( 229 self.escape 230 .merge(self.antibody_ic50s, on="antibody") 231 .merge(self.antibody_binding, on="antibody") 232 .merge(self.antibody_sources, on="antibody") 233 .merge(self.antibody_reweighting, on="antibody", how="left") 234 .assign(reweight=lambda x: x["reweight"].fillna(1)) 235 ) 236 assert self.data.notnull().all().all() 237 238 # get initial config 239 if config.startswith("http"): 240 response = requests.get(config, allow_redirects=True) 241 config_text = response.content.decode("utf-8") 242 else: 243 with open(config) as f: 244 config_text = f.read() 245 config = yaml.safe_load(config_text) 246 self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1)) 247 studies = config["studies"] 248 studies_rev = {value: key for (key, value) in studies.items()} 249 assert set(self.data["study"]) == set(studies) 250 251 if mut_escape_strength is None: 252 self.mut_escape_strength = config["init_mutation_escape_strength"] 253 else: 254 self.mut_escape_strength = mut_escape_strength 255 256 if weight_by_neg_log_ic50 is None: 257 self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"] 258 else: 259 self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50 260 assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50 261 262 if study is None: 263 self.study = config["init_study"] 264 else: 265 if study == "any" or study in studies: 266 self.study = study 267 elif study in studies_rev: 268 self.study = studies_rev[study] 269 else: 270 raise ValueError(f"invalid {study=}") 271 272 if binds is None: 273 self.binds = config["init_binds"] 274 else: 275 self.binds = binds 276 277 if virus is None: 278 self.virus = config["init_virus"] 279 else: 280 self.virus = virus 281 282 if sources is None: 283 sources = config["init_sources"] 284 include_exclude = sources["include_exclude"] 285 sources_list = sources["sources"] 286 self.sources = set(self.data["source"]) 287 assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}" 288 if include_exclude == "exclude": 289 self.sources = self.sources - set(sources_list) 290 elif include_exclude == "include": 291 self.sources = set(sources_list) 292 else: 293 raise ValueError(f"invalid {include_exclude=} in {sources=}") 294 295 if reweight is None: 296 self.reweight = config["init_reweight"] 297 else: 298 self.reweight = reweight 299 assert isinstance(self.reweight, bool), self.reweight 300 301 # filter data 302 if self.study != "any": 303 assert self.study in set(self.data["study"]) 304 self.data = self.data.query("study == @self.study").drop(columns="study") 305 else: 306 self.data = self.data.drop(columns="study") 307 308 if self.binds != "any": 309 assert self.binds in set(self.data["binds"]) 310 self.data = self.data.query("binds == @self.binds").drop(columns="binds") 311 else: 312 self.data = self.data.drop(columns="binds").drop_duplicates() 313 314 assert self.virus in set(self.data["virus"]) 315 self.data = self.data.query("virus == @self.virus").drop(columns="virus") 316 317 assert self.sources.issubset(self.data["source"]) 318 self.data = self.data.query("source in @self.sources").drop(columns="source") 319 320 assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"} 321 assert len(self.data) == len(self.data.drop_duplicates()) 322 self.data = ( 323 self.data 324 .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10)) 325 .drop(columns="IC50") 326 ) 327 328 max_escape_per_antibody = ( 329 self.data 330 .groupby("antibody") 331 .aggregate(max_escape=pd.NamedAgg("escape", "max")) 332 ) 333 assert (max_escape_per_antibody["max_escape"] == 1).all() 334 335 def escape_per_site(self, mutated_sites): 336 """Escape at each site after mutating indicated sites. 337 338 Parameters 339 ---------- 340 mutated_sites : array-like of integers 341 List of mutated sites. 342 343 Returns 344 ------- 345 pandas.DataFrame 346 For each site, gives the original escape and the escape 347 retained after mutations. 348 349 """ 350 mutated_sites = set(mutated_sites) 351 if not mutated_sites.issubset(self.sites): 352 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 353 df = ( 354 self.data 355 .assign( 356 mutated=lambda x: x['site'].isin(mutated_sites).astype(int), 357 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 358 ) 359 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 360 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 361 .assign( 362 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 363 self.mut_escape_strength 364 ), 365 weight=lambda x: ( 366 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 367 * (x["reweight"] if self.reweight else 1) 368 ), 369 ) 370 [["antibody", "antibody_bind_retain", "weight"]] 371 .merge(self.data[["antibody", "site", "escape"]]) 372 .assign( 373 escape=lambda x: x["escape"] * x["weight"], 374 retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"], 375 ) 376 .groupby("site") 377 .aggregate( 378 original_escape=pd.NamedAgg("escape", "sum"), 379 retained_escape=pd.NamedAgg("retained_escape", "sum"), 380 ) 381 ) / self.data["antibody"].nunique() 382 return df.reset_index() 383 384 def binding_retained(self, mutated_sites): 385 """Fraction binding or neutralization retained after mutating indicated sites. 386 387 Parameters 388 ---------- 389 mutated_sites : array-like of integers 390 List of mutated sites, must all be in :attr:`BindingCalculator.sites`. 391 392 Returns 393 ------- 394 float 395 The fraction binding retained after these mutations. 396 397 """ 398 mutated_sites = set(mutated_sites) 399 if not mutated_sites.issubset(self.sites): 400 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 401 retained = ( 402 self.data 403 .assign( 404 mutated=lambda x: x["site"].isin(mutated_sites).astype(int), 405 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 406 ) 407 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 408 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 409 .assign( 410 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 411 self.mut_escape_strength 412 ), 413 weight=lambda x: ( 414 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 415 * (x["reweight"] if self.reweight else 1) 416 ), 417 weighted_antibody_bind_retain=lambda x: ( 418 x["antibody_bind_retain"] * x["weight"] 419 ), 420 ) 421 [["weight", "weighted_antibody_bind_retain"]] 422 .sum(axis=0) 423 ) 424 return retained["weighted_antibody_bind_retain"] / retained["weight"]
Calculates residual polyclonal antibody binding after some mutations.
The calculator is implemented interactively at https://jbloomlab.github.io/SARS2-RBD-escape-calc
By default, this command-line calculator will exactly implement the calculations of the interactive calculator with default settings. You can pass parameters to override the default settings.
Parameters
- escape (str): Path or URL of CSV containing the escape data.
- antibody_ic50s (str): Path or URL of CSV containing the antibody IC50s.
- antibody_binding (str): Path or URL of CSV containing the antibody binding.
- antibody_sources (str): Path or URL of CSV containing the antibody sources.
- antibody_reweighting (str): Path or URL of CSV containing the antibody reweightings.
- config (str): Path or URL of YAML file containing initial settings.
- mut_escape_strength (None or float):
If not
None
, override defaultinit_mutation_escape_strength
inconfig
. - weight_by_neg_log_ic50 (None or bool):
If not
None
, override defaultinit_weight_by_neg_log_IC50
inconfig
. - study (None or str):
If not
None
, override defaultinit_study
inconfig
. - binds (None or str):
If not
None
, override defaultinit_binds
inconfig
. - virus (None or str):
If not
None
, override defaultinit_virus
inconfig
. - sources (None or dict):
If not
None
, override defaultinit_sources
inconfig
. - reweight (None or bool):
If not
None
, override defaultinit_reweight
inconfig
.
Example
Initialize the calculator:
>>> calc = EscapeCalculator()
Calculate escape at sites after some mutations:
>>> sites_of_interest = [403, 440, 484, 498, 505, 510]
>>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3)
site original_escape retained_escape
69 403 0.105 0.030
101 440 0.162 0.018
143 484 0.043 0.032
156 498 0.169 0.076
163 505 0.208 0.022
167 510 0.001 0.001
Calculate overall neutralization retained after no mutations or some mutations:
>>> calc.binding_retained([])
1.0
>>> calc.binding_retained([440, 505]).round(3)
0.545
Now repeat tests with some non-default options:
>>> calc2 = EscapeCalculator(
... mut_escape_strength=1,
... weight_by_neg_log_ic50=False,
... study="Cao et al, 2022, Nature",
... binds="Wuhan-Hu-1",
... virus="D614G",
... sources={"include_exclude": "include", "sources": ["WT convalescents"]},
... )
>>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3)
site original_escape retained_escape
62 403 0.006 0.006
84 440 0.008 0.007
123 484 0.212 0.029
134 498 0.009 0.008
141 505 0.002 0.002
144 510 0.000 0.000
>>> calc2.binding_retained([484]).round(3)
0.788
174 def __init__(self, 175 escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv", 176 antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv", 177 antibody_binding="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_binding.csv", 178 antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv", 179 antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv", 180 config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml", 181 *, 182 mut_escape_strength=None, 183 weight_by_neg_log_ic50=None, 184 study=None, 185 binds=None, 186 virus=None, 187 sources=None, 188 reweight=None, 189 ): 190 """See main class docstring.""" 191 # read input data 192 self.escape = pd.read_csv(escape) 193 assert set(self.escape.columns) == {"antibody", "site", "escape"} 194 assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"])) 195 assert self.escape.notnull().all().all() 196 antibodies = set(self.escape["antibody"]) 197 198 self.antibody_ic50s = pd.read_csv(antibody_ic50s) 199 assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"} 200 assert ( 201 len(self.antibody_ic50s) 202 == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"])) 203 ) 204 assert self.antibody_ic50s["IC50"].max() == 10 205 assert antibodies == set(self.antibody_ic50s["antibody"]) 206 207 self.antibody_binding = pd.read_csv(antibody_binding) 208 assert set(self.antibody_binding.columns) == {"antibody", "binds"} 209 assert ( 210 len(self.antibody_binding) 211 == len(self.antibody_binding.groupby(["antibody", "binds"])) 212 ) 213 assert antibodies == set(self.antibody_binding["antibody"]) 214 215 self.antibody_sources = pd.read_csv(antibody_sources) 216 assert set(self.antibody_sources.columns) == {"antibody", "source", "study"} 217 assert ( 218 len(self.antibody_sources) 219 == len(self.antibody_sources.groupby(["antibody", "source", "study"])) 220 == len(antibodies) 221 ) 222 assert antibodies == set(self.antibody_sources["antibody"]) 223 224 self.antibody_reweighting = pd.read_csv(antibody_reweighting) 225 assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"} 226 assert antibodies.issuperset(self.antibody_reweighting["antibody"]) 227 228 self.data = ( 229 self.escape 230 .merge(self.antibody_ic50s, on="antibody") 231 .merge(self.antibody_binding, on="antibody") 232 .merge(self.antibody_sources, on="antibody") 233 .merge(self.antibody_reweighting, on="antibody", how="left") 234 .assign(reweight=lambda x: x["reweight"].fillna(1)) 235 ) 236 assert self.data.notnull().all().all() 237 238 # get initial config 239 if config.startswith("http"): 240 response = requests.get(config, allow_redirects=True) 241 config_text = response.content.decode("utf-8") 242 else: 243 with open(config) as f: 244 config_text = f.read() 245 config = yaml.safe_load(config_text) 246 self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1)) 247 studies = config["studies"] 248 studies_rev = {value: key for (key, value) in studies.items()} 249 assert set(self.data["study"]) == set(studies) 250 251 if mut_escape_strength is None: 252 self.mut_escape_strength = config["init_mutation_escape_strength"] 253 else: 254 self.mut_escape_strength = mut_escape_strength 255 256 if weight_by_neg_log_ic50 is None: 257 self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"] 258 else: 259 self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50 260 assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50 261 262 if study is None: 263 self.study = config["init_study"] 264 else: 265 if study == "any" or study in studies: 266 self.study = study 267 elif study in studies_rev: 268 self.study = studies_rev[study] 269 else: 270 raise ValueError(f"invalid {study=}") 271 272 if binds is None: 273 self.binds = config["init_binds"] 274 else: 275 self.binds = binds 276 277 if virus is None: 278 self.virus = config["init_virus"] 279 else: 280 self.virus = virus 281 282 if sources is None: 283 sources = config["init_sources"] 284 include_exclude = sources["include_exclude"] 285 sources_list = sources["sources"] 286 self.sources = set(self.data["source"]) 287 assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}" 288 if include_exclude == "exclude": 289 self.sources = self.sources - set(sources_list) 290 elif include_exclude == "include": 291 self.sources = set(sources_list) 292 else: 293 raise ValueError(f"invalid {include_exclude=} in {sources=}") 294 295 if reweight is None: 296 self.reweight = config["init_reweight"] 297 else: 298 self.reweight = reweight 299 assert isinstance(self.reweight, bool), self.reweight 300 301 # filter data 302 if self.study != "any": 303 assert self.study in set(self.data["study"]) 304 self.data = self.data.query("study == @self.study").drop(columns="study") 305 else: 306 self.data = self.data.drop(columns="study") 307 308 if self.binds != "any": 309 assert self.binds in set(self.data["binds"]) 310 self.data = self.data.query("binds == @self.binds").drop(columns="binds") 311 else: 312 self.data = self.data.drop(columns="binds").drop_duplicates() 313 314 assert self.virus in set(self.data["virus"]) 315 self.data = self.data.query("virus == @self.virus").drop(columns="virus") 316 317 assert self.sources.issubset(self.data["source"]) 318 self.data = self.data.query("source in @self.sources").drop(columns="source") 319 320 assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"} 321 assert len(self.data) == len(self.data.drop_duplicates()) 322 self.data = ( 323 self.data 324 .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10)) 325 .drop(columns="IC50") 326 ) 327 328 max_escape_per_antibody = ( 329 self.data 330 .groupby("antibody") 331 .aggregate(max_escape=pd.NamedAgg("escape", "max")) 332 ) 333 assert (max_escape_per_antibody["max_escape"] == 1).all()
See main class docstring.
335 def escape_per_site(self, mutated_sites): 336 """Escape at each site after mutating indicated sites. 337 338 Parameters 339 ---------- 340 mutated_sites : array-like of integers 341 List of mutated sites. 342 343 Returns 344 ------- 345 pandas.DataFrame 346 For each site, gives the original escape and the escape 347 retained after mutations. 348 349 """ 350 mutated_sites = set(mutated_sites) 351 if not mutated_sites.issubset(self.sites): 352 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 353 df = ( 354 self.data 355 .assign( 356 mutated=lambda x: x['site'].isin(mutated_sites).astype(int), 357 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 358 ) 359 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 360 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 361 .assign( 362 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 363 self.mut_escape_strength 364 ), 365 weight=lambda x: ( 366 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 367 * (x["reweight"] if self.reweight else 1) 368 ), 369 ) 370 [["antibody", "antibody_bind_retain", "weight"]] 371 .merge(self.data[["antibody", "site", "escape"]]) 372 .assign( 373 escape=lambda x: x["escape"] * x["weight"], 374 retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"], 375 ) 376 .groupby("site") 377 .aggregate( 378 original_escape=pd.NamedAgg("escape", "sum"), 379 retained_escape=pd.NamedAgg("retained_escape", "sum"), 380 ) 381 ) / self.data["antibody"].nunique() 382 return df.reset_index()
Escape at each site after mutating indicated sites.
Parameters
- mutated_sites (array-like of integers): List of mutated sites.
Returns
- pandas.DataFrame: For each site, gives the original escape and the escape retained after mutations.
384 def binding_retained(self, mutated_sites): 385 """Fraction binding or neutralization retained after mutating indicated sites. 386 387 Parameters 388 ---------- 389 mutated_sites : array-like of integers 390 List of mutated sites, must all be in :attr:`BindingCalculator.sites`. 391 392 Returns 393 ------- 394 float 395 The fraction binding retained after these mutations. 396 397 """ 398 mutated_sites = set(mutated_sites) 399 if not mutated_sites.issubset(self.sites): 400 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 401 retained = ( 402 self.data 403 .assign( 404 mutated=lambda x: x["site"].isin(mutated_sites).astype(int), 405 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 406 ) 407 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 408 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 409 .assign( 410 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 411 self.mut_escape_strength 412 ), 413 weight=lambda x: ( 414 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 415 * (x["reweight"] if self.reweight else 1) 416 ), 417 weighted_antibody_bind_retain=lambda x: ( 418 x["antibody_bind_retain"] * x["weight"] 419 ), 420 ) 421 [["weight", "weighted_antibody_bind_retain"]] 422 .sum(axis=0) 423 ) 424 return retained["weighted_antibody_bind_retain"] / retained["weight"]
Fraction binding or neutralization retained after mutating indicated sites.
Parameters
- mutated_sites (array-like of integers):
List of mutated sites, must all be in
BindingCalculator.sites
.
Returns
- float: The fraction binding retained after these mutations.