escapecalculator
Calculate escape after some mutations to the SARS-CoV-2 RBD.
This Python module, is available at https://github.com/jbloomlab/SARS2-RBD-escape-calc/blob/main/escapecalculator.py
It provides a programmatic method to perform the escape calculations implemented in the interactive calculator implemented at https://jbloomlab.github.io/SARS2-RBD-escape-calc
By default, it downloads the same data used by the interactive calculator and uses the
same initial parameter settings as the interactive calculator. You can override these
defaults by using different arguments when initializing the EscapeCalculator
.
See https://github.com/jbloomlab/SARS2-RBD-escape-calc for all code and data. The calculator is written by Jesse Bloom, and the citation is this paper .
To use the calculator, open a Python session, import this module, and initialize an
EscapeCalculator
:
>>> calc = EscapeCalculator()
With no mutations, there is no escape (all neutralization retained):
>>> float(calc.binding_retained([]))
1.0
But if you mutate some key antigenic sites, there will be a dramatic reduction in neutralization retained:
>>> float(calc.binding_retained([440, 505]).round(3))
0.412
If you have a whole set of sequences and have tabulated which sites are mutated, you can apply the calculator in bulk to the data frame. First, create such a data frame of sequences:
>>> import pandas as pd
>>> seqs = pd.DataFrame.from_records(
... [("seq1", []), ("seq2", [498]), ("seq3", [440]), ("seq4", [440, 505])],
... columns=["name", "mutated sites"],
... )
>>> seqs
name mutated sites
0 seq1 []
1 seq2 [498]
2 seq3 [440]
3 seq4 [440, 505]
Now apply the calculator:
>>> seqs["neutralization retained"] = seqs["mutated sites"].map(calc.binding_retained)
>>> seqs.round(3)
name mutated sites neutralization retained
0 seq1 [] 1.000
1 seq2 [498] 0.818
2 seq3 [440] 0.846
3 seq4 [440, 505] 0.412
You can also create new calculators that compute escape relative to different viruses, for instance BA.2:
>>> calc_ba2 = EscapeCalculator(virus="BA.2")
Now the escape will be different because different antibodies neutralize that virus:
>>> float(calc_ba2.binding_retained([440, 505]).round(3))
0.59
1"""Calculate escape after some mutations to the SARS-CoV-2 RBD. 2 3This Python module, is available at 4https://github.com/jbloomlab/SARS2-RBD-escape-calc/blob/main/escapecalculator.py 5 6It provides a programmatic method to perform the escape calculations implemented in 7the interactive calculator implemented at 8https://jbloomlab.github.io/SARS2-RBD-escape-calc 9 10By default, it downloads the same data used by the interactive calculator and uses the 11same initial parameter settings as the interactive calculator. You can override these 12defaults by using different arguments when initializing the :class:`EscapeCalculator`. 13 14See https://github.com/jbloomlab/SARS2-RBD-escape-calc for all code and data. The 15calculator is written by Jesse Bloom, and the citation is 16`this paper <https://doi.org/10.1093/ve/veac021>`_. 17 18To use the calculator, open a Python session, import this module, and initialize an 19:class:`EscapeCalculator`: 20 21>>> calc = EscapeCalculator() 22 23With no mutations, there is no escape (all neutralization retained): 24 25>>> float(calc.binding_retained([])) 261.0 27 28But if you mutate some key antigenic sites, there will be a dramatic reduction in 29neutralization retained: 30 31>>> float(calc.binding_retained([440, 505]).round(3)) 320.412 33 34If you have a whole set of sequences and have tabulated which sites are mutated, 35you can apply the calculator in bulk to the data frame. 36First, create such a data frame of sequences: 37 38>>> import pandas as pd 39>>> seqs = pd.DataFrame.from_records( 40... [("seq1", []), ("seq2", [498]), ("seq3", [440]), ("seq4", [440, 505])], 41... columns=["name", "mutated sites"], 42... ) 43>>> seqs 44 name mutated sites 450 seq1 [] 461 seq2 [498] 472 seq3 [440] 483 seq4 [440, 505] 49 50Now apply the calculator: 51 52>>> seqs["neutralization retained"] = seqs["mutated sites"].map(calc.binding_retained) 53>>> seqs.round(3) 54 name mutated sites neutralization retained 550 seq1 [] 1.000 561 seq2 [498] 0.818 572 seq3 [440] 0.846 583 seq4 [440, 505] 0.412 59 60You can also create new calculators that compute escape relative to different viruses, 61for instance BA.2: 62 63>>> calc_ba2 = EscapeCalculator(virus="BA.2") 64 65Now the escape will be different because different antibodies neutralize that virus: 66 67>>> float(calc_ba2.binding_retained([440, 505]).round(3)) 680.59 69 70""" 71 72__docformat__ = 'numpy' 73 74 75import requests 76 77import numpy 78 79import pandas as pd 80 81import yaml 82 83 84class EscapeCalculator: 85 """Calculates residual polyclonal antibody binding after some mutations. 86 87 The calculator is implemented interactively at 88 `https://jbloomlab.github.io/SARS2-RBD-escape-calc <https://jbloomlab.github.io/SARS2-RBD-escape-calc/>`_ 89 90 By default, this command-line calculator will exactly implement the calculations of the 91 interactive calculator with default settings. You can pass parameters to override 92 the default settings. 93 94 Parameters 95 ---------- 96 escape : str 97 Path or URL of CSV containing the escape data. 98 antibody_ic50s : str 99 Path or URL of CSV containing the antibody IC50s. 100 antibody_sources : str 101 Path or URL of CSV containing the antibody sources. 102 antibody_reweighting : str 103 Path or URL of CSV containing the antibody reweightings. 104 config : str 105 Path or URL of YAML file containing initial settings. 106 mut_escape_strength : None or float 107 If not `None`, override default `init_mutation_escape_strength` in `config`. 108 weight_by_neg_log_ic50 : None or bool 109 If not `None`, override default `init_weight_by_neg_log_IC50` in `config`. 110 study : None or str 111 If not `None`, override default `init_study` in `config`. 112 virus : None or str 113 If not `None`, override default `init_virus` in `config`. 114 sources : None or dict 115 If not `None`, override default `init_sources` in `config`. 116 reweight : None or bool 117 If not `None`, override default `init_reweight` in `config`. 118 119 Example 120 ------- 121 122 Initialize the calculator: 123 124 >>> calc = EscapeCalculator() 125 126 Calculate escape at sites after some mutations: 127 128 >>> sites_of_interest = [403, 440, 484, 498, 505, 510] 129 >>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3) 130 site original_escape retained_escape 131 70 403 0.376 0.070 132 101 440 0.336 0.019 133 145 484 0.035 0.019 134 159 498 0.313 0.103 135 166 505 0.955 0.086 136 170 510 0.009 0.003 137 138 Calculate overall neutralization retained after no mutations or some mutations: 139 140 >>> assert calc.binding_retained([]) == 1.0 141 >>> float(calc.binding_retained([440, 505]).round(3)) 142 0.412 143 144 Now repeat tests with some non-default options: 145 146 >>> calc2 = EscapeCalculator( 147 ... mut_escape_strength=1, 148 ... weight_by_neg_log_ic50=False, 149 ... study="Cao et al, 2022, Nature", 150 ... virus="D614G", 151 ... sources={"include_exclude": "include", "sources": ["WT convalescents"]}, 152 ... ) 153 154 >>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3) 155 site original_escape retained_escape 156 62 403 0.006 0.006 157 84 440 0.008 0.007 158 123 484 0.215 0.029 159 134 498 0.009 0.008 160 141 505 0.002 0.002 161 144 510 0.000 0.000 162 163 >>> float(calc2.binding_retained([484]).round(3)) 164 0.785 165 166 """ 167 def __init__(self, 168 escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv", 169 antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv", 170 antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv", 171 antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv", 172 config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml", 173 *, 174 mut_escape_strength=None, 175 weight_by_neg_log_ic50=None, 176 study=None, 177 virus=None, 178 sources=None, 179 reweight=None, 180 ): 181 """See main class docstring.""" 182 # read input data 183 self.escape = pd.read_csv(escape) 184 assert set(self.escape.columns) == {"antibody", "site", "escape"} 185 assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"])) 186 assert self.escape.notnull().all().all() 187 antibodies = set(self.escape["antibody"]) 188 189 self.antibody_ic50s = pd.read_csv(antibody_ic50s) 190 assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"} 191 assert ( 192 len(self.antibody_ic50s) 193 == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"])) 194 ) 195 assert self.antibody_ic50s["IC50"].max() == 10 196 assert antibodies == set(self.antibody_ic50s["antibody"]) 197 198 self.antibody_sources = pd.read_csv(antibody_sources) 199 assert set(self.antibody_sources.columns) == {"antibody", "source", "study"} 200 assert ( 201 len(self.antibody_sources) 202 == len(self.antibody_sources.groupby(["antibody", "source", "study"])) 203 == len(antibodies) 204 ) 205 assert antibodies == set(self.antibody_sources["antibody"]) 206 207 self.antibody_reweighting = pd.read_csv(antibody_reweighting) 208 assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"} 209 assert antibodies.issuperset(self.antibody_reweighting["antibody"]) 210 211 self.data = ( 212 self.escape 213 .merge(self.antibody_ic50s, on="antibody") 214 .merge(self.antibody_sources, on="antibody") 215 .merge(self.antibody_reweighting, on="antibody", how="left") 216 .assign(reweight=lambda x: x["reweight"].fillna(1)) 217 ) 218 assert self.data.notnull().all().all() 219 220 # get initial config 221 if config.startswith("http"): 222 response = requests.get(config, allow_redirects=True) 223 config_text = response.content.decode("utf-8") 224 else: 225 with open(config) as f: 226 config_text = f.read() 227 config = yaml.safe_load(config_text) 228 self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1)) 229 studies = config["studies"] 230 studies_rev = {value: key for (key, value) in studies.items()} 231 assert set(self.data["study"]) == set(studies) 232 233 if mut_escape_strength is None: 234 self.mut_escape_strength = config["init_mutation_escape_strength"] 235 else: 236 self.mut_escape_strength = mut_escape_strength 237 238 if weight_by_neg_log_ic50 is None: 239 self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"] 240 else: 241 self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50 242 assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50 243 244 if study is None: 245 self.study = config["init_study"] 246 else: 247 if study == "any" or study in studies: 248 self.study = study 249 elif study in studies_rev: 250 self.study = studies_rev[study] 251 else: 252 raise ValueError(f"invalid {study=}") 253 254 if virus is None: 255 self.virus = config["init_virus"] 256 else: 257 self.virus = virus 258 259 if sources is None: 260 sources = config["init_sources"] 261 include_exclude = sources["include_exclude"] 262 sources_list = sources["sources"] 263 self.sources = set(self.data["source"]) 264 assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}" 265 if include_exclude == "exclude": 266 self.sources = self.sources - set(sources_list) 267 elif include_exclude == "include": 268 self.sources = set(sources_list) 269 else: 270 raise ValueError(f"invalid {include_exclude=} in {sources=}") 271 272 if reweight is None: 273 self.reweight = config["init_reweight"] 274 else: 275 self.reweight = reweight 276 assert isinstance(self.reweight, bool), self.reweight 277 278 # filter data 279 if self.study != "any": 280 assert self.study in set(self.data["study"]) 281 self.data = self.data.query("study == @self.study").drop(columns="study") 282 else: 283 self.data = self.data.drop(columns="study") 284 285 assert self.virus in set(self.data["virus"]) 286 self.data = self.data.query("virus == @self.virus").drop(columns="virus") 287 288 self.data = self.data.query("source in @self.sources").drop(columns="source") 289 290 assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"} 291 assert len(self.data) == len(self.data.drop_duplicates()) 292 self.data = ( 293 self.data 294 .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10)) 295 .drop(columns="IC50") 296 ) 297 298 max_escape_per_antibody = ( 299 self.data 300 .groupby("antibody") 301 .aggregate(max_escape=pd.NamedAgg("escape", "max")) 302 ) 303 assert (max_escape_per_antibody["max_escape"] == 1).all() 304 305 def escape_per_site(self, mutated_sites): 306 """Escape at each site after mutating indicated sites. 307 308 Parameters 309 ---------- 310 mutated_sites : array-like of integers 311 List of mutated sites. 312 313 Returns 314 ------- 315 pandas.DataFrame 316 For each site, gives the original escape and the escape 317 retained after mutations. 318 319 """ 320 mutated_sites = set(mutated_sites) 321 if not mutated_sites.issubset(self.sites): 322 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 323 df = ( 324 self.data 325 .assign( 326 mutated=lambda x: x['site'].isin(mutated_sites).astype(int), 327 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 328 ) 329 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 330 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 331 .assign( 332 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 333 self.mut_escape_strength 334 ), 335 weight=lambda x: ( 336 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 337 * (x["reweight"] if self.reweight else 1) 338 ), 339 ) 340 [["antibody", "antibody_bind_retain", "weight"]] 341 .merge(self.data[["antibody", "site", "escape"]]) 342 .assign( 343 escape=lambda x: x["escape"] * x["weight"], 344 retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"], 345 ) 346 .groupby("site") 347 .aggregate( 348 original_escape=pd.NamedAgg("escape", "sum"), 349 retained_escape=pd.NamedAgg("retained_escape", "sum"), 350 ) 351 ) / self.data["antibody"].nunique() 352 return df.reset_index() 353 354 def binding_retained(self, mutated_sites): 355 """Fraction binding or neutralization retained after mutating indicated sites. 356 357 Parameters 358 ---------- 359 mutated_sites : array-like of integers 360 List of mutated sites, must all be in :attr:`BindingCalculator.sites`. 361 362 Returns 363 ------- 364 float 365 The fraction binding retained after these mutations. 366 367 """ 368 mutated_sites = set(mutated_sites) 369 if not mutated_sites.issubset(self.sites): 370 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 371 retained = ( 372 self.data 373 .assign( 374 mutated=lambda x: x["site"].isin(mutated_sites).astype(int), 375 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 376 ) 377 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 378 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 379 .assign( 380 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 381 self.mut_escape_strength 382 ), 383 weight=lambda x: ( 384 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 385 * (x["reweight"] if self.reweight else 1) 386 ), 387 weighted_antibody_bind_retain=lambda x: ( 388 x["antibody_bind_retain"] * x["weight"] 389 ), 390 ) 391 [["weight", "weighted_antibody_bind_retain"]] 392 .sum(axis=0) 393 ) 394 return retained["weighted_antibody_bind_retain"] / retained["weight"] 395 396 397if __name__ == '__main__': 398 import doctest 399 doctest.testmod()
85class EscapeCalculator: 86 """Calculates residual polyclonal antibody binding after some mutations. 87 88 The calculator is implemented interactively at 89 `https://jbloomlab.github.io/SARS2-RBD-escape-calc <https://jbloomlab.github.io/SARS2-RBD-escape-calc/>`_ 90 91 By default, this command-line calculator will exactly implement the calculations of the 92 interactive calculator with default settings. You can pass parameters to override 93 the default settings. 94 95 Parameters 96 ---------- 97 escape : str 98 Path or URL of CSV containing the escape data. 99 antibody_ic50s : str 100 Path or URL of CSV containing the antibody IC50s. 101 antibody_sources : str 102 Path or URL of CSV containing the antibody sources. 103 antibody_reweighting : str 104 Path or URL of CSV containing the antibody reweightings. 105 config : str 106 Path or URL of YAML file containing initial settings. 107 mut_escape_strength : None or float 108 If not `None`, override default `init_mutation_escape_strength` in `config`. 109 weight_by_neg_log_ic50 : None or bool 110 If not `None`, override default `init_weight_by_neg_log_IC50` in `config`. 111 study : None or str 112 If not `None`, override default `init_study` in `config`. 113 virus : None or str 114 If not `None`, override default `init_virus` in `config`. 115 sources : None or dict 116 If not `None`, override default `init_sources` in `config`. 117 reweight : None or bool 118 If not `None`, override default `init_reweight` in `config`. 119 120 Example 121 ------- 122 123 Initialize the calculator: 124 125 >>> calc = EscapeCalculator() 126 127 Calculate escape at sites after some mutations: 128 129 >>> sites_of_interest = [403, 440, 484, 498, 505, 510] 130 >>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3) 131 site original_escape retained_escape 132 70 403 0.376 0.070 133 101 440 0.336 0.019 134 145 484 0.035 0.019 135 159 498 0.313 0.103 136 166 505 0.955 0.086 137 170 510 0.009 0.003 138 139 Calculate overall neutralization retained after no mutations or some mutations: 140 141 >>> assert calc.binding_retained([]) == 1.0 142 >>> float(calc.binding_retained([440, 505]).round(3)) 143 0.412 144 145 Now repeat tests with some non-default options: 146 147 >>> calc2 = EscapeCalculator( 148 ... mut_escape_strength=1, 149 ... weight_by_neg_log_ic50=False, 150 ... study="Cao et al, 2022, Nature", 151 ... virus="D614G", 152 ... sources={"include_exclude": "include", "sources": ["WT convalescents"]}, 153 ... ) 154 155 >>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3) 156 site original_escape retained_escape 157 62 403 0.006 0.006 158 84 440 0.008 0.007 159 123 484 0.215 0.029 160 134 498 0.009 0.008 161 141 505 0.002 0.002 162 144 510 0.000 0.000 163 164 >>> float(calc2.binding_retained([484]).round(3)) 165 0.785 166 167 """ 168 def __init__(self, 169 escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv", 170 antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv", 171 antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv", 172 antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv", 173 config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml", 174 *, 175 mut_escape_strength=None, 176 weight_by_neg_log_ic50=None, 177 study=None, 178 virus=None, 179 sources=None, 180 reweight=None, 181 ): 182 """See main class docstring.""" 183 # read input data 184 self.escape = pd.read_csv(escape) 185 assert set(self.escape.columns) == {"antibody", "site", "escape"} 186 assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"])) 187 assert self.escape.notnull().all().all() 188 antibodies = set(self.escape["antibody"]) 189 190 self.antibody_ic50s = pd.read_csv(antibody_ic50s) 191 assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"} 192 assert ( 193 len(self.antibody_ic50s) 194 == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"])) 195 ) 196 assert self.antibody_ic50s["IC50"].max() == 10 197 assert antibodies == set(self.antibody_ic50s["antibody"]) 198 199 self.antibody_sources = pd.read_csv(antibody_sources) 200 assert set(self.antibody_sources.columns) == {"antibody", "source", "study"} 201 assert ( 202 len(self.antibody_sources) 203 == len(self.antibody_sources.groupby(["antibody", "source", "study"])) 204 == len(antibodies) 205 ) 206 assert antibodies == set(self.antibody_sources["antibody"]) 207 208 self.antibody_reweighting = pd.read_csv(antibody_reweighting) 209 assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"} 210 assert antibodies.issuperset(self.antibody_reweighting["antibody"]) 211 212 self.data = ( 213 self.escape 214 .merge(self.antibody_ic50s, on="antibody") 215 .merge(self.antibody_sources, on="antibody") 216 .merge(self.antibody_reweighting, on="antibody", how="left") 217 .assign(reweight=lambda x: x["reweight"].fillna(1)) 218 ) 219 assert self.data.notnull().all().all() 220 221 # get initial config 222 if config.startswith("http"): 223 response = requests.get(config, allow_redirects=True) 224 config_text = response.content.decode("utf-8") 225 else: 226 with open(config) as f: 227 config_text = f.read() 228 config = yaml.safe_load(config_text) 229 self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1)) 230 studies = config["studies"] 231 studies_rev = {value: key for (key, value) in studies.items()} 232 assert set(self.data["study"]) == set(studies) 233 234 if mut_escape_strength is None: 235 self.mut_escape_strength = config["init_mutation_escape_strength"] 236 else: 237 self.mut_escape_strength = mut_escape_strength 238 239 if weight_by_neg_log_ic50 is None: 240 self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"] 241 else: 242 self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50 243 assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50 244 245 if study is None: 246 self.study = config["init_study"] 247 else: 248 if study == "any" or study in studies: 249 self.study = study 250 elif study in studies_rev: 251 self.study = studies_rev[study] 252 else: 253 raise ValueError(f"invalid {study=}") 254 255 if virus is None: 256 self.virus = config["init_virus"] 257 else: 258 self.virus = virus 259 260 if sources is None: 261 sources = config["init_sources"] 262 include_exclude = sources["include_exclude"] 263 sources_list = sources["sources"] 264 self.sources = set(self.data["source"]) 265 assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}" 266 if include_exclude == "exclude": 267 self.sources = self.sources - set(sources_list) 268 elif include_exclude == "include": 269 self.sources = set(sources_list) 270 else: 271 raise ValueError(f"invalid {include_exclude=} in {sources=}") 272 273 if reweight is None: 274 self.reweight = config["init_reweight"] 275 else: 276 self.reweight = reweight 277 assert isinstance(self.reweight, bool), self.reweight 278 279 # filter data 280 if self.study != "any": 281 assert self.study in set(self.data["study"]) 282 self.data = self.data.query("study == @self.study").drop(columns="study") 283 else: 284 self.data = self.data.drop(columns="study") 285 286 assert self.virus in set(self.data["virus"]) 287 self.data = self.data.query("virus == @self.virus").drop(columns="virus") 288 289 self.data = self.data.query("source in @self.sources").drop(columns="source") 290 291 assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"} 292 assert len(self.data) == len(self.data.drop_duplicates()) 293 self.data = ( 294 self.data 295 .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10)) 296 .drop(columns="IC50") 297 ) 298 299 max_escape_per_antibody = ( 300 self.data 301 .groupby("antibody") 302 .aggregate(max_escape=pd.NamedAgg("escape", "max")) 303 ) 304 assert (max_escape_per_antibody["max_escape"] == 1).all() 305 306 def escape_per_site(self, mutated_sites): 307 """Escape at each site after mutating indicated sites. 308 309 Parameters 310 ---------- 311 mutated_sites : array-like of integers 312 List of mutated sites. 313 314 Returns 315 ------- 316 pandas.DataFrame 317 For each site, gives the original escape and the escape 318 retained after mutations. 319 320 """ 321 mutated_sites = set(mutated_sites) 322 if not mutated_sites.issubset(self.sites): 323 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 324 df = ( 325 self.data 326 .assign( 327 mutated=lambda x: x['site'].isin(mutated_sites).astype(int), 328 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 329 ) 330 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 331 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 332 .assign( 333 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 334 self.mut_escape_strength 335 ), 336 weight=lambda x: ( 337 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 338 * (x["reweight"] if self.reweight else 1) 339 ), 340 ) 341 [["antibody", "antibody_bind_retain", "weight"]] 342 .merge(self.data[["antibody", "site", "escape"]]) 343 .assign( 344 escape=lambda x: x["escape"] * x["weight"], 345 retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"], 346 ) 347 .groupby("site") 348 .aggregate( 349 original_escape=pd.NamedAgg("escape", "sum"), 350 retained_escape=pd.NamedAgg("retained_escape", "sum"), 351 ) 352 ) / self.data["antibody"].nunique() 353 return df.reset_index() 354 355 def binding_retained(self, mutated_sites): 356 """Fraction binding or neutralization retained after mutating indicated sites. 357 358 Parameters 359 ---------- 360 mutated_sites : array-like of integers 361 List of mutated sites, must all be in :attr:`BindingCalculator.sites`. 362 363 Returns 364 ------- 365 float 366 The fraction binding retained after these mutations. 367 368 """ 369 mutated_sites = set(mutated_sites) 370 if not mutated_sites.issubset(self.sites): 371 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 372 retained = ( 373 self.data 374 .assign( 375 mutated=lambda x: x["site"].isin(mutated_sites).astype(int), 376 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 377 ) 378 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 379 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 380 .assign( 381 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 382 self.mut_escape_strength 383 ), 384 weight=lambda x: ( 385 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 386 * (x["reweight"] if self.reweight else 1) 387 ), 388 weighted_antibody_bind_retain=lambda x: ( 389 x["antibody_bind_retain"] * x["weight"] 390 ), 391 ) 392 [["weight", "weighted_antibody_bind_retain"]] 393 .sum(axis=0) 394 ) 395 return retained["weighted_antibody_bind_retain"] / retained["weight"]
Calculates residual polyclonal antibody binding after some mutations.
The calculator is implemented interactively at https://jbloomlab.github.io/SARS2-RBD-escape-calc
By default, this command-line calculator will exactly implement the calculations of the interactive calculator with default settings. You can pass parameters to override the default settings.
Parameters
- escape (str): Path or URL of CSV containing the escape data.
- antibody_ic50s (str): Path or URL of CSV containing the antibody IC50s.
- antibody_sources (str): Path or URL of CSV containing the antibody sources.
- antibody_reweighting (str): Path or URL of CSV containing the antibody reweightings.
- config (str): Path or URL of YAML file containing initial settings.
- mut_escape_strength (None or float):
If not
None
, override defaultinit_mutation_escape_strength
inconfig
. - weight_by_neg_log_ic50 (None or bool):
If not
None
, override defaultinit_weight_by_neg_log_IC50
inconfig
. - study (None or str):
If not
None
, override defaultinit_study
inconfig
. - virus (None or str):
If not
None
, override defaultinit_virus
inconfig
. - sources (None or dict):
If not
None
, override defaultinit_sources
inconfig
. - reweight (None or bool):
If not
None
, override defaultinit_reweight
inconfig
.
Example
Initialize the calculator:
>>> calc = EscapeCalculator()
Calculate escape at sites after some mutations:
>>> sites_of_interest = [403, 440, 484, 498, 505, 510]
>>> calc.escape_per_site([440, 505]).query("site in @sites_of_interest").round(3)
site original_escape retained_escape
70 403 0.376 0.070
101 440 0.336 0.019
145 484 0.035 0.019
159 498 0.313 0.103
166 505 0.955 0.086
170 510 0.009 0.003
Calculate overall neutralization retained after no mutations or some mutations:
>>> assert calc.binding_retained([]) == 1.0
>>> float(calc.binding_retained([440, 505]).round(3))
0.412
Now repeat tests with some non-default options:
>>> calc2 = EscapeCalculator(
... mut_escape_strength=1,
... weight_by_neg_log_ic50=False,
... study="Cao et al, 2022, Nature",
... virus="D614G",
... sources={"include_exclude": "include", "sources": ["WT convalescents"]},
... )
>>> calc2.escape_per_site([484]).query("site in @sites_of_interest").round(3)
site original_escape retained_escape
62 403 0.006 0.006
84 440 0.008 0.007
123 484 0.215 0.029
134 498 0.009 0.008
141 505 0.002 0.002
144 510 0.000 0.000
>>> float(calc2.binding_retained([484]).round(3))
0.785
168 def __init__(self, 169 escape="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/escape.csv", 170 antibody_ic50s="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_IC50s.csv", 171 antibody_sources="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_sources.csv", 172 antibody_reweighting="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/results/antibody_reweighting.csv", 173 config="https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/main/config.yaml", 174 *, 175 mut_escape_strength=None, 176 weight_by_neg_log_ic50=None, 177 study=None, 178 virus=None, 179 sources=None, 180 reweight=None, 181 ): 182 """See main class docstring.""" 183 # read input data 184 self.escape = pd.read_csv(escape) 185 assert set(self.escape.columns) == {"antibody", "site", "escape"} 186 assert len(self.escape) == len(self.escape.groupby(["antibody", "site", "escape"])) 187 assert self.escape.notnull().all().all() 188 antibodies = set(self.escape["antibody"]) 189 190 self.antibody_ic50s = pd.read_csv(antibody_ic50s) 191 assert set(self.antibody_ic50s.columns) == {"antibody", "virus", "IC50"} 192 assert ( 193 len(self.antibody_ic50s) 194 == len(self.antibody_ic50s.groupby(["antibody", "virus", "IC50"])) 195 ) 196 assert self.antibody_ic50s["IC50"].max() == 10 197 assert antibodies == set(self.antibody_ic50s["antibody"]) 198 199 self.antibody_sources = pd.read_csv(antibody_sources) 200 assert set(self.antibody_sources.columns) == {"antibody", "source", "study"} 201 assert ( 202 len(self.antibody_sources) 203 == len(self.antibody_sources.groupby(["antibody", "source", "study"])) 204 == len(antibodies) 205 ) 206 assert antibodies == set(self.antibody_sources["antibody"]) 207 208 self.antibody_reweighting = pd.read_csv(antibody_reweighting) 209 assert set(self.antibody_reweighting.columns) == {"antibody", "reweight"} 210 assert antibodies.issuperset(self.antibody_reweighting["antibody"]) 211 212 self.data = ( 213 self.escape 214 .merge(self.antibody_ic50s, on="antibody") 215 .merge(self.antibody_sources, on="antibody") 216 .merge(self.antibody_reweighting, on="antibody", how="left") 217 .assign(reweight=lambda x: x["reweight"].fillna(1)) 218 ) 219 assert self.data.notnull().all().all() 220 221 # get initial config 222 if config.startswith("http"): 223 response = requests.get(config, allow_redirects=True) 224 config_text = response.content.decode("utf-8") 225 else: 226 with open(config) as f: 227 config_text = f.read() 228 config = yaml.safe_load(config_text) 229 self.sites = set(range(config["sites"]["start"], config["sites"]["end"] + 1)) 230 studies = config["studies"] 231 studies_rev = {value: key for (key, value) in studies.items()} 232 assert set(self.data["study"]) == set(studies) 233 234 if mut_escape_strength is None: 235 self.mut_escape_strength = config["init_mutation_escape_strength"] 236 else: 237 self.mut_escape_strength = mut_escape_strength 238 239 if weight_by_neg_log_ic50 is None: 240 self.weight_by_neg_log_ic50 = config["init_weight_by_neg_log_IC50"] 241 else: 242 self.weight_by_neg_log_ic50 = weight_by_neg_log_ic50 243 assert isinstance(self.weight_by_neg_log_ic50, bool), self.weight_by_neg_log_ic50 244 245 if study is None: 246 self.study = config["init_study"] 247 else: 248 if study == "any" or study in studies: 249 self.study = study 250 elif study in studies_rev: 251 self.study = studies_rev[study] 252 else: 253 raise ValueError(f"invalid {study=}") 254 255 if virus is None: 256 self.virus = config["init_virus"] 257 else: 258 self.virus = virus 259 260 if sources is None: 261 sources = config["init_sources"] 262 include_exclude = sources["include_exclude"] 263 sources_list = sources["sources"] 264 self.sources = set(self.data["source"]) 265 assert self.sources.issuperset(sources_list), f"{self.sources=}\n{sources_list=}" 266 if include_exclude == "exclude": 267 self.sources = self.sources - set(sources_list) 268 elif include_exclude == "include": 269 self.sources = set(sources_list) 270 else: 271 raise ValueError(f"invalid {include_exclude=} in {sources=}") 272 273 if reweight is None: 274 self.reweight = config["init_reweight"] 275 else: 276 self.reweight = reweight 277 assert isinstance(self.reweight, bool), self.reweight 278 279 # filter data 280 if self.study != "any": 281 assert self.study in set(self.data["study"]) 282 self.data = self.data.query("study == @self.study").drop(columns="study") 283 else: 284 self.data = self.data.drop(columns="study") 285 286 assert self.virus in set(self.data["virus"]) 287 self.data = self.data.query("virus == @self.virus").drop(columns="virus") 288 289 self.data = self.data.query("source in @self.sources").drop(columns="source") 290 291 assert set(self.data.columns) == {"antibody", "site", "escape", "IC50", "reweight"} 292 assert len(self.data) == len(self.data.drop_duplicates()) 293 self.data = ( 294 self.data 295 .assign(neg_log_ic50=lambda x: -numpy.log(x["IC50"] / 10)) 296 .drop(columns="IC50") 297 ) 298 299 max_escape_per_antibody = ( 300 self.data 301 .groupby("antibody") 302 .aggregate(max_escape=pd.NamedAgg("escape", "max")) 303 ) 304 assert (max_escape_per_antibody["max_escape"] == 1).all()
See main class docstring.
306 def escape_per_site(self, mutated_sites): 307 """Escape at each site after mutating indicated sites. 308 309 Parameters 310 ---------- 311 mutated_sites : array-like of integers 312 List of mutated sites. 313 314 Returns 315 ------- 316 pandas.DataFrame 317 For each site, gives the original escape and the escape 318 retained after mutations. 319 320 """ 321 mutated_sites = set(mutated_sites) 322 if not mutated_sites.issubset(self.sites): 323 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 324 df = ( 325 self.data 326 .assign( 327 mutated=lambda x: x['site'].isin(mutated_sites).astype(int), 328 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 329 ) 330 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 331 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 332 .assign( 333 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 334 self.mut_escape_strength 335 ), 336 weight=lambda x: ( 337 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 338 * (x["reweight"] if self.reweight else 1) 339 ), 340 ) 341 [["antibody", "antibody_bind_retain", "weight"]] 342 .merge(self.data[["antibody", "site", "escape"]]) 343 .assign( 344 escape=lambda x: x["escape"] * x["weight"], 345 retained_escape=lambda x: x["antibody_bind_retain"] * x["escape"], 346 ) 347 .groupby("site") 348 .aggregate( 349 original_escape=pd.NamedAgg("escape", "sum"), 350 retained_escape=pd.NamedAgg("retained_escape", "sum"), 351 ) 352 ) / self.data["antibody"].nunique() 353 return df.reset_index()
Escape at each site after mutating indicated sites.
Parameters
- mutated_sites (array-like of integers): List of mutated sites.
Returns
- pandas.DataFrame: For each site, gives the original escape and the escape retained after mutations.
355 def binding_retained(self, mutated_sites): 356 """Fraction binding or neutralization retained after mutating indicated sites. 357 358 Parameters 359 ---------- 360 mutated_sites : array-like of integers 361 List of mutated sites, must all be in :attr:`BindingCalculator.sites`. 362 363 Returns 364 ------- 365 float 366 The fraction binding retained after these mutations. 367 368 """ 369 mutated_sites = set(mutated_sites) 370 if not mutated_sites.issubset(self.sites): 371 raise ValueError(f"sites {mutated_sites - self.sites} not in {self.sites}") 372 retained = ( 373 self.data 374 .assign( 375 mutated=lambda x: x["site"].isin(mutated_sites).astype(int), 376 site_bind_retain=lambda x: 1 - x["escape"] * x["mutated"], 377 ) 378 .groupby(["antibody", "neg_log_ic50", "reweight"], as_index=False) 379 .aggregate(antibody_bind_retain=pd.NamedAgg("site_bind_retain", "prod")) 380 .assign( 381 antibody_bind_retain=lambda x: x["antibody_bind_retain"].pow( 382 self.mut_escape_strength 383 ), 384 weight=lambda x: ( 385 (x["neg_log_ic50"] if self.weight_by_neg_log_ic50 else 1) 386 * (x["reweight"] if self.reweight else 1) 387 ), 388 weighted_antibody_bind_retain=lambda x: ( 389 x["antibody_bind_retain"] * x["weight"] 390 ), 391 ) 392 [["weight", "weighted_antibody_bind_retain"]] 393 .sum(axis=0) 394 ) 395 return retained["weighted_antibody_bind_retain"] / retained["weight"]
Fraction binding or neutralization retained after mutating indicated sites.
Parameters
- mutated_sites (array-like of integers):
List of mutated sites, must all be in
BindingCalculator.sites
.
Returns
- float: The fraction binding retained after these mutations.