Details on fitting

Here we describe how the polyclonal package actually fits the models.

The basic idea is to minimize the difference between the predicted and measured variant-level escape probabilities, pv(c), using a loss function that is not overly sensitive to outliers. In addition, there is regularization to encourage parameters to behave under biologically motivated constraints.

Implementation

The fitting is implemented in the Polyclonal.fit function, which allows adjustment of the weights for the regularization. By default the optimization uses the gradient-based L-BFGS-B method implemented in scipy.optimize.minimize and simply continues optimization until the minimization converges.

Some key details about the fitting are as follows:

  1. By default, the fitting first fits a “site-level” model in which all mutations are lumped together so there are just two characters (wildtype and mutant). The escape values from this initial site-level fitting are then used to initialize the full mutation-level escape values which are then further optimized. The idea is that first fitting a simpler model with less parameters helps get the parameters into a “reasonable” space before full model optimization. This option is implemented via the fit_site_level_first parameter to Polyclonal.fit, and it is recommended to use this approach as testing indicates it helps.

  2. By default, if you are using free-parameter Hill coefficients or non-neutralized fractions, a model with those fixed is fit first via fit_fixed_first. It is fit with stronger regulation on the activities (via fit_fixed_first_reg_activity_weight) to keep epitopes from dropping too low in activity to be picked up in subsequent all-parameter optimization. When this model is being used, the site model is fit with this fixed model, not the later full model.

  3. By default, if there are multiple variants with the same mutations, they are by default treated as independent measurements that are fit. This can be changed to “collapse” them to a single variant that is then given a weight proportional to the number of constituent variants that is used when calculating the loss. This option is implemented by default via collapse_identical_variants during the initialization of a Polyclonal object. It speeds up fitting without (usually) substantially changing the fitting results. However, do not collaps if you are using bootstrapping.

The Polyclonal.fit also allows you to adjust the weights on the regularizations. The default should be sensible, but you may want to try adjusting them about. You can also adjust the δ values for the Pseudo-Huber loss / regularization terms (see below), although the defaults are probably pretty good for these as they are chosen to be L1-like on most the range that values are expected to span.

Loss function

We use a scaled Pseudo-Huber loss function on the difference between the predicted and measure escape probabilities. Note that the Pseudo-Huber function is defined as h^δ(x)=δ2(1+(x/δ)21) where δ is a parameter that indicates when the loss transitions from being quadratic (L2-like) to linear (L1-like) in a. Note that we will actually use a scaled Pseudo-Huber function of hδ(x)=h^δ(x)/δ so the slope of the loss is one in the linear range. The rationale for a Pseudo-Huber loss is to be robust to outliers (L1-like for large residuals).

Specifically, let rv(c)=pv(c)yv,c be the residual for the predicted of the escape probability of variant v at concentration c, where we are using yv,c to denote the measured value. Then the loss for variant v at concentration c is Lδloss(rv(c))=hδloss(rv(c)), and the overall loss is:

L=v,chδloss(rv(c)).

Regularization

We also regularize the mutation escape values (βm,e) and the epitope activities (awt,e) based on the notions:

  1. Most mutations should not mediate escape,

  2. When a site is involved in escape for a given epitope, most mutations at a site will have similar-ish effects.

  3. Epitopes should be mostly unique: a site involved in escape should usually only mediate escape from a single epitope.

  4. Epitopes should be relatively spatially compact (requires structural information).

  5. Epitope activities should be small (or negative) except when clear evidence to the contrary.

Regularization of escape values

We regularize the escape values βm,e using a simple Pseudo-Huber function, so that

Rescape=λescapem,ehδescape(βm,e)

where λescape is the strength of the regularization and δescape is the Psdeuo-Huber delta parameter.

Regularization of spread of escape values at each site and epitope

We regularize the variance of the escape values at each site, so that

Rspread=λspreade,i1Mimi(βm,e1Mimiβm,e)2

where i ranges over all sites, Mi is the number of mutations at site i, and mi indicates all mutations at site i.

Regularization of spatial spread of epitopes

To regularize the spatial spread of epitopes, we first define a differentiable measure of the average absolute value of escape at a site for an epitope e as

sr,e=1Mrmrβm,e2+ϵϵ

where ϵ is a small number and m ranges over all mutations at site r.

We then further assume that we have an experimental measure of dr,r of the distance between each pair of residues r and r.

The regularization term is then:

Rspatial=12err(λspatial,1dr,r+λspatial,2dr,r2)sr,esr,e

Note how this term has weights enabling regularization on both the distances and squared distances. The factor of 12 is to avoid double counting pairs, noting that the diagonal elements are always zero since the self distances are zero.

Regularization of epitope uniqueness

To regularize to ensure epitopes contain largely unique sites, we define the following term which uses the differentiable average absolute value of escape at a site for an epitope sr,e defined above:

Runiqueness=12λuniquenessreeesr,esr,e

where e and e range over all pairs of epitopes, and r ranges over all sites.

Regularization of epitope uniqueness2

This is a second regularization to ensure epitopes contain largely unique sites, but this one operates on the squared product of escape at a site, and so will more strongly penalize very large escape at same site, but less penalize weak shared constraint:

Runiqueness2=λuniqueness2ij>ik(mSkβm,i2)(mSkβm,j2)

where i, j range over all epitopes E, k is the site index, and mSk represents all mutations at site k.

Regularization of epitope activities

We regularize the epitope activities awt,e to be close to zero using Pseudo-Huber function. Note that the activities are confounded with the concentration scale, so for this regularization we first compute the geometric mean concentration cGM and incorporate that so the strength of the regularization is about the same regardless of the concentrations. Specifically, the regularization is:

Ractivity=λactivityhδactivity(awt,e+logcGM)

where λactivity is the strength of the regularization and δactivity is the Psdeuo-Huber delta parameter.

Regularization of Hill coefficients

We regularize the Hill coefficients ne to be close to one using the regularization, using a penalty that essentially operates on the log of ne so the values are bounded by 0 and infinity:

RHillcoefficient={λHillcoefficientne2ifne1λHillcoefficient(11/ne)2ifne<1

where λHillcoefficient is the strength of the regularization. Note that it is important that the Hill coefficients be fairly strongly normalized. For instance, a value of λHillcoefficient=50 creates a penalty of 200 when ne=1/3 or ne=3, a penalty of 50 when ne=1/2 or ne=2, and a penalty of 12.5 when ne=1.5 or ne=1/1.5. Note also that the minimization bounds 0<ne.

Regularization of non-neutralizable fraction

We regularize the non-neutralizable fraction te to be close to zero, and also during the minimization set a bound so it cannot go below zero or above one. The regularization is:

Rnonneutralizedfrac=λnonneutralizedfracte2

where λnonneutralizedfrac is the strength of the regularization. Note that the regularization should be fairly strong. For instance, value of λnonneutralizedfrac=1000 creates a penalty of 10 when te=0.1 and a penalty of 62.5 when te=0.25. Note also that the minimization bounds 0.5te0.

Gradients used for optimization

Here are the formulas used to calculate the gradients in the optimization.

Gradient of loss function

For the loss function, the gradients are as follows:

Lβm,e=v,crv(c)hδ(rv(c))+δpv(c)ne1te[Ue(v,c)te][1Ue(v,c)]Ue(v,c)b(v)m
Lawt,e=v,crv(c)hδ(rv(c))+δpv(c)ne1te[Ue(v,c)te][1Ue(v,c)]Ue(v,c)
Lne=v,crv(c)hδ(rv(c))+δpv(c)(ϕe(v)lnc)(1te)[1Ue(v,c)][Ue(v,c)te]Ue(v,c).
Lte=rv(c)hδ(rv(c))+δ(pv(c)Ue(v,c))1Ue(v,c)1te

See below for how the sub-components that lead to these were calculated.

Calculating [hδ(r)]r

We have

[hδ(r)]r=δ(1+(r/δ)21)r=δ21+(r/δ)22rδ2=rhδ(r)+δ

Calculating pv(c)βm,e

First, note

pv(c)βm,e=Ue(v,c)βm,epv(c)Ue(v,c).

Next, note

Ue(v,c)βm,e=ϕe(v)βm,en(1te)[cexp(ϕe(v))]ne[1+[cexp(ϕe(v))]ne]2=ϕe(v)βm,ene1te[Ue(v,c)te][1Ue(v,c)]

where the last step uses the simplification here.

Finally, note

ϕe(v)βm,e=b(v)m.

Putting it all together, we have:

pv(c)βm,e=pv(c)ne1te[Ue(v,c)te][1Ue(v,c)]Ue(v,c)b(v)m.

Calculating pv(c)awt,e

The only difference from above is the sign, so:

pv(c)awt,e=pv(c)ne1te[Ue(v,c)te][1Ue(v,c)]Ue(v,c).

Calculating Ue(v,c)ne

Ue(v,c)ne=(ϕe(v)lnc)(1te)[1Ue(v,c)][Ue(v,c)te].

Calculating Ue(v,c)te

Ue(v,c)te=1Ue(v,c)1te.

Gradients of regularizations

Calculating Rescapeβm,e

Rescapeβm,e=λescapeβm,ehδescape(βm,e)+δescape

Calculating Rspreadβm,e

Rspreadβm,e=2λspreadMi(βm,e1Mimiβm,e)

Calculating sr,eβm,e

sr,eβm,e={βm,eMrmrβm,e2+ϵifmrande=e0otherwise

Calculating Rspatialβm,e

Rspatialβm,e=12rr(λspatial,1dr,r+λspatial,2dr,r2)(sr,eβm,esr,e+sr,esr,eβm,e)=r(λspatial,1dr,rm+λspatial,2dr,rm2)sr,esrm,eβm,e

where rm is the site of mutation m.

Calculating Runiquenessβm,e

Runiquenessβm,e=12λuniquenessreee(sr,eβm,esr,e+sr,esr,eβm,e)=λuniquenesseesrm,eβm,esrm,e

Calculating Runiqueness2βm,e

Runiqueness2βm,e=2λuniqueness2βm,ejemSk(βm,j2)

Calculating Ractivityawt,e

Ractivityawt,e=λescape(awt,e+logcGM)hδactivity(awt,e+logcGM)+δactivity

Calculating RHillcoefficientne

RHillcoefficientne={2(x1)ifx12/x22/x3ifx<1

Calculating Rnonneutralizedfracte

Rnonneutralizedfracte=2λnonneutralizedfracte

Bootstrapping

For the bootstrapping implemented by PolyclonalBootstrap, we start with a single pre-fit model to all the data. We then draw bootstrap replicates of the data used to fit that model, by default sampling the same variants at each concentration (see sample_by option of PolyclonalBootstrap). We then fit each of these bootstrapped models starting from the initial values from the pre-fit model on all of the data. Finally, the fit parameters or predictions from the models are summarized. Note that mutations may not be present in some bootstrap replicates if they are only in a few variants, and this can be assessed by looking at the frac_boostrap_replicates column in the output from PolyclonalBootstrap.

[ ]: