Details on fitting

Here we describe how the polyclonal package actually fits the models.

The basic idea is to minimize the difference between the predicted and measured variant-level escape probabilities, \(p_v\left(c\right)\), using a loss function that is not overly sensitive to outliers. In addition, there is regularization to encourage parameters to behave under biologically motivated constraints.

Implementation

The fitting is implemented in the Polyclonal.fit function, which allows adjustment of the weights for the regularization. By default the optimization uses the gradient-based L-BFGS-B method implemented in scipy.optimize.minimize and simply continues optimization until the minimization converges.

Some key details about the fitting are as follows:

  1. By default, the fitting first fits a “site-level” model in which all mutations are lumped together so there are just two characters (wildtype and mutant). The escape values from this initial site-level fitting are then used to initialize the full mutation-level escape values which are then further optimized. The idea is that first fitting a simpler model with less parameters helps get the parameters into a “reasonable” space before full model optimization. This option is implemented via the fit_site_level_first parameter to Polyclonal.fit, and it is recommended to use this approach as testing indicates it helps.

  2. By default, if you are using free-parameter Hill coefficients or non-neutralized fractions, a model with those fixed is fit first via fit_fixed_first. It is fit with stronger regulation on the activities (via fit_fixed_first_reg_activity_weight) to keep epitopes from dropping too low in activity to be picked up in subsequent all-parameter optimization. When this model is being used, the site model is fit with this fixed model, not the later full model.

  3. By default, if there are multiple variants with the same mutations, they are by default treated as independent measurements that are fit. This can be changed to “collapse” them to a single variant that is then given a weight proportional to the number of constituent variants that is used when calculating the loss. This option is implemented by default via collapse_identical_variants during the initialization of a Polyclonal object. It speeds up fitting without (usually) substantially changing the fitting results. However, do not collaps if you are using bootstrapping.

The Polyclonal.fit also allows you to adjust the weights on the regularizations. The default should be sensible, but you may want to try adjusting them about. You can also adjust the \(\delta\) values for the Pseudo-Huber loss / regularization terms (see below), although the defaults are probably pretty good for these as they are chosen to be L1-like on most the range that values are expected to span.

Loss function

We use a scaled Pseudo-Huber loss function on the difference between the predicted and measure escape probabilities. Note that the Pseudo-Huber function is defined as \(\hat{h}_{\delta}\left(x\right) = \delta^2 \left(\sqrt{1 + \left(x/\delta\right)^2} - 1\right)\) where \(\delta\) is a parameter that indicates when the loss transitions from being quadratic (L2-like) to linear (L1-like) in \(a\). Note that we will actually use a scaled Pseudo-Huber function of \(h_{\delta}\left(x\right) = \hat{h}_{\delta}\left(x\right)/\delta\) so the slope of the loss is one in the linear range. The rationale for a Pseudo-Huber loss is to be robust to outliers (L1-like for large residuals).

Specifically, let \(r_v\left(c\right) = p_v\left(c\right) - y_{v,c}\) be the residual for the predicted of the escape probability of variant \(v\) at concentration \(c\), where we are using \(y_{v,c}\) to denote the measured value. Then the loss for variant \(v\) at concentration \(c\) is \(L_{\delta_{\rm{loss}}}\left(r_v\left(c\right)\right) = h_{\delta_{\rm{loss}}}\left(r_v\left(c\right)\right)\), and the overall loss is:

\[L = \sum_{v,c} h_{\delta_{\rm{loss}}}\left(r_v\left(c\right)\right).\]

Regularization

We also regularize the mutation escape values (\(\beta_{m,e}\)) and the epitope activities (\(a_{\rm{wt}, e}\)) based on the notions:

  1. Most mutations should not mediate escape,

  2. When a site is involved in escape for a given epitope, most mutations at a site will have similar-ish effects.

  3. Epitopes should be mostly unique: a site involved in escape should usually only mediate escape from a single epitope.

  4. Epitopes should be relatively spatially compact (requires structural information).

  5. Epitope activities should be small (or negative) except when clear evidence to the contrary.

Regularization of escape values

We regularize the escape values \(\beta_{m,e}\) using a simple Pseudo-Huber function, so that

\[R_{\rm{escape}} = \lambda_{\rm{escape}} \sum_{m,e} h_{\delta_{\rm{escape}}}\left(\beta_{m,e}\right)\]

where \(\lambda_{\rm{escape}}\) is the strength of the regularization and \(\delta_{\rm{escape}}\) is the Psdeuo-Huber delta parameter.

Regularization of spread of escape values at each site and epitope

We regularize the variance of the escape values at each site, so that

\[R_{\rm{spread}} = \lambda_{\rm{spread}} \sum_{e,i} \frac{1}{M_i}\sum_{m \in i}\left(\beta_{m,e} - \frac{1}{M_i} \sum_{m' \in i} \beta_{m',e}\right)^2\]

where \(i\) ranges over all sites, \(M_i\) is the number of mutations at site \(i\), and \(m \in i\) indicates all mutations at site \(i\).

Regularization of spatial spread of epitopes

To regularize the spatial spread of epitopes, we first define a differentiable measure of the average absolute value of escape at a site for an epitope \(e\) as

\[s_{r,e} = \sqrt{\frac{1}{M_r} \sum_{m \in r} \beta_{m,e}^2 + \epsilon} - \sqrt{\epsilon}\]

where \(\epsilon\) is a small number and \(m\) ranges over all mutations at site \(r\).

We then further assume that we have an experimental measure of \(d_{r,r'}\) of the distance between each pair of residues \(r\) and \(r'\).

The regularization term is then:

\[R_{\rm{spatial}} = \frac{1}{2}\sum_e \sum_r \sum_{r'} \left(\lambda_{\rm{spatial},1} d_{r,r'} + \lambda_{\rm{spatial},2} d_{r,r'}^2\right)s_{r,e} s_{r',e}\]

Note how this term has weights enabling regularization on both the distances and squared distances. The factor of \(\frac{1}{2}\) is to avoid double counting pairs, noting that the diagonal elements are always zero since the self distances are zero.

Regularization of epitope uniqueness

To regularize to ensure epitopes contain largely unique sites, we define the following term which uses the differentiable average absolute value of escape at a site for an epitope \(s_{r,e}\) defined above:

\[R_{\rm{uniqueness}} = \frac{1}{2}\lambda_{\rm{uniqueness}} \sum_r \sum_e \sum_{e' \ne e} s_{r,e} s_{r,e'}\]

where \(e\) and \(e'\) range over all pairs of epitopes, and \(r\) ranges over all sites.

Regularization of epitope uniqueness\(^2\)

This is a second regularization to ensure epitopes contain largely unique sites, but this one operates on the squared product of escape at a site, and so will more strongly penalize very large escape at same site, but less penalize weak shared constraint:

\[R_{\rm{uniqueness}^2} = \lambda_{\rm{uniqueness}^2} \sum_{i} \sum_{j > i} \sum_{k} \left(\sum_{m' \in S_{k}} \beta_{m',i}^2\right)\left(\sum_{m' \in S_{k}} \beta_{m',j}^2\right)\]

where \(i\), \(j\) range over all epitopes \(E\), \(k\) is the site index, and \(m' \in S_{k}\) represents all mutations at site \(k\).

Regularization of epitope activities

We regularize the epitope activities \(a_{\rm{wt}, e}\) to be close to zero using Pseudo-Huber function. Note that the activities are confounded with the concentration scale, so for this regularization we first compute the geometric mean concentration \(c_{GM}\) and incorporate that so the strength of the regularization is about the same regardless of the concentrations. Specifically, the regularization is:

\[R_{\rm{activity}} = \lambda_{\rm{activity}} h_{\delta_{\rm{activity}}}\left(a_{\rm{wt}, e} + \log c_{GM} \right)\]

where \(\lambda_{\rm{activity}}\) is the strength of the regularization and \(\delta_{\rm{activity}}\) is the Psdeuo-Huber delta parameter.

Regularization of Hill coefficients

We regularize the Hill coefficients \(n_e\) to be close to one using the regularization, using a penalty that essentially operates on the log of \(n_e\) so the values are bounded by 0 and infinity:

\[\begin{split}R_{\rm{Hill coefficient}} = \begin{cases} \lambda_{\rm{Hill coefficient}} n_e^2 & \rm{if\;} n_e \ge 1 \\ \lambda_{\rm{Hill coefficient}} \left(1 - 1/n_e\right)^2 & \rm{if\;} n_e < 1 \\ \end{cases}\end{split}\]

where \(\lambda_{\rm{Hill coefficient}}\) is the strength of the regularization. Note that it is important that the Hill coefficients be fairly strongly normalized. For instance, a value of \(\lambda_{\rm{Hill coefficient}} = 50\) creates a penalty of 200 when \(n_e = 1/3\) or \(n_e = 3\), a penalty of 50 when \(n_e = 1/2\) or \(n_e = 2\), and a penalty of 12.5 when \(n_e = 1.5\) or \(n_e = 1 / 1.5\). Note also that the minimization bounds \(0 < n_e\).

Regularization of non-neutralizable fraction

We regularize the non-neutralizable fraction \(t_e\) to be close to zero, and also during the minimization set a bound so it cannot go below zero or above one. The regularization is:

\[R_{\rm{non-neutralized-frac}} = \lambda_{\rm{non-neutralized-frac}} t_e^2\]

where \(\lambda_{\rm{non-neutralized-frac}}\) is the strength of the regularization. Note that the regularization should be fairly strong. For instance, value of \(\lambda_{\rm{non-neutralized-frac}} = 1000\) creates a penalty of 10 when \(t_e = 0.1\) and a penalty of 62.5 when \(t_e = 0.25\). Note also that the minimization bounds \(0.5 \ge t_e \ge 0\).

Gradients used for optimization

Here are the formulas used to calculate the gradients in the optimization.

Gradient of loss function

For the loss function, the gradients are as follows:

\[\frac{\partial L}{\partial \beta_{m,e}} = \sum_{v,c} \frac{r_v\left(c\right)}{h_{\delta}\left(r_v\left(c\right)\right) + \delta} p_v\left(c\right) \frac{n_e}{1 - t_e} \frac{\left[U_e\left(v, c\right) - t_e\right]\left[1 - U_e\left(v, c\right)\right]}{U_e\left(v, c\right)} b\left(v\right)_m\]
\[\frac{\partial L}{\partial a_{\rm{wt},e}} = -\sum_{v,c} \frac{r_v\left(c\right)}{h_{\delta}\left(r_v\left(c\right)\right) + \delta} p_v\left(c\right) \frac{n_e}{1 - t_e} \frac{\left[U_e\left(v, c\right) - t_e\right]\left[1 - U_e\left(v, c\right)\right]}{U_e\left(v, c\right)}\]
\[\frac{\partial L}{\partial n_e} = \sum_{v,c} \frac{r_v\left(c\right)}{h_{\delta}\left(r_v\left(c\right)\right) + \delta} p_v\left(c\right) \frac{\left(\phi_e\left(v\right) - \ln c\right)}{\left(1 - t_e\right)} \frac{\left[1 - U_e\left(v, c\right)\right]\left[U_e\left(v, c\right) - t_e\right]}{U_e\left(v, c\right)}.\]
\[\frac{\partial L}{\partial t_e} = \frac{r_v\left(c\right)}{h_{\delta}\left(r_v\left(c\right)\right) + \delta} \left(\frac{p_v\left(c\right)}{U_e\left(v, c\right)}\right) \frac{1 - U_e\left(v, c\right)}{1 - t_e}\]

See below for how the sub-components that lead to these were calculated.

Calculating \(\frac{\partial \left[h_{\delta}\left(r\right)\right]}{\partial r}\)

We have

\[ \frac{\partial \left[h_{\delta}\left(r\right)\right]}{\partial r} = \delta \frac{\partial \left(\sqrt{1 + \left(r/\delta\right)^2} - 1\right)}{\partial r} = \frac{\delta}{2 \sqrt{1 + \left(r/\delta\right)^2}} \frac{2r}{\delta^2} = \frac{r}{h_{\delta}\left(r\right) + \delta}\]

Calculating \(\frac{\partial p_v\left(c\right)}{\partial \beta_{m,e}}\)

First, note

\[\frac{\partial p_v\left(c\right)}{\partial \beta_{m,e}} = \frac{\partial U_e\left(v, c\right)}{\partial \beta_{m,e}} \frac{p_v\left(c\right)}{U_e\left(v, c\right)}.\]

Next, note

\[\frac{\partial U_e\left(v, c\right)}{\partial \beta_{m,e}} = \frac{\partial \phi_e\left(v\right)}{\partial \beta_{m,e}} \frac{n \left(1 - t_e\right) \left[c \exp\left(-\phi_e\left(v\right)\right)\right]^{n_e}}{\left[1 + \left[c \exp\left(-\phi_e\left(v\right)\right)\right]^{n_e}\right]^2} = \frac{\partial \phi_e\left(v\right)}{\partial \beta_{m,e}} \frac{n_e}{1 - t_e}\left[U_e\left(v, c\right) - t_e\right] \left[1 - U_e\left(v, c\right)\right]\]

where the last step uses the simplification here.

Finally, note

\[\frac{\partial \phi_e\left(v\right)}{\partial \beta_{m,e}} = b\left(v\right)_m.\]

Putting it all together, we have:

\[\frac{\partial p_v\left(c\right)}{\partial \beta_{m,e}} = p_v\left(c\right) \frac{n_e}{1 - t_e} \frac{\left[U_e\left(v, c\right) - t_e\right]\left[1 - U_e\left(v, c\right)\right]}{U_e\left(v, c\right)} b\left(v\right)_m.\]

Calculating \(\frac{\partial p_v\left(c\right)}{\partial a_{\rm{wt},e}}\)

The only difference from above is the sign, so:

\[\frac{\partial p_v\left(c\right)}{\partial a_{\rm{wt},e}} = -p_v\left(c\right) \frac{n_e}{1 - t_e} \frac{\left[U_e\left(v, c\right) - t_e\right]\left[1 - U_e\left(v, c\right)\right]}{U_e\left(v, c\right)}.\]

Calculating \(\frac{\partial U_e\left(v, c\right)}{\partial n_e}\)

\[\frac{\partial U_e\left(v, c\right)}{\partial n_e} = \frac{\left(\phi_e\left(v\right) - \ln c\right)}{\left(1 - t_e\right)} \left[1 - U_e\left(v, c\right)\right]\left[U_e\left(v, c\right) - t_e\right].\]

Calculating \(\frac{\partial U_e\left(v, c\right)}{\partial t_e}\)

\[\frac{\partial U_e\left(v, c\right)}{\partial t_e} = \frac{1 - U_e\left(v, c\right)}{1 - t_e}.\]

Gradients of regularizations

Calculating \(\frac{\partial R_{\rm{escape}}}{\partial \beta_{m,e}}\)

\[\frac{\partial R_{\rm{escape}}}{\partial \beta_{m,e}} = \frac{\lambda_{\rm{escape}}\beta_{m,e}}{h_{\delta_{\rm{escape}}}\left(\beta_{m,e}\right) + \delta_{\rm{escape}}}\]

Calculating \(\frac{\partial R_{\rm{spread}}}{\partial \beta_{m,e}}\)

\[\frac{\partial R_{\rm{spread}}}{\partial \beta_{m,e}} = \frac{2\lambda_{\rm{spread}}}{M_i} \left(\beta_{m,e} - \frac{1}{M_i} \sum_{m' \in i} \beta_{m',e}\right)\]

Calculating \(\frac{\partial s_{r,e}}{\partial \beta_{m,e}}\)

\[\begin{split}\frac{\partial s_{r,e}}{\partial \beta_{m,e'}} = \begin{cases} \frac{\beta_{m,e'}}{\sqrt{M_r \sum_{m' \in r} \beta_{m',e'}^2 + \epsilon}} & \rm{if\,} m \in r \rm{\, and \,} e = e'\\ 0 & \rm{otherwise} \\ \end{cases}\end{split}\]

Calculating \(\frac{\partial R_{\rm{spatial}}}{\partial \beta_{m,e}}\)

\[\frac{R_{\rm{spatial}}}{\partial \beta_{m,e}} = \frac{1}{2}\sum_r \sum_{r'} \left(\lambda_{\rm{spatial},1} d_{r,r'} + \lambda_{\rm{spatial},2} d_{r,r'}^2\right) \left( \frac{s_{r,e}}{\partial \beta_{m,e}} s_{r',e} + s_{r,e} \frac{\partial s_{r',e}}{\partial \beta_{m,e}} \right) = \sum_r \left(\lambda_{\rm{spatial},1} d_{r,r_m} + \lambda_{\rm{spatial},2} d_{r,r_m}^2\right) s_{r,e} \frac{\partial s_{r_m,e}}{\partial \beta_{m,e}}\]

where \(r_m\) is the site of mutation \(m\).

Calculating \(\frac{\partial R_{\rm{uniqueness}}}{\partial \beta_{m,e}}\)

\[\frac{\partial R_{\rm{uniqueness}}}{\partial \beta_{m,e}} = \frac{1}{2}\lambda_{\rm{uniqueness}} \sum_r \sum_{e'} \sum_{e'' \ne e'} \left( \frac{\partial s_{r,e'}}{\partial \beta_{m,e}} s_{r,e''} + s_{r,e'} \frac{\partial s_{r,e''}}{\partial \beta_{m,e}} \right) = \lambda_{\rm{uniqueness}} \sum_{e' \ne e} \frac{\partial s_{r_m,e}}{\partial \beta_{m,e}} s_{r_m,e'}\]

Calculating \(\frac{\partial R_{\rm{uniqueness}^2}}{\partial \beta_{m,e}}\)

\[\frac{\partial{R_{\rm{uniqueness}^2}}}{\partial{\beta_{m,e}}} = 2\lambda_{\rm{uniqueness}^2} \beta_{m,e} \sum_{j \neq e} \sum_{m' \in S_k}\left(\beta_{m',j}^2\right)\]

Calculating \(\frac{\partial R_{\rm{activity}}}{\partial a_{\rm{wt}, e}}\)

\[\frac{\partial R_{\rm{activity}}}{\partial a_{\rm{wt}, e}} = \frac{\lambda_{\rm{escape}}\left(a_{\rm{wt}, e} + \log c_{GM}\right)}{h_{\delta_{\rm{activity}}}\left(a_{\rm{wt}, e} + \log c_{GM}\right) + \delta_{\rm{activity}}}\]

Calculating \(\frac{\partial R_{\rm{Hill coefficient}}}{\partial n_e}\)

\[\begin{split}\frac{\partial R_{\rm{Hill coefficient}}}{\partial n_e} = \begin{cases} 2\left(x - 1\right) & \rm{if\;} x \ge 1 \\ 2 / x^2 - 2 / x^3 & \rm{if\;} x < 1 \\ \end{cases}\end{split}\]

Calculating \(\frac{\partial R_{\rm{non-neutralized-frac}}}{\partial t_e}\)

\[\frac{\partial R_{\rm{non-neutralized-frac}}}{\partial t_e} = 2 \lambda_{\rm{non-neutralized-frac}} t_e\]

Bootstrapping

For the bootstrapping implemented by PolyclonalBootstrap, we start with a single pre-fit model to all the data. We then draw bootstrap replicates of the data used to fit that model, by default sampling the same variants at each concentration (see sample_by option of PolyclonalBootstrap). We then fit each of these bootstrapped models starting from the initial values from the pre-fit model on all of the data. Finally, the fit parameters or predictions from the models are summarized. Note that mutations may not be present in some bootstrap replicates if they are only in a few variants, and this can be assessed by looking at the frac_boostrap_replicates column in the output from PolyclonalBootstrap.

[ ]: