fitness effects versus number of descendant sequences

Interactive plot of the correlation between fitness estimates of mutations and number of descendants. This plot is designed to help assess whether mutations with higher fitness tend to be associated with more descendant sequences.

Two measures are provided for number of descendants. The first is the log ratio of branches with the mutation that are internal on the tree (non-terminal) versus to (to tip nodes). The second is the log number of tip-node descendants sharing all mutations on a branch. In both cases, larger values indicate a mutation tends to have more descendants.

Each point on the plot represents a fitness estimate for a different amino-acid mutation. The Pearson correlation coefficient and the number of mutations being correlated are shown in the upper left of the scatter plot. The mutations are stratified by whether they are nonsynonymous, synonymous, or introduce a stop codon.

You can mouse over points for details.

The minimum actual count slider below the plot indicates the total number of observed counts we require for a mutation before it is shown on the plot. The non-terminal to terminal ratio may be noisier for smaller values of this threshold, although more deleterious mutations are also expected to have lower actual counts.

The minimum expected count slider below the plot indicates how many expected counts of an an amino acid we require before making a fitness estimate. Larger values yield more accurate estimates but for fewer amino acids. So move the slider to the left to show estimates for more amino acids at lower confidence, and move it to the right to show estimates for fewer amino acids at higher confidence.

You can click/shift-click on specific genes in the legend below the plot to only show mutations for that gene.

The log ratio of the non-terminal to terminal counts is computed after adding a pseudocount of 0.5 to each count.

See Bloom and Neher (2023) for a paper describing the work.

See https://github.com/jbloomlab/SARS2-mut-fitness for full computer code and data.

See https://jbloomlab.github.io/SARS2-mut-fitness/ for links to all interactive plots.

This plot is for the public_2024-04-19 dataset. Here are all plots for that dataset.