correlation of fitness estimates for different viral clades

Interactive plot of the correlation between fitness estimates made using different subsets of the sequence data. Each point represents a fitness estimate for a different amino-acid mutation. The Pearson correlation coefficient and the number of mutations being correlated are shown in the upper left of the scatter plot.

You can mouse over points for details.

The minimum expected count slider below the plot indicates how many expected counts of an an amino acid we require before making a fitness estimate. Larger values yield more accurate estimates but for fewer amino acids. So move the slider to the left to show estimates for more amino acids at lower confidence, and move it to the right to show estimates for fewer amino acids at higher confidence.

This plot only shows the clades with the largest numbers of sequences.

You can click/shift-click on specific genes in the legend below the plot to only show mutations for that gene.

See Bloom and Neher (2023) for a paper describing the work.

See https://github.com/jbloomlab/SARS2-mut-fitness for full computer code and data.

See https://jbloomlab.github.io/SARS2-mut-fitness/ for links to all interactive plots.

This plot is for the public_2022-01-31 dataset. Here are all plots for that dataset.