Test curve fitting on some real dataΒΆ

Test curve fitting on some noisy real data:

[1]:
import neutcurve

import pandas as pd

Read in the data:

[2]:
data = pd.read_csv("test_curves_data.csv")

Fit the curves and display fit parameters:

[3]:
fits = neutcurve.CurveFits(
    data,
    conc_col="concentration",
    fracinf_col="frac_infectivity",
    serum_col="serum",
    virus_col="strain",
    replicate_col="barcode",
)

fit_params = fits.fitParams(average_only=False, no_average=True).drop(
    columns=["nreplicates", "ic50_str"]
)
with pd.option_context("display.float_format", "{:.2g}".format):
    display(fit_params)
serum virus replicate ic50 ic50_bound midpoint midpoint_bound midpoint_bound_type slope top bottom r2 rmsd
0 SerumAd0 A/India-PUN-NIV328484/2021 TTGTCCCGAGACAACA_rep2 0.0026 interpolated 0.0026 0.0026 interpolated 3.8 1 0 0.86 0.15
1 SerumAd0 A/India-PUN-NIV328484/2021 TCTGTTCCGGCCCGAA_rep2 0.0023 interpolated 0.0023 0.0023 interpolated 1.5 1 0 0.91 0.12
2 SerumAd0 A/India-PUN-NIV328484/2021 GATCTAATAATACGGC_rep2 0.0025 interpolated 0.0025 0.0025 interpolated 2.5 1 0 0.78 0.19
3 SerumAd0 A/Togo/0274/2021 TAGCAGATGTATCAAT_rep2 0.0015 interpolated 0.0015 0.0015 interpolated 1.3 1 0 0.86 0.15
4 SerumAd0 A/Togo/0274/2021 GTAACATTATACGATT_rep2 0.0013 interpolated 0.0013 0.0013 interpolated 1.2 1 0 0.78 0.18
5 SerumAd0 A/Togo/0274/2021 AACGAATGAATTTCTT_rep2 0.0021 interpolated 0.0021 0.0021 interpolated 1.9 1 0 0.99 0.037
6 SerumBd0 A/Bangladesh/3210810034/2021 AACTATAGATCTAGAA 0.0018 interpolated 0.0018 0.0018 interpolated 4.2 1 0 1 0.0073
7 SerumBd0 A/Bangladesh/3210810034/2021 ACAAAAGTACCTCTAC 0.0021 interpolated 0.0021 0.0021 interpolated 8 1 0 0.95 0.099
8 SerumBd0 A/SouthAfrica/R14850/2021 AACTCCGCAGACACTG 0.00089 interpolated 0.00089 0.00089 interpolated 2 1 0 0.98 0.064
9 SerumBd0 A/SouthAfrica/R14850/2021 TACTCAACAAGATAAA 0.0011 interpolated 0.0011 0.0011 interpolated 3 1 0 0.74 0.2
10 SerumBd0 A/Bangladesh/8002/2021 AGTGTCCCTAAGAGGC 0.00097 interpolated 0.00097 0.00097 interpolated 2.8 1 0 0.86 0.15
11 SerumBd0 A/Bangladesh/8002/2021 GCAACGCCAAATAATT 0.0022 interpolated 0.0022 0.0022 interpolated 4.6 1 0 0.91 0.12

Plot the curves:

[4]:
fig, _ = fits.plotReplicates(
    ncol=3,
    attempt_shared_legend=False,
    widthscale=1.2,
    legendfontsize=8,
    titlesize=12,
)
_images/test_curves_7_0.png

Note that the fits are not as good if we do not do fix_slope_first=True (the default, which first fits with a fixed slope and then re-fits all parameters including the slope):

[5]:
fits_nofix = neutcurve.CurveFits(
    data,
    conc_col="concentration",
    fracinf_col="frac_infectivity",
    serum_col="serum",
    virus_col="strain",
    replicate_col="barcode",
    fix_slope_first=False,
)

fit_params_nofix = fits_nofix.fitParams(average_only=False, no_average=True).drop(
    columns=["nreplicates", "ic50_str"]
)
with pd.option_context("display.float_format", "{:.2g}".format):
    display(fit_params_nofix)
serum virus replicate ic50 ic50_bound midpoint midpoint_bound midpoint_bound_type slope top bottom r2 rmsd
0 SerumAd0 A/India-PUN-NIV328484/2021 TTGTCCCGAGACAACA_rep2 0.0026 interpolated 0.0026 0.0026 interpolated 3.8 1 0 0.86 0.15
1 SerumAd0 A/India-PUN-NIV328484/2021 TCTGTTCCGGCCCGAA_rep2 0.0023 interpolated 0.0023 0.0023 interpolated 1.5 1 0 0.91 0.12
2 SerumAd0 A/India-PUN-NIV328484/2021 GATCTAATAATACGGC_rep2 3e-06 interpolated 3e-06 3e-06 interpolated -2.9 1 0 -0.7 0.54
3 SerumAd0 A/Togo/0274/2021 TAGCAGATGTATCAAT_rep2 0.0015 interpolated 0.0015 0.0015 interpolated 1.3 1 0 0.86 0.15
4 SerumAd0 A/Togo/0274/2021 GTAACATTATACGATT_rep2 2.6e-06 interpolated 2.6e-06 2.6e-06 interpolated -1.6 1 0 -1.2 0.56
5 SerumAd0 A/Togo/0274/2021 AACGAATGAATTTCTT_rep2 0.0021 interpolated 0.0021 0.0021 interpolated 1.9 1 0 0.99 0.037
6 SerumBd0 A/Bangladesh/3210810034/2021 AACTATAGATCTAGAA 0.0018 interpolated 0.0018 0.0018 interpolated 4.2 1 0 1 0.0073
7 SerumBd0 A/Bangladesh/3210810034/2021 ACAAAAGTACCTCTAC 0.0021 interpolated 0.0021 0.0021 interpolated 8 1 0 0.95 0.099
8 SerumBd0 A/SouthAfrica/R14850/2021 AACTCCGCAGACACTG 0.00089 interpolated 0.00089 0.00089 interpolated 2 1 0 0.98 0.064
9 SerumBd0 A/SouthAfrica/R14850/2021 TACTCAACAAGATAAA 0.0011 interpolated 0.0011 0.0011 interpolated 3 1 0 0.74 0.2
10 SerumBd0 A/Bangladesh/8002/2021 AGTGTCCCTAAGAGGC 0.00097 interpolated 0.00097 0.00097 interpolated 2.8 1 0 0.86 0.15
11 SerumBd0 A/Bangladesh/8002/2021 GCAACGCCAAATAATT 0.0022 interpolated 0.0022 0.0022 interpolated 4.6 1 0 0.91 0.12
[6]:
fig, _ = fits_nofix.plotReplicates(
    ncol=3,
    attempt_shared_legend=False,
    widthscale=1.2,
    legendfontsize=8,
    titlesize=12,
)
_images/test_curves_10_0.png

Make sure the coefficient of determination is consistently improved (or as good) when fitting with slope fixed first:

[7]:
r2_compare = (
    fit_params[["serum", "virus", "replicate", "r2"]]
    .merge(
        fit_params_nofix[["serum", "virus", "replicate", "r2"]].rename(
            columns={"r2": "r2_nofix"}
        ),
        validate="one_to_one",
    )
    .assign(r2_improvement=lambda x: (x["r2"] - x["r2_nofix"]).round(5) + 0)
)

with pd.option_context("display.float_format", "{:.2g}".format):
    display(r2_compare)

assert (r2_compare["r2_improvement"] >= 0).all()
serum virus replicate r2 r2_nofix r2_improvement
0 SerumAd0 A/India-PUN-NIV328484/2021 TTGTCCCGAGACAACA_rep2 0.86 0.86 0
1 SerumAd0 A/India-PUN-NIV328484/2021 TCTGTTCCGGCCCGAA_rep2 0.91 0.91 0
2 SerumAd0 A/India-PUN-NIV328484/2021 GATCTAATAATACGGC_rep2 0.78 -0.7 1.5
3 SerumAd0 A/Togo/0274/2021 TAGCAGATGTATCAAT_rep2 0.86 0.86 0
4 SerumAd0 A/Togo/0274/2021 GTAACATTATACGATT_rep2 0.78 -1.2 2
5 SerumAd0 A/Togo/0274/2021 AACGAATGAATTTCTT_rep2 0.99 0.99 0
6 SerumBd0 A/Bangladesh/3210810034/2021 AACTATAGATCTAGAA 1 1 0
7 SerumBd0 A/Bangladesh/3210810034/2021 ACAAAAGTACCTCTAC 0.95 0.95 0
8 SerumBd0 A/SouthAfrica/R14850/2021 AACTCCGCAGACACTG 0.98 0.98 0
9 SerumBd0 A/SouthAfrica/R14850/2021 TACTCAACAAGATAAA 0.74 0.74 0
10 SerumBd0 A/Bangladesh/8002/2021 AGTGTCCCTAAGAGGC 0.86 0.86 0
11 SerumBd0 A/Bangladesh/8002/2021 GCAACGCCAAATAATT 0.91 0.91 0
[ ]: