Combining CurveFits objectsΒΆ

We can also combine together CurveFits objects for distinct serum/viruses/replicates using the CurveFits.combineCurveFits method. This is useful if you are fitting large datasets in chunks, and then want to combine all the results.

Here is an example/test:

[1]:
import pandas as pd

import neutcurve

Read in the data:

[2]:
fi6v3_datafile = "Doud_et_al_2018-neutdata.csv"
data = pd.read_csv(fi6v3_datafile)

Create a CurveFits object with all the data:

[3]:
fits = neutcurve.CurveFits(data)
_ = fits.fitParams(average_only=False)

Now split the data to make different CurveFits which we can combine:

[4]:
# split the data in half
data1 = data[(data["replicate"] == 1) | (data["serum"] == "H17-L19")]
data2 = data[(data["replicate"] != 1) & (data["serum"] != "H17-L19")]

# make a second split that is **not** disjoint from the first one
data2_invalid = data[data["replicate"] != 1]

# make fits to each of these
fit1 = neutcurve.CurveFits(data1)
_ = fit1.fitParams(average_only=False)
fit2 = neutcurve.CurveFits(data2)
_ = fit2.fitParams(average_only=False)
fit2_invalid = neutcurve.CurveFits(data2_invalid)
_ = fit2_invalid.fitParams(average_only=False)

Combine two fits using CurveFits.combineCurveFits that should yield an object the same as fits:

[5]:
combined_fits = neutcurve.CurveFits.combineCurveFits([fit1, fit2])

pd.testing.assert_frame_equal(
    fits.fitParams(average_only=False),
    combined_fits.fitParams(average_only=False),
)

Combine fits only for certain sera:

[6]:
(
    neutcurve.CurveFits.combineCurveFits(
        [fit1, fit2],
        sera=["FI6v3"],
        viruses=["WT", "P80D", "V135T"],
        serum_virus_replicates_to_drop=[
            ("FI6v3", "P80D", "2"),
            ("FI6v3", "V135T", "1"),
            ("FI6v3", "V135T", "2"),
            ("FI6v3", "V135T", "3"),
        ],
    )
    .fitParams(average_only=False)
    .round(3)
)
[6]:
serum virus replicate nreplicates ic50 ic50_bound ic50_str midpoint midpoint_bound midpoint_bound_type slope top bottom r2 rmsd
0 FI6v3 WT 1 <NA> 0.017 interpolated 0.0167 0.017 0.017 interpolated 2.505 1.0 0.0 0.996 0.028
1 FI6v3 WT 2 <NA> 0.019 interpolated 0.019 0.019 0.019 interpolated 2.513 1.0 0.0 0.986 0.053
2 FI6v3 WT 3 <NA> 0.015 interpolated 0.0152 0.015 0.015 interpolated 1.878 1.0 0.0 0.982 0.060
3 FI6v3 WT average 3 0.017 interpolated 0.017 0.017 0.017 interpolated 2.279 1.0 0.0 0.992 0.039
4 FI6v3 P80D 1 <NA> 0.012 interpolated 0.0121 0.012 0.012 interpolated 2.025 1.0 0.0 0.980 0.069
5 FI6v3 P80D 3 <NA> 0.013 interpolated 0.0128 0.013 0.013 interpolated 2.059 1.0 0.0 0.994 0.035
6 FI6v3 P80D average 2 0.012 interpolated 0.0125 0.012 0.012 interpolated 2.035 1.0 0.0 0.990 0.047

Make sure we cannot combine two fits with overlapping entries, this should give an error due to shared serum/virus/replicates:

[7]:
# NBVAL_RAISES_EXCEPTION

neutcurve.CurveFits.combineCurveFits([fit1, fit2_invalid])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 3
      1 # NBVAL_RAISES_EXCEPTION
----> 3 neutcurve.CurveFits.combineCurveFits([fit1, fit2_invalid])

File ~/neutcurve/neutcurve/curvefits.py:207, in CurveFits.combineCurveFits(curvefits_list, sera, viruses, serum_virus_replicates_to_drop)
    196 combined_fits.df = combined_fits._get_avg_and_stderr_df(combined_fits.df)
    197 if len(combined_fits.df) != len(
    198     combined_fits.df.groupby(
    199         [
   (...)
    205     )
    206 ):
--> 207     raise ValueError("duplicated sera/virus/replicate in `curvefits_list`")
    209 # combine sera
    210 combined_fits.sera = combined_fits.df[combined_fits.serum_col].unique().tolist()

ValueError: duplicated sera/virus/replicate in `curvefits_list`