Combining `CurveFits` objects¶

We can also combine together CurveFits objects for distinct serum/viruses/replicates using the CurveFits.combineCurveFits method. This is useful if you are fitting large datasets in chunks, and then want to combine all the results.

Here is an example/test:

[1]:

import pandas as pd

import neutcurve

Read in the data:

[2]:

fi6v3_datafile = "Doud_et_al_2018-neutdata.csv"
data = pd.read_csv(fi6v3_datafile)

Create a CurveFits object with all the data:

[3]:

fits = neutcurve.CurveFits(data)
_ = fits.fitParams(average_only=False)

Now split the data to make different CurveFits which we can combine:

[4]:

# split the data in half
data1 = data[(data["replicate"] == 1) | (data["serum"] == "H17-L19")]
data2 = data[(data["replicate"] != 1) & (data["serum"] != "H17-L19")]

# make a second split that is **not** disjoint from the first one
data2_invalid = data[data["replicate"] != 1]

# make fits to each of these
fit1 = neutcurve.CurveFits(data1)
_ = fit1.fitParams(average_only=False)
fit2 = neutcurve.CurveFits(data2)
_ = fit2.fitParams(average_only=False)
fit2_invalid = neutcurve.CurveFits(data2_invalid)
_ = fit2_invalid.fitParams(average_only=False)

Combine two fits using CurveFits.combineCurveFits that should yield an object the same as fits:

[5]:

combined_fits = neutcurve.CurveFits.combineCurveFits([fit1, fit2])

pd.testing.assert_frame_equal(
    fits.fitParams(average_only=False),
    combined_fits.fitParams(average_only=False),
)

Combine fits only for certain sera:

[6]:

(
    neutcurve.CurveFits.combineCurveFits(
        [fit1, fit2],
        sera=["FI6v3"],
        viruses=["WT", "P80D", "V135T"],
        serum_virus_replicates_to_drop=[
            ("FI6v3", "P80D", "2"),
            ("FI6v3", "V135T", "1"),
            ("FI6v3", "V135T", "2"),
            ("FI6v3", "V135T", "3"),
        ],
    )
    .fitParams(average_only=False)
    .round(3)
)

[6]:

	serum	virus	replicate	nreplicates	ic50	ic50_bound	ic50_str	midpoint	midpoint_bound	midpoint_bound_type	slope	top	r2	rmsd
0	FI6v3	WT	1	<NA>	0.017	interpolated	0.0167	0.017	0.017	interpolated	2.505	1.0	0.996	0.028
1	FI6v3	WT	2	<NA>	0.019	interpolated	0.019	0.019	0.019	interpolated	2.513	1.0	0.986	0.053
2	FI6v3	WT	3	<NA>	0.015	interpolated	0.0152	0.015	0.015	interpolated	1.878	1.0	0.982	0.060
3	FI6v3	WT	average	3	0.017	interpolated	0.017	0.017	0.017	interpolated	2.279	1.0	0.992	0.039
4	FI6v3	P80D	1	<NA>	0.012	interpolated	0.0121	0.012	0.012	interpolated	2.025	1.0	0.980	0.069
5	FI6v3	P80D	3	<NA>	0.013	interpolated	0.0128	0.013	0.013	interpolated	2.059	1.0	0.994	0.035
6	FI6v3	P80D	average	2	0.012	interpolated	0.0125	0.012	0.012	interpolated	2.035	1.0	0.990	0.047

Make sure we cannot combine two fits with overlapping entries, this should give an error due to shared serum/virus/replicates:

[7]:

# NBVAL_RAISES_EXCEPTION

neutcurve.CurveFits.combineCurveFits([fit1, fit2_invalid])

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 3
      1 # NBVAL_RAISES_EXCEPTION
----> 3 neutcurve.CurveFits.combineCurveFits([fit1, fit2_invalid])

File ~/neutcurve/neutcurve/curvefits.py:207, in CurveFits.combineCurveFits(curvefits_list, sera, viruses, serum_virus_replicates_to_drop)
    196 combined_fits.df = combined_fits._get_avg_and_stderr_df(combined_fits.df)
    197 if len(combined_fits.df) != len(
    198     combined_fits.df.groupby(
    199         [
   (...)
    205     )
    206 ):
--> 207     raise ValueError("duplicated sera/virus/replicate in `curvefits_list`")
    209 # combine sera
    210 combined_fits.sera = combined_fits.df[combined_fits.serum_col].unique().tolist()

ValueError: duplicated sera/virus/replicate in `curvefits_list`

Combining `CurveFits` objects¶

neutcurve

Navigation

Related Topics

Combining CurveFits objects¶

Combining `CurveFits` objects¶