Combining CurveFits
objectsΒΆ
We can also combine together CurveFits
objects for distinct serum/viruses/replicates using the CurveFits.combineCurveFits
method. This is useful if you are fitting large datasets in chunks, and then want to combine all the results.
Here is an example/test:
[1]:
import pandas as pd
import neutcurve
Read in the data:
[2]:
fi6v3_datafile = "Doud_et_al_2018-neutdata.csv"
data = pd.read_csv(fi6v3_datafile)
Create a CurveFits
object with all the data:
[3]:
fits = neutcurve.CurveFits(data)
_ = fits.fitParams(average_only=False)
Now split the data to make different CurveFits
which we can combine:
[4]:
# split the data in half
data1 = data[(data["replicate"] == 1) | (data["serum"] == "H17-L19")]
data2 = data[(data["replicate"] != 1) & (data["serum"] != "H17-L19")]
# make a second split that is **not** disjoint from the first one
data2_invalid = data[data["replicate"] != 1]
# make fits to each of these
fit1 = neutcurve.CurveFits(data1)
_ = fit1.fitParams(average_only=False)
fit2 = neutcurve.CurveFits(data2)
_ = fit2.fitParams(average_only=False)
fit2_invalid = neutcurve.CurveFits(data2_invalid)
_ = fit2_invalid.fitParams(average_only=False)
Combine two fits using CurveFits.combineCurveFits
that should yield an object the same as fits
:
[5]:
combined_fits = neutcurve.CurveFits.combineCurveFits([fit1, fit2])
pd.testing.assert_frame_equal(
fits.fitParams(average_only=False),
combined_fits.fitParams(average_only=False),
)
Combine fits only for certain sera:
[6]:
(
neutcurve.CurveFits.combineCurveFits(
[fit1, fit2],
sera=["FI6v3"],
viruses=["WT", "P80D", "V135T"],
serum_virus_replicates_to_drop=[
("FI6v3", "P80D", "2"),
("FI6v3", "V135T", "1"),
("FI6v3", "V135T", "2"),
("FI6v3", "V135T", "3"),
],
)
.fitParams(average_only=False)
.round(3)
)
[6]:
serum | virus | replicate | nreplicates | ic50 | ic50_bound | ic50_str | midpoint | midpoint_bound | midpoint_bound_type | slope | top | bottom | r2 | rmsd | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FI6v3 | WT | 1 | <NA> | 0.017 | interpolated | 0.0167 | 0.017 | 0.017 | interpolated | 2.505 | 1.0 | 0.0 | 0.996 | 0.028 |
1 | FI6v3 | WT | 2 | <NA> | 0.019 | interpolated | 0.019 | 0.019 | 0.019 | interpolated | 2.513 | 1.0 | 0.0 | 0.986 | 0.053 |
2 | FI6v3 | WT | 3 | <NA> | 0.015 | interpolated | 0.0152 | 0.015 | 0.015 | interpolated | 1.878 | 1.0 | 0.0 | 0.982 | 0.060 |
3 | FI6v3 | WT | average | 3 | 0.017 | interpolated | 0.017 | 0.017 | 0.017 | interpolated | 2.279 | 1.0 | 0.0 | 0.992 | 0.039 |
4 | FI6v3 | P80D | 1 | <NA> | 0.012 | interpolated | 0.0121 | 0.012 | 0.012 | interpolated | 2.025 | 1.0 | 0.0 | 0.980 | 0.069 |
5 | FI6v3 | P80D | 3 | <NA> | 0.013 | interpolated | 0.0128 | 0.013 | 0.013 | interpolated | 2.059 | 1.0 | 0.0 | 0.994 | 0.035 |
6 | FI6v3 | P80D | average | 2 | 0.012 | interpolated | 0.0125 | 0.012 | 0.012 | interpolated | 2.035 | 1.0 | 0.0 | 0.990 | 0.047 |
Make sure we cannot combine two fits with overlapping entries, this should give an error due to shared serum/virus/replicates:
[7]:
# NBVAL_RAISES_EXCEPTION
neutcurve.CurveFits.combineCurveFits([fit1, fit2_invalid])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[7], line 3
1 # NBVAL_RAISES_EXCEPTION
----> 3 neutcurve.CurveFits.combineCurveFits([fit1, fit2_invalid])
File ~/neutcurve/neutcurve/curvefits.py:207, in CurveFits.combineCurveFits(curvefits_list, sera, viruses, serum_virus_replicates_to_drop)
196 combined_fits.df = combined_fits._get_avg_and_stderr_df(combined_fits.df)
197 if len(combined_fits.df) != len(
198 combined_fits.df.groupby(
199 [
(...)
205 )
206 ):
--> 207 raise ValueError("duplicated sera/virus/replicate in `curvefits_list`")
209 # combine sera
210 combined_fits.sera = combined_fits.df[combined_fits.serum_col].unique().tolist()
ValueError: duplicated sera/virus/replicate in `curvefits_list`