phydmslib.utils module¶
Utilities for phydmslib
.
-
phydmslib.utils.
BenjaminiHochbergCorrection
(pvals, fdr)¶ Benjamini-Hochberg procedure to control false discovery rate.
Calling arguments:
- pvalsa list of tuples of (label, p) where label is some label
assignedto each data point, and p is the corresponding P-value.
fdr : the desired false discovery rate
- The return value is the 2-tuple (pcutoff, significantlabels). After
applyingthe algorithm, all data points with p <= pcutoff are declared significant. The labels for these data points are in significantlabels. If there are no significant sites, pcutoff is returned as the maximum P-value that would have made a single point significant.
-
phydmslib.utils.
modelComparisonDataFrame
(modelcomparisonfile, splitparams)¶ Converts
modelcomparison.md
file to pandas DataFrame.Running
phydms_comprehensive
creates a file with the suffixmodelcomparison.md
. This function converts that file into a DataFrame that is easy to handle for downstream analysis.- Args:
- modelcomparisonfile (str)
The name of the
modelcomparison.md
file.- splitparams (bool)
If True, create a new column for each model param in the ParamValues column, with values of NaN if that model does not have such a parameter.
- Returns:
A pandas DataFrame with the information in the model comparison file.
>>> with tempfile.NamedTemporaryFile(mode='w') as f: ... _ = f.write('\n'.join([ ... '| Model | deltaAIC | LogLikelihood | nParams | ParamValues |', ... '|-------|----------|---------------|---------|--------------|', ... '| ExpCM | 0.00 | -1000.00 | 7 | x=1.0, y=2.0 |', ... '| YNGKP | 10.2 | -1005.10 | 7 | x=1.3, z=0.1 |', ... ])) ... f.flush() ... df_split = modelComparisonDataFrame(f.name, splitparams=True) ... df_nosplit = modelComparisonDataFrame(f.name, splitparams=False) >>> df_nosplit.equals(pandas.DataFrame.from_records( ... [['ExpCM', 0, -1000, 7, 'x=1.0, y=2.0'], ... ['YNGKP', 10.2, -1005.1, 7, 'x=1.3, z=0.1']], ... columns=['Model', 'deltaAIC', 'LogLikelihood', ... 'nParams', 'ParamValues'])) True >>> df_split.equals(pandas.DataFrame.from_records( ... [['ExpCM', 0, -1000, 7, 1.0, 2.0, numpy.nan], ... ['YNGKP', 10.2, -1005.1, 7, 1.3, numpy.nan, 0.1]], ... columns=['Model', 'deltaAIC', 'LogLikelihood', ... 'nParams', 'x', 'y', 'z'])) True