phydmslib.utils module

Utilities for phydmslib.

phydmslib.utils.BenjaminiHochbergCorrection(pvals, fdr)

Benjamini-Hochberg procedure to control false discovery rate.

Calling arguments:

pvalsa list of tuples of (label, p) where label is some label

assignedto each data point, and p is the corresponding P-value.

fdr : the desired false discovery rate

The return value is the 2-tuple (pcutoff, significantlabels). After

applyingthe algorithm, all data points with p <= pcutoff are declared significant. The labels for these data points are in significantlabels. If there are no significant sites, pcutoff is returned as the maximum P-value that would have made a single point significant.

phydmslib.utils.modelComparisonDataFrame(modelcomparisonfile, splitparams)

Converts modelcomparison.md file to pandas DataFrame.

Running phydms_comprehensive creates a file with the suffix modelcomparison.md. This function converts that file into a DataFrame that is easy to handle for downstream analysis.

Args:
modelcomparisonfile (str)

The name of the modelcomparison.md file.

splitparams (bool)

If True, create a new column for each model param in the ParamValues column, with values of NaN if that model does not have such a parameter.

Returns:

A pandas DataFrame with the information in the model comparison file.

>>> with tempfile.NamedTemporaryFile(mode='w') as f:
...     _ = f.write('\n'.join([
...        '| Model | deltaAIC | LogLikelihood | nParams | ParamValues  |',
...        '|-------|----------|---------------|---------|--------------|',
...        '| ExpCM | 0.00     | -1000.00      | 7       | x=1.0, y=2.0 |',
...        '| YNGKP | 10.2     | -1005.10      | 7       | x=1.3, z=0.1 |',
...         ]))
...     f.flush()
...     df_split = modelComparisonDataFrame(f.name, splitparams=True)
...     df_nosplit = modelComparisonDataFrame(f.name, splitparams=False)
>>> df_nosplit.equals(pandas.DataFrame.from_records(
...         [['ExpCM', 0, -1000, 7, 'x=1.0, y=2.0'],
...          ['YNGKP', 10.2, -1005.1, 7, 'x=1.3, z=0.1']],
...         columns=['Model', 'deltaAIC', 'LogLikelihood',
...                  'nParams', 'ParamValues']))
True
>>> df_split.equals(pandas.DataFrame.from_records(
...         [['ExpCM', 0, -1000, 7, 1.0, 2.0, numpy.nan],
...          ['YNGKP', 10.2, -1005.1, 7, 1.3, numpy.nan, 0.1]],
...         columns=['Model', 'deltaAIC', 'LogLikelihood',
...                  'nParams', 'x', 'y', 'z']))
True