the phydms_comprehensive program

Overview

phydms_comprehensive is a program that simplifies usage of phydms for standard analyses. Essentially, phydms_comprehensive runs phydms for several different models to enable model comparisons and identify selection.

In its simplest usage, you simply provide phydms_comprehensive with an alignment and one or more files giving site-specific amino-acid preferences. The program then first uses RAxML to infer a tree under the GTRCAT model. Alternatively, you can specify the tree using --tree flag. For each set of site-specific amino-acid preferences, phydms_comprehensive optimizes the tree with the following models:

  • ExpCM

  • ExpCM with a gamma-distributed \(\omega\) (if using the --gammaomega flag).

  • ExpCM with a gamma-distributed \(\beta\) (if using the --gammabeta flag).

  • YNGKP_M0

  • YNGKP_M5

It also runs each of these ExpCM with averaged preferences as a control. Finally, it creates summaries that enable comparison among the models.

You could get all of this output by simply running phydms repeatedly, but using phydms_comprehensive automates this process for you.

See below for information on Command-line usage and Output files.

Command-line usage

Comprehensive phylogenetic model comparison and detection of selection informed by deep mutational scanning data. This program runs ‘phydms’ repeatedly to compare substitution models and detect selection. The ‘phydms’ package is written by the Bloom lab (see https://github.com/jbloomlab/phydms/contributors). Version 2.4.0. Full documentation at http://jbloomlab.github.io/phydms

usage: phydms_comprehensive [-h] (--raxml RAXML | --tree TREE) [--ncpus NCPUS]
                            [--brlen {scale,optimize}] [--omegabysite]
                            [--diffprefsbysite] [--gammaomega] [--gammabeta]
                            [--no-avgprefs] [--randprefs] [-v]
                            outprefix alignment prefsfiles [prefsfiles ...]

Positional Arguments

outprefix

Output file prefix.

This prefix can be a directory name (e.g. my_directory/) if you want to create a new directory. See Output files for a description of the created files.

alignment

Existing FASTA file with aligned codon sequences.

The alignment must meet the same specifications described in the phydms documentation for the argument of the same name (see the phydms program).

prefsfiles

Existing files with site-specific amino-acid preferences.

Provide the name of one or more files giving site-specific amino-acid preferences. These files should meet the same specifications described in the phydms documentation for the prefsfile that should accompany an ExpCM (see the phydms program).

Named Arguments

--raxml

Path to RAxML (e.g., ‘raxml’)

By default, phydms_comprehensive uses RAxML to infer a tree topology under the Jukes Cantor model and the command raxml. Use this argument to specify a path to RAxML other than raxml. The tree inferred by RAxML can be found in the same directory as the other output files.

--tree

Existing Newick file giving input tree.

If you want to instead fix the tree to some existing topology, use this argument and provide the name of a file giving a valid tree in Newick format. You cannot specify both --tree and --raxml.

--ncpus

Use this many CPUs; -1 means all available.

Default: -1

--brlen

Possible choices: scale, optimize

How to handle branch lengths: scale by single parameter or optimize each one

Default: “optimize”

--omegabysite

Fit omega (dN/dS) for each site.

Default: False

--diffprefsbysite

Fit differential preferences for each site.

Default: False

--gammaomega

Fit ExpCM with gamma distributed omega.

Default: False

--gammabeta

Fit ExpCM with gamma distributed beta.

Default: False

--no-avgprefs

No fitting of models with preferences averaged across sites for ExpCM.

Default: False

--randprefs

Include ExpCM models with randomized preferences.

Default: False

-v, --version

show program’s version number and exit

Output files

Running phydms_comprehensive will create the following output files, all with the prefix specified by outprefix.

Log file

A file with the suffix .log will be created that summarizes the overall progress of phydms_comprehensive. If outprefix is just a directory name, this file will be called log.log.

Model comparison files

A file with the suffix modelcomparison.md will be created that summarizes the model comparison. For each model, it reports the \(\Delta\rm{AIC}\), the optimized log likelihood, and the values of key parameters. This file is in Markdown format:

| Model                   | deltaAIC | LogLikelihood | nParams | ParamValues                                   |
|-------------------------|----------|---------------|---------|-----------------------------------------------|
| ExpCM_NP_prefs          | 0.00     | -3389.38      | 6       | beta=2.99, kappa=6.31, omega=0.78             |
| averaged_ExpCM_NP_prefs | 2586.44  | -4682.60      | 6       | beta=0.28, kappa=6.51, omega=0.12             |
| YNGKP_M5                | 2599.70  | -4683.23      | 12      | alpha_omega=0.30, beta_omega=2.41, kappa=5.84 |
| YNGKP_M0                | 2679.50  | -4724.13      | 11      | kappa=5.79, omega=0.11                        |

phydms output for each model

For each individual model, there will also be all of the expected phydms output files as described in the phydms program. These files will begin with the prefix specified by outprefix, which will be followed by the name of the model.