the phydms_comprehensive program¶
Contents
Overview¶
phydms_comprehensive is a program that simplifies usage of phydms for standard analyses. Essentially, phydms_comprehensive runs phydms for several different models to enable model comparisons and identify selection.
In its simplest usage, you simply provide phydms_comprehensive with an alignment and one or more files giving site-specific amino-acid preferences.
The program then first uses RAxML to infer a tree under the GTRCAT model.
Alternatively, you can specify the tree using --tree flag.
For each set of site-specific amino-acid preferences, phydms_comprehensive optimizes the tree with the following models:
ExpCM
ExpCM with a gamma-distributed \(\omega\) (if using the
--gammaomegaflag).ExpCM with a gamma-distributed \(\beta\) (if using the
--gammabetaflag).YNGKP_M0
YNGKP_M5
It also runs each of these ExpCM with averaged preferences as a control. Finally, it creates summaries that enable comparison among the models.
You could get all of this output by simply running phydms repeatedly, but using phydms_comprehensive automates this process for you.
See below for information on Command-line usage and Output files.
Command-line usage¶
Comprehensive phylogenetic model comparison and detection of selection informed by deep mutational scanning data. This program runs ‘phydms’ repeatedly to compare substitution models and detect selection. The ‘phydms’ package is written by the Bloom lab (see https://github.com/jbloomlab/phydms/contributors). Version 2.4.0. Full documentation at http://jbloomlab.github.io/phydms
usage: phydms_comprehensive [-h] (--raxml RAXML | --tree TREE) [--ncpus NCPUS]
[--brlen {scale,optimize}] [--omegabysite]
[--diffprefsbysite] [--gammaomega] [--gammabeta]
[--no-avgprefs] [--randprefs] [-v]
outprefix alignment prefsfiles [prefsfiles ...]
Positional Arguments¶
- outprefix
Output file prefix.
This prefix can be a directory name (e.g.
my_directory/) if you want to create a new directory. See Output files for a description of the created files.- alignment
Existing FASTA file with aligned codon sequences.
The alignment must meet the same specifications described in the
phydmsdocumentation for the argument of the same name (see the phydms program).- prefsfiles
Existing files with site-specific amino-acid preferences.
Provide the name of one or more files giving site-specific amino-acid preferences. These files should meet the same specifications described in the
phydmsdocumentation for the prefsfile that should accompany an ExpCM (see the phydms program).
Named Arguments¶
- --raxml
Path to RAxML (e.g., ‘raxml’)
By default,
phydms_comprehensiveusesRAxMLto infer a tree topology under the Jukes Cantor model and the commandraxml. Use this argument to specify a path toRAxMLother thanraxml. The tree inferred byRAxMLcan be found in the same directory as the other output files.- --tree
Existing Newick file giving input tree.
If you want to instead fix the tree to some existing topology, use this argument and provide the name of a file giving a valid tree in Newick format. You cannot specify both
--treeand--raxml.- --ncpus
Use this many CPUs; -1 means all available.
Default: -1
- --brlen
Possible choices: scale, optimize
How to handle branch lengths: scale by single parameter or optimize each one
Default: “optimize”
- --omegabysite
Fit omega (dN/dS) for each site.
Default: False
- --diffprefsbysite
Fit differential preferences for each site.
Default: False
- --gammaomega
Fit ExpCM with gamma distributed omega.
Default: False
- --gammabeta
Fit ExpCM with gamma distributed beta.
Default: False
- --no-avgprefs
No fitting of models with preferences averaged across sites for ExpCM.
Default: False
- --randprefs
Include ExpCM models with randomized preferences.
Default: False
- -v, --version
show program’s version number and exit
Output files¶
Running phydms_comprehensive will create the following output files, all with the prefix specified by outprefix.
Log file¶
A file with the suffix .log will be created that summarizes the overall progress of phydms_comprehensive. If outprefix is just a directory name, this file will be called log.log.
Model comparison files¶
A file with the suffix modelcomparison.md will be created that summarizes the model comparison.
For each model, it reports the \(\Delta\rm{AIC}\), the optimized log likelihood, and the values of key parameters.
This file is in Markdown format:
| Model | deltaAIC | LogLikelihood | nParams | ParamValues |
|-------------------------|----------|---------------|---------|-----------------------------------------------|
| ExpCM_NP_prefs | 0.00 | -3389.38 | 6 | beta=2.99, kappa=6.31, omega=0.78 |
| averaged_ExpCM_NP_prefs | 2586.44 | -4682.60 | 6 | beta=0.28, kappa=6.51, omega=0.12 |
| YNGKP_M5 | 2599.70 | -4683.23 | 12 | alpha_omega=0.30, beta_omega=2.41, kappa=5.84 |
| YNGKP_M0 | 2679.50 | -4724.13 | 11 | kappa=5.79, omega=0.11 |
phydms output for each model¶
For each individual model, there will also be all of the expected phydms output files as described in the phydms program. These files will begin with the prefix specified by outprefix, which will be followed by the name of the model.