Command-line interface

The easiest way to use pdb_prot_align is typically via the command-line executable that will be installed with the package. See below for usage.

Usage

Align proteins to reference and PDB.

usage: pdb_prot_align [-h] [-v] --protsfile PROTSFILE --refprot_regex
                      REFPROT_REGEX --pdbfile PDBFILE --chain_ids CHAIN_IDS
                      [CHAIN_IDS ...] --outprefix OUTPREFIX
                      [--ignore_gaps IGNORE_GAPS] [--drop_pdb DROP_PDB]
                      [--drop_refprot DROP_REFPROT] [--mafft MAFFT]

Named Arguments

-v, --version

show program’s version number and exit

--protsfile

input FASTA file of protein sequences

--refprot_regex

regex for reference protein header in protsfile

--pdbfile

input PDB file

--chain_ids

chains in PDB file to align; all chains aligning to a site must share the same residue number and amino-acid or an error will be raised

--outprefix

prefix for output files (can be / include directory): “alignment.fa” (alignment with gaps relative to reference stripped); “alignment_unstripped.fa” (non-stripped alignment with PDB chains still included); “sites.csv” (sequential sites in reference, PDB sites, PDB chains, wildtype in reference, wildtype in PDB, site entropy in bits, n effective amino acids at site, amino acid, frequency of amino acid)

--ignore_gaps

ignore gaps (-) when calculating frequencies, number effective amino acids, entropy

Default: True

--drop_pdb

drop PDB protein chains from “alignment.fa” and computation of stats in “sites.csv” output files

Default: True

--drop_refprot

drop reference protein from “alignment.fa” and computation of stats in “sites.csv” output files

Default: False

--mafft

path to mafft, potentially with additional args such as “mafft –reorder” (if multiple args, it all needs to be in quotes)

Default: “mafft”