main¶
Main PDB / protein alignment function.
You can run all of the command-line functionality of pdb_prot_align
via
the run()
function.
-
pdb_prot_align.main.
run
(protsfile, refprot_regex, pdbfile, chain_ids, outprefix, ignore_gaps=True, drop_pdb=True, drop_refprot=True, mafft='mafft')[source]¶ Run main function to align proteins to reference and PDB chain(s).
Note
This function implements the full command-line functionality of
pdb_prot_align
via a Python function.- Parameters
protsfile (str) – input FASTA file of protein sequences
refprot_regex (str) – regex for reference protein header in protsfile
pdbfile (str) – input PDB file
chain_ids (list) – chains in PDB file to align; all chains aligning to a site must share the same residue number and amino-acid or an error will be raised
outprefix (str) – prefix for output files (can be / include directory): “alignment.fa” (alignment with gaps relative to reference stripped); “alignment_unstripped.fa” (non-stripped alignment with PDB chains still included); “sites.csv” (sequential sites in reference, PDB sites, PDB chains, wildtype in reference, wildtype in PDB, site entropy in bits, n effective amino acids at site, amino acid, frequency of amino acid)
ignore_gaps (bool) – ignore gaps (-) when calculating frequencies, number effective amino acids, entropy
drop_pdb (bool) – drop PDB protein chains from “alignment.fa” and computation of stats in “sites.csv” output files
drop_refprot (bool) – drop reference protein from “alignment.fa” and computation of stats in “sites.csv” output files
mafft (str) – path to mafft, potentially with additional args such as “mafft –reorder” (if multiple args, it all needs to be in quotes)