main

Main PDB / protein alignment function.

You can run all of the command-line functionality of pdb_prot_align via the run() function.

pdb_prot_align.main.get_parser()[source]

Command-line argparse.ArgumentParser for pdb_prot_align.

pdb_prot_align.main.run(protsfile, refprot_regex, pdbfile, chain_ids, outprefix, ignore_gaps=True, drop_pdb=True, drop_refprot=True, mafft='mafft')[source]

Run main function to align proteins to reference and PDB chain(s).

Note

This function implements the full command-line functionality of pdb_prot_align via a Python function.

Parameters
  • protsfile (str) – input FASTA file of protein sequences

  • refprot_regex (str) – regex for reference protein header in protsfile

  • pdbfile (str) – input PDB file

  • chain_ids (list) – chains in PDB file to align; all chains aligning to a site must share the same residue number and amino-acid or an error will be raised

  • outprefix (str) – prefix for output files (can be / include directory): “alignment.fa” (alignment with gaps relative to reference stripped); “alignment_unstripped.fa” (non-stripped alignment with PDB chains still included); “sites.csv” (sequential sites in reference, PDB sites, PDB chains, wildtype in reference, wildtype in PDB, site entropy in bits, n effective amino acids at site, amino acid, frequency of amino acid)

  • ignore_gaps (bool) – ignore gaps (-) when calculating frequencies, number effective amino acids, entropy

  • drop_pdb (bool) – drop PDB protein chains from “alignment.fa” and computation of stats in “sites.csv” output files

  • drop_refprot (bool) – drop reference protein from “alignment.fa” and computation of stats in “sites.csv” output files

  • mafft (str) – path to mafft, potentially with additional args such as “mafft –reorder” (if multiple args, it all needs to be in quotes)