seqnumbering

Deals with sequence numbering conversions.

class dms_tools2.seqnumbering.TranscriptConverter(genbankfile, *, ignore_other_features=False, to_upper=True)[source]

Bases: object

Convert sites in mRNA transcript to chromosome or CDS sites.

In mRNA sequencing, we identify mutations with respect to their position in the mRNA. But we may want map them to the corresponding numbers in the entire chromosome (gene segment in case of a segmented virus, or entire viral genome in the case of a non-segmented virus), or to CDSs encoded on that chromosome.

For all site numbers used as input and output for this class, numbering is assumed to be 1, 2, …. Note that this is different than the 0, 1, … numbering used by Python for strings.

Args:
genbankfile (str)

Genbank file with one or more loci, each of which should be a separate chromosome. The features of relevance are called mRNA and CDS. Each of these features should have a qualifier called label that gives its name.

ignore_other_features (bool)

If genbankfile contains features not mRNA or CDS, ignore them or raise error?

to_upper (bool)

Convert all sequences to upper case letters?

Attributes:
chromosomes (dict)

Keyed by chromosome names, values are Bio.SeqRecord.SeqRecord for chromosomes.

mRNAs (dict)

Keyed by mRNA names, values are Bio.SeqFeature.SeqFeature for mRNA.

CDSs (dict)

Keyed by CDS names, values are Bio.SeqFeature.SeqFeature for CDS.

mRNA_chromosome (dict)

Keyed by mRNA name, values is name of chromosome containing mRNA.

chromosome_CDSs (dict)

Keyed by chromosome names, values are names of CDSs on chromosome.

Specify example genbankfile contents with required information. In this file, the chromosome is fluNS, and it encodes two mRNAs and two CDSs (for fluNS1 and fluNS2):

>>> genbank_text = '''
... LOCUS       fluNS                    890 bp    DNA              UNK 01-JAN-1980
... FEATURES             Location/Qualifiers
...      mRNA            2..868
...                      /label="fluNS1"
...      mRNA            join(2..56,529..868)
...                      /label="fluNS2"
...      CDS             27..719
...                      /label="fluNS1"
...      CDS             join(27..56,529..864)
...                      /label="fluNS2"
... ORIGIN
...         1 agcaaaagca gggtgacaaa gacataatgg atccaaacac tgtgtcaagc tttcaggtag
...        61 attgctttct ttggcatgtc cgcaaaagag ttgcagacca agaactaggt gatgccccat
...       121 tccttgatcg gcttcgccga gatcagaagt ccctaagagg aagaggcagc actcttggtc
...       181 tggacatcga aacagccacc cgtgctggaa agcaaatagt ggagcggatt ctgaaggaag
...       241 aatctgatga ggcactcaaa atgaccatgg cctctgtacc tgcatcgcgc tacctaactg
...       301 acatgactct tgaggaaatg tcaaggcact ggttcatgct catgcccaag cagaaagtgg
...       361 caggccctct ttgtatcaga atggaccagg cgatcatgga taagaacatc atactgaaag
...       421 cgaacttcag tgtgattttt gaccggctgg agactctaat attactaagg gccttcaccg
...       481 aagaggggac aattgttggc gaaatttcac cactgccctc tcttccagga catactgatg
...       541 aggatgtcaa aaatgcagtt ggggtcctca tcggaggact tgaatggaat aataacacag
...       601 ttcgagtctc tgaaactcta cagagattcg cttggagaag cagtaatgag aatgggagac
...       661 ctccactcac tccaaaacag aaacggaaaa tggcgggaac aattaggtca gaagtttgaa
...       721 gaaataaggt ggttgattga agaagtgaga cacagactga agataacaga gaatagtttt
...       781 gagcaaataa catttatgca agccttacaa ctattgcttg aagtggagca agagataaga
...       841 actttctcgt ttcagcttat ttaataataa aaaacaccct tgtttctact
... //
... '''

Now initialize a TranscriptConverter:

>>> with tempfile.NamedTemporaryFile(mode='r+') as genbankfile:
...     _ = genbankfile.write(genbank_text)
...     genbankfile.flush()
...     _ = genbankfile.seek(0)
...     converter = TranscriptConverter(genbankfile)

Confirm resulting TranscriptConverter contains one chromosome (fluNS) with the expected to mRNAs and CDSs:

>>> list(converter.chromosomes.keys())
['fluNS']
>>> sorted(converter.CDSs.keys())
['fluNS1', 'fluNS2']
>>> sorted(converter.mRNAs.keys())
['fluNS1', 'fluNS2']
>>> converter.mRNA_chromosome['fluNS1']
'fluNS'
>>> converter.mRNA_chromosome['fluNS2']
'fluNS'
>>> converter.chromosome_CDSs['fluNS']
['fluNS1', 'fluNS2']

Get site in chromosome (fluNS) that corresponds to a position in the fluNS1 mRNA, and then do the same for the fluNS2 mRNA, using TranscriptConverter.i_mRNAtoChromosome. Then check nucleotide identities with TranscriptConverter.ntIdentity:

>>> converter.i_mRNAtoChromosome('fluNS1', 60)
61
>>> converter.ntIdentity('fluNS', 61)
'A'
>>> converter.i_mRNAtoChromosome('fluNS2', 58)
531
>>> converter.ntIdentity('fluNS', 531)
'C'

Do the same for substrings of fluNS1 and fluNS2:

>>> converter.i_mRNAtoChromosome('fluNS1', 2, mRNAfragment='GATTGCTTTCT')
61
>>> converter.i_mRNAtoChromosome('fluNS2', 9, mRNAfragment='TTTCAGGACATACTGATG')
531

Get amino-acid substitutions caused by chromosome point mutation. We can specify point mutation as string or tuple, and get output with 3 or 1-letter amino-acid codes:

>>> converter.aaSubstitutions('fluNS', 'A61T')
'fluNS1-Asp12Val'
>>> converter.aaSubstitutions('fluNS', ('A', 61, 'T'))
'fluNS1-Asp12Val'
>>> converter.aaSubstitutions('fluNS', 'A61T', aa_3letter=False)
'fluNS1-D12V'

Now look at a point mutation that affects fluNS1 and fluNS2, but only causes an amino-acid substitution in the former (is synonymous in the latter):

>>> converter.aaSubstitutions('fluNS', 'C531T')
'fluNS1-His169Tyr'

Now mutation that causes amino-acid substitutions in fluNS1 and fluNS2:

>>> converter.aaSubstitutions('fluNS', 'A532T')
'fluNS1-His169Leu_fluNS2-Ile12Leu'
aaSubstitutions(chromosome, mutation, *, aa_3letter=True)[source]

Amino-acid substitutions from point mutation to chromosome.

Gets all amino-acid substitutions in CDSs caused by a point mutation to a chromosome.

Args:
chromosome (str)

Name of chromosome.

mutation (str or 3-tuple)

Point mutation in chromosome based numbering. A str like “A50T”, or tuple (wt, i, mut) where wt is wildtype nt, i is site, and mut is mutant nt.

aa_3letter (bool)

Use 3-letter rather than 1-letter amino-acid codes?

Returns:

A string giving the amino-acid mutations:

  • If no mutations, returns empty str.

  • If one mutation, will be str like ‘fluNS1-Asp12Val’.

  • If several mutations, will be str like ‘fluNS1-His169Leu_fluNS2-Ile12Leu’.

i_mRNAtoChromosome(mRNA, i, *, mRNAfragment=None)[source]

Convert site number in mRNA to number in chromosome.

Args:
mRNA (str)

Name of a valid mRNA.

i (int)

Site number in mRNA.

mRNAfragment (str or None)

Substring sequence of mRNA. In this case, i is taken as the site in this substring of the mRNA. Useful because sometimes mutations may be called in a fragment of the full mRNA. Set to None if i is site in full mRNA.

Returns:

Site number in chromosome that contains mRNA.

ntIdentity(chromosome, i)[source]

Gets identity at site in chromosome.

Args:
chromosome (str)

Name of chromosome.

i (int)

Site in chromosome.

Returns:

Nucleotide in chromosome at site i.