utils

class alignparse.utils.InFrameDeletionsToSubs(geneseq)[source]

Bases: object

Convert in-frame codon-length deletions to substitutions.

Also shifts deletions to put them in frame when possible. Deletions that are not in-frame and codon length are left as deletions.

Parameters:

geneseq (str) – The sequence of the “wildtype” gene.

geneseq
Type:

str

Example

First, a case where deletions are already in frame:

>>> geneseq = 'ATG GCG TCA GTA CCG CAT CTA'.replace(' ', '')
>>> deltosubs = InFrameDeletionsToSubs(geneseq)
>>> deltosubs.dels_to_subs('A1C del4to6')
'A1C G4- C5- G6-'
>>> deltosubs.dels_to_subs('A1C del4to6 del9to11 del13to15 del19to20')
'A1C G4- C5- G6- G10- T11- A12- C13- C14- G15- del19to20'

Now case where deletions need to be shifted to in frame:

>>> mut_str = 'A1C del3to5 ins11GGG del14to16 del19to20'
>>> deltosubs.dels_to_subs(mut_str)
'A1C G4- C5- G6- ins11GGG C13- C14- G15- del19to20'

Now case where deletions cannot be shifted to in frame:

>>> geneseq2 = 'ATG ATC TCA ATA CAG GAT CTA'.replace(' ', '')
>>> deltosubs2 = InFrameDeletionsToSubs(geneseq2)
>>> deltosubs2.dels_to_subs(mut_str)
'A1C del3to5 ins11GGG del14to16 del19to20'

You can also just directly test if any given deletion can be shifted up or down in position while retaining the same sequence:

>>> deltosubs.shiftable(13, 13, 0)
True
>>> deltosubs.shiftable(13, 13, 1)
True
>>> deltosubs.shiftable(13, 13, -1)
False
dels_to_subs(mut_str)[source]

str: Copy of mut_str with in-frame deletions as substitutions

shiftable(start, end, shift)[source]

Can deletion be shifted in sequence?

Parameters:
  • start (int) – Start of deletion in 1, 2, … numbering.

  • end (int) – End of deletion in 1, 2, … numbering (inclusive).

  • shift (int) – Amount we try to shift deletion.

Returns:

Can deletion be shifted?

Return type:

bool

class alignparse.utils.MutationRenumber(number_mapping, old_num_col, new_num_col, wt_nt_col, *, err_suffix='', allow_letter_suffixed_numbers=False)[source]

Bases: object

Re-number mutations.

Parameters:
  • number_mapping (pandas.DataFrame) – Data frame giving mapping from old to new numbering scheme.

  • old_num_col (str) – Column in number_mapping giving old site number.

  • new_num_col (str) – Column in number_mapping giving new site number.

  • wt_nt_col (str or None) – Column in number_mapping giving wildtype nucleotide at each site, or None to not check identity. Is also allowed to be an amino acid.

  • err_suffix (str) – Append this message to any errors raised about invalid sites or mutation strings. Can be useful for debugging.

  • allow_letter_suffixed_numbers (bool) – Allow site numbers in number_mapping to have lowercase letter suffixes as in 214a.

old_to_new_site

Maps old site number to new one.

Type:

dict

old_to_wt

Maps old site number to wildtype nucleotide if using wt_nt_col.

Type:

dict or None

Example

>>> number_mapping = pd.DataFrame({'old': [1, 2, 3],
...                                'new': [5, 6, 7],
...                                'wt_nt': ['A', 'C', 'G']})
>>> renumberer = MutationRenumber(number_mapping=number_mapping,
...                               old_num_col='old',
...                               new_num_col='new',
...                               wt_nt_col='wt_nt')
>>> renumberer.old_to_new_site
{1: 5, 2: 6, 3: 7}
>>> renumberer.old_to_wt
{1: 'A', 2: 'C', 3: 'G'}
>>> renumberer.renumber_muts('A1C del2to3 ins3GC')
'A5C del6to7 ins7GC'

Try to renumber with gaps and stop codons, allowed if flags set:

>>> renumberer.renumber_muts("A1F C2- G3*")
Traceback (most recent call last):
  ...
ValueError: Cannot match C2- in A1F C2- G3*
>>> renumberer.renumber_muts("A1F C2- G3*",
...                          allow_gaps=True,
...                          allow_stop=True)
'A5F C6- G7*'

Use allow_letter_suffixed_numbers:

>>> suffixed_number_mapping = pd.DataFrame({'old': [1, 2, 3],
...                                         'new': ["5", "6", "6a"],
...                                         'wt_nt': ['A', 'C', 'G']})
>>> suffixed_renumberer = MutationRenumber(number_mapping=suffixed_number_mapping,
...                                        old_num_col='old',
...                                        new_num_col='new',
...                                        wt_nt_col='wt_nt')
Traceback (most recent call last):
  ...
ValueError: `number_mapping` column new not integer
>>> suffixed_renumberer = MutationRenumber(number_mapping=suffixed_number_mapping,
...                                        old_num_col='old',
...                                        new_num_col='new',
...                                        wt_nt_col='wt_nt',
...                                        allow_letter_suffixed_numbers=True)
>>> suffixed_renumberer.renumber_muts('A1C del2to3 ins3GC')
'A5C del6to6a ins6aGC'
renumber_muts(mut_str, allow_gaps=False, allow_stop=False)[source]

Get re-numbered mutation string.

Parameters:
  • mut_str (str) – Mutations in format ‘A1C del2to3 ins3GG’.

  • allow_gaps (bool) – Allow gap (-) characters

  • allow_stop (bool) – Allow stop (*) characters

Returns:

A version of mut_str where sites have been renumbered.

Return type:

str

alignparse.utils.merge_dels(s)[source]

Merge consecutive deletions

Parameters:

s (str) – A single string of mutations.

Returns:

A mutation strings where consecutive deletions have been merged, and all mutations are sorted by site.

Return type:

str

Example

Merge consecutive deletions:

>>> merge_dels('del12to15 del21to30 del210to300 del16to20 '
...            'del1702to1909 del1910to1930 G885T G85T')
'del12to30 G85T del210to300 G885T del1702to1930'
alignparse.utils.qvals_to_accuracy(qvals, encoding='numbers')[source]

Convert set of quality scores into average accuracy.

Parameters:
  • qvals (numpy.array, number, or str) – Q-values, for how they are encoded see encoding.

  • encoding ({'numbers', 'sanger'}) – If ‘numbers’ then qvals should be a numpy.array of Q-values or a number giving a single Q-value. If ‘sanger’, then qvals is a string, with the Q-value being the ASCII value minus 33.

Returns:

The average accuracy if the Q-values. nan if qvals is empty.

Return type:

float or nan

Note

The probability \(p\) of an error at a given site is related to the Q-value \(Q\) by \(Q = -10 \log_{10} p\). The accuracy is one minus the average error rate.

Example

>>> qvals = numpy.array([13, 77, 93])
>>> round(qvals_to_accuracy(qvals), 3)
0.983
>>> round(qvals_to_accuracy(qvals[1 : ]), 3)
1.0
>>> qvals_to_accuracy(numpy.array([]))
nan
>>> qvals_str = '.n~'
>>> round(qvals_to_accuracy(qvals_str, encoding='sanger'), 3)
0.983
>>> round(qvals_to_accuracy(15), 3)
0.968
alignparse.utils.sort_mutations(mut_strs)[source]

Sort mutation string by site, and combine multiple mutation strings.

Parameters:

mut_strs (str or list) – A single mutation string or a list of such strings.

Returns:

A single mutation string with all mutations sorted by site.

Return type:

str

Example

Sort a single mutation string:

>>> sort_mutations('ins7GC A5C del2to3')
'del2to3 A5C ins7GC'

Sort a list of two mutation strings, including a negative site:

>>> sort_mutations(['ins7GC', 'A-5C del2to3'])
'A-5C del2to3 ins7GC'