utils¶
- class alignparse.utils.InFrameDeletionsToSubs(geneseq)[source]¶
Bases:
object
Convert in-frame codon-length deletions to substitutions.
Also shifts deletions to put them in frame when possible. Deletions that are not in-frame and codon length are left as deletions.
- Parameters:
geneseq (str) – The sequence of the “wildtype” gene.
- geneseq¶
- Type:
str
Example
First, a case where deletions are already in frame:
>>> geneseq = 'ATG GCG TCA GTA CCG CAT CTA'.replace(' ', '') >>> deltosubs = InFrameDeletionsToSubs(geneseq)
>>> deltosubs.dels_to_subs('A1C del4to6') 'A1C G4- C5- G6-'
>>> deltosubs.dels_to_subs('A1C del4to6 del9to11 del13to15 del19to20') 'A1C G4- C5- G6- G10- T11- A12- C13- C14- G15- del19to20'
Now case where deletions need to be shifted to in frame:
>>> mut_str = 'A1C del3to5 ins11GGG del14to16 del19to20' >>> deltosubs.dels_to_subs(mut_str) 'A1C G4- C5- G6- ins11GGG C13- C14- G15- del19to20'
Now case where deletions cannot be shifted to in frame:
>>> geneseq2 = 'ATG ATC TCA ATA CAG GAT CTA'.replace(' ', '') >>> deltosubs2 = InFrameDeletionsToSubs(geneseq2) >>> deltosubs2.dels_to_subs(mut_str) 'A1C del3to5 ins11GGG del14to16 del19to20'
You can also just directly test if any given deletion can be shifted up or down in position while retaining the same sequence:
>>> deltosubs.shiftable(13, 13, 0) True
>>> deltosubs.shiftable(13, 13, 1) True
>>> deltosubs.shiftable(13, 13, -1) False
- shiftable(start, end, shift)[source]¶
Can deletion be shifted in sequence?
- Parameters:
start (int) – Start of deletion in 1, 2, … numbering.
end (int) – End of deletion in 1, 2, … numbering (inclusive).
shift (int) – Amount we try to shift deletion.
- Returns:
Can deletion be shifted?
- Return type:
bool
- class alignparse.utils.MutationRenumber(number_mapping, old_num_col, new_num_col, wt_nt_col, *, err_suffix='', allow_letter_suffixed_numbers=False)[source]¶
Bases:
object
Re-number mutations.
- Parameters:
number_mapping (pandas.DataFrame) – Data frame giving mapping from old to new numbering scheme.
old_num_col (str) – Column in number_mapping giving old site number.
new_num_col (str) – Column in number_mapping giving new site number.
wt_nt_col (str or None) – Column in number_mapping giving wildtype nucleotide at each site, or None to not check identity. Is also allowed to be an amino acid.
err_suffix (str) – Append this message to any errors raised about invalid sites or mutation strings. Can be useful for debugging.
allow_letter_suffixed_numbers (bool) – Allow site numbers in number_mapping to have lowercase letter suffixes as in
214a
.
- old_to_new_site¶
Maps old site number to new one.
- Type:
dict
- old_to_wt¶
Maps old site number to wildtype nucleotide if using wt_nt_col.
- Type:
dict or None
Example
>>> number_mapping = pd.DataFrame({'old': [1, 2, 3], ... 'new': [5, 6, 7], ... 'wt_nt': ['A', 'C', 'G']}) >>> renumberer = MutationRenumber(number_mapping=number_mapping, ... old_num_col='old', ... new_num_col='new', ... wt_nt_col='wt_nt') >>> renumberer.old_to_new_site {1: 5, 2: 6, 3: 7} >>> renumberer.old_to_wt {1: 'A', 2: 'C', 3: 'G'} >>> renumberer.renumber_muts('A1C del2to3 ins3GC') 'A5C del6to7 ins7GC'
Try to renumber with gaps and stop codons, allowed if flags set:
>>> renumberer.renumber_muts("A1F C2- G3*") Traceback (most recent call last): ... ValueError: Cannot match C2- in A1F C2- G3* >>> renumberer.renumber_muts("A1F C2- G3*", ... allow_gaps=True, ... allow_stop=True) 'A5F C6- G7*'
Use
allow_letter_suffixed_numbers
:>>> suffixed_number_mapping = pd.DataFrame({'old': [1, 2, 3], ... 'new': ["5", "6", "6a"], ... 'wt_nt': ['A', 'C', 'G']}) >>> suffixed_renumberer = MutationRenumber(number_mapping=suffixed_number_mapping, ... old_num_col='old', ... new_num_col='new', ... wt_nt_col='wt_nt') Traceback (most recent call last): ... ValueError: `number_mapping` column new not integer >>> suffixed_renumberer = MutationRenumber(number_mapping=suffixed_number_mapping, ... old_num_col='old', ... new_num_col='new', ... wt_nt_col='wt_nt', ... allow_letter_suffixed_numbers=True) >>> suffixed_renumberer.renumber_muts('A1C del2to3 ins3GC') 'A5C del6to6a ins6aGC'
- renumber_muts(mut_str, allow_gaps=False, allow_stop=False)[source]¶
Get re-numbered mutation string.
- Parameters:
mut_str (str) – Mutations in format ‘A1C del2to3 ins3GG’.
allow_gaps (bool) – Allow gap (
-
) charactersallow_stop (bool) – Allow stop (
*
) characters
- Returns:
A version of mut_str where sites have been renumbered.
- Return type:
str
- alignparse.utils.merge_dels(s)[source]¶
Merge consecutive deletions
- Parameters:
s (str) – A single string of mutations.
- Returns:
A mutation strings where consecutive deletions have been merged, and all mutations are sorted by site.
- Return type:
str
Example
Merge consecutive deletions:
>>> merge_dels('del12to15 del21to30 del210to300 del16to20 ' ... 'del1702to1909 del1910to1930 G885T G85T') 'del12to30 G85T del210to300 G885T del1702to1930'
- alignparse.utils.qvals_to_accuracy(qvals, encoding='numbers')[source]¶
Convert set of quality scores into average accuracy.
- Parameters:
qvals (numpy.array, number, or str) – Q-values, for how they are encoded see encoding.
encoding ({'numbers', 'sanger'}) – If ‘numbers’ then qvals should be a numpy.array of Q-values or a number giving a single Q-value. If ‘sanger’, then qvals is a string, with the Q-value being the ASCII value minus 33.
- Returns:
The average accuracy if the Q-values. nan if qvals is empty.
- Return type:
float or nan
Note
The probability \(p\) of an error at a given site is related to the Q-value \(Q\) by \(Q = -10 \log_{10} p\). The accuracy is one minus the average error rate.
Example
>>> qvals = numpy.array([13, 77, 93]) >>> round(qvals_to_accuracy(qvals), 3) 0.983 >>> round(qvals_to_accuracy(qvals[1 : ]), 3) 1.0 >>> qvals_to_accuracy(numpy.array([])) nan
>>> qvals_str = '.n~' >>> round(qvals_to_accuracy(qvals_str, encoding='sanger'), 3) 0.983
>>> round(qvals_to_accuracy(15), 3) 0.968
- alignparse.utils.sort_mutations(mut_strs)[source]¶
Sort mutation string by site, and combine multiple mutation strings.
- Parameters:
mut_strs (str or list) – A single mutation string or a list of such strings.
- Returns:
A single mutation string with all mutations sorted by site.
- Return type:
str
Example
Sort a single mutation string:
>>> sort_mutations('ins7GC A5C del2to3') 'del2to3 A5C ins7GC'
Sort a list of two mutation strings, including a negative site:
>>> sort_mutations(['ins7GC', 'A-5C del2to3']) 'A-5C del2to3 ins7GC'