syn_selection¶
Identifies enrichment of synonymous codons.
-
dms_tools2.syn_selection.
syn_selection_by_codon
(counts_pre, counts_post, pseudocount=0.5)[source]¶ Identify sites with selection on synonymous codons.
- Runs two-tailed Fisher Exact test, which returns:
P-value reflecting the significance of codon_x enrichment.
After calculating the Fisher P-value, a pseudocount is added for calculation of the odds ratio, which reflects the enrichment of codon_x after selection (relative to other synonymous codons).
The pseudocount removes 0’s to avoid returning NA or inf odds ratios.
Rows where only one codon is represented pre-selection are dropped.
- Args:
- counts_pre (str or pandas.DataFrame)
CSV file giving pre-selection codon counts with columns named ‘site’, ‘wildtype’, and list of codons. Can also be a pandas DataFrame containing the CSV file.
- counts_post (str or pandas.DataFrame)
Like counts_pre but for the post-selection counts. CSV file giving post-selection codon counts in same format as counts_pre.
- ‘pseudocount’ (float or int, default 0.5)
Number to add to each codon count before calculating the odds ratio.
- Returns:
- A pandas DataFrame with the following columns:
‘site’
‘wildtype’ : wildtype codon at site
‘codon’ : codon we are analyzing at site
‘aa’ : amino acid
‘codon_pre’ : counts for codon of interest pre-selection
‘aa_pre’ : counts for all codons for amino acid pre-selection
‘codon_post’ : counts for codon of interest post-selection
‘aa_post’ : counts for all codons for amino acid post-selection
‘odds_ratio’ : enrichment of codon post-selection
‘P’ : P-value calculated using Fisher’s exact test
Example:
>>> pd.set_option('display.max_columns', None) # display all columns >>> pd.set_option('expand_frame_repr', False) # do not break lines >>> counts_pre = pd.DataFrame.from_records( ... [(1, 'ATC', 5, 100, 10), ... (2, 'ATT', 50, 10, 10), ... ], ... columns=['site', 'wildtype', 'ATT', 'ATC', 'ATA'], ... ) >>> counts_post = pd.DataFrame.from_records( ... [(1, 'ATC', 5, 50, 75), ... (2, 'ATT', 50, 9, 11), ... ], ... columns=['site', 'wildtype', 'ATT', 'ATC', 'ATA'], ... ) >>> syn_selection_by_codon(counts_pre, counts_post, 1) site wildtype codon aa codon_pre aa_pre codon_post aa_post odds_ratio P 0 1 ATC ATT I 6 118 6 133 0.881890 1.000000e+00 1 1 ATC ATC I 101 118 51 133 0.104685 1.798192e-15 2 1 ATC ATA I 11 118 76 133 12.969697 7.500709e-17 3 2 ATT ATT I 51 73 51 73 1.000000 1.000000e+00 4 2 ATT ATC I 11 73 10 73 0.894661 1.000000e+00 5 2 ATT ATA I 11 73 12 73 1.108793 1.000000e+00