fracsurvive¶
Performs operations related to estimating the fraction of each mutation that survives a selection.
-
dms_tools2.fracsurvive.
avgMutFracSurvive
(mutfracsurvivefiles, avgtype)[source]¶ Gets mean or median mutation fraction surviving.
Typically used to get an average across replicates.
- Args:
- mutfracsurvivefiles (list)
List of CSV files with mutfracsurvivesel as returned by
dms2_fracsurvive
.- avgtype (str)
- Type of “average” to calculate. Possibilities:
mean
median
- Returns:
A pandas.DataFrame containing the mean or median mutation fraction survive (mutfracsurvive).
>>> tf = tempfile.NamedTemporaryFile >>> with tf(mode='w') as f1, tf(mode='w') as f2, tf(mode='w') as f3: ... x = f1.write('site,wildtype,mutation,mutfracsurvive\n' ... '156,G,S,0.9\n' ... '157,K,D,0.1') ... f1.flush() ... x = f2.write('site,wildtype,mutation,mutfracsurvive\n' ... '157,K,D,0.1\n' ... '156,G,S,1.0') ... f2.flush() ... x = f3.write('site,wildtype,mutation,mutfracsurvive\n' ... '157,K,D,0.4\n' ... '156,G,S,0.5') ... f3.flush() ... mean = avgMutFracSurvive([f1.name, f2.name, f3.name], ... 'mean') ... median = avgMutFracSurvive([f1.name, f2.name, f3.name], ... 'median') >>> (mean['site'] == [156, 157]).all() True >>> (median['site'] == [156, 157]).all() True >>> (mean['wildtype'] == ['G', 'K']).all() True >>> (median['wildtype'] == ['G', 'K']).all() True >>> (mean['mutation'] == ['S', 'D']).all() True >>> (median['mutation'] == ['S', 'D']).all() True >>> numpy.allclose(mean['mutfracsurvive'], [0.8, 0.2]) True >>> numpy.allclose(median['mutfracsurvive'], [0.9, 0.1]) True
-
dms_tools2.fracsurvive.
computeMutFracSurvive
(libfracsurvive, sel, mock, countcharacters, pseudocount, translate_to_aa, err=None, mincount=0, aboveavg=False)[source]¶ Compute fraction surviving for each mutation.
- Args:
- libfracsurvive (float)
Overall fraction of selected library that survives relative to mock-selected. Should be >= 0 and <= 1.
- sel (pandas.DataFrame)
Counts for selected sample. Columns should be site, wildtype, and every character in countcharacters.
- mock (pandas.DataFrame)
Like sel but counts for mock-selected sample.
- countcharacters (list)
List of all characters (e.g., codons).
- pseudocount (int or float > 0)
Pseudocount to add to counts.
- translate_to_aa (bool)
Should be True if counts are for codons and we are estimating mutfracsurvive for amino acids, False otherwise.
- err (pandas.DataFrame or None)
Optional error-control counts, in same format as sel.
- mincount (int >= 0)
Report as NaN the mutfracsurvive for any mutation where neither sel nor mock has at least this many counts.
- aboveavg (bool)
If True, compute the fraction suriviving above the library average by subtracting off libfracsurvive and then setting any negative values to 0.
- Returns:
A pandas.DataFrame with the fraction surviving for each mutation. Columns are site, wildtype, mutation, mutfracsurvive.
>>> countchars = ['A', 'C', 'G', 'T'] >>> libfracsurvive = 0.1 >>> pseudocount = 5 >>> mock = pandas.DataFrame.from_records( ... [(1, 'A', 95, 95, 95, 95), (2, 'C', 195, 195, 95, 95)], ... columns=['site', 'wildtype', 'A', 'C', 'G', 'T']) >>> sel = pandas.DataFrame.from_records( ... [(1, 'A', 390, 90, 90, 190), (2, 'C', 390, 190, 390, 190)], ... columns=['site', 'wildtype', 'A', 'C', 'G', 'T']) >>> mutfracsurvive = computeMutFracSurvive(libfracsurvive, sel, mock, ... countchars, pseudocount, False) >>> {'site', 'wildtype', 'mutation', 'mutfracsurvive'} == set( ... mutfracsurvive.columns) True >>> mutfracsurvive = mutfracsurvive.sort_values(['site', 'mutation']) >>> all(mutfracsurvive['site'] == [1, 1, 1, 1, 2, 2, 2, 2]) True >>> all(mutfracsurvive['mutation'] == countchars + countchars) True >>> numpy.allclose(mutfracsurvive.query('site == 1')['mutfracsurvive'], ... [0.2, 0.05, 0.05, 0.1]) True >>> numpy.allclose(mutfracsurvive.query('site == 2')['mutfracsurvive'], ... [0.1, 0.05, 0.2, 0.1]) True >>> mutfracsurvive_above = computeMutFracSurvive(libfracsurvive, ... sel, mock, countchars, pseudocount, False, aboveavg=True) >>> all(mutfracsurvive['site'] == mutfracsurvive_above['site']) True >>> all(mutfracsurvive['mutation'] == mutfracsurvive_above['mutation']) True >>> numpy.allclose(mutfracsurvive_above.query('site == 1') ... ['mutfracsurvive'], [0.1, 0, 0, 0]) True
-
dms_tools2.fracsurvive.
mutToSiteFracSurvive
(mutfracsurvive)[source]¶ Computes sitefracsurvive from mutfracsurvive.
- Args:
- mutfracsurvive (pandas.DataFrame)
Dataframe with mutfracsurvive as from computeMutFracSurvive
- Returns:
- The dataframe sitefracsurvive, which has the following columns:
site
avgfracsurvive: avg mutfracsurvive over non-wildtype chars
maxfracsurvive: maximum mutfracsurvive at site
Mutations for which mutfracsurvive is NaN are ignored, and the site values are also NaN if all mutation values are NaN for that site.
>>> mutfracsurvive = (pandas.DataFrame({ ... 'site':[1, 2, 3, 4], ... 'wildtype':['A', 'G', 'C', 'G'], ... 'A':[numpy.nan, 0.6, 0.1, numpy.nan], ... 'C':[0.2, numpy.nan, numpy.nan, numpy.nan], ... 'G':[0.8, 0.9, 0.2, 0.1], ... 'T':[0.2, 0.0, 0.3, numpy.nan], ... }) ... .melt(id_vars=['site', 'wildtype'], ... var_name='mutation', value_name='mutfracsurvive') ... .reset_index(drop=True) ... ) >>> sitefracsurvive = mutToSiteFracSurvive(mutfracsurvive) >>> all(sitefracsurvive.columns == ['site', 'avgfracsurvive', ... 'maxfracsurvive']) True >>> all(sitefracsurvive['site'] == [1, 2, 3, 4]) True >>> numpy.allclose(sitefracsurvive['avgfracsurvive'], ... [0.4, 0.3, 0.2, numpy.nan], equal_nan=True) True >>> numpy.allclose(sitefracsurvive['maxfracsurvive'], ... [0.8, 0.6, 0.3, numpy.nan], equal_nan=True) True