fracsurvive¶

Performs operations related to estimating the fraction of each mutation that survives a selection.

dms_tools2.fracsurvive.avgMutFracSurvive(mutfracsurvivefiles, avgtype)[source]¶

Gets mean or median mutation fraction surviving.

Typically used to get an average across replicates.

Args:

mutfracsurvivefiles (list)

List of CSV files with mutfracsurvivesel as returned by dms2_fracsurvive.

avgtype (str)

Type of “average” to calculate. Possibilities:

mean
median

Returns:

A pandas.DataFrame containing the mean or median mutation fraction survive (mutfracsurvive).

>>> tf = tempfile.NamedTemporaryFile
>>> with tf(mode='w') as f1, tf(mode='w') as f2, tf(mode='w') as f3:
...     x = f1.write('site,wildtype,mutation,mutfracsurvive\n'
...                  '156,G,S,0.9\n'
...                  '157,K,D,0.1')
...     f1.flush()
...     x = f2.write('site,wildtype,mutation,mutfracsurvive\n'
...                  '157,K,D,0.1\n'
...                  '156,G,S,1.0')
...     f2.flush()
...     x = f3.write('site,wildtype,mutation,mutfracsurvive\n'
...                  '157,K,D,0.4\n'
...                  '156,G,S,0.5')
...     f3.flush()
...     mean = avgMutFracSurvive([f1.name, f2.name, f3.name],
...             'mean')
...     median = avgMutFracSurvive([f1.name, f2.name, f3.name],
...             'median')
>>> (mean['site'] == [156, 157]).all()
True
>>> (median['site'] == [156, 157]).all()
True
>>> (mean['wildtype'] == ['G', 'K']).all()
True
>>> (median['wildtype'] == ['G', 'K']).all()
True
>>> (mean['mutation'] == ['S', 'D']).all()
True
>>> (median['mutation'] == ['S', 'D']).all()
True
>>> numpy.allclose(mean['mutfracsurvive'], [0.8, 0.2])
True
>>> numpy.allclose(median['mutfracsurvive'], [0.9, 0.1])
True

dms_tools2.fracsurvive.computeMutFracSurvive(libfracsurvive, sel, mock, countcharacters, pseudocount, translate_to_aa, err=None, mincount=0, aboveavg=False)[source]¶

Compute fraction surviving for each mutation.

Args:

libfracsurvive (float): Overall fraction of selected library that survives relative to mock-selected. Should be >= 0 and <= 1.
sel (pandas.DataFrame): Counts for selected sample. Columns should be site, wildtype, and every character in countcharacters.
mock (pandas.DataFrame): Like sel but counts for mock-selected sample.
countcharacters (list): List of all characters (e.g., codons).
pseudocount (int or float > 0): Pseudocount to add to counts.
translate_to_aa (bool): Should be True if counts are for codons and we are estimating mutfracsurvive for amino acids, False otherwise.
err (pandas.DataFrame or None): Optional error-control counts, in same format as sel.
mincount (int >= 0): Report as NaN the mutfracsurvive for any mutation where neither sel nor mock has at least this many counts.
aboveavg (bool): If True, compute the fraction suriviving above the library average by subtracting off libfracsurvive and then setting any negative values to 0.

Returns:

A pandas.DataFrame with the fraction surviving for each mutation. Columns are site, wildtype, mutation, mutfracsurvive.

>>> countchars = ['A', 'C', 'G', 'T']
>>> libfracsurvive = 0.1
>>> pseudocount = 5
>>> mock = pandas.DataFrame.from_records(
...         [(1, 'A', 95, 95, 95, 95), (2, 'C', 195, 195, 95, 95)],
...         columns=['site', 'wildtype', 'A', 'C', 'G', 'T'])
>>> sel = pandas.DataFrame.from_records(
...         [(1, 'A', 390, 90, 90, 190), (2, 'C', 390, 190, 390, 190)],
...         columns=['site', 'wildtype', 'A', 'C', 'G', 'T'])
>>> mutfracsurvive = computeMutFracSurvive(libfracsurvive, sel, mock,
...         countchars, pseudocount, False)
>>> {'site', 'wildtype', 'mutation', 'mutfracsurvive'} == set(
...         mutfracsurvive.columns)
True
>>> mutfracsurvive = mutfracsurvive.sort_values(['site', 'mutation'])
>>> all(mutfracsurvive['site'] == [1, 1, 1, 1, 2, 2, 2, 2])
True
>>> all(mutfracsurvive['mutation'] == countchars + countchars)
True
>>> numpy.allclose(mutfracsurvive.query('site == 1')['mutfracsurvive'],
...         [0.2, 0.05, 0.05, 0.1])
True
>>> numpy.allclose(mutfracsurvive.query('site == 2')['mutfracsurvive'],
...         [0.1, 0.05, 0.2, 0.1])
True
>>> mutfracsurvive_above = computeMutFracSurvive(libfracsurvive,
...         sel, mock, countchars, pseudocount, False, aboveavg=True)
>>> all(mutfracsurvive['site'] == mutfracsurvive_above['site'])
True
>>> all(mutfracsurvive['mutation'] == mutfracsurvive_above['mutation'])
True
>>> numpy.allclose(mutfracsurvive_above.query('site == 1')
...         ['mutfracsurvive'], [0.1, 0, 0, 0])
True

dms_tools2.fracsurvive.mutToSiteFracSurvive(mutfracsurvive)[source]¶

Computes sitefracsurvive from mutfracsurvive.

Args:

mutfracsurvive (pandas.DataFrame): Dataframe with mutfracsurvive as from computeMutFracSurvive

Returns:

The dataframe sitefracsurvive, which has the following columns:

site
avgfracsurvive: avg mutfracsurvive over non-wildtype chars
maxfracsurvive: maximum mutfracsurvive at site

Mutations for which mutfracsurvive is NaN are ignored, and the site values are also NaN if all mutation values are NaN for that site.

>>> mutfracsurvive = (pandas.DataFrame({
...         'site':[1, 2, 3, 4],
...         'wildtype':['A', 'G', 'C', 'G'],
...         'A':[numpy.nan, 0.6, 0.1, numpy.nan],
...         'C':[0.2, numpy.nan, numpy.nan, numpy.nan],
...         'G':[0.8, 0.9, 0.2, 0.1],
...         'T':[0.2, 0.0, 0.3, numpy.nan],
...         })
...         .melt(id_vars=['site', 'wildtype'],
...               var_name='mutation', value_name='mutfracsurvive')
...         .reset_index(drop=True)
...         )
>>> sitefracsurvive = mutToSiteFracSurvive(mutfracsurvive)
>>> all(sitefracsurvive.columns == ['site', 'avgfracsurvive',
...         'maxfracsurvive'])
True
>>> all(sitefracsurvive['site'] == [1, 2, 3, 4])
True
>>> numpy.allclose(sitefracsurvive['avgfracsurvive'],
...         [0.4, 0.3, 0.2, numpy.nan], equal_nan=True)
True
>>> numpy.allclose(sitefracsurvive['maxfracsurvive'],
...         [0.8, 0.6, 0.3, numpy.nan], equal_nan=True)
True

fracsurvive¶

dms_tools2

Navigation

Related Topics