fracsurvive

Performs operations related to estimating the fraction of each mutation that survives a selection.

dms_tools2.fracsurvive.avgMutFracSurvive(mutfracsurvivefiles, avgtype)[source]

Gets mean or median mutation fraction surviving.

Typically used to get an average across replicates.

Args:
mutfracsurvivefiles (list)

List of CSV files with mutfracsurvivesel as returned by dms2_fracsurvive.

avgtype (str)
Type of “average” to calculate. Possibilities:
  • mean

  • median

Returns:

A pandas.DataFrame containing the mean or median mutation fraction survive (mutfracsurvive).

>>> tf = tempfile.NamedTemporaryFile
>>> with tf(mode='w') as f1, tf(mode='w') as f2, tf(mode='w') as f3:
...     x = f1.write('site,wildtype,mutation,mutfracsurvive\n'
...                  '156,G,S,0.9\n'
...                  '157,K,D,0.1')
...     f1.flush()
...     x = f2.write('site,wildtype,mutation,mutfracsurvive\n'
...                  '157,K,D,0.1\n'
...                  '156,G,S,1.0')
...     f2.flush()
...     x = f3.write('site,wildtype,mutation,mutfracsurvive\n'
...                  '157,K,D,0.4\n'
...                  '156,G,S,0.5')
...     f3.flush()
...     mean = avgMutFracSurvive([f1.name, f2.name, f3.name],
...             'mean')
...     median = avgMutFracSurvive([f1.name, f2.name, f3.name],
...             'median')
>>> (mean['site'] == [156, 157]).all()
True
>>> (median['site'] == [156, 157]).all()
True
>>> (mean['wildtype'] == ['G', 'K']).all()
True
>>> (median['wildtype'] == ['G', 'K']).all()
True
>>> (mean['mutation'] == ['S', 'D']).all()
True
>>> (median['mutation'] == ['S', 'D']).all()
True
>>> numpy.allclose(mean['mutfracsurvive'], [0.8, 0.2])
True
>>> numpy.allclose(median['mutfracsurvive'], [0.9, 0.1])
True
dms_tools2.fracsurvive.computeMutFracSurvive(libfracsurvive, sel, mock, countcharacters, pseudocount, translate_to_aa, err=None, mincount=0, aboveavg=False)[source]

Compute fraction surviving for each mutation.

Args:
libfracsurvive (float)

Overall fraction of selected library that survives relative to mock-selected. Should be >= 0 and <= 1.

sel (pandas.DataFrame)

Counts for selected sample. Columns should be site, wildtype, and every character in countcharacters.

mock (pandas.DataFrame)

Like sel but counts for mock-selected sample.

countcharacters (list)

List of all characters (e.g., codons).

pseudocount (int or float > 0)

Pseudocount to add to counts.

translate_to_aa (bool)

Should be True if counts are for codons and we are estimating mutfracsurvive for amino acids, False otherwise.

err (pandas.DataFrame or None)

Optional error-control counts, in same format as sel.

mincount (int >= 0)

Report as NaN the mutfracsurvive for any mutation where neither sel nor mock has at least this many counts.

aboveavg (bool)

If True, compute the fraction suriviving above the library average by subtracting off libfracsurvive and then setting any negative values to 0.

Returns:

A pandas.DataFrame with the fraction surviving for each mutation. Columns are site, wildtype, mutation, mutfracsurvive.

>>> countchars = ['A', 'C', 'G', 'T']
>>> libfracsurvive = 0.1
>>> pseudocount = 5
>>> mock = pandas.DataFrame.from_records(
...         [(1, 'A', 95, 95, 95, 95), (2, 'C', 195, 195, 95, 95)],
...         columns=['site', 'wildtype', 'A', 'C', 'G', 'T'])
>>> sel = pandas.DataFrame.from_records(
...         [(1, 'A', 390, 90, 90, 190), (2, 'C', 390, 190, 390, 190)],
...         columns=['site', 'wildtype', 'A', 'C', 'G', 'T'])
>>> mutfracsurvive = computeMutFracSurvive(libfracsurvive, sel, mock,
...         countchars, pseudocount, False)
>>> {'site', 'wildtype', 'mutation', 'mutfracsurvive'} == set(
...         mutfracsurvive.columns)
True
>>> mutfracsurvive = mutfracsurvive.sort_values(['site', 'mutation'])
>>> all(mutfracsurvive['site'] == [1, 1, 1, 1, 2, 2, 2, 2])
True
>>> all(mutfracsurvive['mutation'] == countchars + countchars)
True
>>> numpy.allclose(mutfracsurvive.query('site == 1')['mutfracsurvive'],
...         [0.2, 0.05, 0.05, 0.1])
True
>>> numpy.allclose(mutfracsurvive.query('site == 2')['mutfracsurvive'],
...         [0.1, 0.05, 0.2, 0.1])
True
>>> mutfracsurvive_above = computeMutFracSurvive(libfracsurvive,
...         sel, mock, countchars, pseudocount, False, aboveavg=True)
>>> all(mutfracsurvive['site'] == mutfracsurvive_above['site'])
True
>>> all(mutfracsurvive['mutation'] == mutfracsurvive_above['mutation'])
True
>>> numpy.allclose(mutfracsurvive_above.query('site == 1')
...         ['mutfracsurvive'], [0.1, 0, 0, 0])
True
dms_tools2.fracsurvive.mutToSiteFracSurvive(mutfracsurvive)[source]

Computes sitefracsurvive from mutfracsurvive.

Args:
mutfracsurvive (pandas.DataFrame)

Dataframe with mutfracsurvive as from computeMutFracSurvive

Returns:
The dataframe sitefracsurvive, which has the following columns:
  • site

  • avgfracsurvive: avg mutfracsurvive over non-wildtype chars

  • maxfracsurvive: maximum mutfracsurvive at site

Mutations for which mutfracsurvive is NaN are ignored, and the site values are also NaN if all mutation values are NaN for that site.

>>> mutfracsurvive = (pandas.DataFrame({
...         'site':[1, 2, 3, 4],
...         'wildtype':['A', 'G', 'C', 'G'],
...         'A':[numpy.nan, 0.6, 0.1, numpy.nan],
...         'C':[0.2, numpy.nan, numpy.nan, numpy.nan],
...         'G':[0.8, 0.9, 0.2, 0.1],
...         'T':[0.2, 0.0, 0.3, numpy.nan],
...         })
...         .melt(id_vars=['site', 'wildtype'],
...               var_name='mutation', value_name='mutfracsurvive')
...         .reset_index(drop=True)
...         )
>>> sitefracsurvive = mutToSiteFracSurvive(mutfracsurvive)
>>> all(sitefracsurvive.columns == ['site', 'avgfracsurvive',
...         'maxfracsurvive'])
True
>>> all(sitefracsurvive['site'] == [1, 2, 3, 4])
True
>>> numpy.allclose(sitefracsurvive['avgfracsurvive'],
...         [0.4, 0.3, 0.2, numpy.nan], equal_nan=True)
True
>>> numpy.allclose(sitefracsurvive['maxfracsurvive'],
...         [0.8, 0.6, 0.3, numpy.nan], equal_nan=True)
True