dms_editsites

Overview

dms_editsites is a program included with the dms_tools package. It is designed for renumbering sites, or removing data for specific ones. See Examples for illustrations of how you might do this.

After you install dms_tools, this program will be available to run at the command line.

Command-line usage

Edits sites in a data file. Typically you would use this program if you wanted to renumber sites or remove certain sites. This script is part of dms_tools (version 1.1.20) written by the Bloom Lab (see https://github.com/jbloomlab/dms_tools/graphs/contributors for all contributors). Detailed documentation is at http://jbloomlab.github.io/dms_tools/

usage: dms_editsites [-h] [--skipfirstline] [-v]
                     infile outfile {renumber,remove,retain} edit_file

Positional Arguments

infile

Existing data file. This could be a deep mutational scanning counts file, a preferences file, a differential preferences file, or any other file with the following format: blank lines or lines beginning with “#” (comment lines) are ignored; every other line must begin with an entry giving a unique site number (such as “1” or “2A”). The line may then have an arbitrary number of other entries separated from the site number by whitespace. If the lines have no whitespace, then we look for comma separators.

Typically, this file might be a preferences_file, a diffpreferences_file, or a dms_counts.

outfile The created output file in which the site editing has been performed on “infile”. If this output file already exists, it is overwritten.
edit_method

Possible choices: renumber, remove, retain

How to do the editing: renumber sites, remove specified sites, or retain only specified sites.

edit_file Existing file specifying how edits are made. If “edit_method” is “renumber”, then all non-comment lines (those not beginning with “#”) must have two space delimited entries specifying the existing site in “infile” and the new site number with which it is replaced; all sites must be specified, and if the new number is “None” then the site is removed in the created file. If “edit_method” is “remove”, then each line should have as its first entry a site, and all of the listed sites are removed. If “edit_method” is “retain”, then each line should have as its first entry a site, and only the listed sites are retained.

Named Arguments

--skipfirstline
 

Skip the edit operation on the first site. This could be helpful if dealing with a CSV file in pandas format.

Default: False

-v, --version show program’s version number and exit

Examples

Renumbering sites

If we have the preferences.txt file with these contents:

# POSITION WT SITE_ENTROPY PI_A PI_C PI_D PI_E PI_F PI_G PI_H PI_I PI_K PI_L PI_M PI_N PI_P PI_Q PI_R PI_S PI_T PI_V PI_W PI_Y PI_* PI_A_95 PI_C_95 PI_D_95 PI_E_95 PI_F_95 PI_G_95 PI_H_95 PI_I_95 PI_K_95 PI_L_95 PI_M_95 PI_N_95 PI_P_95 PI_Q_95 PI_R_95 PI_S_95 PI_T_95 PI_V_95 PI_W_95 PI_Y_95 PI_*_95
2 K 3.7887 0.00871112 0.0566807 0.00209889 0.0126265 0.123541 0.0419474 0.012759 0.198294 0.0609558 0.0861918 0.0132844 0.0314975 0.0410346 0.0578467 0.0166405 0.0588795 0.0873109 0.0655377 0.00906756 0.0098667 0.0052279 0.00377015,0.0161776 0.0368293,0.0866459 5.25935e-05,0.00750028 0.00147735,0.0305943 0.0838463,0.17365 0.0290031,0.0584818 0.00555741,0.0238828 0.156947,0.244985 0.0549586,0.0681017 0.0664255,0.113842 0.000847956,0.041653 0.0221782,0.0430464 0.0290997,0.0551884 0.0355218,0.091347 0.00758911,0.0250887 0.0443249,0.0761826 0.0677839,0.113325 0.0425034,0.0981422 0.000771089,0.0277961 0.00358541,0.0201952 0.000127331,0.0154344
3 A 2.83514 0.0437537 0.0334546 0.0309614 0.00177409 0.0895918 0.00481327 0.00159833 0.0194826 0.00612203 0.45822 0.00724804 0.0214667 0.00343922 0.0070182 0.00247844 0.019586 0.0234391 0.166837 0.0286347 0.0188882 0.0111935 0.0382826,0.0494847 0.0159482,0.061066 0.0146872,0.054093 3.31649e-05,0.00560408 0.0547333,0.132015 9.98457e-05,0.0141004 4.33912e-05,0.0056256 0.00931897,0.0363372 0.000604372,0.0187545 0.385594,0.531808 0.000164964,0.0337412 0.00394407,0.0612625 0.000229564,0.00835275 0.000259382,0.0237583 0.000710602,0.00561234 0.00861317,0.0325203 0.00902808,0.0448718 0.123725,0.227516 0.00809207,0.0680912 0.00888236,0.0354431 0.00412041,0.0228625
4 K 3.90821 0.0442179 0.0282262 0.00917627 0.0381669 0.00258029 0.00505373 0.117934 0.0259991 0.0512432 0.0905123 0.0170454 0.0126292 0.0614701 0.0848777 0.141501 0.0361854 0.0180969 0.0514335 0.104027 0.0562738 0.0033496 0.0281289,0.0676053 0.0130852,0.0517097 0.00165841,0.0237626 0.00962184,0.0932152 6.39066e-05,0.00890049 0.00132784,0.0114412 0.0783091,0.176141 0.0131793,0.0426951 0.0448744,0.0578793 0.0643019,0.115842 0.00299087,0.0466871 0.00124322,0.0341616 0.0425182,0.0870665 0.0530961,0.124081 0.103449,0.195545 0.0238936,0.0533164 0.009447,0.0303053 0.0345412,0.071387 0.0575267,0.175514 0.0281562,0.0992686 0.000150733,0.00985258

and the renumbering_scheme.txt file with these contents:

#ORIGINAL_SITE NEW_SITE
2 1
3 2
4 2A

then the command:

dms_editsites preferences.txt renumbered_preferences.txt renumber renumbering_scheme.txt

creates the file renumbered_preferences.txt with these contents:

# POSITION WT SITE_ENTROPY PI_A PI_C PI_D PI_E PI_F PI_G PI_H PI_I PI_K PI_L PI_M PI_N PI_P PI_Q PI_R PI_S PI_T PI_V PI_W PI_Y PI_* PI_A_95 PI_C_95 PI_D_95 PI_E_95 PI_F_95 PI_G_95 PI_H_95 PI_I_95 PI_K_95 PI_L_95 PI_M_95 PI_N_95 PI_P_95 PI_Q_95 PI_R_95 PI_S_95 PI_T_95 PI_V_95 PI_W_95 PI_Y_95 PI_*_95
1 K 3.7887 0.00871112 0.0566807 0.00209889 0.0126265 0.123541 0.0419474 0.012759 0.198294 0.0609558 0.0861918 0.0132844 0.0314975 0.0410346 0.0578467 0.0166405 0.0588795 0.0873109 0.0655377 0.00906756 0.0098667 0.0052279 0.00377015,0.0161776 0.0368293,0.0866459 5.25935e-05,0.00750028 0.00147735,0.0305943 0.0838463,0.17365 0.0290031,0.0584818 0.00555741,0.0238828 0.156947,0.244985 0.0549586,0.0681017 0.0664255,0.113842 0.000847956,0.041653 0.0221782,0.0430464 0.0290997,0.0551884 0.0355218,0.091347 0.00758911,0.0250887 0.0443249,0.0761826 0.0677839,0.113325 0.0425034,0.0981422 0.000771089,0.0277961 0.00358541,0.0201952 0.000127331,0.0154344
2 A 2.83514 0.0437537 0.0334546 0.0309614 0.00177409 0.0895918 0.00481327 0.00159833 0.0194826 0.00612203 0.45822 0.00724804 0.0214667 0.00343922 0.0070182 0.00247844 0.019586 0.0234391 0.166837 0.0286347 0.0188882 0.0111935 0.0382826,0.0494847 0.0159482,0.061066 0.0146872,0.054093 3.31649e-05,0.00560408 0.0547333,0.132015 9.98457e-05,0.0141004 4.33912e-05,0.0056256 0.00931897,0.0363372 0.000604372,0.0187545 0.385594,0.531808 0.000164964,0.0337412 0.00394407,0.0612625 0.000229564,0.00835275 0.000259382,0.0237583 0.000710602,0.00561234 0.00861317,0.0325203 0.00902808,0.0448718 0.123725,0.227516 0.00809207,0.0680912 0.00888236,0.0354431 0.00412041,0.0228625
2A K 3.90821 0.0442179 0.0282262 0.00917627 0.0381669 0.00258029 0.00505373 0.117934 0.0259991 0.0512432 0.0905123 0.0170454 0.0126292 0.0614701 0.0848777 0.141501 0.0361854 0.0180969 0.0514335 0.104027 0.0562738 0.0033496 0.0281289,0.0676053 0.0130852,0.0517097 0.00165841,0.0237626 0.00962184,0.0932152 6.39066e-05,0.00890049 0.00132784,0.0114412 0.0783091,0.176141 0.0131793,0.0426951 0.0448744,0.0578793 0.0643019,0.115842 0.00299087,0.0466871 0.00124322,0.0341616 0.0425182,0.0870665 0.0530961,0.124081 0.103449,0.195545 0.0238936,0.0533164 0.009447,0.0303053 0.0345412,0.071387 0.0575267,0.175514 0.0281562,0.0992686 0.000150733,0.00985258

Removing sites

If we have the preferences.txt file with these contents:

# POSITION WT SITE_ENTROPY PI_A PI_C PI_D PI_E PI_F PI_G PI_H PI_I PI_K PI_L PI_M PI_N PI_P PI_Q PI_R PI_S PI_T PI_V PI_W PI_Y PI_* PI_A_95 PI_C_95 PI_D_95 PI_E_95 PI_F_95 PI_G_95 PI_H_95 PI_I_95 PI_K_95 PI_L_95 PI_M_95 PI_N_95 PI_P_95 PI_Q_95 PI_R_95 PI_S_95 PI_T_95 PI_V_95 PI_W_95 PI_Y_95 PI_*_95
2 K 3.7887 0.00871112 0.0566807 0.00209889 0.0126265 0.123541 0.0419474 0.012759 0.198294 0.0609558 0.0861918 0.0132844 0.0314975 0.0410346 0.0578467 0.0166405 0.0588795 0.0873109 0.0655377 0.00906756 0.0098667 0.0052279 0.00377015,0.0161776 0.0368293,0.0866459 5.25935e-05,0.00750028 0.00147735,0.0305943 0.0838463,0.17365 0.0290031,0.0584818 0.00555741,0.0238828 0.156947,0.244985 0.0549586,0.0681017 0.0664255,0.113842 0.000847956,0.041653 0.0221782,0.0430464 0.0290997,0.0551884 0.0355218,0.091347 0.00758911,0.0250887 0.0443249,0.0761826 0.0677839,0.113325 0.0425034,0.0981422 0.000771089,0.0277961 0.00358541,0.0201952 0.000127331,0.0154344
3 A 2.83514 0.0437537 0.0334546 0.0309614 0.00177409 0.0895918 0.00481327 0.00159833 0.0194826 0.00612203 0.45822 0.00724804 0.0214667 0.00343922 0.0070182 0.00247844 0.019586 0.0234391 0.166837 0.0286347 0.0188882 0.0111935 0.0382826,0.0494847 0.0159482,0.061066 0.0146872,0.054093 3.31649e-05,0.00560408 0.0547333,0.132015 9.98457e-05,0.0141004 4.33912e-05,0.0056256 0.00931897,0.0363372 0.000604372,0.0187545 0.385594,0.531808 0.000164964,0.0337412 0.00394407,0.0612625 0.000229564,0.00835275 0.000259382,0.0237583 0.000710602,0.00561234 0.00861317,0.0325203 0.00902808,0.0448718 0.123725,0.227516 0.00809207,0.0680912 0.00888236,0.0354431 0.00412041,0.0228625
4 K 3.90821 0.0442179 0.0282262 0.00917627 0.0381669 0.00258029 0.00505373 0.117934 0.0259991 0.0512432 0.0905123 0.0170454 0.0126292 0.0614701 0.0848777 0.141501 0.0361854 0.0180969 0.0514335 0.104027 0.0562738 0.0033496 0.0281289,0.0676053 0.0130852,0.0517097 0.00165841,0.0237626 0.00962184,0.0932152 6.39066e-05,0.00890049 0.00132784,0.0114412 0.0783091,0.176141 0.0131793,0.0426951 0.0448744,0.0578793 0.0643019,0.115842 0.00299087,0.0466871 0.00124322,0.0341616 0.0425182,0.0870665 0.0530961,0.124081 0.103449,0.195545 0.0238936,0.0533164 0.009447,0.0303053 0.0345412,0.071387 0.0575267,0.175514 0.0281562,0.0992686 0.000150733,0.00985258

and the file remove_sites.txt with these contents:

#sites to remove
2

then the command:

dms_editsites preferences.txt pruned_preferences.txt remove remove_sites.txt

creates the file pruned_preferences.txt with these contents:

# POSITION WT SITE_ENTROPY PI_A PI_C PI_D PI_E PI_F PI_G PI_H PI_I PI_K PI_L PI_M PI_N PI_P PI_Q PI_R PI_S PI_T PI_V PI_W PI_Y PI_* PI_A_95 PI_C_95 PI_D_95 PI_E_95 PI_F_95 PI_G_95 PI_H_95 PI_I_95 PI_K_95 PI_L_95 PI_M_95 PI_N_95 PI_P_95 PI_Q_95 PI_R_95 PI_S_95 PI_T_95 PI_V_95 PI_W_95 PI_Y_95 PI_*_95
3 A 2.83514 0.0437537 0.0334546 0.0309614 0.00177409 0.0895918 0.00481327 0.00159833 0.0194826 0.00612203 0.45822 0.00724804 0.0214667 0.00343922 0.0070182 0.00247844 0.019586 0.0234391 0.166837 0.0286347 0.0188882 0.0111935 0.0382826,0.0494847 0.0159482,0.061066 0.0146872,0.054093 3.31649e-05,0.00560408 0.0547333,0.132015 9.98457e-05,0.0141004 4.33912e-05,0.0056256 0.00931897,0.0363372 0.000604372,0.0187545 0.385594,0.531808 0.000164964,0.0337412 0.00394407,0.0612625 0.000229564,0.00835275 0.000259382,0.0237583 0.000710602,0.00561234 0.00861317,0.0325203 0.00902808,0.0448718 0.123725,0.227516 0.00809207,0.0680912 0.00888236,0.0354431 0.00412041,0.0228625
4 K 3.90821 0.0442179 0.0282262 0.00917627 0.0381669 0.00258029 0.00505373 0.117934 0.0259991 0.0512432 0.0905123 0.0170454 0.0126292 0.0614701 0.0848777 0.141501 0.0361854 0.0180969 0.0514335 0.104027 0.0562738 0.0033496 0.0281289,0.0676053 0.0130852,0.0517097 0.00165841,0.0237626 0.00962184,0.0932152 6.39066e-05,0.00890049 0.00132784,0.0114412 0.0783091,0.176141 0.0131793,0.0426951 0.0448744,0.0578793 0.0643019,0.115842 0.00299087,0.0466871 0.00124322,0.0341616 0.0425182,0.0870665 0.0530961,0.124081 0.103449,0.195545 0.0238936,0.0533164 0.009447,0.0303053 0.0345412,0.071387 0.0575267,0.175514 0.0281562,0.0992686 0.000150733,0.00985258