dms_editsites
¶
Contents
Overview¶
dms_editsites
is a program included with the dms_tools package. It is designed for renumbering sites, or removing data for specific ones.
See Examples for illustrations of how you might do this.
After you install dms_tools, this program will be available to run at the command line.
Command-line usage¶
Edits sites in a data file. Typically you would use this program if you wanted to renumber sites or remove certain sites. This script is part of dms_tools (version 1.1.20) written by the Bloom Lab (see https://github.com/jbloomlab/dms_tools/graphs/contributors for all contributors). Detailed documentation is at http://jbloomlab.github.io/dms_tools/
usage: dms_editsites [-h] [--skipfirstline] [-v]
infile outfile {renumber,remove,retain} edit_file
Positional Arguments¶
infile | Existing data file. This could be a deep mutational scanning counts file, a preferences file, a differential preferences file, or any other file with the following format: blank lines or lines beginning with “#” (comment lines) are ignored; every other line must begin with an entry giving a unique site number (such as “1” or “2A”). The line may then have an arbitrary number of other entries separated from the site number by whitespace. If the lines have no whitespace, then we look for comma separators. Typically, this file might be a preferences_file, a diffpreferences_file, or a dms_counts. |
outfile | The created output file in which the site editing has been performed on “infile”. If this output file already exists, it is overwritten. |
edit_method | Possible choices: renumber, remove, retain How to do the editing: renumber sites, remove specified sites, or retain only specified sites. |
edit_file | Existing file specifying how edits are made. If “edit_method” is “renumber”, then all non-comment lines (those not beginning with “#”) must have two space delimited entries specifying the existing site in “infile” and the new site number with which it is replaced; all sites must be specified, and if the new number is “None” then the site is removed in the created file. If “edit_method” is “remove”, then each line should have as its first entry a site, and all of the listed sites are removed. If “edit_method” is “retain”, then each line should have as its first entry a site, and only the listed sites are retained. |
Named Arguments¶
--skipfirstline | |
Skip the edit operation on the first site. This could be helpful if dealing with a CSV file in pandas format. Default: False | |
-v, --version | show program’s version number and exit |
Examples¶
Renumbering sites¶
If we have the preferences.txt
file with these contents:
# POSITION WT SITE_ENTROPY PI_A PI_C PI_D PI_E PI_F PI_G PI_H PI_I PI_K PI_L PI_M PI_N PI_P PI_Q PI_R PI_S PI_T PI_V PI_W PI_Y PI_* PI_A_95 PI_C_95 PI_D_95 PI_E_95 PI_F_95 PI_G_95 PI_H_95 PI_I_95 PI_K_95 PI_L_95 PI_M_95 PI_N_95 PI_P_95 PI_Q_95 PI_R_95 PI_S_95 PI_T_95 PI_V_95 PI_W_95 PI_Y_95 PI_*_95
2 K 3.7887 0.00871112 0.0566807 0.00209889 0.0126265 0.123541 0.0419474 0.012759 0.198294 0.0609558 0.0861918 0.0132844 0.0314975 0.0410346 0.0578467 0.0166405 0.0588795 0.0873109 0.0655377 0.00906756 0.0098667 0.0052279 0.00377015,0.0161776 0.0368293,0.0866459 5.25935e-05,0.00750028 0.00147735,0.0305943 0.0838463,0.17365 0.0290031,0.0584818 0.00555741,0.0238828 0.156947,0.244985 0.0549586,0.0681017 0.0664255,0.113842 0.000847956,0.041653 0.0221782,0.0430464 0.0290997,0.0551884 0.0355218,0.091347 0.00758911,0.0250887 0.0443249,0.0761826 0.0677839,0.113325 0.0425034,0.0981422 0.000771089,0.0277961 0.00358541,0.0201952 0.000127331,0.0154344
3 A 2.83514 0.0437537 0.0334546 0.0309614 0.00177409 0.0895918 0.00481327 0.00159833 0.0194826 0.00612203 0.45822 0.00724804 0.0214667 0.00343922 0.0070182 0.00247844 0.019586 0.0234391 0.166837 0.0286347 0.0188882 0.0111935 0.0382826,0.0494847 0.0159482,0.061066 0.0146872,0.054093 3.31649e-05,0.00560408 0.0547333,0.132015 9.98457e-05,0.0141004 4.33912e-05,0.0056256 0.00931897,0.0363372 0.000604372,0.0187545 0.385594,0.531808 0.000164964,0.0337412 0.00394407,0.0612625 0.000229564,0.00835275 0.000259382,0.0237583 0.000710602,0.00561234 0.00861317,0.0325203 0.00902808,0.0448718 0.123725,0.227516 0.00809207,0.0680912 0.00888236,0.0354431 0.00412041,0.0228625
4 K 3.90821 0.0442179 0.0282262 0.00917627 0.0381669 0.00258029 0.00505373 0.117934 0.0259991 0.0512432 0.0905123 0.0170454 0.0126292 0.0614701 0.0848777 0.141501 0.0361854 0.0180969 0.0514335 0.104027 0.0562738 0.0033496 0.0281289,0.0676053 0.0130852,0.0517097 0.00165841,0.0237626 0.00962184,0.0932152 6.39066e-05,0.00890049 0.00132784,0.0114412 0.0783091,0.176141 0.0131793,0.0426951 0.0448744,0.0578793 0.0643019,0.115842 0.00299087,0.0466871 0.00124322,0.0341616 0.0425182,0.0870665 0.0530961,0.124081 0.103449,0.195545 0.0238936,0.0533164 0.009447,0.0303053 0.0345412,0.071387 0.0575267,0.175514 0.0281562,0.0992686 0.000150733,0.00985258
and the renumbering_scheme.txt
file with these contents:
#ORIGINAL_SITE NEW_SITE
2 1
3 2
4 2A
then the command:
dms_editsites preferences.txt renumbered_preferences.txt renumber renumbering_scheme.txt
creates the file renumbered_preferences.txt
with these contents:
# POSITION WT SITE_ENTROPY PI_A PI_C PI_D PI_E PI_F PI_G PI_H PI_I PI_K PI_L PI_M PI_N PI_P PI_Q PI_R PI_S PI_T PI_V PI_W PI_Y PI_* PI_A_95 PI_C_95 PI_D_95 PI_E_95 PI_F_95 PI_G_95 PI_H_95 PI_I_95 PI_K_95 PI_L_95 PI_M_95 PI_N_95 PI_P_95 PI_Q_95 PI_R_95 PI_S_95 PI_T_95 PI_V_95 PI_W_95 PI_Y_95 PI_*_95
1 K 3.7887 0.00871112 0.0566807 0.00209889 0.0126265 0.123541 0.0419474 0.012759 0.198294 0.0609558 0.0861918 0.0132844 0.0314975 0.0410346 0.0578467 0.0166405 0.0588795 0.0873109 0.0655377 0.00906756 0.0098667 0.0052279 0.00377015,0.0161776 0.0368293,0.0866459 5.25935e-05,0.00750028 0.00147735,0.0305943 0.0838463,0.17365 0.0290031,0.0584818 0.00555741,0.0238828 0.156947,0.244985 0.0549586,0.0681017 0.0664255,0.113842 0.000847956,0.041653 0.0221782,0.0430464 0.0290997,0.0551884 0.0355218,0.091347 0.00758911,0.0250887 0.0443249,0.0761826 0.0677839,0.113325 0.0425034,0.0981422 0.000771089,0.0277961 0.00358541,0.0201952 0.000127331,0.0154344
2 A 2.83514 0.0437537 0.0334546 0.0309614 0.00177409 0.0895918 0.00481327 0.00159833 0.0194826 0.00612203 0.45822 0.00724804 0.0214667 0.00343922 0.0070182 0.00247844 0.019586 0.0234391 0.166837 0.0286347 0.0188882 0.0111935 0.0382826,0.0494847 0.0159482,0.061066 0.0146872,0.054093 3.31649e-05,0.00560408 0.0547333,0.132015 9.98457e-05,0.0141004 4.33912e-05,0.0056256 0.00931897,0.0363372 0.000604372,0.0187545 0.385594,0.531808 0.000164964,0.0337412 0.00394407,0.0612625 0.000229564,0.00835275 0.000259382,0.0237583 0.000710602,0.00561234 0.00861317,0.0325203 0.00902808,0.0448718 0.123725,0.227516 0.00809207,0.0680912 0.00888236,0.0354431 0.00412041,0.0228625
2A K 3.90821 0.0442179 0.0282262 0.00917627 0.0381669 0.00258029 0.00505373 0.117934 0.0259991 0.0512432 0.0905123 0.0170454 0.0126292 0.0614701 0.0848777 0.141501 0.0361854 0.0180969 0.0514335 0.104027 0.0562738 0.0033496 0.0281289,0.0676053 0.0130852,0.0517097 0.00165841,0.0237626 0.00962184,0.0932152 6.39066e-05,0.00890049 0.00132784,0.0114412 0.0783091,0.176141 0.0131793,0.0426951 0.0448744,0.0578793 0.0643019,0.115842 0.00299087,0.0466871 0.00124322,0.0341616 0.0425182,0.0870665 0.0530961,0.124081 0.103449,0.195545 0.0238936,0.0533164 0.009447,0.0303053 0.0345412,0.071387 0.0575267,0.175514 0.0281562,0.0992686 0.000150733,0.00985258
Removing sites¶
If we have the preferences.txt
file with these contents:
# POSITION WT SITE_ENTROPY PI_A PI_C PI_D PI_E PI_F PI_G PI_H PI_I PI_K PI_L PI_M PI_N PI_P PI_Q PI_R PI_S PI_T PI_V PI_W PI_Y PI_* PI_A_95 PI_C_95 PI_D_95 PI_E_95 PI_F_95 PI_G_95 PI_H_95 PI_I_95 PI_K_95 PI_L_95 PI_M_95 PI_N_95 PI_P_95 PI_Q_95 PI_R_95 PI_S_95 PI_T_95 PI_V_95 PI_W_95 PI_Y_95 PI_*_95
2 K 3.7887 0.00871112 0.0566807 0.00209889 0.0126265 0.123541 0.0419474 0.012759 0.198294 0.0609558 0.0861918 0.0132844 0.0314975 0.0410346 0.0578467 0.0166405 0.0588795 0.0873109 0.0655377 0.00906756 0.0098667 0.0052279 0.00377015,0.0161776 0.0368293,0.0866459 5.25935e-05,0.00750028 0.00147735,0.0305943 0.0838463,0.17365 0.0290031,0.0584818 0.00555741,0.0238828 0.156947,0.244985 0.0549586,0.0681017 0.0664255,0.113842 0.000847956,0.041653 0.0221782,0.0430464 0.0290997,0.0551884 0.0355218,0.091347 0.00758911,0.0250887 0.0443249,0.0761826 0.0677839,0.113325 0.0425034,0.0981422 0.000771089,0.0277961 0.00358541,0.0201952 0.000127331,0.0154344
3 A 2.83514 0.0437537 0.0334546 0.0309614 0.00177409 0.0895918 0.00481327 0.00159833 0.0194826 0.00612203 0.45822 0.00724804 0.0214667 0.00343922 0.0070182 0.00247844 0.019586 0.0234391 0.166837 0.0286347 0.0188882 0.0111935 0.0382826,0.0494847 0.0159482,0.061066 0.0146872,0.054093 3.31649e-05,0.00560408 0.0547333,0.132015 9.98457e-05,0.0141004 4.33912e-05,0.0056256 0.00931897,0.0363372 0.000604372,0.0187545 0.385594,0.531808 0.000164964,0.0337412 0.00394407,0.0612625 0.000229564,0.00835275 0.000259382,0.0237583 0.000710602,0.00561234 0.00861317,0.0325203 0.00902808,0.0448718 0.123725,0.227516 0.00809207,0.0680912 0.00888236,0.0354431 0.00412041,0.0228625
4 K 3.90821 0.0442179 0.0282262 0.00917627 0.0381669 0.00258029 0.00505373 0.117934 0.0259991 0.0512432 0.0905123 0.0170454 0.0126292 0.0614701 0.0848777 0.141501 0.0361854 0.0180969 0.0514335 0.104027 0.0562738 0.0033496 0.0281289,0.0676053 0.0130852,0.0517097 0.00165841,0.0237626 0.00962184,0.0932152 6.39066e-05,0.00890049 0.00132784,0.0114412 0.0783091,0.176141 0.0131793,0.0426951 0.0448744,0.0578793 0.0643019,0.115842 0.00299087,0.0466871 0.00124322,0.0341616 0.0425182,0.0870665 0.0530961,0.124081 0.103449,0.195545 0.0238936,0.0533164 0.009447,0.0303053 0.0345412,0.071387 0.0575267,0.175514 0.0281562,0.0992686 0.000150733,0.00985258
and the file remove_sites.txt
with these contents:
#sites to remove
2
then the command:
dms_editsites preferences.txt pruned_preferences.txt remove remove_sites.txt
creates the file pruned_preferences.txt
with these contents:
# POSITION WT SITE_ENTROPY PI_A PI_C PI_D PI_E PI_F PI_G PI_H PI_I PI_K PI_L PI_M PI_N PI_P PI_Q PI_R PI_S PI_T PI_V PI_W PI_Y PI_* PI_A_95 PI_C_95 PI_D_95 PI_E_95 PI_F_95 PI_G_95 PI_H_95 PI_I_95 PI_K_95 PI_L_95 PI_M_95 PI_N_95 PI_P_95 PI_Q_95 PI_R_95 PI_S_95 PI_T_95 PI_V_95 PI_W_95 PI_Y_95 PI_*_95
3 A 2.83514 0.0437537 0.0334546 0.0309614 0.00177409 0.0895918 0.00481327 0.00159833 0.0194826 0.00612203 0.45822 0.00724804 0.0214667 0.00343922 0.0070182 0.00247844 0.019586 0.0234391 0.166837 0.0286347 0.0188882 0.0111935 0.0382826,0.0494847 0.0159482,0.061066 0.0146872,0.054093 3.31649e-05,0.00560408 0.0547333,0.132015 9.98457e-05,0.0141004 4.33912e-05,0.0056256 0.00931897,0.0363372 0.000604372,0.0187545 0.385594,0.531808 0.000164964,0.0337412 0.00394407,0.0612625 0.000229564,0.00835275 0.000259382,0.0237583 0.000710602,0.00561234 0.00861317,0.0325203 0.00902808,0.0448718 0.123725,0.227516 0.00809207,0.0680912 0.00888236,0.0354431 0.00412041,0.0228625
4 K 3.90821 0.0442179 0.0282262 0.00917627 0.0381669 0.00258029 0.00505373 0.117934 0.0259991 0.0512432 0.0905123 0.0170454 0.0126292 0.0614701 0.0848777 0.141501 0.0361854 0.0180969 0.0514335 0.104027 0.0562738 0.0033496 0.0281289,0.0676053 0.0130852,0.0517097 0.00165841,0.0237626 0.00962184,0.0932152 6.39066e-05,0.00890049 0.00132784,0.0114412 0.0783091,0.176141 0.0131793,0.0426951 0.0448744,0.0578793 0.0643019,0.115842 0.00299087,0.0466871 0.00124322,0.0341616 0.0425182,0.0870665 0.0530961,0.124081 0.103449,0.195545 0.0238936,0.0533164 0.009447,0.0303053 0.0345412,0.071387 0.0575267,0.175514 0.0281562,0.0992686 0.000150733,0.00985258