Introduction
I made this guide for biologists with moderate programming experience who want to improve the way they work. This guide is not intended to teach you a particular coding language (here's a quick aside about that). Instead, you'll learn how to organize your projects, use remote resources, create reproducible software environments, and build computational pipelines. My goal is that you not only learn the how but also the why behind each practice.
A Quick Note
This guide was made with members of the Bloom lab in mind. In the future, I might expand this resource to apply to a more general audience.
Outline
Although each section is useful in isolation, you'll get the most out of this guide if you follow it in order.
- Coding Smarter with Online Tools
Taking advantage of online resources
- Setting up your IDE
Using VSCode to streamline your coding workflow
- Organizing your projects
Creating clearly named, well organized projects
- Using remote resources
Increasing your computational capabilities with remote resources
- Managing software environments
Using
conda
to isolate software environments - Tracking your code
Versioning your code with
git
- Working collaboratively
Collaborating with
GitHub
- Reproducible workflows and pipelines
Building reproducible pipelines with
Snakemake
- Coding best practices
Writing clear code
Learning to Program
Some basic programming knowledge is necessary to get the most out of this guide. Below, I'll give a quick overview of the programming languages that are widely used by biologists so you can pick the language that best suites your needs. ß Generally, programming languages are either compiled––you write a script and an intermediate software called a compiler transforms it into an executable program––or interpreted––you write a script that can be executed at any point without a compilation step. Compiled languages are fast and easy to run on different operating systems; however, they tend to be more difficult to learn, run, and debug. Interpreted languages are usually not as fast as compiled languages, but they're much easier to learn. Both types of language have a place in biological programming. Compiled languages are great for writing programs that run quickly on huge amounts of data (like an aligner). Interpreted languages are great for data analysis and plotting. Most biologists will want to learn an interpreted language.
Common compiled programming languages used by biologists are C/C++, Java, and Rust. They're used to write short-read aligners like BWA
(C/C++), genome browsers like IGV
(Java), and tools for single-cell genomics like cellranger
(Rust). You're likely to interact with tools written in a compiled language, but you probably won't need to write your own.
Common interpreted languages are Python, R, Javascript, and Perl. Python and R are the most widely used by biologists. Perl was common among biologists, but has fallen out favor. Javascript is the principle programming language of the web and is occasionally used by biologists to build websites and interactive dashboards. If you're a biologist who want's to learn programming, stick to with Python or R.
When choosing whether to learn Python or R, consider the following factors:
1. What are my research needs?
Python and R share most of the same core features, but they have different strengths. Python is a better general programming language with many useful libraries for biological problems. R is geared towards statistics and data analysis, so programming for these scenarios is more natural. Additionally, R has more common packages for analyzing gene expression and single-cell data. Learn the language with the best set of tools for the type of research you're doing.
2. What are people around me using?
If possible, learn the language that people around you (in your lab or institute) are using. It makes it easier to collaborate and they'll be a valuable resource while you're learning.
TIP
Learn Python if you're a member of the Bloom lab.
Suggestions
If you have any suggestions for topics that aren't covered in this guide, please suggest them in the corresponding Discussions thread.