Concrete tools and practices for making computational research reproducible, from version control to environment management and documentation.
Studies across multiple fields have shown that a significant fraction of computational results cannot be independently reproduced, even by the original authors. A 2019 survey found that roughly 70% of researchers had tried and failed to reproduce another scientist's results, and over 50% had failed to reproduce their own.
The causes are rarely fraud. They are mundane: undocumented dependencies, hardcoded file paths, forgotten preprocessing steps, random seeds that were not recorded, and software environments that shifted between runs.
The good news is that reproducible computational research is achievable with tools and practices that are already available. It requires discipline, not heroism.
Code. Every script, notebook, and analysis program should be in a Git repository. Commit frequently with meaningful messages. Tag releases that correspond to published results.
What to commit:
What not to commit:
Data. For datasets too large for Git, use data versioning tools:
Environment. Record your software environment explicitly (more on this below).
The most common reproducibility failure: "It works on my machine." Software environments drift over time. A package update changes a default parameter. A system library version affects numerical precision. An operating system upgrade breaks a compiled dependency.
Environment specification files capture what is installed:
requirements.txt or pyproject.toml for Python (pin versions: numpy==1.24.3, not just numpy)renv.lock for R (renv captures the exact package versions)environment.yml for Conda environmentsContainers go further by capturing the entire operating system environment:
FROM python:3.11.4-slim, not FROM python:latest)Virtual machines provide the most complete isolation but are heavy. Useful for archiving an exact environment for long-term reproducibility.
In practice, the sweet spot is containers plus pinned dependencies. This covers the vast majority of reproducibility failures without excessive overhead.
If reproducing your results requires a human to remember and execute steps in the right order, reproducibility depends on that human's memory and attention. Automate instead.
Workflow managers (Make, Snakemake, Nextflow) define the steps, their dependencies, and how to execute them. Running snakemake or make all regenerates all results from raw data.
Continuous integration (GitHub Actions, GitLab CI) can automatically run your analysis pipeline when code changes, catching reproducibility breakdowns before they accumulate.
Literate programming (Jupyter Notebooks, R Markdown, Quarto) interleaves code, documentation, and results in a single document. This is excellent for analyses that need narrative explanation, but beware: notebooks can be executed out of order, creating hidden state. Use tools like nbstripout to clean notebook outputs before committing, and always verify that the notebook runs cleanly from top to bottom.
Here is a concrete workflow for reproducible computational research:
my-project/
README.md # Project description, how to reproduce
LICENSE # Data and code licensing
Dockerfile # Environment definition
Snakefile # Pipeline definition
config.yaml # Analysis parameters
data/
raw/ # Immutable raw data (or DVC-tracked pointers)
processed/ # Generated, gitignored
src/ # Analysis source code
results/ # Generated outputs, gitignored
docs/ # Methodology documentation
src/Snakefile if the pipeline structure changesconfig.yaml if parameters changesnakemake --use-singularity (or Docker)Random seeds. Any analysis involving randomness (simulations, bootstrapping, train/test splits) must record and set random seeds. Document them in the configuration file.
Floating-point non-determinism. Parallel computation can produce slightly different results across runs due to floating-point arithmetic ordering. Document expected precision and use tolerance-based comparisons in tests.
Hidden dependencies. System libraries, environment variables, and filesystem state can affect results without appearing in your dependency specifications. Containers catch most of these.
Manual steps. "Then I manually adjusted the color scale in the figure" breaks reproducibility. Script everything, including figure generation.
Key takeaway: Reproducibility is not about perfection. It is about giving someone else (including future you) a reasonable chance of getting the same results. Version control your code and data. Specify your environment. Automate your workflow. Document what you did and why. These practices take effort upfront but save far more time in the long run.
Whether you're modernizing your infrastructure, navigating compliance, or building new software - we can help.
Book a 30-min Call