AquaINFRA News

Open Science Isn't Just About Access... It's About Reproducibility

April 27th, 2026

Open science has become one of the defining policy ambitions of modern research. The European Commission now requires open access publication for all Horizon Europe outputs. The infrastructure for making papers freely available has never been stronger.

But open access to publications is only the most visible layer of a much deeper problem. The real promise of open science is not that anyone can read your paper, it is that anyone can reproduce your results. And in aquatic science, that promise remains largely unfulfilled.

The reproducibility problem is not abstract

The so-called “replication crisis” first gained widespread attention in psychology and biomedical research, where landmark studies in the 2010s found that a significant proportion of published findings could not be independently replicated. But the problem is not confined to those fields. In ecology and environmental science, reproducibility failures are common. They are simply less visible, because fewer researchers attempt exact replications of observational studies.

In aquatic science specifically, the challenges are acute. Consider a study that estimates nutrient loading in a coastal catchment. The result depends on which hydrological model was used, how land-use data were classified, what statistical methods were applied to fill gaps in monitoring records, and how boundary conditions were defined. Each of these steps involves choices that are rarely documented in sufficient detail for another researcher to repeat the analysis exactly.

A 2020 survey of ecology papers found that fewer than 30 per cent provided access to both the data and the code needed to reproduce the published results. In marine and freshwater science, the figure is likely lower, given the field’s reliance on large observational datasets held by national monitoring agencies with varying data-sharing policies.

Why aquatic science faces particular difficulties

Several features of aquatic research make reproducibility especially difficult.

Data heterogeneity. Water quality parameters are measured using different protocols across Europe. Chlorophyll-a concentrations in a Finnish lake and a Portuguese reservoir may both be reported in micrograms per litre, but the extraction methods, spectrophotometric corrections, and quality assurance procedures behind those numbers can differ substantially. Without standardised metadata describing the analytical method, combining or comparing datasets introduces hidden uncertainty.

Ad hoc data processing. Much of the data wrangling in aquatic science happens in spreadsheets or custom scripts that are never shared. A researcher might remove outliers based on expert judgement, interpolate missing values using a method chosen informally, or apply unit conversions that are documented only in a lab notebook. These steps are part of the scientific method, but they are rarely part of the published record.

Complex modelling chains. Hydrological and ecological models often involve sequences of linked tools. For example, a climate downscaling model feeds into a rainfall-runoff model, which then feeds into a water quality model. Each link in the chain has its own parameterisation, calibration data, and software dependencies. Reproducing the full chain requires not just access to the final model, but to every intermediate step and the exact software versions used.

Institutional data silos. Much aquatic monitoring data sits within national agencies that may share aggregated statistics but not raw observations. Even when data are technically available, access may require formal agreements, registration on national portals, or navigation of language barriers. This makes it difficult for an independent researcher to obtain the same input data used in a published study.

What reproducible workflows actually look like

Reproducibility is not an all-or-nothing proposition. It exists on a spectrum, and even partial improvements can make a significant difference.

Version-controlled code. The most basic step is to share the analysis code alongside the publication, using a platform such as GitHub or GitLab, with a permanent archive on Zenodo or a similar repository. This means not just the final script, but the full history of changes, so that another researcher can see exactly what was done and in what order.

Containerised environments. Software dependencies are a common source of irreproducibility. A script that ran correctly in R 4.1 with a particular set of packages may fail or produce different results in R 4.3. Container technologies such as Docker and Singularity allow researchers to package the entire computational environment, ensuring that the same software stack is available to anyone who wants to reproduce the analysis.

Workflow management systems. Tools such as Galaxy allow researchers to define complex analytical pipelines as formal workflows, specifying the inputs, outputs, and dependencies of each step. These workflows can be shared, inspected, and re-executed, providing a complete record of the analytical process.

FAIR data principles in practice. The FAIR principles, Findable, Accessible, Interoperable, and Reusable, were formulated in 2016 precisely to address these issues. But FAIR is not just a set of aspirations for data repositories. Applied rigorously, it means that every dataset used in a study should have a persistent identifier, machine-readable metadata describing its provenance and structure, and a clear licence specifying reuse conditions.

Provenance tracking. A reproducible workflow records not just what was done, but also where the input data came from, when they were accessed, and what transformations were applied. This kind of provenance metadata is essential for studies that draw on monitoring data that may be updated or corrected after publication.

The cultural dimension

Technical solutions are necessary but not sufficient. Reproducibility also requires a shift in professional incentives. Currently, researchers are rewarded primarily for publications and citations, not for the quality of their data management or the transparency of their methods. Preparing a fully reproducible workflow takes time, and that is time that could otherwise be spent writing another paper.

Some progress is being made. Journals such as Environmental Modelling & Software now encourage or require submission of code and data. The European Commission’s Open Research Europe platform mandates open peer review and data availability. Research infrastructure projects, including those operating within the European Open Science Cloud framework, are building platforms where researchers can develop, share, and execute analytical workflows in standardised environments.

But these remain exceptions rather than the norm. In many areas of aquatic science, the standard practice is still to describe methods in prose, provide summary statistics rather than raw data, and leave the computational details to the supplementary material, if they are documented at all.

Why this matters beyond academia

Reproducibility is not merely an internal concern for the research community. Policy decisions about water management, eutrophication control, and biodiversity protection depend on scientific evidence. If the analyses underpinning those decisions cannot be independently verified, the evidence base is weaker than it appears.

The EU Water Framework Directive, for example, relies on member states conducting ecological status assessments using nationally calibrated methods. If the calibration procedure is not fully documented and reproducible, it becomes difficult to compare assessments across countries or to identify whether apparent differences in ecological status reflect genuine environmental variation or methodological artefacts.

Similarly, climate change impact assessments for freshwater resources often involve complex modelling chains. If these are not reproducible, it is impossible to evaluate how sensitive the conclusions are to particular modelling choices. This is a significant gap when the results inform billion-euro investment decisions in flood protection or water supply infrastructure.

Moving forward

The path towards reproducible aquatic science does not require a revolution. It requires consistent application of practices that already exist: sharing code and data, using version control, documenting analytical choices explicitly, and adopting workflow tools that make complex analyses transparent and re-executable.

Virtual research environments, platforms that combine data access, computational tools, and workflow management in a single infrastructure, offer a practical way to lower the barrier to reproducibility. When a researcher can develop an analysis in an environment that automatically tracks data provenance, software versions, and processing steps, reproducibility becomes a by-product of the research process rather than an additional burden.

Open science is about more than removing paywalls. It is about making the full chain of evidence, from raw observation to published conclusion, available for scrutiny. In aquatic science, where the stakes include the health of Europe’s rivers, lakes, and seas, that transparency is not optional. It is essential.

How AquaINFRA Uses Cookies

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. You can edit which cookies are active using the settings below. Once you're happy with you selection, click 'Accept & Close'. You can edit these settings at any time with the 'Edit Cookie Settings' button in the footer.

Google Analytics