AquaINFRA News

Europe's Water Data Is Scattered Across Hundreds of Databases. Can We Fix That?

April 22nd, 2026
Europe's Water Data Is Scattered Across Hundreds of Databases. Can We Fix That?

Try a simple exercise. Pick a European river that crosses at least two national borders, the Danube, the Rhine, the Elbe, and attempt to assemble a continuous dataset of nitrogen concentrations from source to sea, covering the last twenty years.

You will need data from multiple national monitoring agencies, each with its own database, its own access procedures, and its own way of recording measurements. You will encounter different chemical parameters (total nitrogen, nitrate-nitrogen, dissolved inorganic nitrogen), different units, different sampling frequencies, and different quality assurance protocols. When the river reaches the coast, you will need to switch from freshwater monitoring systems to marine ones, which operate under entirely separate institutional frameworks.

This exercise will take you days, possibly weeks. Most researchers, understandably, do not bother. They work with whatever data they can access most easily, which usually means the data from their own country, their own institution, or a single repository they already know. The result is that pan-European water science is far less integrated than the continent’s interconnected water systems demand.

The architecture of fragmentation

Europe’s water data landscape is not chaotic by accident. It is the product of rational decisions made at different levels, national, European, institutional, each optimised for a specific purpose but never designed to work as a unified whole.

At the national level, EU member states are required to monitor water quality under two major directives. The Water Framework Directive (WFD), adopted in 2000, covers rivers, lakes, groundwater, and transitional waters. The Marine Strategy Framework Directive (MSFD), adopted in 2008, covers marine waters out to the limit of exclusive economic zones. Both directives require regular reporting to the European Environment Agency (EEA), but the monitoring itself is carried out by national agencies, environment ministries, water authorities, geological surveys, using nationally defined methods.

The data that member states report to the EEA feeds into the Water Information System for Europe (WISE), which provides aggregated freshwater data, and into the MSFD reporting mechanism. But WISE is primarily a policy tool, designed to track compliance with the WFD. It is not a research database. The data are aggregated, the spatial resolution is often coarse, and access to the underlying raw measurements typically requires going back to the national source.

On the marine side, EMODnet, the European Marine Observation and Data Network, aggregates data from national oceanographic centres across seven thematic areas: bathymetry, geology, physics, chemistry, biology, seabed habitats, and human activities. EMODnet has made significant progress in harmonising marine data and providing open access, but it operates independently of freshwater systems. A researcher studying nutrient transport from rivers to coastal seas must straddle both worlds.

Then there are the research infrastructures. ICES, the International Council for the Exploration of the Sea, manages datasets on fisheries, oceanography, and marine contaminants contributed by its 20 member countries. LifeWatch ERIC provides biodiversity data infrastructure. DANUBIS supports research in the Danube basin. Copernicus Marine Service delivers satellite-derived and modelled ocean data. Each of these serves a specific community and does it well, but none was built to interoperate with the others.

Add to this the institutional repositories maintained by universities, research institutes, and individual projects, many containing valuable long-term datasets that are discoverable only if you happen to know they exist, and the picture becomes clear. Europe does not lack water data. It lacks the ability to find, combine, and use that data efficiently.

Why interoperability is so difficult

The fragmentation is not simply a matter of different websites hosting different files. The barriers run deeper.

Vocabularies and ontologies. Different communities describe the same things in different ways. A marine ecologist’s “chlorophyll-a concentration in surface water” and a freshwater limnologist’s “Chl-a (0–2m)” may refer to functionally identical measurements, but without a shared vocabulary, a machine, or a researcher outside the field, cannot recognise them as such. Efforts to develop common vocabularies exist, such as the NERC Vocabulary Server maintained by the British Oceanographic Data Centre, but adoption is uneven.

Metadata standards. Some databases use ISO 19115 for geospatial metadata. Others use Dublin Core. Many use bespoke schemas or, worse, no formal metadata standard at all. Without consistent metadata, automated discovery and harvesting of datasets is extremely difficult.

Licensing and access. Not all water data in Europe are open. Some national agencies restrict access to raw monitoring data, requiring formal requests or data-sharing agreements. Others provide data freely but without clear licensing, which creates uncertainty about whether the data can be reused, redistributed, or combined with other sources. The FAIR principles, Findable, Accessible, Interoperable, Reusable, provide a framework for addressing this, but many datasets predate the FAIR movement and have not been retroactively updated.

The freshwater-marine divide. This is perhaps the most fundamental barrier. Freshwater and marine science have developed as largely separate disciplines, with different institutions, different journals, different conferences, and different data cultures. The WFD and MSFD, despite both addressing water quality, were developed independently and use different classification systems. The boundary between them, transitional and coastal waters, is precisely where integration matters most and where data gaps are widest.

What would it take to fix this?

Complete unification of Europe’s water data into a single database is neither realistic nor desirable. The diversity of monitoring programmes reflects genuine differences in what is being measured, why, and by whom. Forcing everything into a single schema would lose important nuance.

The more promising approach is federation, connecting existing databases through shared standards, common APIs, and discovery services that allow researchers to search across multiple sources simultaneously. This is the model that EOSC, the European Open Science Cloud, is pursuing across all scientific disciplines, and it is the approach taken by projects like AquaINFRA in the aquatic domain.

Federation requires agreement on minimum metadata standards, adoption of persistent identifiers for datasets, and development of mapping services that translate between vocabularies. It also requires institutional willingness to expose data through standardised interfaces, something that depends as much on policy and incentives as on technology.

Progress is being made. EMODnet’s ingestion pipeline has demonstrated that marine data from diverse national sources can be harmonised and made accessible. The INSPIRE Directive has pushed member states towards standardised geospatial data sharing. And the growing emphasis on open science in EU funding programmes is creating stronger incentives for data providers to adopt FAIR practices.

But the gap between aspiration and reality remains wide. Many of the researchers who most need integrated water data, those working on cross-border pollution, climate impacts on water resources, or land-sea interactions, are still spending a disproportionate share of their time on data wrangling rather than analysis.

The data exist. The challenge is making them work together.