AquaINFRA News

What 'FAIR Data' Actually Means in Practice (And Why It's Harder Than It Sounds)

March 31st, 2026
What 'FAIR Data' Actually Means in Practice (And Why It's Harder Than It Sounds)

Almost everyone in research now agrees that data should be FAIR — Findable, Accessible, Interoperable, and Reusable. The FAIR principles, first articulated in 2016 by Wilkinson and colleagues, have become a fixture of funding proposals, institutional policies, and strategic plans across Europe and beyond. The European Commission requires FAIR data management in Horizon Europe projects. National funding bodies have followed suit. University libraries have written guides. Conference panels have discussed the concept at length.

And yet, in practice, most research data is still not FAIR. A 2020 assessment of European research data found that the majority of datasets failed to meet even basic FAIR criteria. In aquatic science — a field that depends on data from hundreds of monitoring networks, research cruises, sensor arrays, and citizen science programmes — the gap between aspiration and reality is particularly wide.

This is not because researchers are unwilling. It is because making data genuinely FAIR is a great deal harder than endorsing the idea.

What each letter actually requires

The four FAIR principles sound intuitive. In practice, each one involves a set of specific, sometimes demanding technical requirements.

Findable

For data to be findable, it must be described with rich metadata — structured information about what the data contains, who collected it, when and where it was gathered, and how it was processed. This metadata must itself be stored in a searchable resource, and both the data and its metadata must be assigned a persistent identifier, such as a Digital Object Identifier (DOI).

This sounds simple enough. But consider what it requires of a research group that has just completed a two-year monitoring campaign on a Mediterranean coastal lagoon. They have thousands of water quality measurements, each with associated spatial coordinates, timestamps, depth values, and instrument calibration records. Creating metadata that accurately describes this dataset — in a standardised format that a search engine or data portal can parse — is a substantial task. It requires time, expertise, and familiarity with metadata standards such as ISO 19115 for geospatial data or the Darwin Core for biological observations.

Many research groups lack the resources or training to do this well. The result is that datasets are deposited in repositories with minimal metadata — a title, an author name, perhaps a brief description — making them technically discoverable but practically invisible to anyone who did not already know they existed.

Accessible

Accessibility means that data can be retrieved through a standardised, open protocol — typically via the internet — and that the conditions under which access is granted are clearly stated. Importantly, FAIR does not require that all data be open. Data may be restricted for legitimate reasons, including privacy, commercial sensitivity, or national security. But the metadata should always be accessible, even if the data itself is not, and the access procedures should be transparent and machine-readable. In aquatic science, accessibility is complicated by the sheer diversity of data sources. Environmental monitoring data may be held by national agencies, regional authorities, research institutes, or international organisations, each with different access policies, different web interfaces, and different authentication requirements. A researcher trying to assemble a pan-European dataset on river nutrient concentrations may need to navigate dozens of separate portals, each with its own registration process and terms of use.

The technical infrastructure for standardised data access exists. Protocols such as OPeNDAP for scientific datasets and OGC web services for geospatial data provide well-established mechanisms. But adoption is uneven, and many data holders still distribute their data as downloadable files on project websites, with no programmatic access and no guarantee of long-term availability.

Interoperable

Interoperability is arguably the most challenging of the four principles. It requires that data use shared vocabularies, ontologies, and formats, so that datasets from different sources can be combined without manual translation.

This is where aquatic science faces some of its greatest difficulties. Freshwater and marine monitoring programmes, as discussed elsewhere on this blog, often use different variable names, different units, different classification systems, and different quality flags for essentially the same measurements. A chlorophyll-a concentration measured in a Finnish lake and a chlorophyll-a concentration measured in the Adriatic Sea may be recorded in entirely different formats, using different analytical methods, and tagged with different quality descriptors.

Achieving interoperability requires agreement on controlled vocabularies — standardised lists of terms and definitions that everyone uses consistently. In ocean science, the NERC Vocabulary Server maintained by the British Oceanographic Data Centre provides an extensive set of standardised terms. Freshwater science has fewer widely adopted equivalents. Bridging the two requires mapping between existing vocabularies, filling gaps, and persuading data producers to adopt common standards — a process that is as much social and institutional as it is technical.

Reusable

Reusability demands that data be well-described, clearly licensed, and accompanied by sufficient provenance information that a new user can understand how the data was collected and whether it is suitable for their purpose.

Clear licensing is a particular sticking point. Many publicly funded datasets in Europe lack explicit licence statements, leaving potential users uncertain about whether they may legally reuse the data, and under what conditions. The absence of a licence is not the same as open access — legally, it often means the opposite. Creative Commons licences, particularly CC BY 4.0 and CC0, provide straightforward solutions, but their adoption in environmental monitoring is far from universal.

Provenance — the record of how data was generated, processed, and transformed — is equally important and equally neglected. A dissolved oxygen measurement is of limited use without knowing the instrument type, calibration date, sampling depth, and quality control procedures applied. Yet this information is frequently lost when data moves from the original collector to a repository, or from one format to another.

Why it is so hard in aquatic science

Several features of aquatic science make FAIR implementation especially difficult.

The data landscape is extraordinarily fragmented. Europe's aquatic environment is monitored by hundreds of organisations — national environment agencies, water authorities, marine institutes, universities, and non-governmental organisations — each with its own data management traditions. There is no single authority that can mandate standards across this landscape.

Much of the most valuable data is collected under regulatory obligations, not research programmes. Data gathered to comply with the Water Framework Directive or the Marine Strategy Framework Directive is often formatted for regulatory reporting rather than scientific reuse. The reporting templates used by the European Environment Agency serve their regulatory purpose well but were not designed with FAIR principles in mind.

Long-term monitoring data presents particular challenges. Some European water quality time series stretch back decades, with records kept in formats ranging from handwritten field notebooks to proprietary database systems. Making these historical records FAIR requires significant retrospective effort — digitisation, standardisation, and metadata creation — that few organisations have the capacity to undertake without dedicated funding. Finally, there is the human dimension. FAIR implementation requires skills that many researchers were not trained in: data management, metadata standards, controlled vocabularies, persistent identifier systems. It requires time that competes with other demands — writing papers, supervising students, conducting fieldwork. And it requires institutional support — data management infrastructure, training programmes, and recognition that data curation is a legitimate and valuable professional activity.

Progress and pragmatism

None of this is grounds for pessimism. Significant progress has been made. The European Open Science Cloud (EOSC) is building shared infrastructure for FAIR data across disciplines. Domain-specific initiatives, including AquaINFRA's work on virtual research environments for aquatic science, are developing practical tools and standards tailored to the needs of the water research community. National data centres in several European countries have made substantial advances in FAIR compliance.

But it is important to be honest about the scale of the challenge. FAIR data is not achieved by adding a DOI to a spreadsheet and uploading it to a repository. It requires sustained investment in infrastructure, standards, training, and cultural change. The principles are sound. The hard part is the practice.