
Most research methods are described after the fact. A paper appears, the method works, and the years of trial and error that produced it are folded away into a tidy section headed "Methods". The AquaINFRA use case for the Daugava and the Gulf of Riga is being built the other way round: in public, on GitHub, while it is still rough. You can read the code, see the dead ends, and clone the whole thing today if you want to.
It starts with one of the oldest instruments in oceanography. A Secchi disk is a plain white plate lowered into the water on a line until the observer can no longer see it. The depth at which it disappears is the measure of the water's transparency. It is simple, it is cheap, and in the Baltic it has been done in much the same way for decades. That long record is the whole point.
We put fourteen questions to the researchers at the Latvian Institute of Aquatic Ecology who are building the workflow. Astra Labuce answered them.
The question sounds narrow and is not. "The research question answers if there are significant changes in the analysed variable, that is, water transparency, and in which regions of the Gulf of Riga they can be detected," Labuce says. "In plain terms, decision-makers using this tool can see that in the eastern and northern regions of the Gulf of Riga water transparency has decreased significantly."
Why that matters takes a sentence more. "Water transparency is an overarching water environment parameter. When transparency decreases, less light penetrates the water column, which can affect primary production, underwater vegetation, habitat suitability, and overall ecosystem functioning. A significant decrease tells decision-makers where changes are occurring, how extensive they are, and where further investigation or management actions may be needed."
So why a white disk and a piece of string, when modern sensors exist? "Secchi depth is one of the few indicators that allows us to look back several decades, and that is the main reason we chose it," she says. Other methods can measure transparency, but their records are short. The trade-off is that the disk depends on a person. "It heavily relies on the eyesight of the person who performs the measurement, so changes in observers can affect the results. It is also important not to measure on the sunny side of the vessel, though this is not always possible." All of it goes into the monitoring protocols and has to be accounted for when the data is interpreted.

The Secchi disk is one of the oldest instruments in oceanography: lower it until it disappears, note the depth, and you have a measure of water transparency.
And it has a blind spot. A Secchi reading tells you that the water is changing, not why. "A lower value may be caused by phytoplankton, suspended particles, coloured dissolved organic matter, or a combination. When several factors act together, it is impossible to quantify the exact contribution of each." That is where the science is now heading: "We are looking for the source of the change by linking land cover changes in the catchment area, expressed through river hydrochemistry, to the decrease in water transparency observed in the Gulf."
A long-term decline in transparency has two main suspects. Eutrophication, where more nutrients feed more algae. Or brownification, where more coloured dissolved organic matter washes in from the land. Telling them apart is a genuine scientific question. For a manager, Labuce argues, it is sometimes the wrong one to lead with.
"Managers do not necessarily care whether the water is getting darker because of increased phytoplankton or increased dissolved organic matter. They care about what is driving the change and whether anything can be done about it." The mechanism matters for the explanation. The source matters for the response.
Is darkening water good news or bad? Neither, in her framing, and that is the honest answer. "Water darkening is news of change, not necessarily good or bad. It signals that the ecosystem is changing." The likely consequences are concrete enough: perennial macroalgae, which need light to photosynthesise and are already squeezed between the depth where light runs out and the shallows where waves batter them, may decline or disappear. Pelagic food webs can shift too, depending on the cause. If dissolved organic matter is driving the change, it feeds bacteria, and "bacterial activity and production may increase at the expense of phytoplankton biomass."
The trend, she says, is strong enough to warrant attention. Attention is not action. "To move from attention to action, we first need to understand what is driving the change, what the ecological consequences might be, and whether there are realistic management options. We need these answers while there is still time to respond. This is exactly why research like this is needed."
The distinctive thing about this use case is that it was opened up early, marked pre-alpha, long before it was polished. "Scientists are used to sharing polished results, but polishing takes time," Labuce says. "Building the workflow step by step and opening up the analysis, even just for fellow researchers, can bring insights from different perspectives. It can help identify mistakes, suggest improvements, or show that the workflow could be useful for other regions and situations as well."
The workflow is kept in both R and Python, which sounds like duplicated effort and is not. "The analysis itself is conducted entirely in R. Python was mainly used to fit the analysis into the Galaxy system and make it work within the workflow environment." R does the science. Python does the plumbing that turns each step into something the web can call.
That plumbing is where the time went, and the answer here is the most useful thing a working scientist can tell another. "Turning a spreadsheet into a script is not really a problem. Turning a script into a reproducible and generic script was the real challenge." The hours disappeared into "bug fixing, variable names, input and output definitions, and all the small script-related details that are often invisible to the end user. Research scripts are usually written for a specific dataset and a specific purpose. Making them robust enough to work in a reusable workflow requires much more attention to detail." The surprise was the ratio: how much longer the small details took than the actual analysis.
How long would this once have taken? Labuce can answer from her own experience, because she inherited the original script from a colleague. "It was very much a process of trial and error until I figured out how everything worked. I also had to rewrite parts because there were sections I could not get to run," whether from package versions, local configuration, or some other detail it is hard to ever pin down. "It took me more than a week to fully understand the script, and that was before AI tools became widely available." Only then did the work of generalising it for Galaxy begin, a larger effort still, spread across much of one summer and several people.
If you cloned the repository tomorrow, what is the one thing not yet in the README? "That this specific workflow is only the first step of a scientific analysis. It gives answers, but it also creates new questions. No matter how many answers you find, new questions will emerge, and they will require new approaches, new data, and new ideas."
The six steps are deliberately small and modular, so they can be rearranged or reused. Asked which step others will reach for most, Labuce declines to guess. "Scientists work in mysterious ways. From my perspective, the whole workflow makes the most sense used together. The analysis is generic and can be applied to different parameters, not only Secchi depth, and to different study areas." If pressed, she would pick subsetting a dataset to a polygon of interest, "simply because I use it regularly myself. But I would not be surprised if other researchers found completely different uses. Different minds see different possibilities, and that is one of the reasons for making workflows openly available."
The proof is that it has already moved. The same workflow has been run on other Baltic sub-basins, including the Bothnian Bay and the Gulf of Finland. "From a technical perspective, only minor adaptations were needed," she says, mostly small things like input files that came as comma or semicolon separated text. "From a scientific perspective, however, each area still needs to be understood in its own context. A workflow can be transferred much more easily than ecological interpretation."
It also reaches towards the formal machinery of regional reporting. Asked whether work like this could feed HELCOM monitoring and assessment, Labuce is direct: "Yes. In AquaINFRA there is a workflow developed on the basis used in HELCOM assessment procedures. This is one way to make the black box of environmental assessments more transparent."
AquaINFRA finishes at the end of 2026. What would success look like a year on? "Opening the repository a year later and seeing that the workflow has a life of its own. That could mean somebody reused it, modified it, improved it, or applied it to a completely different question than the one we originally had in mind."
We ended on the unglamorous work that sits under every environmental headline, and Labuce's answer is worth quoting at length.
"I wish more people understood how much work stands behind a single environmental indicator. People see the final number, graph, or headline, but not the years of monitoring, quality control, data management, analysis, and expert discussions that make that number possible. They do not see sampling in rough waters when most of the crew is seasick. They do not see spoiled samples, failed analysis, or equipment malfunctions. And perhaps most importantly, they do not see how much work goes into dealing with these issues before the data can be trusted."
"Environmental data rarely arrives ready to use. Before any assessment can be made, data needs to be collected, validated, harmonised, documented, analysed, and interpreted. Much of this work is invisible, but without it the final result would not be trustworthy. In a way, environmental statistics are like an iceberg. The number everyone sees is only a small part of the work that sits underneath it."
The Gulf of Riga is one test bed. The repository, dead ends and all, is the point: a working method, made public, that someone else can pick up and carry somewhere new.
With thanks to Astra Labuce of the Latvian Institute of Aquatic Ecology. The Daugava and Gulf of Riga workflow is part of AquaINFRA Work Package 5 and is published on the European Galaxy server. The repository is public and in active development: https://github.com/AstraLabuce/aquainfra-usecase-Daugava