Statistics is hard to get right

What factors influence the probability of a dive leading to DCS? Many of us would love to know the answer to that question. Wouldn’t it be great if we had a huge database of dives with depth profiles and lots of additional data of many different divers, together with information if those dives led to various levels of decompression sickness? Then we could run all kinds of different statistics and see what factors of a dive make undesirable outcomes more or less likely.

Recently, a paper by Alessandro Marroni, Jacek Kot, Massimo Pieri, Riccardo Pelliccia and Costantino Balestra (four of which are associated with the DAN Europe Research Division) in the journal “International Maritime Health” made the rounds on social media which aims to do exactly this. Specifically, DAN keeps a database of more than 100,000 dives together with information about the almost 6000 divers and circumstances of the specific dive that they ran various statistics on. And they have 628 profiles of dives that lead to decompression sickness. The authors looked at various factors and determined if they were different between dives with and without DCS.

Some of the reported results were expected: Dives with DCS tend to have higher gas loadings in the divers’ tissues (more on that below) and this seems to be the factor with the strongest influence. For repetitive dives, longer surface intervals lead to a lower rate of DCS. Higher workload during the dive leads to more DCS.

But some other results were quite surprising: According to the study, women have a three to four times higher risk to get bent than men. Body mass index has an influence, but the lowest rate of DCS was amongst divers that are moderately to severely obese; moderately underweight divers appear to have a five times higher DCS risk than severely obese divers! Exercise before the dive doubles the DCS risk. Negative feelings before the dive reduce the DCS risk by 70%, also feeling tired before the dive lowers the risk of DCS. In a sequence of repetitive dives, the later dives are less and less likely to lead to DCS.

https://xkcd.com/552/ CC-BY-NC

Of course, correlation should not be confused with causation. You should not acquire a beer belly and leave your deco gases on the shore to lower your decompression risk. There are some cases where it seems rather clear how an initially unexpected result can be readily explained: As an example, using more deco gases was correlated with a higher chance of getting DCS. The authors rightly note that there might be confounding factors, in this case dives with more deco gases tending to be more advanced tec dives that might therefore have an inherently higher risk of DCS. But still, many of the findings of this study would run counter to what we believed before. In the words of Carl Sagan: Extraordinary claims require extraordinary evidence! So let’s look at the details more carefully.

How does DAN collect data?

The most important factor is unfortunately not really discussed in the paper: How do the dives actually get into DAN’s database? For the statistics to be meaningful, they should be representative of the dives done by the general population. Furthermore, the outcome (DCS or not) must not influence the likelihood of a dive to be included in the database. Otherwise there would be a bias that would be very hard or impossible to correct for.

And it seems, this aspect is already where the study gets into difficult waters. Apparently, there were very different ways via which dives could end up in DAN’s database. Unfortunately, there is no clear statement to be found anywhere about how exactly the data was collected, but the DAN research site gives some hints. There exists an earlier publication from a team with a large overlap with the one of the 2026 study, Cialoni et al. 2017, and this serves as a reference for explanations, as it treats a smaller, older sample of dives from seemingly the same database. The profiles from the 2017 study thus can be assumed to be also a part of the present-day data.

DAN does invite divers to submit dives to them for statistical analysis, e.g. via via a tool called the Diver Safety Guardian (DSG), and recently also via a successor, DANA Health (which is free for members, but on a subscription plan for non-members), since around 2014. Before this, they did collect profile data in different events directly from the divers – no information about how exactly this happened is available, but the participation was always voluntary.

With using the DSG, an advantage is offered to the divers for submitting their data: They can see an analysis of their dive, including some indicators for the estimated risk arising from the dive. And this, to be honest, attractive and well-intended as it is, already should ring some alarm bells: Unless you make divers submit all their dives, or at least a truly random sample, you have to expect people to in particular submit dives they found “remarkable” or “interesting”, the opposite of typical. Maybe divers are especially keen to submit dives that were particularly deep or long or where they encountered some sort of near-miss. People almost inevitably will report less of the “boring” dives they do on any given weekend. This will already present a nearly insurmountable problem for extracting meaningful statistical numbers. But it gets worse.

DAN does collect profile data plus more medical data on “laboratory – events”, and they regularly conduct dedicated research, where specific profiles are tested. That in itself of course is great. But again those dives will not be truly representative of “random dives, done by random divers on an random day”. Which of these data did or did not enter the Database is not really clear. But in the 2017 paper, there were 970 dive profiles with subsequent bubble measurements. Obviously, those were not randomly reported, and are a part of the nearly 40.000 profiles analysed in this first batch.

We have to guess that at least the dives that were voluntarily submitted online via the DSG without further information were generally counted as “no DCS”. This guess is based on the direct observation that the questionnaire that accompanies the dive upload at least in the DSG today does not ask about symptoms of decompression sickness. So even if someone did upload dives there that ended with some symptoms, maybe hoping to find out what was “wrong” with their profile, this would likely not automatically lead to their dive being counted as a DCS dive. We don’t know which fraction of the dive profiles comes from this collection, and if other approaches did do a follow-up to check if the dive ended with DCS or not, but what is actually visible is this.

The 628 dives that were marked as “DCS” thus seem to have come to the database via a different pathway. One can imagine, DAN of course being one of the most well known medical assistance organisations in diving, that at least part of those cases where reported directly as DCS cases, potentially seeking medical assistance or advice of various types. Of course it can well be that there were more and different ways via which DCS cases were collected, but this does not become clear from the paper. Anyhow, the demographics of the DCS dives are thus inevitably different from the “no DCS” dives. Some of the divers that incurred DCS may of course have reported more profiles than only the dive that went wrong, but this is still not “random divers sharing their dives” – those divers do it because they have, or suspect, symptoms and seek assistance.
The paper does not explain how exactly the DCS data was collected, nor how “DCS” is defined, and what precisely the criteria are for a dive to be counted as a “DCS dive” – is it a diagnosis by a medical professional, is it self reported symptoms, is it a confirmed insurance claim? The first database analysis from 2017 does treat the DCS cases as a different set of data, not as a part of all profiles. And that to us still seems the more correct thing to do. The 2026 paper seemingly does treat both datasets as comparable. The DCS incidence quoted in the abstract then however is just the fraction of studied dives which were marked as DCS, and will not be representative of the incidence of DCS throughout the diving population.

Why the data collection style matters

There is the big elephant in the room with that database: It seemingly collects data from fundamentally different sources. Those profiles thus can not at all represent “dives as done by recreational divers”, but are a mixture of recreational dives, technical dives, dives that ended with DCS, dives where the outcome is not known, dives that were reported for curiosity to see the tool, large amounts of dives that were reported from some divers specifically interested in contributing .

If – as we have to assume – the DCS dives were indeed collected in a different way from the more general profile data collection, both subsets become really hard to compare. In fact, this means it is not possible to calculate a real incidence from such a comparison. Both datasets would by contrast have to be treated as being fundamentally different.

Furthermore, self-reporting is never a really good idea for statistics. We have already seen that the profiles that are shared may be substantially different from the profiles that are done throughout the diving population. And it is not clear if men and women are equally likely to report their dives. It is easy to imagine that maybe women do more than ca. 13% of all dives worldwide (which would be the fraction used in the study), but are on average less likely than men to participate in any DAN data collection. But women might at the same time be more likely to attribute light symptoms to DCS and consequently report a diving accident to DAN. At least a part of this effect honestly seems almost unavoidable, and unless this can be rigorously accounted for, the datasets thus do not make it possible to really infer any dependence of the DCS risk on gender. We have to treat the result of a three to four times higher risks for females with strong caution at the very least, and by contrast have to say it is simply not possible to make a definite statement if there even exists a significant difference on the basis of present-day data alone.

Similar surprises may lurk regarding the dependence on the BMI: We could imagine that a relatively large number of reported DCS cases were from more advanced technical dives (the dependence of the risk on tissue loadings seems to be a quite stable result). But the demographics of technical divers could well differ from the general diving population with respect to body fat. And thus an obese diver might well be found in the general diving population (and count in the “no DCS” group) but might be much less likely in an inherently risk-prone technical diving subgroup. If, however, those obese diver would indeed attempt the same more ambitious dives, their DCS rate might in fact be as large (or even larger?) than that of the people actually doing those dives. This could be like the statement that musical directors have an above the average life expectancy (due to zero child mortality amongst them).

Analysing the dives as independent events

The study seemingly compares dives from fundamentally different sources and starts an analysis. While this leaves a lot of questions about what exactly the sought-after effects were, how the method was calibrated and validated (e.g. was there a split into train and test samples done, or attempted?), and why the numbers used are seemingly inconsistent in some places, one of the biggest problems is that all the collected profiles are treated as independent events. But in fact, an important part of the analysis, everything related to the body of the diver, is not independent at all. A lot of divers reported only one dive, while others did report way more, up to 1,432 dives come from one single diver, with the same body and probably a comparable diving style, leading to similar profiles as well. A diver who reports more does count a lot more in the analysis than a diver who reports only a small part of his dives.

And more problems …

There are further issues in this paper. For example, there is the criterion “purpose of dive” with the possibilities “recreational, instructional, guidance, student, technical and other” which were assigned the (arbitrary) numerical values 1 to 6 and these ad hoc numerical values were used in a formula to fit the probability of DCS. This honestly makes no sense at all. If at all, those six possibilities should have been used as binary variables (0 meaning “no” and 1 “yes”).

As a measure of tissue loading (which ended up being the strongest predictor of DCS outcome, and this in itself is largely uncontested) the authors of the study use a numerical value (the “DAN Surface Supersaturation Gradient” DSSG), which is not really defined well in the paper. The authors do refer to the 2017 study, which however states to use classical gradient factors (GFs).
The name “DSSG” could indeed be interpreted to mean the gradient factor at the end of the dive (i.e. when depth 0m is reached), but on inspection this is not what it is. If we assume that in the 2017 study the GFs were calculated according to the prescription by Baker, the DSSGs can actually not be GFs without making the data incompatible, or without assuming that the 320 DCS cases from the 2017 study are not included in the 2026 study (but why would they not be?). This because the numbers of the DCS cases simply would not match. There were in total 320 cases in 2017, and now there are 628 cases – a plus of 308 cases, which seems not unplausible for that time interval. The issue is, there were 46 (or 59, the presentation of the data in table 4 of that paper is not absolutely unambiguous at that point) cases of DCS with a maximum GF under 0.7 (70%) in the 2017 study, but there are only 29 cases with a DSSG of 0.7 or lower in the new paper. So, cases would have to dissappear if the DSSG was a gradient factor…
Without the full dataset, we can not determine what exactly was calculated in both papers, but seemingly the interpretation of the DSSG being a GF is in conflict with the reported numbers under the assumption that the data from the 2017 study would be part of the 2026 one.

Of course the DSSG could be the fraction of the plain vanilla 100/100 Bühlmann M-value. Without doubt, both metrics will work for a description of the tissue loading, but they are numerically not the same, and can not be mixed. In any case, it should be clearly communicated to the avid diver what exactly is used. This is really important for the diving community, as the most common number to evaluate tissue supersaturation for most divers is the GF, a number divers know, manage and think they understand. We at this point want to caution against an idea that divers may have (although it is also not encouraged in the paper): One should be very careful to draw conclusions from the numbers reported in this study about what gradient factors should be considered “safe”; note that the gradient factor is generally smaller than the percentage of the M-value if we stay in the regime generally considered safe!

So, unfortunately, without sufficient control of the various biases in the dives that were considered in the analysis and in particular different selection criteria in the two groups (DCS/no DCS), one has to be extremely careful to draw qualitative conclusions, let alone quantitative ones. To be honest, one would have hoped that these questions would have come up in the peer-review of this paper and could have been addressed already before it appeared in the journal.

Is there a brighter outlook?

The pessimistic outlook on the reliability of some of the results we have to take here is especially sad given that DAN is almost uniquely positioned to collect such large data volumes from dives over a wide cross section of the diving population, and given also the eagerness of the diving community to learn about risk factors, to digest results, and to potentially adapt diving habits. The large number of collected dives can be seen as an indicator of that, and is a heartening result of the efforts! We sincerely hope that ways can be found to construct suitable pathways for the collection of truly representative datasets, and for more robust analyses. The diving world is definitely waiting for that, and it is our hope that the criticism raised here may not be perceived negatively, but rather as a constructive input for future work. We absolutely do hope that our readers do not take the points raised in this post as a general discouragement of data collection and statistical analysis. To the contrary, science needs all of your participation! Statistics is hard to do in diving, and maybe it is no surprise that the road to results is bumpy. But in the meantime, please do not draw definite conclusions, especially not about individual real-life cases you may encounter, on the basis of the numbers from this study alone.

For this blog post, I collaborated with Veronika Sievers and Dominik Elsässer, the authors of a highly recommended book on decompression theory and main contributors to the Punkfish Academy.

Leave a Reply

Your email address will not be published. Required fields are marked *