Sandra Eldridge is Professor of Biostatistics and Director of the Pragmatic Clinical Trials Unit at Barts and The London School of Medicine and Dentistry, Queen Mary University of London. In this blog post, Professor Eldridge discusses the recent excitement around some promising clinical trial results that suggest that vitamin D improved the recovery of hospitalised COVID-19 patients, and she explains why the trial may be flawed.
On 13th February David Davis MP tweeted: “The findings of this large and well conducted study should result in this therapy being administered to every Covid patient in every hospital in the temperate latitudes.”
The therapy in question was vitamin D and the Spanish study was a randomised trial purporting to show a 60 per cent drop in death rates amongst hospitalised patients. The story was covered in other media, but an avalanche of responses on twitter and directly to the website where the study details were posted, indicated that not everyone agreed with David Davis’ assessment, and the study was eventually removed from the website. So, what was wrong with this study?
Over the past year, everyone has become increasingly familiar with clinical trials. The backbone of clinical research, clinical trials are based on the idea that to see if a treatment works we take a group of individuals who might benefit from the treatment if it does work, give the treatment to half of them and not to the other half and then compare what happens to the individuals in the two halves.
Yes, trials can be more complicated but this is the essence. Furthermore, two things are key as we construct this comparison. First, there must be a large enough number of people involved in the comparison. Second, the comparison must be “fair”. Putting all older people in one group and younger in another group would obviously negate any findings, and if the government had tried to pursue policies for treatment of COVID-19 based on trials of, say, eight patients, no-one, no matter what their background, would have trusted them.
Understanding how to get large enough numbers is fairly straightforward. To make the comparison as fair as possible, we use randomisation, a statistical technique for deciding whether individuals get the treatment or not. Though everything is done by computer nowadays, the easiest way to understand randomisation is to think of tossing a coin for every individual – heads you go in the treatment group, tails you go in the non-treatment group. It’s accepted as the best way of making sure the two groups end up being comparable.
So, was this Spanish study large enough and did it use randomisation? With over 900 patients and described by the authors as an 'open randomised controlled trial', on the face of it, yes. However, dig a little deeper, like many of those who tweeted in response to David Davis, and things begin to fall apart. This study has a remarkably large number of flaws. Here I outline two, which, unfortunately, might leave the purported results in tatters.
First, although the study is described by the authors as a randomised controlled trial, it wasn’t the 900 plus individuals in the trial who were randomised initially, it was hospital wards, eight of them. Five of them were selected randomly to give vitamin D to their patients, and the other three wards did not receive vitamin D. Randomising entities other than actual patients is a perfectly legitimate trial design; in this particular case, earning the trial the designation of a “cluster” randomised trial. However, eight is rather a small number.
What if the wards have very different characteristics and, when randomised, just by chance the two groups aren’t very comparable? There might, for example, be one very large pioneering ward amongst the eight; whichever group that ward is randomised into will have an unfair advantage showing improved patient benefit. This is not quite the same as having only eight patients in a trial, which most people might consider suspect, but it has similarities. It doesn’t necessarily invalidate the trial, but some account of the variation between wards does need to be made in the trial analysis. This was not done in this trial, and this means the results presented are too precise.
So, let’s solve this. Let’s give the trial data to some statisticians who know how to make the appropriate adjustments, and see how the results pan out. Unfortunately, a further potential flaw in this trial means that that may not rectify things. This potential flaw is in the way the patients themselves were assigned to the wards. It appears (though the publication is not very clear on this) that they may have been recruited to the trial by hospital staff who knew which wards were treatment wards and which were non-treatment wards.
It doesn’t take very much thought to realise that this may have influenced the doctors’ decision on which patients to recruit into the trial, and which ward they were sent to. We do not know if there were safeguards to ensure that patients in the treated and non-treated groups were comparable in any way. Though some statistical techniques can go partway to account for this problem, sadly they cannot do so with absolute certainty; this is a case of unknown unknowns.
With a bit of thought and the expertise of those versed in trial design this trial could have been conducted more robustly. As it is, David Davis and others may have been, sadly, misled.
For media information, contact: