In most of the western world, two drugs are used in the fight against covid-19. These are remdesivir, an anti-viral agent that is injected just before the ventilator stage, as well as the immunosuppressive steroid dexamethasone, used in the ventilator stage to dampen the real killer: the cytokine storm.
Once hospitalized with covid-19, mortality is very high (13.1%) as illustrated in the chart below:
The strategy of “flattening the curve” is not about preventing covid-19 infections, but rather spreading the infection over a longer period of time so that health care systems are not overwhelmed. That scenario is something that not only causes mortality in covid-19 patients to increase — but also that of other patients for whom there will be fewer resources. Thus, avoiding hospitalizations is crucial in this pandemic.
Very conveniently, several microbiologists, doctors, and epidemiologists claim that the malaria drug hydroxychloroquine is effective against covid-19 in the outpatient stage; at the onset of the disease. They claim the remedy can fight the disease before it develops and thus requires hospitalization. Its proponents are, for example, Didier Raoult (the most cited microbiologist in Europe) and Harvey Risch (professor of epidemiology at Yale University).
What if hydroxychloroquine works? What if it is cheap and safe? What if it is already available in large quantities at home? You probably think that if the above is true — then it would have been implemented a long time ago. If it is true that means that the authorities have failed big time and that is a scary thought.
There are 105 treatment candidates against covid-19 and two of them are used in the west. In this analysis, I will try to answer the question: Should three have been used?
The case of hydroxychloroquine has been very politicized. I will try to present the arguments from western authorities in an objective manner. Afterward, I will try to dissect them.
Quick facts about hydroxychloroquine
Hydroxychloroquine (HCQ) is the treatment candidate that has by far the largest literature behind it with regards to covid-19. 162 studies have been published worldwide and 96 of them are peer-reviewed. For comparison, 7 studies about remdesivir have been published.
Hydroxychloroquine is used against some types of malaria and is a safer version of chloroquine that has been used since 1934. It is a broad-spectrum agent which is also used against a total of 24 other disorders such as arthritis. HCQ is part of the WHO’s “List of Essential Medicines”.
Some quick concepts before we move on
When you do studies, you measure on some variables. It can be mortality, hospitalization rate, median hospitalization time, virus concentration development, etc.
RR: Relative risk
These outcome variables are compared between the groups in a study via RR. RR is the ratio between the risk of an outcome in two different groups (eg: risk of death in a control group and an HCQ group). An RR of 1 means no difference in risk. Above 1 there is an increase in risk and below 1 the risk decreases. For example, an RR of 0.4 means that the risk is 60% less for one group.
The P-value is the probability that your observations are due to coincidences. If you flip a coin 4 times and get flat every time, you may suspect that the coin influences the result (cheat coin). The p-value of the observed is = 0.5 x 0.5 x 0.5 x 0.5 = 0.0625 = 6.25%. This can help us answer the suspicion. Similarly, the p-value can calculate the probability that the observed difference in the groups in the study was due to coincidences (if we assume the intervention is no better than a placebo or standard of care).
What do the health authorities say?
EMA (European Medicines Agency)
EMA evaluates medicines against infectious diseases on behalf of all the EU countries. This is the so-called central procedure. So, let’s look at the EMA’s assessment of HCQ.
“These medicines (hydroxychloroquine and chloroquine, ed.) have not shown any beneficial effects in treating COVID-19 in the relevant treatment arms of ongoing large randomized clinical trials (e.g. the Solidarity trial, the Recovery trial, and the Discovery trial). Chloroquine and hydroxychloroquine may cause certain side effects, including heart rhythm problems.”
As EMA states, there is a lack of positive results from large clinical, randomized trials. There have not been many such trials as they require great collaboration and coordination between many hospitals, researchers, and doctors — something that can often only be done at state level.
The only three of such trials are dubbed RECOVERY (English), DISCOVERY (French), and SOLIDARITY (WHO). RECOVERY had a negative conclusion on HCQ. DISCOVERY stopped its HCQ arm on June 17 due to poor results but is yet to publish the findings. SOLIDARITY also stopped its HCQ arm on July 4 due to “little or no measured power”.
By the same token, the WHO conclusion on hydroxychloroquine is as follows (from 31 July):
The WHO’s guidelines are based largely on meta-analyzes by Cochrane, an international network of researchers and institutions that review studies from around the world on the effects of various treatments. Cochrane has published two such meta analyzes regarding hydroxychloroquine and covid-19.
On 11 June, Cochrane, on behalf of the WHO, carried out an analysis of the literature (though they only included 5 studies) and came to the following conclusion:
“The current evidence on the efficacy and safety of hydroxychloroquine for the treatment of COVID-19 is limited and of very low certainty. Hydroxychloroquine was not associated with a difference in overall mortality when compared to standard care for the treatment of COVID-19. Limited evidence suggested that hydroxychloroquine may result in more adverse events than standard care for COVID-19 treatment.”
In the context of an epidemic, June 11th is light years back. Cochrane Australia, on the other hand, offers an up-to-date meta-analysis of HCQ. Cochrane Australia is part of Australia’s task force to evaluate covid-19 treatments (The National COVID-19 Clinical Evidence Taskforce).
Australian health authorities (Cochrane Australia)
Their assessment of HCQ was updated on October 15, 2020. They conclude:
“Based on the available evidence, hydroxychloroquine is potentially harmful and no more effective than standard care in treating patients with COVID-19. We therefore recommend that hydroxychloroquine should not be used.”
The meta-analysis showed that HCQ increased the risk of dying by 7% (RR 1.07, 95% confidence interval: 0.97–1.18). Their meta-analysis was based on data from 13 RCT’s (a total of 6,300 patients), however, the vast majority of the evidence came from the RECOVERY study (4,716 patients). In calculating the mortality variable, RECOVERY’s results weighed 98%.
UK health authorities
In the UK, their Medicines Agency (MHRA) concludes the following based on two studies:
“The MHRA took into account the results released from the RECOVERY trial, showing no beneficial effect of hydroxychloroquine in patients hospitalized with COVID-19, and a New England Journal of Medicine publication (Boulware et al., Ed.). Using hydroxychloroquine as postexposure prophylaxis of COVID-19, concluding that hydroxychloroquine did not prevent illness compatible with Covid-19 or confirmed infection.”
American health authorities
The US National Institutes of Health (NIH) also recommends against HCQ:
“The Panel recommends against the use of HCQ with or without AZM for the treatment of COVID-19 in hospitalized patients (AI).
In nonhospitalized patients, the Panel recommends against the use of HCQ with or without AZM for the treatment of COVID-19, except in a clinical trial (AI).”
The NIH provides a more thorough argumentation for its recommendation (updated October 9, 2020). They did a systemic review of the literature.
All these included studies had a negative conclusion.
Finally, they added a rebuttal in the form of the observational study Arshad et al. which concluded that mortality decreased by 51.3% (p = 0.009), but it was concluded that this did not outweigh the evidence from the other studies.
Problems with western health authorities’ argumentation
In large parts of the West, there is a clear consensus that HCQ is not effective against covid-19. But when one looks at their conclusions, it is clear that they are based on a fraction of the literature.
The two most comprehensive assessments come from the NIH and Cochrane Australia. At the time of the last update, NIH’s systemic review of “the literature” included only 10,3% of the literature. Cochrane Australia’s meta-analysis was only based on RCT’s and did include 68% of all RCT’s at the time (13 out of 19). However, this was only 11,7% of the total literature.
Today, a total of 134 experimental or observational studies determining the efficacy of HCQ against covid-19 have been published — all of which can be seen here (along with meta-analyses, safety studies, etc.).
It is a scientific practice to reject biased and unreliable studies when doing a meta-analysis (or systemic review), but in this case, it seems that subjective selection of studies was done rather than objective deselections.
Below is the distribution of the results from all experimental and observational studies on HCQ and covid-19:
The conclusions of Western health authorities are based almost exclusively on the experimental studies called RCT’s. 20 of such studies have been published. Anthony Fauci (head of the NIH) stated in the following clip from July 29th:
“I look at the evidence. And data from all trials that were randomized and controlled in a proper way show consistently that hydroxychloroquine is not effective against covid-19”.
The above statement reflects what Harvey Risch, a professor of epidemiology at Yale University, defines as an RCT fundamentalism in the West; a tendency to equate evidence with RCTs. Why are RCTs so preferred?
RCT’s are good for lowering bias. They are good at lowering the impact of confounding variables that can affect the outcome. However, they are expensive and time-consuming which is why 6 times as many observational studies have been published on HCQ and covid-19.
In observational studies, the researchers examine existing databases (patient registers, medical records, etc.) and look for statistical connections among variables. It may be that data has been collected on some people who were hospitalized for the same disease. Some of them have been given a drug and thus, for example, it would be possible to calculate the mortality rate for those who received it and for those who did not. But maybe the reason one group got the drug was that they were extra sick. That is a confounding variable. For covid-19, age and co-morbidities have a high influence on disease severity and this might blur results from poorly conducted studies. So even though there is a correlation, there is not necessarily causality in observational studies. This risk is RCT’s good at reducing.
The NIH also writes in its interpretation of the HCQ literature that observational studies are largely neglected:
“Many of the observational studies that have evaluated the use of chloroquine or hydroxychloroquine in patients with COVID-19 have attempted to control for confounding variables. However, study arms may be unbalanced in some of these studies, and some studies may not account for all potential confounding factors. These factors limit the ability to interpret and generalize the results from observational studies; therefore, results from these studies are not as definitive as those from large randomized trials.”
On the other hand, an analysis by researchers from Oxford has shown that only 10% of Cochrane’s recommendations from 2015–19 are based on evidence from several positive RCTs (the so-called “gold standard”). Another study of cardiology medicine showed that only 14.3% of the recommendations of the European Society of Cardiology (ESC) were based on the “gold standard”.
Does this mean that much of the medicine we receive for other disorders do not necessarily work? Not according to a huge meta-analysis that compared the results of 2492 observational studies and RCT’s related to 228 different disorders. The analysis concluded:
“There is little evidence for significant effect estimate differences between observational studies and RCTs, regardless of specific observational study design, heterogeneity, or inclusion of studies of pharmacological interventions.”
This conclusion is consistent with Kovesdy et al. who writes:
“Experiments such as RCTs are in fact themselves merely observations; what distinguishes them is the apparent control that we believe we have over the circumstances of the observations. However, RCTs too are subject to a number of fallacies (vide infra) which could render their results invalid or questionable. It is thus naive to look at RCTs as a universal panacea that will always tell us what is right or what is wrong; as it is also simplistic to question the validity of observational studies merely because of their nonexperimental and non-randomized nature.”
It may therefore seem slightly undesirable to adjust the pool of potential knowledge from 131 studies to 20 because one insists on the superiority of the RCT.
This phenomenon is certainly not unique to HCQ and covid-19. In 2017, three years before covid-19, former head of the Center for Disease Control (CDC), Thomas Frieden, wrote an article in the New England Journal of Medicine about the exact same trend:
“Current evidence-grading systems are biased toward RCTs, which may lead to inadequate consideration of non-RCT data. Objections to observational studies include the potential for bias from unrecognized factors along with the belief that these studies overestimate treatment effects…
This line of reasoning does not suggest that the Food and Drug Administration should be less stringent in their review of drug safety and efficacy, but rather that there should be rigorous review of all potentially valid data sources.”
Frieden also pointed out in the article that especially in epidemic situations, RCT-fundamentalism can be problematic:
“… These limitations also affect the use of RCTs for urgent health issues, such as infectious disease outbreaks, for which public health decisions must be made quickly on the basis of limited and often imperfect available data. ”
104 observational studies on HCQ and covid-19 have been published. 77 of them showed a positive effect on the main outcome variable and 27 showed a negative effect. Nevertheless, the argument against HCQ among Western authorities boils down to a maximum of 13 studies consisting only of RCTs (with a few exceptions).
Insignificance = no effect?
If hydroxychloroquine is effective against covid-19 then that should be apparent in the RCTs. That is a fair assumption. Let us, therefore, look at the conclusions from the 20 published RCTs (negative conclusions are red and positive conclusions are green):
Thus, the authors in 17 out of 20 RCTs conclude that HCQ does not appear to have a positive effect. Most importantly, that is concluded in the two largest studies RCT’s RECOVERY and SOLIDITARY. Also, it is concluded in the second-largest studies Mitja, Skipper, and Cavalcanti et al.
If you examine the actual results and data from the studies, you see an almost diametrically opposite picture. HCQ has had a mere positive effect in 14 out of 20 studies and mixed positive/negative effects depending on the outcome variable in 2 others (the orange ones):
Why this discrepancy between results and interpretation? The answer has to do with significance. Let us, for example, assume Mitja et al., where it is concluded that “no benefit was observed with HCQ”. The study is used by the NIH in their arguments against HCQ.
In the study, 8 out of 128 in the HCQ arm came to the hospital and 12 out of 143 in the control arm came to the hospital. This corresponds to an RR of 0.75 for the HCQ arm. But couldn’t that be due to a coincidence? Yes! Therefore, we use the p-value to tell us what the probability is that the fluctuation was due to a coincidence (and the null hypothesis is assumed to be true: HCQ=control):
The p-value of the observation is 0.516. This means that there is a good 50% probability that it could happen by chance. The doctrinal phrase is that once there is less than a 5% probability of chance, one will trust the effect of the drug. This is called the significance level. So how few admissions in the HCQ arm should hypothetically have happened before the p-value came below 0.05 and one would trust the drug? This is illustrated in the graph below:
As illustrated, 3 hospital admissions in the HCQ arm (and still 12 in the control arm) were needed before we would believe that the medication is effective. That means, once RR is below 0.28 and the effectiveness is 72% or above, do we believe it works. If the measured effect was 20%, 30%, or — in this case — 25% it would not be enough. This is the problem with drugs that are not miracle cures, but that (potentially) work: they are difficult to prove.
In this case, the number of events is so small (8 and 12 admissions respectively) that they could easily be due to coincidences — therefore the p-value of 0.51. We cannot, for good reason, conclude anything on this basis. Nor that HCQ has “no benefit,” as the authors did conclude.
Assume Boulware et al. as another example. Here, an overall 18% lower risk of infection was measured in the HCQ group, but the p-value was 0.35. Thus, there is almost a 2/3 chance that the observations were not due to chance. Instead of concluding that “HCQ did not prevent illness”, one should instead have concluded something like: “The results are more compatible with a decrease in risk of infection”.
Am I trying to convince you that the researchers behind most of the RCTs have misinterpreted their data? That is absurd, yet still the case. Am I saying that they should have concluded that HCQ works? No. One should instead have concluded that the results are more compatible with a positive effect rather than chance.
The problem is that although doctors are skilled in their profession, they are not mathematicians. Almost on an equal footing with the principle of RCT’s superiority, there seems to be a rigid understanding of significance. The Indian professor Chittaranjan Andrade points out the problem:
“Next, imagine that instead of obtaining P = 0.04, you obtained P = 0.14 in the imaginary RCT described earlier. In this situation, we do not reject the null hypothesis, based on the 5% threshold. So, can we conclude that the drug is no different from placebo? Certainly not, and we definitely cannot conclude that the drug is similar to placebo, either. After all, we did find that there was a definite difference in the response rate between drug and placebo; it is just that this difference did not meet our arbitrary cut-off for statistical significance…
Consequently, it should be considered fallacious to insert an arbitrary threshold to define results as significant or nonsignificant, as though significant versus nonsignificant results are in some ways categorically different the way people who are dead versus alive are categorically different… In fact, declaring significance may give us a false sense of confidence that a finding exists in the population, while rejecting significance may give us a false sense of confidence that the finding does not exist. ”
The interesting math happens, when many studies report a positive effect even though they may individually be insignificant. Because if HCQ is ineffective we should observe a fairly equal distribution between positive and negative results — because sometimes coincidences are on HCQ’s side and other times they are not.
15 RCT’s measured a positive effect (14 studies + and two positives studies with negative sub-findings). The chance that an ineffective agent could get 15 out of 20 datasets to show a positive effect is equivalent to flipping a coin (because there is an equal probability of a negative/positive result) 20 times and getting heads at least 15 times. The probability is 2% and below the accepted threshold of coincidence:
When we look at all the studies, this is the picture:
102 out of 134 studies showed a positive effect measured on the most important outcome variable. The chance that an ineffective agent could get 102 out of 134 to show a positive result is 1 in 2 billion (p = 0.00000000052).
One can point out that studies often have several different outcome variables. Assume that all studies have 2 variables and the authors behind Figure 6 choose the positive one whenever possible. This is equivalent to flipping the coin twice as many times. And then the probability of getting heads (positive outcomes) over 102 times equals to 99.99%. This small detail creates a diametrically opposite image. Therefore, the authors behind the analysis made sure to be consistent about choosing the “most important” variable for each study. In other words, mortality was chosen before hospitalization rate — and hospitalization rate over median hospitalization time, etc.
RCTs are impractical and expensive in their very nature and this often limits them to a small population. With a smaller sample, the statistical probability that fluctuations are due to coincidences increases. This places great demand on the effectiveness of the intervention to achieve a p-value below 5%. For example, in the case of Mitja et al., the effectiveness needed to be 72% or more.
Therefore, meta-analyzes are convenient in that they statistically summarize data from several different studies and thus cause the “population” to increase, which enables moderate effects to become statistically significant.
A meta-analysis from 30th September looked at the results of the RCTs that analyzed HCQ’s efficacy as post-exposure prophylaxis (PEP). Thus, Mitja et al. (July 16), Skipper et al., Mitja et al. (July 26), Rajasingham et al., and Boulware et al. Although each study had a negative interpretation, the combined data showed a significant 25% improvement with HCQ:
One may ask: Are the RCT’s lack of statistical significance due to an ineffective drug or too small populations? This question has been elucidated by 42 mathematicians, statisticians, and physicians who have made the following table comparing the results for dexamethasone in the RECOVERY study with the results for HCQ in Boulware, Skipper, and Mitja et al .:
The diagram shows that the measured effect for dexamethasone was actually smaller than that of HCQ, but due to a larger population, statistical significance was obtained. In column 6, they calculated the hypothetical p-value for the HCQ studies if they projected the measured trend to a population the size of RECOVERY. They found that Boulware and Skipper et al. would achieve a lower p-value than RECOVERY while Mitja et al. would be on the verge of significance. The results of dexamethasone in RECOVERY were hailed and immediately used worldwide, while the results in Boulware, Skipper, and Mitja et al. were used to argue against HCQ — although there is a predominant probability that the insignificance was due to small populations.
In Eric Strong’s (Stanford University) introduction to evidence-based medicine, he makes it clear that many physicians believe evidence is about “just being able to recite the abstracts and conclusions of the most important studies in the field.” Perhaps it plays a role in the opposition to HCQ that too much emphasis is placed on the authors’ own interpretations?
RECOVERY and SOLIDARITY: Fundamental methodological flaws
Although many of the published RCTs measured a positive effect with HCQ, this is not the case for arguably the two most important studies in the field of covid-19 treatment: RECOVERY and SOLIDARITY.
The RECOVERY trial was a nationwide English RCT conducted in the spring by researchers from Oxford University. Here, 1561 hospitalized patients were given HCQ and compared with 3155 patients who received standard care. SOLIDARITY trial was a multinational RCT coordinated by the WHO where 954 hospitalized patients were given HCQ and compared with a control of 4088. In other words, these are very large RCTs.
RECOVERY showed an HCQ mortality rate that was 9% higher than the control group (p = 0.15), while SOLIDARITY showed a mortality rate that was 19% higher (p = 0.23).
The results from these two studies have a major influence on the West’s rejection of HCQ. RECOVERY is the first study in the NIH’s systemic review and occupies 98% of Australia’s Covid Task Force meta-analysis of HCQ mortality (although 7 other studies were included).
However, there were two major methodological issues in the studies that are apparently being neglected:
Dosage of HCQ: Toxic
Below is RECOVERY’s and SOLIDARITY’s dosing protocol for HCQ:
Both studies followed the same protocol and a quick calculation tells us that the patients received 9.6 grams of HCQ over 10 days. They received 2.4 grams in the first 24 hours (loading dose). This is much more than what has been used in other studies. Based on a recent peer-reviewed study, the optimal dose is 4.1 g over 10 days with a “high initial dose of 1.3 g”.
Thus, the two studies used a loading dose almost twice as recommended (and the recommended loading dose is already high). Over 10 days, the dosage was almost 2.5x higher than recommended. Is that just more of the good stuff?
No. A Brazilian RCT (Borba et al.) from April showed that a dose of 12 g over 10 days (only 25% more than in RECOVERY and SOLIDARITY) significantly increased mortality (OR 2.8) compared to a low-dose protocol. The Brazilian researchers have been accused of negligent manslaughter of 11 patients who died directly from an overdose.
In late May, the Indian Council of Medical Research (ICMR) wrote to the WHO that they were concerned about the HCQ doses used in SOLIDARITY, which was four times higher than used in India, where there is widespread and coordinated use of the drug against covid-19.
For HCQ the lethal dose is estimated to be about 3–5 grams (4g for a 75kg heavy human). The 2.4 grams used within the first 24 hours are thus not far from a potentially deadly single dose of 3 grams.
Thus, it is very worrying that toxic doses were used which undoubtedly contributed to the higher mortality. Indications suggest that the dosage amount was confused with that used for hydroxyquinoline against dysentery.
For unexplained reasons, the dosage amount is not included in the NIH’s assessment of RECOVERY’s limitations (and they knew about the toxic dosing from Borba et al. which was included in their systemic review):
HCQ, as an antiviral, should be used on an outpatient basis against covid-19. That is, it is more effective the faster it is used from the onset of symptoms. Why?
“Treating covid-19 is very phase-specific and this is missed by many so-called experts”
That is the message from Paul Marik, professor of medicine at Eastern Virginia Medical School, who is the author of the MATH+ protocol for the treatment of covid-19 in hospitals. HCQ is used as an antiviral against covid-19, which is why — according to all its proponents — it should be used while viral replication is the problem. And that is in the beginning:
The image above illustrates the course of the disease. The virus replication and concentration are greatest at the beginning of the disease where there is an infection in the upper respiratory tract. When the disease later develops into a lower respiratory tract infection, the problem is no longer the viral replication but instead the body’s inflammation against viral debris that causes breathing difficulties.
Antiviral drugs are thus effective before the immune system stops the viral replication (and when it does, it tends — especially for the elderly — to do more harm than good). The widespread Tamiflu (oseltamivir) used as an antiviral for influenza, for example, works only within the first 48 hours of symptoms and quickly becomes ineffective after.
Hydroxychloroquine works mainly by creating a zinc ionophore into human cells (which is why zinc is a crucial supplement to HCQ). HCQ creates a passage through the cell membrane for zinc ions to enter. Zinc has strong antiviral properties and is an important part of the immune system’s viral response, but increased zinc intake does not necessarily cause the zinc concentration in the cells to rise. It requires a zinc ionophore, where HCQ has been shown to be the most effective:
Besides, HCQ also works by raising the intracellular pH slightly, interfering with covid-19’s virus replication. A replication that is greatest just around the onset of symptoms and after.
So, when it comes to HCQ’s efficacy against covid-19, it is crucial to nuance — at what stage? All 21 studies on an early use of HCQ have reported a positive effect:
The above meta-analysis of all studies on early HCQ use concludes a very significant 63% improvement (RR 0.37, [0.29‑0.48]).
All the while, the patients in RECOVERY were very far in the disease course and very ill. Mortality was 27% in the HCQ group and 25% in the control group, which is extremely high (higher than Italian ventilator patient’s mortality of 23%). All were hospitalized and just over 60% received non-invasive oxygen, while 16.8% were on a ventilator. The median time from onset of symptoms to the initiation of treatment was 9 days:
For SOLIDARITY, the information is not as detailed, but the patient was still undoubtedly very far in the course of the disease. All were hospitalized and 63% (602) received either oxygen or ventilation:
However, late intervention of HCQ is nothing exclusive to RECOVERY and SOLITARY. This is the case for 86 of all 134 studies and 11 out of the 20 RCTs.
Assessing the effectiveness of HCQ on outpatients based on such studies is erroneous.
Opposition to HCQ is largely based on assessing the evidence based on the 20 RCTs even though a total of 134 studies have been published. 17 of the 20 RCT interpretations are indeed negative, but the vast majority of them actually showed a positive effect — albeit insignificant. Insignificance, on the other hand, does not mean inefficiency and if you look at the entirety of the published studies rather than individually, it is extremely unlikely that the results were due to chance. The two main studies, RECOVERY and SOLIDARITY, which showed a negative effect of HCQ are characterized by fundamental methodological problems. Outpatient treatment with HCQ + zinc seems like a very relevant candidate to lower hospitalizations and save lives.
This is how the rest of the world has assessed the matter: