The Heroic Age

Sometimes it feels like times are uncertain and that things are essentially not working for most people. Even though we keep reminding ourselves it’s not an apocalypse yet, or a war yet, or a pogrom yet, the sense of unease is widespread. We know this.

In times like these, in the quiet between the battles or the rest between the long marches, before the war is over or before the fires are out, people talk about how it was better as a way of talking about how it will be again. We spin our stories and put them up for the dawn to gasp at, if the dawn comes. This is not the only way to make a story but the ones that survive this way of thinking have been tested and stick hard in the mind and speak straight to the soul. In this way we often end up with stories of a heroic age.

The need for heroes

Some people just like heroes, I suppose. It feels partly determined by culture, so that in some places there’s a need for big old swaggering heroes all the time, even in the middle of the good times, and in others we’re more wary of giving all our dreams up for someone else to hold.

That aside, the hero stories get built as a promise to the future that we’ll be there, doing those things again; or a golden dream in the night, that someone has saved us from the monsters and we will wake in the land of the young; or a stirring of the pot of magic porridge to fill us with the battle fever ourselves and dig us out of the hiding caves. Those are how we tell ourselves who and why we are, but I want to talk about a giddier, unmoored, entryist sort of story from the heroic age, the sort of story we might tell of Jotunheim or the unSeely Court as a warning of how we might go wrong if we use the right words the wrong way.

This is a story of heroes made to draw the glamour around oneself, to take the face of the hero and wear it. Now. This is the sort of shit that Loki or Lugh or Merlin or Maui or Kalolo might pull, and we accept them as our own: why am I saying it’s of the deep?

Mens Rea

It’s partly the fact that the Trickster might be a wanker but comes back with the fire or the porridge or the pearl, and everyone knows it was a trick. The tricks are usually pulled on the powerful and if they don’t work out, there is punishment and the Trickster keeps their place. The thing that keeps us from savagery is continuity of conscience, and to take a face for ever is not the sort of lie that most cultures praise, for the simple reason that identity is an important part of judgment.

These hero stories are told in order to get and keep power that is unearned; and so it is when Prometheus runs cackling from the cave mouth with the fire. The difference is that Prometheus doesn’t use it to trade up into a management role, he’s back at the gags within the day. What makes me suspicious is when people assert that they are the hero now, and they will remain the hero no matter their actions. Trust me, they say.

Trust me

No mate, I will not. Anyone telling me to trust them immediately raises my suspicion about why they feel the need to bypass the usual trust building measures. Same with a few other statements, almost specific to my area of medicine but with some more general resonance:

I’m extremely empathic

…then you will be feeling my cringe right now, too?

I’m very smart

…hence the self awareness. I see.

I care about this patient group

…and will do anything to them in order to prove that

I understand

…which you will presumably prove in the next few sentences, right?

Well. I don’t want this to detract from “ordinary heroes”, who toil away and do the things that keep the world upright. Those who work and sing and tell these stories, who trust and love and do, whether in the end they are right or wrong is less important than that they are looking out for each other, and that’s how in the end the dawn will come again or if it fails we will go bravely and with heart.

I do mean to point this at anyone who uses volume to overwhelm discourse, or assertion to overwhelm equipoise, or opinion to overwhelm evidence, or status to overwhelm judgment. I almost want to add, using rules to overwhelm principles but it’s really the arbitrary writing of the rules that comes over Bifrost and those who follow along are not the fake heroes, they’re as much the victims as anyone else. In some of my most shameful episodes I have co-victimised under such a regime and as usual, I do want to be told if I’m doing anything I write about here.

A new hope

So. What sort of hero do I see for us now? I really do think we need the Trickster to form and help us out in several ways, not least in changing how we use energy and resources. Even the Trickster is being faked in various blond bodies at the moment, though, and that might be the dirtiest trick of Jotunheim. Mostly we need calm and a focus on removing the active threats that we are continuously generating ourselves.


Journal club: Andromeda Shock

Reinventing the sled

Bob Wright, who is happily retired from St Vincent’s in Sydney and remembered with varying degrees of awe, fondness and scorn, used to say that you had to examine patients regularly to figure out whether you were doing them good or ill. The scorn comes from those who note that he didn’t publish much, who mostly missed out on his shrewd reading of the state of the art. The awe comes from those who realise late on that he was mostly right despite often being against the fashions of the day, or decade, or millenium. The fondness is from those who saw how much work he put into getting it right, with a sceptical and intense focus on what can be learned and applied from the literature and the corridor conversations.

Is it an art?

Obviously I don’t think medicine is an art. I wouldn’t write this journal club if I thought it was a personal aesthetic and moral crusade. I would be writing about how to become a better person, or how in the end nobody could trust anything said by anyone outside of a closed list written by me. It’s a profession, which is wider and less pure than an art: it requires science and reason and art and dirty dealings, and most of that is quite beyond me as it is beyond most people. Fortunately there are things that we can do that are quite straightforward and require none of the hard stuff. After using a cursory amount of the hard stuff to suspect a serious infection, you do some neatly described things from a closed list, early and quickly, and in so doing you make it less likely that the afflicted will die. As that list was drawn up in something of a hurry, items have been falling off it for some years including central venous oxygen saturation monitoring and activated protein C. Other items are flirting with the list; the list’s parents want to believe in them but notes that the last list they took to the prom ended up pretty trashed and the parents are rightly cautious. In case that metaphor went too far off the rails, I’m talking about steroids, who ruined senior prom for pulmonary fibrosis, a subject newly close to my bruised heart.

What is the question?

Other things are on the list despite nobody being that into them, like a first boyfriend who’s never obviously caused harm but everyone feels might be holding the list back in reaching its full potential. Large bolus fluid resuscitation is one of those things. But how to decide? Do we need differnent fluid, less fluid, fluid in fewer people, the same fluid but slower, or some combination of those? And how would we test it? There have been several good attempts at doing so, none of which really came up with an answer. My guess is that it was because none of the attempts managed to isolate a fluid administration behaviour while leaving other behaviours unchanged. The allocated behaviour often affected a combination of those questions, while at the same time being unable to measure all of the other changes in behaviour among astute and present clinicians who reacted to the allocated change. Then, there was little willing to adjust for those changes in behaviour even when they were measured, or they were measured in a way that didn’t allow adjustment again because of reluctance.

In this trial [1] the groundwork was laid for some of those future trials. It sought to decide whether death was more common among those with septic shock if using frequent lactate, or frequent clinical examination with capillary refill time. It hoped to determine whether lactate or CRT was the better way to guide resuscitation.

What is resuscitation?

Well, that’s the thing. The protocol used here was very easy to explain. Patients entered the study if they had been given 20 ml/kg fluid and were still on vasopressors to maintaint MAP of >=65 mmHg. One by one various “tests” of interventions were applied, each taking an hour or two, and if one was positive then that intervention was continued and the testing stopped until the next lactate or CRT time. If it was negative then the next intervention was tested. In order, the interventions were fluid boluses; then inodilators if that didn’t work, plus vasopressors for a MAP >= 80 mmHg if they had chronic hypertension.

That’s a lot of potential sets of actions to be allocated to, each one conditional on response to a first allocation. That is quite hard to analyse. My very first thought on reading it was that if you do lots of tests you’ll fail some tests or, in more time worn language, “if you don’t take the temperature you can’t find a fever”. More testing leads ineluctably to more intervention in this trial, and at the design phase I would have expected someone to mention that it is then quite a difficult thing to understand if the results are not all in the same direction. Right enough, of those who were administered the test, the same proportion ‘passed’ in each group, which opens the obvious question, “would those not administered the test also have ‘passed’?”, and so which bit of the intervention is helping or harming, if all the bits are being administered to different people? Of course, the intention to treat analysis is easier to understand and tests the question “do I use the lab-intensive lactate or the clinician-intensive CRT?”. That was the registered primary outcome, and that is what we’ll look at first.

It was “non-significant”, in that the mortality was 35% in those monitored with CRT and exposed to subsequent resulting interventions, and 43% in those monitored with lactate, and exposed to subsequent resulting interventions, with a probability of that result or one more extreme of 0.06 under the null hypothesis.

Oh come on

Well that was disappointing. If it were p=0.8, or p<0.01, we would be happy and then happily start savaging the methods. But that’s like starting a really good movie and finding the last bit chewed up by the dog. What’s the actual answer?

And why am I so invested all of a sudden? Truly, I’m not going to be implementing either of these protocols religiously and so the lesson of the overall ITT analysis is not the most important one to me, nor to the majority of readers. The hook for me is the story of clinical examination and the remote possibility that it actually is as Bob said, once again. I’ve always believed that you have to go back to the patient after you’ve done something, or withheld something significant, and see what you’ve done. If for nothing else, you maintain the relationship and increase your own therapeutic value, and you allow yourself to learn about the effect of what you’re doing in a process that continues until you stop returning, or looking, or listening, or decidine, or caring. Some believe that you need to go back and write something, which I suppose is fair enough, as long as you’re writing it for the information of others and not just to signal virtue or ostentatiously coroner-proof the notes. That’s just distracting.

Then the direction of the point estimate is exactly where my bias would want it to be: regular clinical contact is not inferior and seems to be superior to what my bias tells me is a distractor.


The clinical interaction uses all of your senses and all of your qualities. Signals are floating about the room, the bandwidth is extremely wide and you are both receiver and integrator. If you’re distracted or angry or feeling disempowered, if you’re sick or (to a lesser extent) tired or haven’t read enough or feeling too emotionally attached or disconnected, even contemptuous of the patient and their life choices, you won’t process it all well enough. And then someone stands in the corner of the room with a flashy sign saying something like “look, lactate!”

My experience of being mentored is not extensive but I’ve watched many people you could call models. A powerful action was always to bring down the volume a little, sit in close to the patient and talk to the patient, the relative, the sitter or someone who has been watching and has not already decided what is their agenda. That might be bedside nurse or other involved clinician, but often they have their agenda and want to filter the whole clinical experience, to drop the bandwidth and distract your attention from what they see as minutiae or irrelevancies, towards what they’ve decided is important. These days it’s growing harder to do that as people shorten their patience and attention span, and clinical examination is downrated in the list of things calling on patience and attention.

Be kind to time

This is as much a note to self as a general practice point. There are clear pure moments where the universe will sing to you if you will let it. Among them is the first time you meet a very sick patient and everyone else is paralysed, locked deep inside their head and running their wheel. That sort of moment is cleared of obstructions by psychology, others are cleared by social structure, such as the ward round in which you’re left for a precious five minutes talking to the patient rather than the computer. It was a surprise to me that other such moments exist outside of me, such as when someone else is eliciting responses and either through great art or holy ignorance is getting responses that I didn’t. Those times need to be protected as well, while not simply casting a blanket rule that everyone needs to take turns with the teddy to talk, including the patient.

Where to now?

Well. There’s so much here. I won’t be doing either of these in their entirety but I will be feeling reassured that I don’t need to measure lactate more often, or give more fluid. I’m still not sure what to do if someone else gives heaps of fluid or inodilators or asks for a higher MAP target. Do I carry it on? Stop it? Stop if side effects? Look harder for side effects? Ignore side effects unless reported by someone else?

If I were designing the next trial I would write out the expected proportion of each group who would be tested and exposed to the higher intensity intervention, I would assume some range of possible effect sizes for each intervention and I would simulate it extensively even to power for a frequentist ITT analysis. I wouldn’t include any interventionsn that couldn’t be somehow transparently modelled before starting, so that others could take the results and work better with them.

I would still try not to expose one group to more conditional interventions than the other. That makes the statement of results unnecessarily difficult, as above.

Evidence based medicine questions

  • Was the actual intervention delivered well and easily described, including all “supplementary” differences as a result of the intervention and referred to in the protocol?
    • Yes. But the problem here was the conditional nature of much of the intervention, and subsequent inferential clouding.
  • Were the groups comparable, and how can this be ensured? Or is it not as important to ensure it in a RCT?
    • Yes. Block randomising and centre effects, plus the use of Cox proportional hazards for the primary outcome of time to death censored at day 28, both minimise difference in baseline and account for any remaining baseline imbalance, although the need for that is hotly debated.
  • How important is baseline balance to “comparability” and future inference?
    • Even hotterly debated. Any position is supported by some logic and has been taken by experts, from “critically important, you can’t trust results you haven’t done in your own population” to “the long run ensures that your decisions are only wrong alpha percent of the time as long as you strictly stick to your protocol interpretation”. My own is that you will be more efficient if you take some account of it and will get to the next question more surely.
  • What is the meaning of this 35% vs 43% mortality and a p value of 0.06?
    • Oh come on.

1. Hernández G, Ospina-Tascón GA, Damiani LP, et al. Effect of a Resuscitation Strategy Targeting Peripheral Perfusion Status vs Serum Lactate Levels on 28-Day Mortality Among Patients With Septic Shock: The ANDROMEDA-SHOCK Randomized Clinical Trial. JAMA. February 2019. doi:10.1001/jama.2019.0071

Journal club: It’s all gone quiet over there

The title seems fitting for my first post in approximately one eighth of a geological epoch, and it has a passing relationship to the paper [1] discussed by Lachie Inch on Easter Friday. Which itself is just the best sign of what a 24 hour, 365 day unit we really are. This turned out quite long.

This paper tells the unsurprising story that doctors don’t follow scripts or structure very well. In particular, the researchers described and found little evidence that doctors had used either checklisted inquiry about 7 categories of patient’s perceived values or “3 deliberative strategies” to elicit the right decision for a patient from their proxies: how the patient would feel about the state they’re in, how they would express their values if able and a clinician’s recommendation based on those values.

They used transcripts of conversations, which I always see as slightly weird and very biasing, and qualitative coding, where someone reads the transcript and says “yes, the strategy was used here” or not. The categories are things that we’d all want people to take into account at the end or indeed the middle or beginning of our lives: independence, burden of treatment (ickily juxtaposed with prolonged life), physical prowess, social network, cognitive function, living as long as possible, and spirituality. The gruelling coding was applied to 108 conversations and found that under 35% of them talked about each of these, with clinicians initiating or sustaining those components in fewer than 15% of conversation. These are quite confronting and while it’s not appropriate or necessary in every chat (you can often get to a clear message or consensus without asking about how they like their eggs), there are opportunities to know the people better for whom we care.

The EBM (and subject domain) questions were:

  • How is a cohort study defined?
    • I’d answer that it’s unsatisfactory in most of the literature. A cohort is defined most generally as a group that has characteristics defined before the outcome is available. This is usually expanded to include groups that are defined by characteristics that are measured before the outcome is available, in databases or registries where the outcome is available. There’s something amiss in the epistemology of that definition, where we pretend that we are wilfully ignoring information in order to define our cohort. It’s hard to do so practically and most retrospective “cohorts” have biases due to researcher degrees of freedom, where the case definitions, their boundaries and the exclusions are subtly influenced by having outcome information available.
  • What is different about the possible biases in a cohort study compared to a case-control or cross sectional study?
    • See above. ideally cohorts represent the source population but that’s more often honoured in the breach if they’re retrospective. Representing the population is a powerful way to say that you are observing the causal effect of the characteristics that define the cohort. Cross sectional studies look at everyone, or a sample from everyone, who have either exposure or outcome, and correlate the two: their biases can be fundamental unless they are an essentially random sample from absolutely everyone (such as the Global Burden of Disease studies).
  • Do you have any comments on natural language processing or coders coding speech from a recording or transcript?
    • I just think it’s quite bias-prone. It ignores gestural or emotive cues and relies on the absolute attention, judgment and consistency of the human coder, and it is very much not scaleable so it relies on these tiny samples of 108.
  • What is our duty to the family’s views when we are trying to decide the right approach to end of life care?
    • This is partly a moral question: to be effective and faithful to the patient’s needs, we doctors need to have a relationship and a partnership with proxy decision makers. To be human, we need that relationship with those who love the patient. Its also partly a law question and where you stand on that is mostly determined by your experience with the law and lawyers. Between those often conflicting spheres of influence – morality often defining what we see as right, and law what is permitted – lie a set of decisions that are good for the patient and the family. Then, having decided the approach or direction or whatever we call it, there are further technical or medical decisions about how it can be done, and we iterate morally and legally to yet further decisions about which of those are best.
    • I see our general duty in quite a Calvinist, or Stoic, way: to use our arts to do the right thing. Our duty to the family’s views is to hear them and honour them, even if not in acquiescence.

There’s been a lot of focus recently on goals of care in my workplace, and it’s probably overdue and probably useful. I went to a session recently that I’m arrogant enough to construe as more useful in the anthropology of the event than the actual learning or development. The audience were grindstone-apposed clinicians, such as consultants and Aboriginal liaison officers and social or pastoral care workers, and so the desires for the session were mostly quite unfocused and contained only a little jargon. There’s a grammar to the whole “communication” teaching world, and it’s a strong assumption in most of the teaching that better familiarity and facility with that grammar or jargon will improve some aspect of practice in the wild environment. I guess that is generally something I would not like to believe: there are indeed teachable principles of good communication but anyone who’s been in stressful situations with a wide variety of people can tell you that there are more ways than one to communicate well. I learn about this from patients and families, and my trainees and very occasionally – when we can get over the choking weight of snobbery and let each other listen, when we can let the hierarchy go and bear our fragility to each other, by the light of the blue moons when we can be at once brave and honest and stalwart yet short of certain – my colleagues.

I even learn from people telling where I’ve done wrong: that most quickly and painfully and, oddly, in private as I lick my wounds and reflect on what malformed demons had me speaking so, rather than in simply accepting the guidance and integrating like a computer or a grownup. I think I might not be alone in that. I don’t really learn from people giving me a script or pretending to be a patient or attempting medium-fidelity simulation of emotions, because I’ve been around the block and I’ve seen mothers wail at their child’s passing and young men watch death come over the hill for them and young women pass under its shadow and come unprepared into the harsh brightness of survival and old women laugh in its nasty teeth and old men hug it tight with thanks. I and probably you and everyone you have seen stare into the middle distance when telling a story have seen this and been marked, and developed antennae about what people are doing and when they will listen, and lost an easy facility with jingles and scripts along the way. If we need anything it’s assurance and calm and space and the authority to take the situation and own it, consequences and all.

I’m going to go further and say that a normative approach to teaching communication will rub out quite a lot of inchoate but helpful reflexes in those taught, hopefully replacing them with other useful reflexes if done well. The net effect may be a gain or a loss but will be more familiar, and more assessable, and eventually, if everyone including the patients and their families do the same sort of communication training, will become the new secular culture and so of course it will work better. In a way, it will impose itself best on those incapable of understanding it. If it does so, the whole taught style and structure will need quite a lot of maintenance to keep it working. This may explain some of the results of communication teaching or interventions around specific, high pressure moments: their effects fade, and effects outside their immediate zone of influence are barely detectable [2].

So. This course. The clinicians wanted to learn about how to be nice yet stand their ground, or to get people to agree with their perception of the clinical course, or to crack the hard nuts who would not agree with the right thing. Or they wanted to hear that they were doing OK, or that there was some way out, or that there were people to hand it over to. That’s all good, and there were some things in it for all those needs, I suppose, if they hadn’t read very much either during or since an undergraduate medical degree. There were some startlingly effective uses of yarning, as usual, hit me right in the feels: that way of circling a topic and somehow checking that everyone sees it, and that everyone has the same feeling about it and knows that we’re all in the same game, has a very modern appeal and is very ancient indeed. It’s not teachable, it’s coachable; it’s not a collection of microskills, it’s a mindset about the relationship; it’s not referrable to a template in the absence of shared worldviews, so it’s not assessable and saleable. There were some spectacularly missed cues, and a paternalistic attitude to ancient knowledge, and some lip service. There were lots of people wanting to do the right thing. A little more coaching and a little less teaching might have got a little further and more durably.

There was not an emphasis on the thing that I’ve seen as the most effective thing to give people for end of life conversations, which is moral courage. If you can face someone who is going to die and tell them practical things, and listen to their fears and let them be at least a little of themselves at the end of the day, then you will do them a great service. There are lots of neurolinguistic tricks and culturally specific shortcuts for each culture – which you should probably not try unless you are at home – that are probably oversold in courses like this unless they are unusually culturally aware. But what matters when faced with death, your own or someone else’s, is the courage to be honest.

1. Scheunemann LP, Ernecoff NC, Buddadhumaruk P, et al. Clinician-Family Communication About Patients’ Values and Preferences in Intensive Care Units. JAMA Internal Medicine. April 2019. doi:10.1001/jamainternmed.2019.0027

2. Carson SS, Cox CE, Wallenstein S, et al. Effect of Palliative Care–Led Meetings for Families of Patients With Chronic Critical Illness: A Randomized Clinical Trial. JAMA. 2016;316(1):51. doi:10.1001/jama.2016.8474

Journal Club: PPI for bleeding

In which Pawel Irisik examines a Big Dumb Trial.  This one [1] is exactly the sort of question that needs a BDT: proton pump inhibitors versus placebo in ICU patients. tl;dr they caused about the half the low bleeding risk to which ICU patients are exposed given placebo, but no difference in any other outcome.

In most ICUs in OECD countries we give acid suppression to everybody, either proton pump inhibitors or H2 receptor antagonists.  This started in the old days to stop stress ulcers, giant erosive prostaglandin mediated things that covered the whole stomach, of which many bled and with which many died.

The background

Stress ulcers don’t seem to be around much any more, so it’s hard to lavish much resource on investigated their cause, but it was probably a general systemic illness effect, with deprivation of food an important contributor.  GI bleeding among ICU patients on PPIs is much rarer now than was GI bleeding among ICU patients on PPIs in days of stress ulcer yore.  So the question is, are PPIs as useful as they once seemed, or should the credit go to the overall less brutal ICU practice of the now? 

With that hint that routine PPIs are part of the days of cardiac output optimisation and paralysis for “fighting the ventilator”, there are many great plans worldwide to answer the question, “is it better to give acid suppression or not?”. A little late, perhaps, as this trial found that 42% of patients are already on them at home. Still, blanket PPI coverage is a troublesome rock sticking out in the tide of best practice that forms modern ICU, so the question will be answered well.

Because it’s such a routine practice, the silky powers of the ANZICS CTG have been brought to bear, to point out the case for equipoise. Just riding out as an expert or a managerial style research capo, with a banner saying “PPI v placebo”, might not persuade many people that there really is a question.  After all, “We give PPIs to everyone.  We have no more stress ulcers. People used to die of that shit.  Have you run out of topics, you fool?”

The actual sciencey reasons have been WELL dissected by some of the most productive and best communicators of the ANZICS CTG, which means the whole topic is much better dealt with elsewhere. Apparently they’re associated with C difficile infection and myocardial infarction, for misty reasons. What interested me to put this on Journal Club is that very process of building equipoise. It’s one of the strengths of the CTG, convincing all kinds of ICUs that there is a real question, that we don’t know things and that doesn’t make us bad people, that we can answer it and that in doing so we will make things better for patients.  It’s nice.

There is a planned pantoprazole-v-placebo trial looking at GI bleeding, REVISE. That’s pausing for thought after this trial.  There is a planned pantoprazole-v-ranitidine cluster randomised trial using administrative data (the things that hospitals do to get paid for casemix-based funding in Australia, and to stop their chief executives from getting the sack in other places), PEPTIC.  That’s potentially going to change everything, as the first one that will really be able to be run and give a clear answer using hospital-scale resources.  The point is, there are people articulating clearly that there might be reasons to stop giving everybody PPIs.


Equipoise is poorly understood. I take it to mean that, extracting emotion and latent preferences and obvious heuristic biases, the person who has equipoise believes that several options may be as good as each other.  So one intervention might have more efficacy but less tolerability and so overall be less useful, if the only option is to give it to all the same patients.  Or one might be new; or one might be so old that the information on it might as well be new, because the data is all low quality; or there are significant reasons why inference from another population is not transferable (Heterogeneity of treatment effect, differential treatment effect, novel confounders, under-dispersed interactions …).  Overall, it means that the deciding person is comfortable with any of the options.

There are those who have a narrow definition of equipoise: one says that you must believe that the interventions are equal but simply lack information, another that you must have exactly equal belief that each one is superior and so on.  Because these are almost all entirely foolish, they mostly end up with people denying the existence of equipoise.  It’s the same sort of foolishness that expects doctors always to know the right treatment… or at least to say they do.  It’s an adversarial courtroom definition and it has no place in science.  It denies the possibility that a person can be truly, honestly prepared to be just wrong without defending the wrongness, and intending to come to truth not by an authority informing them of their wrongness but by inquiry and reopening the old facts in light of the new.

So I say personal clinical equipoise can be as far from the zenith as “I sort of prefer this, going on my biases, but I know how often everyone else is wrong when they go by their biases so I’ll ditch my ego and support the research for the sake of patients”. It’s the kind of equipoise in which you could offer orchidectomy patients the three options of prophylactic radiotherapy, prophylactic carb& gem, or watchful waiting as if it’s a normal choice of the kind that humans make about themselves.  See, it does exist. It’s the kind of equipoise by which a seasoned anti-Parenteral Nutrition activist might push for their unit to be involved in a supplemental PN trial because look how often people are wrong about nutrition (ICYMI, it’s every time they speak or write anything about it. Every time.). It’s a bit meridional for some, but it’s the way we learn how not to harm patients.

Community equipoise, on the other hand, is when there are camps of opinion where people genuinely believe that their preference is better, yet are happy to randomise to get the answer they expect.  How can this be OK? Surely if you have the tablets from the Mount you want to proselytise, not just keep them in your Holy? Well, apart from the fact that some people are happy to have others in error, that’s just not the way scientifically derived opinions are formed.  People who believe that they have the true knowledge and others are inferior are dangerous, because that is not a belief that can be tested, challenged, falsified or mitigated.  Again, it comes down to the personal responsibility to recognise when you might be wrong and hopefully our ICU community is grown up enough to display that.

Also, on a more jihadi note, when there are camps around certain practices but not a whole monolithic religion yet (say the Righteous Sect of Early Tracheostomy and the Noble Sect of Late), it might not be possible to have your chosen belief dominate, and a trial is the only way, pragmatically, that you will win. So community equipoise is not simply weak will and group think but a principled way to move your whole community to safer, more effective practice through research.

Intention to Treat

So, this is an intervention that is applied across the whole ICU population: you need to know what the effect is to the next admission. So you need an intention to treat analysis (which they did as their primary analysis; I’m just emphasising).

If there were potentially side effects that were dose dependent, or significant proportions of people were unlikely to get it despite being allocated, or heaven forfend if we were somehow capable of giving medicine differentially to the members of externally identifiable segments of the polity (cf Macpherson Report), then other analysis such as Per Protocol or Sensitivity might be more important.  It all depends on the intended use of the information, with a strong additional feature that ITT analysis is less susceptible to selection bias on behalf of the analyst.

This paper

It’s a large,multicentre product of the Scandinavian Critical Care Trials Group, who as well as being apparently limitlessly pleasant in plenary and discussion, are also methodological ultras, so the randomisation scheme and follow up are presumably faultless and look so from the report.  3298 were randomised to 40mg pantoprazole daily or placebo until ICU discharge and 31.1% versus 30.4% died by day 90, with 2.5% and 4.2% having significant GI bleeding.  Oddly, Only 20 of the 3298 withdrew entirely, which in light of the equipoise discussion shows people have a real desire to help to find the answer even if it’s not relevant to their own care.

I find it a little weird that they don’t report more about bleeding in the main paper, but table S8 in the supplements shows “overt” bleeding in 5.4 v 9.0%. “Significant” was bleeding associated with 20mmHg drop in BP, 2 unit drop in Hb, 2 unit transfusion or 20% increase in vasopressors; which is a little loose for most future purposes. Only 16/88 vs 28/148 had endoscopy for their overt bleed; the numerators may be the same or different for the significant bleeds, but the denominators are 41 and 69. Either way, that’s well outside of our usual, or recommended practice. 3 v 5 had surgery and 2 v 4 endovascular haemostatis.  Oddly, although 14/41 vs 21/69 were ulcers or gastritis as expected, only 6/41 and 14/69 were “other”, which I suppose means all sorts of things but implies some in the placebo arm were not preventable by PPI.

Also, 32.5 versus 29.6% of patients had a transfusion over the 90 days.  Yes, more transfusions in the pantoprazole group. Maddeningly, there’s no total RBC unit count, just a median 0 and range 0 to 1 for each patient.  There was no association with C difficile or MI.

The EBM questions:

1. What is equipoise? And what is community equipoise? What are your duties as a doctor, to the institution and science and the patient, when your equipoise conflicts with someone else’s equipoise? Is it OK just not to “believe in high flow” for example?
2. And most importantly, when is a per protocol analysis appropriate, as opposed to an Intention to Treat analysis?

  1. Krag M, Marker S, Perner A, et al. Pantoprazole in Patients at Risk for Gastrointestinal Bleeding in the ICU. New England Journal of Medicine. October 2018. doi:10.1056/NEJMoa1714919

Journal Club: Blood tests for brain damage

Some Evidence Based Medicine questions for this episode are:

  • What is a “false positive” in the context of this sort of study / report?
  • How do you calculate / quote / control false positives?
  • What do the answers to the above questions mean for our decision to adopt or ignore a new test for something?


I recently read about the promising serum neurofilament light chain test, on twitter.  This might be a more common way to hear about promising papers in the future but for now it all seems a bit new and confusing.  The background is that the TTM trial [1] of two standard targets for temperature control after cardiac arrest had a null result: that is, if you attempt to limit further brain damage after cardiac arrest using cooling, then it doesn’t matter whether you target 33 or 36 degrees across the whole population.  While doing that trial many participants had blood taken to find out more about the biology of cardiac arrest. This was not quite part of the trial design: a trial is a closed question, with exacting design to produce an answer that you can understand. There were no calculations on power or error control, no preregistration, no cross validation, and nothing about the biobank was going to answer the trial question. What it does is provide standardised information on a well characterised cohort in whom at least one part of treatment definitely has an independent causal effect on outcome, and really that is worth an awful lot.

Mechanism is a really vital part of trial design and I think no trial should be publicly funded without having a clear discussion about whether the mechanisms behind it are clear. If they are not crystal clear then a trial is a very good setting to find out more.  I’d call the TTM version the entry level mechanism evaluation: blood samples were stored for later analysis, to characterise the biochemical changes over the first couple of days in a group of patients with a carefully measured clinical course. This might generously be called a study within a trial or SWAT [2] and as a fan of branding I like it except that they missed the golden opportunity to call it a trial-within-a-trial.

In this paper NLC is reported as predicting outcome very precisely: a single blood test at day 3 can tell you whether the patient has very severe or not very severe injury, without needing any other tests or indeed clinical examination, and is not wrong very often.  Great, sign us up!

Sometimes things are as good as they seem

This is biology, which is a messy science: there are all sorts of complicated things going on. Press button A and you don’t always get response Z. It is, though, a science. If something has a true relationship to something else then it will show up repeatedly.  In this case there are lots of molecules that appear in the CSF and blood of those with brain damage, and several of those were measured in in CSF it correlates closely with the amount of brain damage, and in serum slightly less well but there was hope that it would be useful.

Sometimes they are not

If you suffer damage that will kill you, then your NLC may not always be the same number, or even above a certain threshold. This might be because the causes of death are other than brain damage, or the brain damage just doesn’t cause the predicted elevation of NLC, or anything else that is associated with both the predictor and outcome.  These “anything elses” are confounders if they modify the outcome both in the presence or absence of the predictor. If they have no effect alone, but alter the effect of the predictor on the outcome, then they are quite reasonably called effect modifiers or interactors. If none of these things are measured then you can still estimate the effect of the predictor on the outcome if the predictor is randomly assigned and you have enough outcomes; the magical power of randomisation again.

This study is a cohort and does not have the magical power of randomisation, because participants were not randomly assigned to any part of the things that are measured.  So the certainty attached to a certain strength of association depends on some things which are fairly difficult to model. This might have become fairly complex if there were a large effect of treatment assignment, for example if 33 were better than 36 in those with more than 10 minutes of CPR. As it is, even if there were a single hypothesis about NLC there are many ways that it could have been higher at some time point in those who were apparently doomed to die. It might have been because of the inevitable result of a certain injury severity, the unconscious and undetectable lag in treating those who are not doing so well (but were not certain to die), or conversely the effect of aggressive treatment for those who are not doing so well (and were harmed by the overall aggressive treatment), a preponderance of certain injury types in the group who eventually died which was not causally associated with death but was causally associated with NLC levels and so on. Add a few other molecules from a few different parts of the neuron and the chance that a positive finding is not as informative as it seems is fairly high.

I’m not sure, after reading it a few times, whether they used a standard train-and-test approach. In that approach you would select some of the cohort at random and sequester them somewhere without looking at them until you had a hypothesis to test.  You would play around with the remainder until you had some hypotheses, then commit your analysis plan for the test set before running it, just so that you had less chance of being fooled.  There are other ways to do it and this one is probably very conservative, but if you have found a new biosignal that is as transformative as the twitter press, then you will not lose it in a 30% test sample. In this paper I don’t think they did so, it was more used as a resampling technique to determine the uncertainty around the estimates of the information that each molecule added to the prediction of outcomes. Like a bootstrap [3].

What are the ROCs?

Any test is a combination of data and a rule.  Otherwise it’s not a test.  For example, if you were to use the 97.5th centile of the value in normal adult male humans as the cutoff above which everyone is assumed to have the disease you’re looking for, then no test would work very well. So there is some rule applied to the data that generates a “positive test”. The Receiver Operator Characteristic curve joins the dots of sensitivity and specificity of the whole test under various versions of the rule. Hopefully the identical data should produce unique dots for every chosen level of sensitivity or specificity, so choosing several rules produces a single line on the curve. For example, above a certain very high threshold of NLC nobody is alive and so the specificity of (NLC>threshold) is 100% for that threshold. Below that threshold, however, there are a lot of people who died anyway, and so the sensitivity is quite low.  With a lower threshold some will still survive despite the high predictor and so the specificity is lower. And so on until the dots can be joined.

Different data, such as a different molecule or the addition of clinical examination, give different qualities of prediction and the most predictive model can be seen graphically as being further from the line of no information.

This is related to the other information metric used in the report, the Akaike Information Criterion. There are long good descriptions of this, and short dud ones.  Here’s a short dud one: it’s a way to pick the simplest model that best fits the data.

So more work is needed, yeah?

Yeah.  Sorry.  This is a training set, really.


Nielsen N, Wetterslev J, Cronberg T, et al. Targeted Temperature Management at 33°C versus 36°C after Cardiac Arrest. New England Journal of Medicine. 2013;369(23):2197-2206. doi:10.1056/NEJMoa1310519
Treweek S, Bevan S, Bower P, et al. Trial Forge Guidance 1: what is a Study Within A Trial (SWAT)? Trials. 2018;19(1). doi:10.1186/s13063-018-2535-5
Efron B, Diaconis P. Computer-Intensive Methods in Statistics. Scientific American. 1983;248(5):116-130.

The quolls

Quolls are fast, aggressive, intelligent but tragically short lived marsupial predators. Like a GURPS Uplift client that spent too much on their attributes and had to cover the cost with the Short-Lived flaw, they have little tribal opportunity to pass techniques on to subsequent generations.

They are hence distinct from those who perform Qualitative Research, which is

Research involving the studied use
of empirical materials such as case
studies, personal experience, life
stories, interviews, observations,
and cultural texts

National Statement 2018 [1]

Although again Wikipedia is a bit more nuanced and both less bombastic and more specific, where the National Statement would leave the uninitiated wondering what the hell all other research was.

Qualitative research is a scientific method of observation to gather non-numerical data.

The dustbin of opinions[2]

These Quals are measurably slower, but equally aggressive and intelligent; I have no empirical data on their lifespan but they are considered predators (unpublished work)[3] and have a cultivated ability to pass on techniques, albeit in modified forms and subject to considerable schismatism. Their techniques involve things as apparently simple as the semi-structured interview – structured interviews or surveys being not qualitative because they produce results in closed categories – and as abstruse and resilient to paraphrase as Rural Participatory Action Research.

I am obviously from a different cultural hinterland from the Quals.  Much of what is called research is what I would previously have considered to be essay writing or journalism, whether reporting or editorialising. Sometimes causal statements are made on the basis of observations that while gruelling might equally be described as casual. Some qualitative research reaches a high orbit of empowerment and partnership without being destructive or colonial upon any viewpoint, while most of it only wants to persuade you that it does so.  But past all that, the defining drawback of qualitative research for me is that its reports are very long. I would almost rather hear the individual accounts than their interpretation by the researcher.

In saying these things I may be endangering my relationships with several researchers whose work and views I respect, and of course I’m being mostly flippant.  There are activities that are called qualitative research which descend from a venerable and redoubtable tradition of allowing the powerless to speak, or which question how we ask fundamental questions. To the extent to which these are problematic, it’s due to their need to cosy into the biomedical machine and be presented as the same sort of endeavour as mechanistic laboratory research, clinical trials, hypothesis-driven epidemiology and economic health service evaluation.

There are also activities that would be quite recognisable as feed-in topics but understandably don’t want to be subsidiarised to the eventual numbers, logic, deductive-centred project to take all the glory, funds and oxygen. Establishing the rationale for a question, checking whether there is equipoise for a trial, addressing the needs of excluded participants, building a community of inquiry and so on, are the sort of groundwork that has always accompanied any good mainstream project, but now they have their own rules and will probably be done better and for more respect and reward.

This week’s journal club article covers that ground and was the prompt for this apparently ill-judged but but in truth simply curmudgeonly rant.  I always find it hard to review these.  In the end I have adopted a pretty stripped back approach to reading qualitative research: my approach has become to look at the data type first: what sort of information is being gathered and how is it being processed, before being interpreted? Then you can move onto seeing where the biases might be, and so deciding whether you trust the results.


Council, N. H. and M. R. National Statement on Ethical Conduct in Human Research (2007) (Updated May 2015) | National Health and Medical Research Council. (2011). Available at: (Accessed: 22nd August 2018)
In publication, personal opinion, anonymous.

Journal club: The quolls show empathy


John Floridis has nobly taken the bait and agreed to review a Qualitative Research article for Journal Club this coming Monday.  This is possibly less brave than it seems but still gets the points for Gryffindor.  I have a couple of “qualitative projects” on the bubble and I work in a broadly “qualitative-friendly” unit, but the overall perception of Intensive Care is an intimidating wall of data, like the Matrix with added hurt.  This is obviously not true, as you see from attending any unit meeting where expressions of human sympathy and raw emotion are increasingly common and are things that I applaud.  To that end, here is a report of audio recorded formal family conversations (or care conferences) in paediatric intensive care, designed to relate expressions of empathy to the subsequent course of the conversation[1].


The techniques, being unfamiliar, may take some reading. Audio recordings were analysed: in this setting, analysis is often designed after becoming familiar with the data.  This is obviously prone to certain cognitive biases on the part of the researcher and a clear description goes some way but as far as I can tell not all the way to  showing the readers, who are really the users of the information, whether they can trust the conclusions.  In this case the qualitative part was fairly circumscribed.  The care conferences were of a formal and scheduled sort, and consent was obtained before they started from the families and the physicians. The recordings were coded for the presence and type of certain features of the conversation only: whether a statement of empathy was made and what sort it was from a closed list [2], whether it was buried in the middle of a jargon cascade or unburied (allowed to be heard by itself), and what happened next in the conversation.

The techniques listed include “direct content analysis” and “investigator triangulation”, which can be looked up. The important thing is that these involved judgment on the part of those who decided what to code and how to code them. This judgment is the qualitative part.  Then those codes were treated as categorical characteristics of the conversation and analysed for their association with each other, using descriptive statistics and a hypothesis test, in this case the chi squared test. Does this p value mean what it seems to mean? As usual, it depends what you started with and if you just view it as another statement of how unlikely the apparent correlations are, if there is in truth no correlation, then it is harmless and maybe helps. The smaller point I’d like to slip in is that, as demonstrated by this p-value, all research contains qualitative elements.  The decision to apply a Pearson chi squared test to abstracted new parameters from these conversations is an entirely qualitative one which was not reported according to the qualitative research guidelines, because it is out of scope.  This sort of thing is more discussed in the context of data science. Check out the NSSdeviations podcast for wrangling with it IRL or the Book of Why which is trendy at the moment for a philosophical approach to causal inference, or a million other solutions in rpubs.

The questions for personal growth and life satisfaction are:

What are the indicators of study quality in qualitative research?

When is qualitative research an appropriate use of time versus simply doing something such as a practice improvement?”

October TW, Dizon ZB, Arnold RM, Rosenberg AR. Characteristics of Physician Empathetic Statements During Pediatric Intensive Care Conferences With Family Members: A Qualitative Study. JAMA Network Open. 2018;1(3):e180351. doi:10.1001/jamanetworkopen.2018.0351
There are apparently five sorts of empathetic statement.  I am shocked that the JAMA subeditors didn’t detect that the authors used a “pneumonic” to classify these as naming, understanding, respecting, supporting, or exploring.  This NURSE mnemonic seems a sensible and accepted list but it’s sort of not qualitative but unordered categorical.

Journal Club: preventative dexmedetomidine

Delayed again, here is Journal Club from the 10th of August. I have not given up on the back catalogue, I’ve just placed it on notice.  This one [1] was interesting to me because it seems to add to the drip of evidence [2, 3] overturning my prior belief that dexmedetomidine is just a sedative, usefully limited to a low maximum dose and slow titration by pharmacokinetics, that permits more human treatment of patients.
Jilly Cooper did a good job of fighting through the many voices to present the clinical evidence of this paper.  It has lots of good qualities: double blind, permuted block randomisation to the treatment or placebo and tightly protocolised delivery, outcomes measured using widely accepted standard instruments, several outcomes measured which adds to “internal validity”.
It has some bad qualities: a convenience sample (I made a little squeak when I read this. A convenience sample, published in the Blue Journal? My honour!), two centres, an unusual patient population not currently suffering from delirium 2/5 of whom were not currently sedated despite being invasively ventilated.
It has some neutral qualities: among these I would put the intervention, a fixed low dose alpha agonist overnight after switching off any other sedatives.  There were strong views and raised voices about the intervention but this is my show and I’ll therefore have the last word: although I wouldn’t consider that recipe, despite reading quite a lot of alpha-fu over the last couple of months I don’t pretend to have The Wisdom when it comes to dexmedetomidine. The multiple measures are probably a neutral feature. Although it allows a rich description of the phenotype achieved by each patient there is no way that information would really be used in this trial. The multilevel (~=clustered / correlated / autocorrelated) structure of the data across all these measures is not easy to deal with and the temptation to overturn a primary trial result would be very strong. It was probably a fishing expedition for future trials or future use of the instruments themselves and as such would raise suspicions at our ethics committee because of the burden on participants and the high likelihood of wasted data.
The bottom line is: it seemed to work, although only at about day 7 before which the lines on the failure-time graph are pretty close.  After that they are not close.
The validity question came up again and so I’ve written a post about validity because I weary of this.  In summary it says that a trial conducted like this gives you information about the total effect of the intervention [4, 5]  and there are limited legitimate ways to discard the data: bias (limited here), chance (for which you need to know stats-fu) and differential treatment effect between the studied patients and any patients you might see (rarely a legitimate attack, but included for completeness [4]).  My view is that this trial gives more information about the biological effect of dexmedetomidine. Using this and other evidence it seems that I was wrong and that the effect is to reduce the probability of delirium in those at risk and to shorten its duration and reduce its severity.
Finally, the questions for personal development:
When you see a Kaplan-Meier curve how do you personally interpret it? Now quickly read, I don’t know, Wikipedia on survival analysis for about 5 minutes. Was your intuition correct or are there subtle problems with a naive reading of the lines?
When an intervention seems to work, as here, but it given in a strange way, as here, the temptation is to give it either in your own way (unsupported by evidence) or in the strange way (unfamiliar and prone to novel errors). How do you approach this question and how do you frame the discussions with the people who will help you decide?
What is meta-analysis and how would it weight this study? Not judge it, just weight it.
Skrobik, Y., Duprey, M. S., Hill, N. S. & Devlin, J. W. Low-Dose Nocturnal Dexmedetomidine Prevents ICU Delirium. A Randomized, Placebo-controlled Trial. American Journal of Respiratory and Critical Care Medicine 197, 1147–1156 (2018).
Riker, R. R. et al. Dexmedetomidine vs midazolam for sedation of critically ill patients: a randomized trial. Jama 301, 489–499 (2009).
Reade, M. C. et al. Effect of Dexmedetomidine Added to Standard Care on Ventilator-Free Time in Patients With Agitated Delirium: A Randomized Clinical Trial. JAMA 315, 1460 (2016).
Hayes, R. J. & Bennett, S. Simple sample size calculation for cluster-randomized trials. Int. J. Epidemiol. 28, 319–326 (1999).


This is a rubbish word.  It is debased coin, so clipped and filed and recast that it barely contains any silver.  There are alternatives that mean more about the problems or solutions in reading evidence, and there is no certainty that your interlocutor means the same as you by the word Validity.

The Oxford English Dictionary calls it

the quality of being logically or factually sound

(which is it, OED? Logically or factually? Because they are friends but not twins). Merriam-Webster adds


which is even worse. Wikipedia is probably more nuanced but a better working definition of a non-working concept simply reveals its flaws and the length of the disambiguation page is a good guide to the structural ambiguity of the topic.

My Epidemiology notes from the MSc reflect the ambiguity, with some statistical definitions and some philosophical ones.  The statistical ones are operational and clear, but probably don’t mean what you think they mean (and are not all the same), and the philosophical ones are glossy fakes, signifying nothing useable. Statistical validity might mean “the correlation of quantity’s measure with its true value”, which is fine as long as you have the true value to hand or are assuming that the true value is asymptotically estimable. If this is not the case, and you then question the “validity” of something, then you are making a philosophical statement.  If I’m being kind to its users, as I believe one should always be kind to those who know not what they do, I might accept that it is a useful container word, which can suggest a whole universe of other meanings without immediately specifying them but indicating that the conversation is about to take that turn.  A lower quality turn, to be sure, but it’s good to have a warning.

Philosophical validity looks clean if you never have to use, hear or see it again.

“If the groups are comparable they have internal validity”

“if the conclusions are applicable to other populations it has external validity”

Here, validity might mean the extent to which people believe something or can use something to work with other things (face validity or construct validity are commonly used tokens). This is circular, which is fine as long as you’re not trying to make a logical argument.  Or it might be external validity, which is something to do with the similarity of a situation to another situation and would be an entirely insufficient expression to introduce the nuanced and uncertain conversation about identities and high dimensional differences, even if there were no other meanings of the word validity.  It might mean representation validity, a combination of the effect that a theory or description or translated word or token of communication has on the perception of someone who is exposed to it and the closeness of that effect to some defined effect; now this is obviously nonsense and not to get too circular, has a validity that depends on each of the participants in the exchange.

I have almost finished but my special spite is reserved for internal validity which is a sort of slur that can be applied to any report, protocol, discussion, explanation or even idea that you don’t like and seems to need no further explanation of what are the problems and whether they can be solved.  It’s the lazy science fight equivalent of dropping your pint, the precursor to putting the head on your opponent (which you can spot in your next science fight by somebody saying “that’s been disproven”, or in Australia, “that’s controversial”).

What can replace these catchy container words?

  1. If you mean “similar” then say it. Similarly with “the same”, or more mathematical or logical concepts such as “correlated”, “predicted” and so on. You will have to do some explaining about their dimension and degree of similarity but that is all to the good.
  2. You might prefer to say something more emotive such as “agree”, for translations or opinions.  This might be liberating. I don’t think there is such a thing as universal objectivity, but you might disagree.  Yes that’s a joke.
  3. If you hate something then either come clean or haud your wheesht. Although we don’t talk about science fights I can reveal that they are more fun when they are conducted in simple language about high concepts.
  4. If you are talking about bias then you need to start talking about the sources, effects and any corrections. I have come to the tentative conclusion that while bias is officially not correctable, it might be possible to use techniques or outside data to infer things even in the presence of bias. Maybe that makes it “not bias any more”, maybe we need to be more elastic about what bias is: that’s a topic for another time.

Happy validating.

Bicarbonate for metabolic acidosis

I am an acronym connoisseur.  I have as much disdain as anyone for the contrived acronym (Levosimendan to Reduce Mortality in High Risk Cardiac Surgery Patients: A Multicenter Randomized Controlled Trial does not acronymise* to CHEETAH), but on the other hand a well chosen one, or a narrowly missed one, can be a source of simple joy.  When designing a trial name, I do wonder what determines the need for an acronym.  Is it a thing that some people demand? Perhaps there’s one devotee at the table who sites there bursting with mirth until she’s allowed to unleash the choice marketing ploy, or maybe it’s just a matter of a checklist.  Either way, someone dropped a bollock with acronymising this trial of bicarb for metabolic acidosis in the ICU [1]:


* this is a word, obviously.

8.4% Sodium bicarbonate, let us catechise, is a molar solution, that is one containing one thousand millimoles per litre of both sodium and bicarbonate.  That makes it hypertonic.  This trial used half that strength, giving 250 ml (one eighth mole!) over 30 minutes, to a maximum of 1000ml (500mmol!!!) over 24 hours.  Such a blast of sodium increases various things, including myocardial contractility, no matter what state your body is in, simply because it increases the half cell potential for sodium.  Then as bicarbonate converts to carbon dioxide it raises the CO2 concentration which is then breathed out, leaving a little more water and a little more hydroxide paired with the sodium. The effect of infusing the sodium bicarbonate is modest until you can breathe out the carbon dioxide: the difference between the amount of hydrogen ions that would be available in the serum alone and the amount that are now available because of the passage of carbons out of bicarbonate and into carbon dioxide is very small, as you see if you infuse molar sodium bicarbonate into a paralysed person and add dead space until the end tidal carbon dioxide is constant: the pH judders a little but doesn’t move a lot. Thankfully keeping the ventilation the same, or titrating to an end tidal CO2, causes the sodium hydroxide to accumulate pretty quickly, the intracellular CO2 not to rise and the overall situation to turn to alkalosis.

An alternative, as in [2], is to use sodium lactate which does very similar things to the half cell potentials but additionally causes vasodilatation. But that’s nothing to do with this trial.

This was a parallel group randomised controlled trial powered for an absolute reduction in mortality by 15%, or 30 fewer deaths in the intervention arm of 200 patients than in the control arm of 200 patients, an odds ratio of 0.52 if the relevant risks of death were 45% and 30% as assumed.  To reiterate every other time I’ve presented such things, that’s not a treatment, that’s a cure.  Antibiotics have that sort of effect.  Not much else does, and it’s a little odd to expect a solution that’s been in and out of favour for many years to have that effect.  Adding to my surprise is the inclusion criteria which in our ICU have a mortality of roughly 20% (pH <7.2 with HCO3 <20 and PaCO2 <45) and which would not currently be a threshold to make any of us think of bicarbonate.

So it’s an optimistically powered trial of an extended indication for a physiologically titrated treatment with little evidence in its current restricted use. That doesn’t sound promising.  And so another layer of stun* hit me when it showed a clear signal for improved survival in the intervention group, albeit in the setting of 55% control group mortality, and an odds ratio of 0.7ish for death within 28 days. From the KM curves, all the separation has happened by day 3, but persists forever (or 28 days, which is forever as far as funding is concerned). From the tables, it cures renal failure while leaving everything else the same except for the prespecified outcomes.

The patients were clearly sicker than the threshold, with a pH of 7.15 and bicarbonate of 13, just about where most of my colleagues would consider using bicarbonate. The difference between control and intervention was greater in the prespecified subgroup (out of 400?!) who had renal failure, which is again a surprise. If some of your acidosis is due to renal damage causing inability to regenerate bicarbonate, rather than sickness severity causing excess anions, then you are generally less sick; and so the effect of a treatment would usually be less assuming it has a constant effect across all categories. That effect, known as risk magnification, means that the greatest reduction in absolute risk is found in the middle of the distribution of baseline risk for any treatment that has a constant and uniform effect as measured by the odds ratio. It’s yet another thing that is sometimes called Heterogeneity of Treatment Effect, which is actually getting overused as a term and is thereby losing its own meaning [3].

Overall I can’t find any other reasons for this to have had such an effect, and yet I would still like to see a confirmatory trial before I start rewriting anything.  The sample size was a little light, the outcomes were strangely distributed, the analysis conincidentally fell over the Bright Line of p=0.05 after adjustment, the interweaving analyses raise more questions than they answer, and there is not enough longitudinal, intermediary, response-measuring detail to explain the overall observed effect; but it has a little biological plausibility and there is no signal for harm.  I’ll probably use bicarbonate a little more, I just won’t  be busting it out for everyone with a pH of 7.2

* also a word. Deal.


1. Jaber, Samir, Catherine Paugam, Emmanuel Futier, et al. 2018
Sodium Bicarbonate Therapy for Patients with Severe Metabolic Acidaemia in the Intensive Care Unit (BICAR-ICU): A Multicentre, Open-Label, Randomised Controlled, Phase 3 Trial. The Lancet., accessed June 16, 2018.
2. Ichai C, Payen J-F, Orban J-C, Quintard H, Roth H, Legrand R, et al. Half-molar sodium lactate infusion to prevent intracranial hypertensive episodes in severe traumatic brain injured patients: a randomized controlled trial. Intensive Care Medicine. 2013 Aug;39(8):1413–22.
3. Kent DM, Rothwell PM, Ioannidis JP, Altman DG, Hayward RA. Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal. Trials. 2010;11(1):85.