Gerard E. Dallal, Ph.D.
Asking the Right Question
Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. -- John W. Tukey (1962), "The future of data analysis", Annals of Mathematical Statistics 33, 1-67 (page 13).
Chu-chih [Gutei] Raises One Finger : The Gateless Barrier Case 3
Whenever Chu-chih was asked a question, he simply raised one finger. One day a visitor asked Chu-chih's attendant what his master preached. The boy raised a finger. Hearing of this, Chu-chih cut off the boy's finger with a knife. As he ran from the room, screaming with pain, Chu-chih called to him. When he turned his head, Chu-chih raised a finger. The boy was suddenly enlightened.
When Chu-chih was about to die, he said to his assembled monks: "I received this one-finger Zen from T'ien-lung. I used it all my life but never used it up." With this he entered his eternal rest.
Here is argument for Intention-To-Treat analysis in its purest form:
At the conclusion of a research study, researchers will often argue for the excludion of subjects who haven't followed the protocol. At first blush, this may seem reasonable, especially in a randomized, controlled trial where the expectation is that randomized controls will insure the validity of the study. However, if dropouts and noonadherent subjects are ignored, there is the possibility that bias will be introduced or, to put it in a less technical way with a bit of hyperbole
Consider two weight loss diets, one of which is effective while the other isn't.
In response to this concern, it is now commonplace, if not standard practice, to see study sponsors and funding agencies specify that study data be subjected to an Intention-To-Treat (ITT) analysis with "followup and case ascertainment continued regardless of whether participants continued in the trial". Regardless means
ITT requires nvestigators to do everything they can to insure that their data are complete, especially with regard to the primary outcome measure.
If you think about a nagging question long enough, say 25 years, one of two things will happen:
I've finally reached state (2) with Intention-To-Treat analyis. It's not
that I have anything new to say this time around. It's more that I now see
what I've been saying all along! As a trip to the
Wayback Machine demonstrates, there has been one constant throughout this
entire series of notes, which I quote from the earliest version captured, back
The proper approach is to ignore labels, understand the
research question, and perform the proper analysis whatever it's
So, here it is in a nutshell:
Intention-To-Treat Analysis, as it
is often used, is a FRAUD!
I was somewhat hesitant when I first wrote those words. Hyperbole is one thing, but I wondered if this was a bit over the top. I don't wonder anymore.
The reason ITT is a fraud and the reason I no longer wonder whether my language is too strong is simple. Investigators start out with what seems like a straightforward question: Does something work? The investigators will be told that they cannot simply look at everyone who followed the rules of the study. The Intention-To-Treat principle says that they must consider everyone who took part in the study, regardless of adherence! However, the advocates of Intention-To-Treat often fail to acknowledge that this changes the research question! The Intention-To-Treat analysis is ofter presented as the proper way to evaluate the original question and therein lies the FRAUD.
Here's an example: Subjects are put on a diet. Everyone who sticks with the diet achieves the desired effect. However, as is often the case with diet studies, people stop following the diet for one reason or another. Let's suppose that half of the participants stop following the diet and that none of those who stop achieves the desired effect. There are two ways to view the data:
Both of these statements are true and contain valuable information, but they answer two different questions!
The FRAUD occurs when the answer to the question of assignment is given as though it were the answer to the question of adherence! This is invariably what happens when emphasis is place on choosing between types of analyses rather than the questions they answer. The problems go away if those who insist on ITT analysis would recognize that they've changed the question under investigation.
Put this way, it's hard to get worked up over ITT because it now becomes so...lame. IIT proponents argue why an ITT analysis is better ("Here's why you do an ITT analysis..."), but once we recognize that an ITT analysis changes the research question, we find ourselves discussing not why one analysis is preferred to another but why the research question should be changed! Keeping in mind the quotation from John Tukey that opens this note, it becomes incumbent on those who would insist on an ITT analysis to either
There are cases where an ITT type of anlysis apropriate because the original question is one of assignment. (What happens if you give someone this stuff?) Such questions demand an ITT type of analysis. In those cases, I require and perform analyses that would be called Intention-to-Treat...but not because they are called Intention-To-Treat. Rather, it's because that's what the research question demands.
Let's start with some terminology. There are generally two broad types of analyses:
In previous versions of this note, I made the mistake of buying into the ITT proponents' characterization of the issue in terms of which analysis should be performed. They then had me fussing over which type of analysis was "better". But, when the start of the discussion changes from changing the method of analysis to changing the research question, it becomes almost too ludicrous to continue. Your hypothesis is your hypothesis!
We can (and should!) talk about why a per protocol analysis might have problems and whether the appropriate inferences can be drawn, but that has nothing to do with wether an Intention-To-Treat analysis is appropriate for the question at hand.
The discussion of Intention-To-Treat is often complicated by the early introduction of issues involving missing data. It's natural to want to do this. ITT analyses often involve missing data because subjects drop out of the study or are lost to followup. However, missing data complicate the discussion and have the potential to confuse the issues. Subjects can be nonadherent without having any of their data missing, so we'll begin by assuming that the data are complete and only adherence is at issue. The topic of Missing Data will be discussed in its own section later.
With that out of the way...
There are four major ways in which proponents of intention-to-treat analysis claim ITT analysis is sound where PP analysis is can be faulty (Actually, I'm being kind. Typically, ITT proponents argue why ITT is the proper form of analysis, but it can't be because it answers a different question from the PP analysis.) :
Dealing with questionable outcomes and guarding against conscious or unconscious introductions of bias
Paul Meier (of Kaplan-Meier fame), then of the University of Chicago, once offered an example involving a subject in a heart disease study where there is a question of whether his death should be counted against his treatment or set aside. The subject disappeared after falling off his boat. He had been observed carrying two six-packs of beer on board before setting off alone. Meier argues that most researchers would set this event aside as unrelated to the treatment, while intention-to-treat would require the death be counted against the treatment. But suppose, Meier continues, that the beer is eventually recovered and every can is unopened. Intention-to-treat does the right thing in any case. By treating all events the same way, deaths unrelated to treatment should be equally likely to occur in all groups and the worst that can happen is that the treatment effects will be watered down by the occasional, randomly occurring outcome unrelated to treatment. If we pick and choose which events should count, we risk introducing bias into our estimates of treatment effects.
Guarding against informative dropouts
This was illustrated by the introductory example involving two weight loss diets, where the effective diet looked worse than it really was because the only subjects following the ineffective diet who remained in the study were those losing weight. ITT would demand the inclusion of everyone who started on the diet.
Preserving baseline comparability between treatment groups achieved by randomization.
There have been studies where outcome was unrelated to treatment but was related to adherence. That is, success was determined not by the treatment the subject was given, but by how well the subject adhered to instructions, whatever they were. In many cases, potentially nonadherent subjects may be more likely to quit a particular treatment. For example, a nonadherent subject might be more likely to quit when assigned to strenuous exercise than to stretching exercises. In a per protocol or on treatment analysis, the balance in adherence achieved at baseline will be lost and the resulting bias might make one of two equivalent treatments appear to be better than it truly is simply because one group of subject, on the whole, are more adherent.
In the spirit of Paul Meier's example, consider a study in which severely ill subjects are randomly assigned to surgery or drug therapy. There will be early deaths in both groups. It would be tempting to exclude the early deaths of those in the surgery group who died before getting the surgery on the grounds that they never got the surgery. However, those who died prior to surgery were presumably among the least healthy subject. Excluding them has the effect of making the drug therapy group much less healthy on average at baseline.
Sometimes, what appears to be a problem with maintaining baseline comparability is something quite different. The real issue with the medication/surgery example is recognizing that the treatment is not only what happens during and after the surgical procedure. It includes what happens during the time spent waiting for the procedure to take place! Those subjects who died awaiting surgery might have survived if they were given the medication immediately.
Reflecting performance in the population
Intention-to-treat analysis is said to be more realistic because it reflects what might be observed in actual clinical practice. In practice, patients may not adhere, they may change treatments, they may die accidentally. ITT factors this into its analysis. It answers the public health question of what happens when a recommendation is made to the general public and the public decides how to implement it. The results of an intention-to-treat analysis can be quite different from the treatment effect observed when adherence is perfect.
IF we keep in mind that the proponents of ITT are asking us to consider a different research question their claims regarding the problems of conducting a valid per protocol study are true to a degree (* break my heart; good science isn't easy!). If all you want to do is find out what happens when subjects are assigned to treatment, then all you have to do is assign them to treatment. Then, it's just a matter of collecting the data and the question is answered. (Or is it? I think not! More about this later.)
However, researchers are often NOT interested in the assignment question. They care about the adherence question only. The two questions are NOT the same. Also, I believe that the arguments against per protocol analyses are not as strong as the ITT proponents suggest.
Let's consider once again the supposed 4 advantages of ITT after noting that in the absence of bias, the only thing ITT does is add noise to the signal from a PP analysis. If there's a moderate signal coming out of the PP analysis, an ITT analysis sprinkles on some random non-adherent subjects to attenuate it.
This strikes me as a weak argument. In Meier's example, the decision to exclude could be made by someone blinded to treatment. ITT resolves the issue by including everyone and everything so that any noise would affect all treatments equally. Blinded decisions can often do much the same to eliminate bias. The major difference is that ITT includes the noise while blinded assessment tries to exclude it.
This is true. Dropouts and nonadherence will always be a concern because one can never be certain of what biases they might introduce. Completers and noncompleters can be examined to see whether they differ in any identifiable way, while the treatments themselves can be examined for differential dropout rates. While it would be encouraging to have such checks come up negative, they are not a guarantee that no bias was introduced. But, no one ever said good science was easy.
However, in nutrition studies, almost all dropouts are noninformative, that is, unrelated to the outcome. My colleagues study treatments that are easily tolerated (eat this, drink that, take this pill). In addition, our volunteers are extremely health and diet conscious. When there are drop outs, it is invariably because of loss of interest, moving out of the area, changing jobs, or a change in family situation. The only effect of an intention-to- treat analysis in such cases is add noise to the data.
In many cases, baseline comparability can be preserved by using statistical adjustments. Adjusting for adherence can eliminate some confounding due to differential dropout. However, this presumes we know what to adjust for, which is not always the case.
There is a bit of truth to this claim. However, it is more complicated than it first appears. Adherence during a trial might be quite different from adherence once a treatment has been proven effective. In such cases, ITT will NOT reflect what will happen in practice. We see time and again where a health claim is reported and the next thing you know it seems as though everyone is acting on it. The explosion in the consumption of soy and blueberries and the avoidance of hormone replacement therapy are just a few recent examples. People could always have done this, but they chose not to until presented with some preliminary evidence that is often weak and contradictory. There is no reason to think that changes in attitude would be any less dramatic once treatments have actually been shown to be effective.
Imagine a controlled trial where some participants take a small pill daily while others are required to undergo a daily series of uncomfortable injections. It almost goes without saying that the dropout rate in the injection group will vastly outpace that of the pill group. Because of the high dropout rate, an Intention-To-Treat analysis not likely to show the superiority of injections even if they are effective while the pill is not.
That's okay, ITT's proponents would argue. What's the point of a superior iinjection if no one will inject? Yet, the somewhere between 300,000 to 500,000 insulin dependent diabetics put themselves through this discomfort daily because they KNOW the benefits of taking insulin! Knowing that something works can often be a great motivator.
There may be cases where an intention-to-treat analysis will truly reflect the way the treatments will behave in practice because adherence during the trial will reflect adherence after the treatment is proven effective. I have been told that this is true in the field of mental health. I suspect this is the rare exception rather than the rule.
There are some circumstances that demand an intention-to-treat type of analysis. If the question is, "What happens once a treatment is started or recommended?" subjects must be followed once a treatment is started or recommended, regardless of what else happens. This is typical of the studies I see as a member of my institutions's Scientific Review Committee. Most involve comparing two medical treatments. The research question is invariably whether subjects starting on one treatment fare better than subjects starting on another. I invariably insist that an ITT analysis be performed because a health care provider needs to know what happens when subjects are prescribed (started on) a particular treatment. ITT is the appropriate form of analysis because it is dictated by the research question!*
What I object to is not the analysis but the way the term "Intention-To- Treat" is often used as a magical incantation
When used this way--without thinking about the underlying research question and the proper way to answer it--"Intention-To-Treat Analysis" is no different from the way Chu-chih's assistant (mis)used his master's one-finger Zen.
Whenever I evaluate a study, I don't care one bit what the investigators call the analysis. I invariably examine the study to learn what question prompted the research and assure myself that the particular analysis is appropriate for providing the answer. I ask myself "What is the research question?" and perform the proper analysis whatever it's called. Sometimes I perform both an intention-to-treat analysis and an on treatment analysis, using the results from the different analyses to answer different research questions.
David Salsburg once asked what to do about an intention-to-treat analysis if at the end of a trial it was learned that everyone assigned treatment A was given treatment B and vice-versa. I got to live his joke. In a placebo- controlled vitamin E supplementation study conducted in nursing homes, the packager delivered the pills just as the trial was scheduled to start. Treatments were given to the first few dozen subjects. As part of the protocol, random samples of the packaged pills were analyzed to insure the vitamin E did not lose potency during packaging. We discovered the pills were mislabeled--E as placebo and placebo as E. Since this was discovered a few weeks into the trial and no one had received refills, there was no possibility of anyone receiving something different from what was originally dispensed. We relabeled existing stores properly and I switched the assignment codes for those who had already been given pills to reflect what they actually received. How shall I handle the intention-to-treat analysis?
This slip-up aside, this is an interesting study because it argues both for and against an ITT type of analysis to answer the adherence question.
The bottom line is that an ITT type of analysis may be appropriate in some cases, but it's not a magic charm. A good analyst performs an ITT analysis not to do ITT but because the analysis demanded by the research question just happens to be ITT. The analysis would be performed whether or not there were something called ITT. The one good thing about ITT is that it forced some people to think about the issues behind the recommendation, but I believe the good is more than offset by thoughtless use.
ITT is typically used as an umbrella to cover two distinct issues--adherence and missing data. They are distinct because even subjects who drop out may return for final measurements, while even adherent subjects may be missing data.
Common sense says, "The data are missing ! How can they possibly be filled in?!" Common sense is right! They can't!
Many imputation methods have been suggested by some very bright people, but there's nothing much that can be done without making lots of critical unverifiable assumptions.
Consider a longitudinal study that some subjects fail to complete. To keep things simple, think of a weight loss study. ITT says that the investigator should do everything possible to persuade the subjects to return for their final weighing. This is in keeping with the "how people will follow the recommendation" function of ITT analyses. In this case, ITT would be correct, not because it's ITT, but rather because it is in keeping with the goals of the study.
What about missing data? Some subjects will have dropped out and refused to have their final measurements taken, while the investigator may have lost contact with others. ITT mandates that we "fill in the blanks". The final values should be replaced with a "best guess". But, how ?!
All of the approaches are merely different ways of forecasting what the final measurement might have been, and analyzing the data with those imputed values. Those who believe in imputation say that it is often good practice to try more than one way to "fill in the blanks" to see whether conclusions change with the different methods.
I find imputation, like ITT, to be smoke and mirrors. Imputation is another kind of forecasting. If it could be done reliably, those who claim to be able to do it would have made a fortune in the stock market and retired ages ago. It hasn't happened. It is easy to show that any imputation technique may hurt rather than help depending on the circumstances, by constructing examples like the weight loss study. Consider the four methods listed above:
No matter how fancy the acronym or how elegant/confusing the mathematics, the bottom line remains the same: Subjects dropped out ! Data are missing ! There is no reason for any of these approaches to work other than an assumption that they will, which is no reason at all.
In summary, the Intention To Treat approach is a tool. In some circumstances, it may be the right tool, but a slavish devotion to ITT is as bad as a slavish devotion to any other approach or method. One size does not fit all. The proper approach, as I've asserted with the earliest version of this note, is to ignore labels, understand the research question, and perform the proper analysis whatever it's called!
There remains one question begging to be asked, so I'll ask it: If missing data and the elimination of nonadherent subjects biases a per protocol analysis and ITT is not the panacea some make it out to be, what happens when both approaches are suspect? The answer is simple, if unpleasant: Who knows? Not every problem is amenable to a solution that can be summarized in a catch-phrase. If enough data are missing or enough subjects are lost to followup, the results may be suspect whatever one does. Situations like this can only be handled with great care and attention to detail on a case-by-case basis.
[back to The Little Handbook of