Intention-to-Treat Analysis

Intention-To-Treat Analysis
Gerard E. Dallal, Ph.D.

Asking the Right Question
Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. -- John W. Tukey (1962), "The future of data analysis", Annals of Mathematical Statistics 33, 1-67 (page 13).

Chu-chih [Gutei] Raises One Finger : The Gateless Barrier Case 3
Whenever Chu-chih was asked a question, he simply raised one finger. One day a visitor asked Chu-chih's attendant what his master preached. The boy raised a finger. Hearing of this, Chu-chih cut off the boy's finger with a knife. As he ran from the room, screaming with pain, Chu-chih called to him. When he turned his head, Chu-chih raised a finger. The boy was suddenly enlightened.
When Chu-chih was about to die, he said to his assembled monks: "I received this one-finger Zen from T'ien-lung. I used it all my life but never used it up." With this he entered his eternal rest.

In the beginning...

Here is argument for Intention-To-Treat analysis in its purest form:

At the conclusion of a research study, researchers will often argue for the excludion of subjects who haven't followed the protocol. At first blush, this may seem reasonable, especially in a randomized, controlled trial where the expectation is that randomized controls will insure the validity of the study. However, if dropouts and noonadherent subjects are ignored, there is the possibility that bias will be introduced or, to put it in a less technical way with a bit of hyperbole

Everyone loses weight in a weight loss study!

Consider two weight loss diets, one of which is effective while the other isn't.

People on the effective diet lose weight and stay in the study.
On the ineffective diet

This will make the ineffective diet look better than it really is--and, by comparison, the effective diet look worse than it really is--because the only subjects who remain in the study following the ineffective diet are those losing weight!

In response to this concern, it is now commonplace, if not standard practice, to see study sponsors and funding agencies specify that study data be subjected to an Intention-To-Treat (ITT) analysis with "followup and case ascertainment continued regardless of whether participants continued in the trial". Regardless means

regardless of adherence,
regardless of change in regimens,
regardless of the reason for withdrawal [accidental death is death]...

A popular phrase used to describe ITT analyses is "Analyze as randomized!" Once subjects are randomized, their data must be used for the ITT analysis! There are some exceptions. Many researchers would exclude subjects who are randomized but drop out before starting treatment IF the time between randomization and starting treatment is the same for all treatments in a blinded study. For example, it would be acceptable to ignore subjects who withdrew prior to the visit at which they first received one of two apparently identical pills.

ITT requires nvestigators to do everything they can to insure that their data are complete, especially with regard to the primary outcome measure.

Subjects who are nonadherent or who even stop following the protocol continue to be followed. This may include cajoling and pleading with dropouts (insofar as the Institutional Review Board will allow) to return so that final outcome measurements can be made.
Values that cannot be obtained directly are estimated.

--- Thus spake the proponents of Intention-To-Treat

Prologue

If you think about a nagging question long enough, say 25 years, one of two things will happen:

You'll stop thinking about it and get on with your life.
Your head will continue to hurt.
In a blinding flash of insight, the Gordian knot will be unloosed, the problem will melt away, the answer will be clear, and you'll wonder what the fuss was all about.

I've finally reached state (2) with Intention-To-Treat analyis. It's not that I have anything new to say this time around. It's more that I now see what I've been saying all along! As a trip to the Wayback Machine demonstrates, there has been one constant throughout this entire series of notes, which I quote from the earliest version captured, back in 2002:

The proper approach is to ignore labels, understand the research question, and perform the proper analysis whatever it's called !

So, here it is in a nutshell:

Intention-To-Treat Analysis, as it is often used, is a FRAUD!

I was somewhat hesitant when I first wrote those words. Hyperbole is one thing, but I wondered if this was a bit over the top. I don't wonder anymore.

The reason ITT is a fraud and the reason I no longer wonder whether my language is too strong is simple. Investigators start out with what seems like a straightforward question: Does something work? The investigators will be told that they cannot simply look at everyone who followed the rules of the study. The Intention-To-Treat principle says that they must consider everyone who took part in the study, regardless of adherence! However, the advocates of Intention-To-Treat often fail to acknowledge that this changes the research question! The Intention-To-Treat analysis is ofter presented as the proper way to evaluate the original question and therein lies the FRAUD.

Here's an example: Subjects are put on a diet. Everyone who sticks with the diet achieves the desired effect. However, as is often the case with diet studies, people stop following the diet for one reason or another. Let's suppose that half of the participants stop following the diet and that none of those who stop achieves the desired effect. There are two ways to view the data:

The diet is 100% effective for those who stick with it. (The effect of adherence.)
Only 50% of those attempting the diet achieve the desired effect. (The effect of assignment.)

Both of these statements are true and contain valuable information, but they answer two different questions!

The first statement describes what happens to people who adhere to the diet.
The second statement describes what happens to people who are placed on the diet.

The FRAUD occurs when the answer to the question of assignment is given as though it were the answer to the question of adherence! This is invariably what happens when emphasis is place on choosing between types of analyses rather than the questions they answer. The problems go away if those who insist on ITT analysis would recognize that they've changed the question under investigation.

Put this way, it's hard to get worked up over ITT because it now becomes so...lame. IIT proponents argue why an ITT analysis is better ("Here's why you do an ITT analysis..."), but once we recognize that an ITT analysis changes the research question, we find ourselves discussing not why one analysis is preferred to another but why the research question should be changed! Keeping in mind the quotation from John Tukey that opens this note, it becomes incumbent on those who would insist on an ITT analysis to either

explain
- why the original adherence question cannot be answered properly with data from the adherent subjects and
- why the assignment question approximates the adherence question well enough to let it stand in its place, or
acknowledge that they are changing the question because they feel the adherence question cannot be answered!

There are cases where an ITT type of anlysis apropriate because the original question is one of assignment. (What happens if you give someone this stuff?) Such questions demand an ITT type of analysis. In those cases, I require and perform analyses that would be called Intention-to-Treat...but not because they are called Intention-To-Treat. Rather, it's because that's what the research question demands.

Intention-To-Treat

Let's start with some terminology. There are generally two broad types of analyses:

Per Protocol in which only data from adherent subjects are analyzed.
Intention-To-Treat (ITT) in which all subjects are followed regardless of adherence.

In previous versions of this note, I made the mistake of buying into the ITT proponents' characterization of the issue in terms of which analysis should be performed. They then had me fussing over which type of analysis was "better". But, when the start of the discussion changes from changing the method of analysis to changing the research question, it becomes almost too ludicrous to continue. Your hypothesis is your hypothesis!

We can (and should!) talk about why a per protocol analysis might have problems and whether the appropriate inferences can be drawn, but that has nothing to do with wether an Intention-To-Treat analysis is appropriate for the question at hand.

The discussion of Intention-To-Treat is often complicated by the early introduction of issues involving missing data. It's natural to want to do this. ITT analyses often involve missing data because subjects drop out of the study or are lost to followup. However, missing data complicate the discussion and have the potential to confuse the issues. Subjects can be nonadherent without having any of their data missing, so we'll begin by assuming that the data are complete and only adherence is at issue. The topic of Missing Data will be discussed in its own section later.

With that out of the way...

There are four major ways in which proponents of intention-to-treat analysis claim ITT analysis is sound where PP analysis is can be faulty (Actually, I'm being kind. Typically, ITT proponents argue why ITT is the proper form of analysis, but it can't be because it answers a different question from the PP analysis.) :

Intention-to-treat simplifies the task of dealing with suspicious outcomes, that is, it guards against conscious or unconscious attempts to influence the results of the study by excluding odd outcomes.
Intention-to-treat guards against bias introduced when dropping out is related to the outcome.
Intention-to-treat preserves the baseline comparability between treatment groups achieved by randomization.
Intention-to-treat reflects the way treatments will perform in the population by ignoring adherence when the data are analyzed.

Dealing with questionable outcomes and guarding against conscious or unconscious introductions of bias

Paul Meier (of Kaplan-Meier fame), then of the University of Chicago, once offered an example involving a subject in a heart disease study where there is a question of whether his death should be counted against his treatment or set aside. The subject disappeared after falling off his boat. He had been observed carrying two six-packs of beer on board before setting off alone. Meier argues that most researchers would set this event aside as unrelated to the treatment, while intention-to-treat would require the death be counted against the treatment. But suppose, Meier continues, that the beer is eventually recovered and every can is unopened. Intention-to-treat does the right thing in any case. By treating all events the same way, deaths unrelated to treatment should be equally likely to occur in all groups and the worst that can happen is that the treatment effects will be watered down by the occasional, randomly occurring outcome unrelated to treatment. If we pick and choose which events should count, we risk introducing bias into our estimates of treatment effects.

Guarding against informative dropouts

This was illustrated by the introductory example involving two weight loss diets, where the effective diet looked worse than it really was because the only subjects following the ineffective diet who remained in the study were those losing weight. ITT would demand the inclusion of everyone who started on the diet.

Preserving baseline comparability between treatment groups achieved by randomization.

There have been studies where outcome was unrelated to treatment but was related to adherence. That is, success was determined not by the treatment the subject was given, but by how well the subject adhered to instructions, whatever they were. In many cases, potentially nonadherent subjects may be more likely to quit a particular treatment. For example, a nonadherent subject might be more likely to quit when assigned to strenuous exercise than to stretching exercises. In a per protocol or on treatment analysis, the balance in adherence achieved at baseline will be lost and the resulting bias might make one of two equivalent treatments appear to be better than it truly is simply because one group of subject, on the whole, are more adherent.

In the spirit of Paul Meier's example, consider a study in which severely ill subjects are randomly assigned to surgery or drug therapy. There will be early deaths in both groups. It would be tempting to exclude the early deaths of those in the surgery group who died before getting the surgery on the grounds that they never got the surgery. However, those who died prior to surgery were presumably among the least healthy subject. Excluding them has the effect of making the drug therapy group much less healthy on average at baseline.

Sometimes, what appears to be a problem with maintaining baseline comparability is something quite different. The real issue with the medication/surgery example is recognizing that the treatment is not only what happens during and after the surgical procedure. It includes what happens during the time spent waiting for the procedure to take place! Those subjects who died awaiting surgery might have survived if they were given the medication immediately.

Reflecting performance in the population

Intention-to-treat analysis is said to be more realistic because it reflects what might be observed in actual clinical practice. In practice, patients may not adhere, they may change treatments, they may die accidentally. ITT factors this into its analysis. It answers the public health question of what happens when a recommendation is made to the general public and the public decides how to implement it. The results of an intention-to-treat analysis can be quite different from the treatment effect observed when adherence is perfect.

My own views

IF we keep in mind that the proponents of ITT are asking us to consider a different research question their claims regarding the problems of conducting a valid per protocol study are true to a degree (* break my heart; good science isn't easy!). If all you want to do is find out what happens when subjects are assigned to treatment, then all you have to do is assign them to treatment. Then, it's just a matter of collecting the data and the question is answered. (Or is it? I think not! More about this later.)

However, researchers are often NOT interested in the assignment question. They care about the adherence question only. The two questions are NOT the same. Also, I believe that the arguments against per protocol analyses are not as strong as the ITT proponents suggest.

Let's consider once again the supposed 4 advantages of ITT after noting that in the absence of bias, the only thing ITT does is add noise to the signal from a PP analysis. If there's a moderate signal coming out of the PP analysis, an ITT analysis sprinkles on some random non-adherent subjects to attenuate it.

Intention-to-treat simplifies the task of dealing with suspicious outcomes, that is, it guards against conscious or unconscious attempts to influence the results of the study by excluding odd outcomes.
This strikes me as a weak argument. In Meier's example, the decision to exclude could be made by someone blinded to treatment. ITT resolves the issue by including everyone and everything so that any noise would affect all treatments equally. Blinded decisions can often do much the same to eliminate bias. The major difference is that ITT includes the noise while blinded assessment tries to exclude it.
Intention-to-treat guards against bias introduced when dropping out is related to the outcome.
This is true. Dropouts and nonadherence will always be a concern because one can never be certain of what biases they might introduce. Completers and noncompleters can be examined to see whether they differ in any identifiable way, while the treatments themselves can be examined for differential dropout rates. While it would be encouraging to have such checks come up negative, they are not a guarantee that no bias was introduced. But, no one ever said good science was easy.
However, in nutrition studies, almost all dropouts are noninformative, that is, unrelated to the outcome. My colleagues study treatments that are easily tolerated (eat this, drink that, take this pill). In addition, our volunteers are extremely health and diet conscious. When there are drop outs, it is invariably because of loss of interest, moving out of the area, changing jobs, or a change in family situation. The only effect of an intention-to- treat analysis in such cases is add noise to the data.
Intention-to-treat preserves the baseline comparability between treatment groups achieved by randomization.
In many cases, baseline comparability can be preserved by using statistical adjustments. Adjusting for adherence can eliminate some confounding due to differential dropout. However, this presumes we know what to adjust for, which is not always the case.
Intention-to-treat reflects the way treatments will perform in the population by ignoring adherence when the data are analyzed.
There is a bit of truth to this claim. However, it is more complicated than it first appears. Adherence during a trial might be quite different from adherence once a treatment has been proven effective. In such cases, ITT will NOT reflect what will happen in practice. We see time and again where a health claim is reported and the next thing you know it seems as though everyone is acting on it. The explosion in the consumption of soy and blueberries and the avoidance of hormone replacement therapy are just a few recent examples. People could always have done this, but they chose not to until presented with some preliminary evidence that is often weak and contradictory. There is no reason to think that changes in attitude would be any less dramatic once treatments have actually been shown to be effective.
Imagine a controlled trial where some participants take a small pill daily while others are required to undergo a daily series of uncomfortable injections. It almost goes without saying that the dropout rate in the injection group will vastly outpace that of the pill group. Because of the high dropout rate, an Intention-To-Treat analysis not likely to show the superiority of injections even if they are effective while the pill is not.
That's okay, ITT's proponents would argue. What's the point of a superior iinjection if no one will inject? Yet, the somewhere between 300,000 to 500,000 insulin dependent diabetics put themselves through this discomfort daily because they KNOW the benefits of taking insulin! Knowing that something works can often be a great motivator.
There may be cases where an intention-to-treat analysis will truly reflect the way the treatments will behave in practice because adherence during the trial will reflect adherence after the treatment is proven effective. I have been told that this is true in the field of mental health. I suspect this is the rare exception rather than the rule.

There are some circumstances that demand an intention-to-treat type of analysis. If the question is, "What happens once a treatment is started or recommended?" subjects must be followed once a treatment is started or recommended, regardless of what else happens. This is typical of the studies I see as a member of my institutions's Scientific Review Committee. Most involve comparing two medical treatments. The research question is invariably whether subjects starting on one treatment fare better than subjects starting on another. I invariably insist that an ITT analysis be performed because a health care provider needs to know what happens when subjects are prescribed (started on) a particular treatment. ITT is the appropriate form of analysis because it is dictated by the research question!^*

What I object to is not the analysis but the way the term "Intention-To- Treat" is often used as a magical incantation

without any understanding of its implications and
under the mistaken impression that an ITT analysis automatically makes any issues surrounding adherence, dropouts, and missing values vanish when answering the adherence question. (One can hear the Great Oz: "There are no problems here! An Intention-To- Treat analysis was performed!")

When used this way--without thinking about the underlying research question and the proper way to answer it--"Intention-To-Treat Analysis" is no different from the way Chu-chih's assistant (mis)used his master's one-finger Zen.

Whenever I evaluate a study, I don't care one bit what the investigators call the analysis. I invariably examine the study to learn what question prompted the research and assure myself that the particular analysis is appropriate for providing the answer. I ask myself "What is the research question?" and perform the proper analysis whatever it's called. Sometimes I perform both an intention-to-treat analysis and an on treatment analysis, using the results from the different analyses to answer different research questions.

David Salsburg once asked what to do about an intention-to-treat analysis if at the end of a trial it was learned that everyone assigned treatment A was given treatment B and vice-versa. I got to live his joke. In a placebo- controlled vitamin E supplementation study conducted in nursing homes, the packager delivered the pills just as the trial was scheduled to start. Treatments were given to the first few dozen subjects. As part of the protocol, random samples of the packaged pills were analyzed to insure the vitamin E did not lose potency during packaging. We discovered the pills were mislabeled--E as placebo and placebo as E. Since this was discovered a few weeks into the trial and no one had received refills, there was no possibility of anyone receiving something different from what was originally dispensed. We relabeled existing stores properly and I switched the assignment codes for those who had already been given pills to reflect what they actually received. How shall I handle the intention-to-treat analysis?

This slip-up aside, this is an interesting study because it argues both for and against an ITT type of analysis to answer the adherence question.

For: Because the study pill is administered along by a nurse along with a subject's medications, it is hard to imagine how adherence might change, even if the results of the trial were overwhelmingly positive. The adherence achieved in this study is the adherence that would be achieved if nursing homes adopted a policy of supplementing with vitamin E.
Against: It is likely that there will be many drop outs unrelated to treatment in any study of a frail population. Should they be allowed to water down any treatment effect? Some subjects will leave the study because they cannot tolerate taking the pill, irrespective of whether it is active or inactive, or because their physicians decide, after enrollment, that they should not be in a study in which they might receive a vitamin E supplement. If the study had resulted in a recommendation that supplements be given, such subjects would not be able to follow it, so perhaps it is inappropriate to use their data to evaluate vitamin E's efficacy.

The bottom line is that an ITT type of analysis may be appropriate in some cases, but it's not a magic charm. A good analyst performs an ITT analysis not to do ITT but because the analysis demanded by the research question just happens to be ITT. The analysis would be performed whether or not there were something called ITT. The one good thing about ITT is that it forced some people to think about the issues behind the recommendation, but I believe the good is more than offset by thoughtless use.

Missing Data

ITT is typically used as an umbrella to cover two distinct issues--adherence and missing data. They are distinct because even subjects who drop out may return for final measurements, while even adherent subjects may be missing data.

Common sense says, "The data are missing ! How can they possibly be filled in?!" Common sense is right! They can't!

Many imputation methods have been suggested by some very bright people, but there's nothing much that can be done without making lots of critical unverifiable assumptions.

Consider a longitudinal study that some subjects fail to complete. To keep things simple, think of a weight loss study. ITT says that the investigator should do everything possible to persuade the subjects to return for their final weighing. This is in keeping with the "how people will follow the recommendation" function of ITT analyses. In this case, ITT would be correct, not because it's ITT, but rather because it is in keeping with the goals of the study.

What about missing data? Some subjects will have dropped out and refused to have their final measurements taken, while the investigator may have lost contact with others. ITT mandates that we "fill in the blanks". The final values should be replaced with a "best guess". But, how ?!

One approach is to use a subject's last measurement as the final measurement. This is called Last Observation Carried Forward (LOCF).
Another approach fits a function, such as a straight line, to the available data to see what might have happened had the trend continued.
A third approach ("the hot hand") looks for subjects who are just like the ones with missing data (same age, sex, height, weight,...) and substitutes their data for the missing values.
Yet another approach ("propensity scores") looks to see which variables predict having a missing value. Those variable are then used to develop an equation in those with no missing data that can be used to predict values for those missing them.

All of the approaches are merely different ways of forecasting what the final measurement might have been, and analyzing the data with those imputed values. Those who believe in imputation say that it is often good practice to try more than one way to "fill in the blanks" to see whether conclusions change with the different methods.

I find imputation, like ITT, to be smoke and mirrors. Imputation is another kind of forecasting. If it could be done reliably, those who claim to be able to do it would have made a fortune in the stock market and retired ages ago. It hasn't happened. It is easy to show that any imputation technique may hurt rather than help depending on the circumstances, by constructing examples like the weight loss study. Consider the four methods listed above:

LOCF says there was no further change since the last measurement. This might not be too outrageous for a weight loss study. It says that subjects' weights did not change once they left. Typically, people regain any weight loss once they quit the program, so LOCF may make a program look better than it actually is.
Extrapolating from a straight line fitted to the data says that those who quit a program continue to lose weight at the same rate as when they were in the program. Once again, people typically regain any weight loss once they quit the program. If LOCF may make a program look better than it actually is, extrapolating from a straight line will many it look MUCH bettter!
The hot hand says it's okay to replicate data. If we're the same age, height, weight, and sex, then of course our weight loss is the same. Uhm...no.
The method of propensity scores is the most attractive on the surface.

No matter how fancy the acronym or how elegant/confusing the mathematics, the bottom line remains the same: Subjects dropped out ! Data are missing ! There is no reason for any of these approaches to work other than an assumption that they will, which is no reason at all.

In summary, the Intention To Treat approach is a tool. In some circumstances, it may be the right tool, but a slavish devotion to ITT is as bad as a slavish devotion to any other approach or method. One size does not fit all. The proper approach, as I've asserted with the earliest version of this note, is to ignore labels, understand the research question, and perform the proper analysis whatever it's called!

There remains one question begging to be asked, so I'll ask it: If missing data and the elimination of nonadherent subjects biases a per protocol analysis and ITT is not the panacea some make it out to be, what happens when both approaches are suspect? The answer is simple, if unpleasant: Who knows? Not every problem is amenable to a solution that can be summarized in a catch-phrase. If enough data are missing or enough subjects are lost to followup, the results may be suspect whatever one does. Situations like this can only be handled with great care and attention to detail on a case-by-case basis.

[back to The Little Handbook of Statistical Practice]