Johns Hopkins Magazine - June 1994 Issue

Knowing the Trade-Offs

By Elise Hancock

Here's an age-old question: What works in medical care? What will really work, once the patient goes home and tries to live a normal life? Hssh. Don't tell anyone, because it's too early to say for sure--but it looks as if much better answers may become available through the growing field of outcomes research.

These days, medical methods pro liferate like rabbits in a corner. There are ever new theories, procedures, treatments, and medications to choose from--not to mention the old ones. Many are wonderful, but many are not, and the wisdom of Sir William Osler still obtains. Osler used to remind medical students at Hopkins, "Of course, 50 percent of what I'm teaching you is wrong. The difficulty is, I don't know which 50 percent."

Enter outcomes research, a way to assess medical methods that promises to augment the traditional clinical trials with some real-world answers. Clinical trials, known as the so-called "gold standard," measure efficacy, says Earl Steinberg, director of the Johns Hopkins Program for Medical Technology and Practice Assessment. He says "you have to know whether something works in the best of circumstances, to know whether there's a potential value."

Outcomes research, however, measures effectiveness--how well a method works in practice. How does it work for a wide range of patients, some of whom are much more sick than others? In the hands of ordinary practitioners, rather than specialists with all the resources of a big medical center? To answer such questions, the "outcomes" this research looks at are broad ones, viewed over long periods of time, in which blood tests, surgical procedures, and other "medical" indicia are only part of the data.

It's not good enough, say outcomes researchers, to just take out a cataract and record that the patient went home with visual acuity improved X percent, therefore all is well. Rather, you'd go back a year later--two years, three years, five years after surgery, whatever--and ask what is the patient's quality of life? Of all the cataract patients that year, how many needed additional surgery? How impaired is their functioning? Are there any subpopulations who responded especially well or especially poorly? Do the patients think the benefits exceed any losses? Over years, does less expensive treatment A show up just as well as costly B? Only slightly less well? Less well for which specific groups of patients?

Using computer power, often on pre-existing records (like insurance claims), outcomes research may compile and analyze results for thousands of patients, not hundreds. It can look at cost-effectiveness. It can compare results between different physicians, different institutions, and different subsets of patients.

Perhaps most unusual, outcomes research factors in the patient's preferences and quality of life, using the methods of "qualitative" research--traditional in anthropology and other social sciences, but a newcomer to medicine.

No longer do hard data reign supreme. Outcomes research, says Steinberg, "goes beyond the usual clinical endpoints to ask how much improved are the patients. Not just 'Did they die?' or 'Did they have another heart attack?' but what is their quality of life after this procedure, as perceived by the patients." Are they satisfied with their medical care?

The method of finding out is straightforward, he says: "We ask them."

For patients, the resulting information can make all the difference. Outcomes researcher Albert Wu, for example, an assistant professor in Health Policy and Management, has something of a specialty in chronic disease. In chronic disease any side effects of treatment are especially important, for if a disease is long-term, so must be its treatment--and so are any side effects. He says, "You wouldn't necessarily want to put someone on a drug for the rest of their life if it eradicated the bug but they felt worse."

"For many illnesses," explains Wu "--and AIDS is a very good example--the illness, which is chronic and progressive, makes you feel worse. Treatments make you feel better, though they also have their own adverse effects. The patient just cares, for the most part, about the net effect," he says with emphasis. This can be tricky to figure in a drug like ZDV, say, that causes headache and anorexia, but improves fatigue and makes people more able to work.

"What quality-of-life measures do," Wu says, "is inquire about common aspects of life--mental health, energy, ability to work, social function, general well- being." So in addition to questions about physical function, patients are asked questions like, "Have you felt downhearted and blue during the past four weeks?" and "Have you felt full of pep?" (The range of six answers includes "a good bit of the time" and "a little of the time.") The answers transform into a score ranging from 0 (worst possible impairment) to 100; scores can be compared to see whether, overall, a given treatment makes patients feel better.

Wu has conducted several studies of treatments for the opportunistic infections that dog patients with HIV infection. Clarithromycin, for example, turns out to kill mycobacteria, and the patients felt better. Fine. But not so--at least at doses of 1,200 mg--the drug ZDV, zidovudine (formerly called AZT. Usual doses are now much lower). A study Wu conducted in the late '80s found that after 24 weeks of treatment, 27 patients on 1,200 mg felt worse and had significantly less energy, compared with their own beginning condition, than did 25 patients taking placebos.

The results of such outcomes research, ideally discussed with patients by their health professionals, can help patients see their options clearly, and Wu says it is becoming more and more clear that clinically based answers and "outcomes" answers are not always the same. ZDV at 1,200 mg a day does slow progression to AIDS by almost a month, yes: that is clinically true. But were the months of feeling lousy worth it?

That depends on who you ask. Since there is currently no cure for AIDS, says Wu, some patients with HIV consider quality of life all-important. They want to preserve comfort even at the cost of longevity. Other patients choose the other way, and appropriate treatment can depend on the patient's values--which outcomes researchers also ask about.

As well as its uses in counseling individual patients, outcomes research can also serve to issue report cards, as it were, comparing results for different practitioners or hospitals, and it can look at results in relation to cost. To do that, researchers often make use of large data pools that were collected for other purposes--Medicare claims data, say, or the abstracts every state hospital makes of patient charts when patients leave the hospital.

These databases may not be perfect research tools, proponents admit. Just for starters, the abstracts seldom tell which patients were more or less sick to start with, which naturally affects outcome. (In the jargon of the trade, there's no measure of "severity.") It would also be useful to know the patient's "comorbidity," any other ailments besides the primary diagnosis. And are the results totally comparable? Were they gathered in the same way and do they record uniform sets of data? Not by a long shot. Those information gaps are huge.

Still, mountains of data are there. They exist. To accumulate all that information, a great deal of work has already been done. So why waste it? Why not, say, compare all the U.S. patients who have had various different treatments for renal failure in the last five years and see if some subset of patients does notably worse or better? For any given treatment, are unexpected side effects showing up? How do costs compare?

If such data are used with extreme care, say outcomes researchers, there's gold in them thar hills. The report card aspect has special value, says Steinberg. It is not only useful to consumers, but also forces both hospitals and clinicians to focus on long-term results and patient satisfaction in a way that just hasn't been the case. A bad report card, presumably, would drive improvement.

At the least, the results of outcomes research can be suggestive, yielding good questions for future, more rigorous research.

At the best, says Steinberg, outcomes data will become "a part of routine clinical care. The results should become part of a feedback loop, to help define what ought to be done in the future." In that way guidelines and standards for national practice could be continuously refined.

Though epidemiologists have been working along these lines for years, "outcomes research" as such is comparatively new. Steinberg's program at Hopkins was established in 1986, while the federal agency that funds such research (AHCPR, for Agency for Health Care Policy and Research) was established by Congress only in late 1989.

Nevertheless, some promising results are already in hand, from Hopkins and elsewhere, that show a good deal about the hows and whys of this new methodology--including a few surprises.

For example, the Hopkins outcomes program, with its 14 professional researchers, is a national center for the study of cataracts. In a series of studies, Steinberg and his colleagues examined 6,000 previously published reports, culling out all but the 90 that dealt with the most relevant and sophisticated techniques; each of the 90 studies was then painstakingly analyzed and graded on a scale of 1 to 100 for reliability. Then, adjusted according to reliability, results were pooled to look at overall incidence of 18 different complications. And several surprises emerged:

First, removing a cataract is not the cake-walk many have thought. In the short term, 95 percent of the operations succeed. But over the longterm, explains Steinberg, vision clouds up again in 25 to 40 percent of cases (see February 1994 Archives of Ophthalmology). That's because when the eye's lens is removed, the capsule is left in place to help anchor the new, artificial lens. If the capsule opacifies, as it sometimes does, the patient's symptoms return.

The capsule can be treated by using what is called a YAG laser. "Basically," says Steinberg, "the surgeon takes this laser and blows a hole in the capsule, so light passes through and the symptoms are resolved."

Fine, except that a subsequent study of 60,000 Medicare claims found evidence that this procedure--an increasingly common one--may quadruple the risk of retinal detachment, which is blinding. If the quadrupled risk turns out to be true, people who can live with some loss of vision in one eye may not care to risk losing all their vision. They might not choose to undergo the YAG procedure.

It may not be true, however, because there remains a problem with the data--the kind that critics of outcomes research like to point to: Medicare claims do not say whether the eye treated was the right eye or the left one. "So we can identify cataract surgery in a patient, and a laser in the same patient, and retinal detachment, but we don't know that they all occurred in the same eye." (The same problem holds with any paired organ.)

His team's solution, Steinberg explains, has been to obtain more recent data and use it to identify patients for further study. In other words, they backtracked, learning from the government who the patients and insurance carriers were. From the 50 insurance companies, they were able to learn who the doctors were. And from the doctors (having promised to keep each patient's records confidential), the Hopkins team is requesting the clinical data needed for a proper case control study.

This project is still in the works, but Steinberg is excited already, because the method is a breakthrough. "We're going to get clinically sound data," he gloats, "clinically meaningful data--from using the claims data to direct our data collection, as opposed to being the source of the data. I think that's very powerful, very exciting."

Though cumbersome, the method is still far cheaper and easier than such a comprehensive study would be if done from scratch, he says. Eventually, if money can be found and the confidentiality of computer records guaranteed, he hopes there will be no need to backtrack. Then all insurance claims and medical records could routinely include the clinical information needed to track important aspects of patient care.

Meanwhile, Earl Steinberg is virtually bouncing in his chair (in a very quiet and professional way). He looks as if he can hardly wait to get the computers whirring. He thinks it is clear already--given the degree of risk now known--that cataract surgery should be done only if the patient's quality of life, as the patient sees it, is substantially impaired. And he and his team have developed a way of measuring that, published in the May 1994 Archives of Ophthalmology.

The key to this recent study is a questionnaire, the VF-14 (VF for visual function), that the Steinberg team has developed and tested. This questionnaire asks patients to rate how much trouble they're having with daily activities like driving, writing checks, watching television, reading, and so on through 14 items. Again, the answers emerge as "a score between zero and 100--the lower the score, the greater your impairment."

As part of testing, for the same 766 patients, the team also collected data on patient satisfaction with their vision, as well as the traditional Snellen Visual Acuity ("you know, the eye chart with the big E"). "Well," says Steinberg. He leans forward. The correlation between VF-14 and a patient's general satisfaction with vision is excellent. But "it turns out that the correlation between visual acuity and patient satisfaction with their vision is--take a guess_" He waits while the interviewer fumbles.

Obviously a surprise is expected, but_well, surely there must be some correlation.

Steinberg is delighted. "The correlation is ZERO."

He thinks the main reason is that people have different needs. A duck hunter, a voracious reader, or an accountant will be quite bothered by a small loss of acuity, whereas people who use their eyes less will not. Also, the good old Snellen Acuity fails to pick up crucial factors. An eye that cannot handle glare, for example--the bane of night drivers--will not be spotted by the Snellen, which is given in cool darkness.

In April, even before the article was published, there was already considerable inquiry from ophthalmologists about the VF-14. It's quick, it's easy, and it does the job. Using the questionnaire, doctors can do better at tailoring their advice to the needs of each individual patient.

Valid and reliable questionnaires--meaning ones that really do measure what they say they measure, in a way that allows legitimate comparison--also exist that measure a patient's overall well-being and ability to perform the activities of daily life. More such tools are needed, however, for other specific conditions. The VF-14 equivalents for low back pain and congestive heart failure, for instance, would be very useful.

Such questionnaires are coming into use in many ways. At Hopkins Hospital, patients are now randomly sampled each year, to keep track of patient satisfaction with various aspects of their care. The instrument was developed by Haya Rubin, assistant professor of medicine and also the hospital's director of quality of care research, while she was at the California think tank, RAND.

Do patient assessments matter? Very much, says Rubin. They are helping Hopkins improve. And clinically, even raw data have some value, because it's been learned that patients who assess their care as less than excellent are less likely to return to that doctor for follow-up. They are also less likely to follow instructions.

Asked whether patients are, quote unquote, "right" in their ratings of medical care, Rubin stops to think. "Patients are not discriminating in a technical respect," she says at last. "When patients think all is well, they're not necessarily correct. But patients are usually right when they say something is wrong. On average, at least in primary care, doctors with higher patient satisfaction ratings also get better clinical results, across the board."

The reason, she thinks, is that patients know when they've been listened to, "and they know you can only achieve good effects with medicine if someone has taught you how to use it. They don't know if it was the right antibiotic. They know if you listened. They know if you conveyed to them what they needed to know."

In other words, the physician trait that patients notice as "listening" may not be good medicine in itself. But it may well make technically excellent treatment more useful to patients, and it may correlate with more accurate diagnoses and more appropriate treatment. "I would strongly suspect it does," says Steinberg.

These ideas are explosive. Many physicians are upset at the notion of "report cards" and "guidelines," because they do not want their skills (and their livelihood) judged by popularity contest. Even Steinberg, committed though he is, flares up briefly at the idea that patients might evaluate physician care correctly. "Patients cannot usually assess technical care," he snaps.

It would be all too easy to abuse outcomes research, using it to justify what doctors contemptuously call "cookbook medicine." Many fear to see insurance companies and administrative hoo-ha calling the shots.

"Doctors are concerned," says Rubin, "that many people think there is an easy answer," which there is not. "This is hard," she says, "and no computer is going to help you. The problem is not the lack of computer power. It's the fact that human beings are all unique."

To show why, Rubin hauls out a fat book of data, page after page of charts. She goes on to describe the study the charts are about, a recent massive project by 11 members of the Academic Medical Center Consortium to develop standards for several procedures used to treat coronary artery disease. As a specialist in blending clinical and statistical requirements of outcomes research, she was the Hopkins representative on the study.

To begin this study, various specialist societies nominated outstanding physicians from all over the country, who painstakingly evolved a list of key clinical indications for cardiac patients (age, symptoms, stable, chronic, two vessel disease, three vessel disease, ejection fraction >35%, normal risk, high risk, etc. etc. etc.). The final total was 996 items. Then RAND consultants took treatment options (bypass surgery if angiopplasty contraindicated, bypass if angioplasty possible, angioplasty compared to medical therapy), and assembled it all into a massive table that seemed to take in every permutation of coronary patient one could find. "You might say, it was hard, but we did it," Rubin says. They seemed to be well on their way to establishing standards.

As a last step, after weeks and weeks of effort, the nine physicians then set out to test the chart on real cases, the idea being to combine their collective wisdom, some 200 years' worth of clinical experience and study. ("These are big names," says Rubin. "Remember, they were nominated by specialist societies.") Now for each cell of the chart, this council of experts would develop a consensus on which treatment would be appropriate for which precise condition, starting with 100 actual case charts from each medical center. Cases were randomly selected, says Rubin, and "we tried to classify each one into one of the cells--and there's the rub. We tried."

Independently, then independently after group discussion, each clinician rated each actual case as to whether the treatment given had been necessary--and in 37 percent of cases, even after discussion, the consensus was, uncertain. Puzzled, Rubin brought the Hopkins "uncertains" back to Baltimore and met with the cardiac surgeons. "Why did you do the bypass graft?" she would ask. "Why not just drugs or angioplasty?" "Oh," said the surgeon. "_Yes, I remember this one. Well, the abstract doesn't show it, but the patient had already had four angioplasties."

"I don't know about you," says Rubin, "but that sounds reasonable to me_." Her shoulders slump. "This is hard," she says again. "And even with some of these cases cleared up, there may still be as many as 10 or 20 percent who don't fit into those cells." Doctors fear that "somebody is going to tell that 10 or 20 percent, 'We won't pay,' or 'We won't authorize that surgery.'"

Rubin sighs. "I'm not being nihilistic, I think this can be done. But what doctors fear is a very appropriate concern. We'll need to give a benefit of the doubt." In any pool of cases, she says with certainty, "there will be exceptions. Legitimate exceptions. So outcomes research will be good for general guidance. It will be good for policy decisions. But it is not good for payment decisions. There will always need to be a review."

"It's frightening," Steinberg agrees. "If this research is done wrong, so that what's being compared is apples and oranges, some providers may be unfairly penalized, classified as 'less effective' or 'more costly' providers than they really are. That would be extremely damaging."

He prefers to look, however, at "the exciting part--that it offers the prospect of tremendous insights into the value of what we do. Because the fact is that while we're spending $800 to $900 billion dollars a year on providing health care in this country, we're spending precious little to evaluate what we do. It doesn't make sense to run an operation that way."

So much need for the research, so much to be done, so many ways to go wrong--where should researchers in this fledgling discipline start? The 80-20 rule, one of the world's all-time great rules of thumb, states that achievement and effort are 80-20 proportional. In other words, 20 percent of the effort gets you 80 percent of the way (like tidying as opposed to cleaning a kid's bedroom); 20 percent of the people do 80 percent of the work (alas, too true); and so on. Knowing this, it's often useful to ask, Where's the 20 percent? If I can only do part, which part will get me furthest?

Several Hopkins outcomes researchers were asked the 80-20 question: In their field, where's the 20 percent? Where should effort be focused?

"Follow-up," says Donald Steinwachs, director of the Health Services Research and Development Center of the School of Public Health. "Frequently people focus on what happens during the time that the patient is seeing the physician. I think we have to remember that a lot of quality problems have to do with what happens outside of that time. Do the patients come back? Can the patients manage their own treatment and medication when the doctor is not there?" Steinwachs would like to see some kind of systematic effort to make sure that adequate follow-up occurs.

"I'd put my money on procedures," says Steinberg, without hesitation. "Both surgical and technical. They're the easiest to study--and there's a lot of money at stake, either because they're high-cost procedures or because there's a lot of volume." Within that group, he'd start with procedures "where there's a lot of variation in practice, so that you have a good likelihood of distinguishing the difference in outcome as a function of difference in practice."

Eric Bass, an assistant professor of medicine and an associate of Steinberg's at the Outcomes Program, sees the biggest gains in a focus on data. "We need easier ways to measure key outcomes, ways that fit into routine practice," he says. In its present form, outcomes research is time-intensive, therefore expensive. Making it just part of day-to-day medical care, so that ordinary practice generates constant feedback, will require that all the right data get into the databases. How? Most outcomes researchers foresee a computer interface, where physician (or patient) would answer questions by touching a screen, to systematically zip off machine-readable data--data that are truly comparable with data for other patients.

What would be the key elements of that data? Bass's answer comes terse and easy, with no groping--he's thought about this list: "Mortality. Some measure of morbidity. Some measure of quality of life. Some measure of cost."

That's what outcomes research is all about, and Steinberg sees it as a necessary next step in medicine. In the jungle of new methodology, outcomes research could help doctors sort out "the comparative benefits and weaknesses of ways of managing a particular category of patient."

As he goes on, his words start tumbling out, losing their measured cadence. "And that's very exciting. There's tremendous power in that." Advocates of clinical trials, he says, "would say there's no power in that because you haven't controlled for all the differences between the patients. And my response is, yes, there's trade-offs. We're not controlling for a thousand other things. But I think there is more to be gained than to be lost. And I don't think inaction is an option."

Medicine has no choice, says Steinberg. "Meaning, payers are gonna insist that different providers provide them feedback on their outcomes." He urges that physicians be "pro-active--because it's gonna be done. So step up to the plate, and let's do it."

Elise Hancock is the magazine's senior editor.

Send EMail to Johns Hopkins Magazine

Return to table of contents.