How to REALLY Answer a Question: Designing a Study from Scratch

SUPPOSE that you are interested in answering a simple question: how effective is aspirin at relieving headaches? If you want to have conviction in the answer, you’ll need to think surprisingly carefully about your methods. Your first idea might simply be to take aspirin the next time you get a headache, and see if the headache goes away. But upon reflection, things won’t be quite so easy.

First of all, since all headaches go away eventually, whether yours goes away or not isn’t really the relevant question. It would be better to ask how quickly the headache goes away. But even this question is not necessarily good enough, because even if aspirin doesn’t relieve a headache fully, a significant reduction in severity is still worthwhile.

In light of this, you may decide that when you next notice having a headache, you’ll make a record of how you feel every half hour for the next two hours. You’ll write things like “dull, throbbing pain of low intensity” or “sharp, searing pain over one eye”.

Unfortunately, just examining how you feel after taking aspirin a single time probably won’t be adequate, since the aspirin may be more helpful some times and less helpful other times. For example, perhaps it works on moderate headaches but not on really severe ones, so if your next headache happened to be really severe, it would look like aspirin was useless. To solve this problem and give yourself more data, you might resolve to make these records of how you’re feeling for each of the next 20 headaches you get.

There is still a problem though, because these subjective descriptions of headaches are difficult to compare to each other. It you take aspirin and your headache goes from a sharp pain over one eye to an intense ache over the entire head, have you made things better or worse? It would be difficult to aggregate the information from these varied descriptions over 20 different headaches to make a final assessment of how well aspirin is working.

Analysis would be a lot easier if you scored how unpleasant each headache was on a simple scale from 1 to 5 (1 meaning slight unpleasantness, 3, moderate unpleasantness, and 5, extreme unpleasantness). That way, you can simply look at all the scores you got just before taking aspirin and average them together. You can then compare this to the average of the scores 30 minutes after taking the aspirin and 60 minutes after taking it. That way, you can see if the amount of headache unpleasantness you feel really does drop substantially after taking aspirin.

You are interested in determining how effective aspirin is at relieving headaches, but all you’ve done so far is measure how good it is at relieving your own headaches. Perhaps you are more or less sensitive to aspirin than other people, or perhaps your headaches are more severe and harder to treat than most other people’s. To solve this problem, you enlist 20 people who are frequent headache sufferers. You get them to agree that, over the next 6 months, any time they begin to notice that they have a headache they will record how they feel on your 1 to 5 unpleasantness scale. They will then take aspirin and record how they feel again 30 and 60 minutes later.

But what if people take different doses? You might think that the aspirin isn’t working for some of them, but it’s only because they haven’t taken enough. To fix this problem, you hand them each an identical bottle of pills and tell them to take two whenever they get a headache (the maximum recommended dose). This also has the added benefit that everyone will be taking the same exact brand. That way if you find out that the aspirin really does work, other people can try to replicate your results by using the same brand that you did. On second thought, you also provide everyone with a timer that measures 30 minute intervals, to help them be more accurate about making records of their pain at almost exactly 30 and 60 minutes.

There is still a problem though. You know that headaches often become less severe within an hour or so even when you don’t take aspirin. That means that even if someone’s pain score tends to have fallen 60 minutes after taking the aspirin, you don’t really know whether it is the aspirin that caused the reduction in pain or if the reduction would have occurred regardless. To remedy this, you come up with the idea of having only half of the people take aspirin when they have a headache, though everyone will still keep a record of their headache’s progression. Then, to see how well the aspirin worked, you can compare the average 30 minute and 60 minute scores of the 10 people who took the aspirin each time with the scores of the 10 people who didn’t take any aspirin. If the aspirin group’s pain fell a lot more than the non-aspirin group, then the aspirin probably was the cause.

But is it possible that the pain levels people record could be influenced by the act of taking a pill, independent of the chemical effect of the active ingredients? For example, what if, because they expect the aspirin to work, the people in the group taking the pills are more aware of signs of improvement? In that case, the aspirin would seem to work better than it really does. Or perhaps people’s expectations of improving could even influence how much pain they experience. Fortunately, these problems are easily remedied. Instead of giving half the group no aspirin, you instead give them pills in an aspirin bottle that look just like aspirin, but which have no effect on headaches. Sugar pills are a reasonable choice, because pretty much everyone has sugar in their diet anyway, and small amounts of it (like the amount in two little pills) won’t have any noticeable effect on a person.

This raises ethical considerations, however. You got people to agree to take aspirin, not to take sugar pills. That means that beforehand you’ll need to inform everyone that they might be getting aspirin, but they also might be getting sugar pills instead. You can’t let them know which they are getting during the six months that they are recording their results, but afterwards, you can let them know which they were on. You’ll also have to get them to agree to not take any other headache medication during the experiment, and to record any medication that they do happen to take, or else it might throw off the results.

So half of your group will be taking aspirin, and the other half will get sugar pills. But who should get which? If, for example, the 10 people getting the aspirin have headaches that naturally (without treatment) last much longer than those of the 10 people that aren’t getting anything, then the aspirin may seem less effective than it really is.  Hence, you don’t want there to be any substantial difference between the two groups. A simple way to help ensure this is to assign individuals to the two groups (i.e. the aspirin group or the sugar pill group) at random. It is even better if someone else does the randomization (secretly recording which person is assigned to which type of pill). That way when you talk to the subjects about the experiment, there is no chance that you accidently tip them off  (with body language, or otherwise) to which type of pill they are getting. Furthermore, when you analyze the final results, you won’t have any temptation (subconscious or otherwise) to make the data come out a particular way (since you won’t know until you are done which subject was taking the aspirin and which was taking the sugar pill).

Unfortunately, if you carried out this experiment multiple times, you should expect to get slightly different results. After all, the people you would be able to recruit might be different and so might respond differently to the medicine. What’s more, even if you used the same people each time, the intensity of their headaches might vary from one 6 month period to the next, which could also influence how the results turn out.

But, if the results fluctuate randomly, that implies that sometimes, just by luck alone, the aspirin might seem to be effective even if it is not. Likewise, it might seem to be ineffective, even if it does work. So whatever it is that your experiment shows, how can you be sure you are really getting the right answer? Well, since chance is involved, total certainty is not possible. But a statistician could easily calculate for you the probability that you would get results (in favor of aspirin working) that are at least as strong as the ones that you got, if in fact aspirin is no more effective than the sugar pill. If this probability is large, then based on your experiment you do not have sufficient evidence to conclude that aspirin is an effective treatment for headaches. If this probability is small (say, less than 5%), then aspirin very likely is effective. In order to increase the likelihood that the results of your test are conclusive, you would need only to increase the number of participants involved.

Hence, we see that in order to answer questions with a high degree of certainty, a well thought out methodology is necessary. Most elements of good study design become obvious when we reflect logically on the ways that data may mislead us. Whenever possible, experiments should be double blind with a placebo control. They should have large sample sizes, standardized dosages, a standardized (and predetermined) method for outcome measurement, and careful statistical analysis. Without all of these in place, experiments simply cannot be trusted (for reasons that become obvious, with just a little thought).

This entry was posted in Science, Truth. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>