Research9 min read

Working on rival hypotheses, not a single guess

The oldest move in the philosophy of science: list two or three explanations that could fit the data before testing one. Sixty-seven years after Platt, still the most reliably under-used habit in working research.

Academe

In 1897, the geologist T.C. Chamberlain published an essay called "The Method of Multiple Working Hypotheses." His complaint was that researchers tend to fall in love with a single explanation and quietly spend the rest of their careers protecting it from evidence. Sixty-seven years later, the molecular biologist John Platt made the same case in "Strong Inference." Platt noticed that the fastest-moving fields in science were the ones where researchers habitually listed two or more hypotheses and designed experiments to discriminate between them.

The advice is old, which is usually a sign that it is still relevant. Some practical notes on how to put it into actual research.

The classic failure mode

An experiment is run. The result is consistent with the favored hypothesis. The hypothesis is treated as supported. The paper is submitted.

The problem: most experimental results are consistent with several hypotheses. If the only comparison was between the favored hypothesis and the null, very little has been learned. A result that says "something is going on" is different in kind from a result that says "option A is going on, not option B."

This failure is common. It shows up in underpowered social science (a significant effect that could be noise, selection bias, or the actual claim). It shows up in machine-learning research (a new architecture performs better, but the authors did not rule out that the gain came from a hyperparameter change). It shows up in applied science (a clinical finding consistent with a mechanism, a confounder, and a reporting artifact, all three).

The move

Before running the experiment, list at least three explanations that could produce the expected result. Write them down. For each, ask: "If this explanation were true, what would I see that is different from what the other explanations predict?"

That last step is the entire trick. The work is not just naming rivals. It is naming the differential predictions. A hypothesis that cannot be differentiated from the others is not yet useful. It is a name for a cluster.

A worked example

Suppose the question is why some graduate students finish their PhDs faster than others. The favored hypothesis: strong early publications predict faster completion.

That is fine, but as it stands the claim is not testable against rivals. Possible alternatives:

Strong early publications predict faster completion. Mechanism: early publications build confidence and committee trust, reducing late-stage friction.
Strong advisor support predicts faster completion. Mechanism: good advising accelerates both early publications and completion, so the correlation between publications and completion is spurious.
Funding stability predicts faster completion. Mechanism: students with stable funding write more, publish more, and finish on time. The correlation is driven by a third variable.
Field effect. Mechanism: some fields publish more and finish faster on average, so the correlation is a composition artifact.

Now the differential predictions:

If (1) is true, the publication-completion correlation should survive controls for advisor quality.
If (2) is true, controlling for advisor quality should shrink the correlation to near zero.
If (3) is true, funding stability should do most of the work.
If (4) is true, within-field correlations should be much weaker than the pooled correlation.

Now there is a research design instead of a hunch.

Why this stays rare in practice

Listing rivals is cognitively expensive. It requires holding multiple stories in mind simultaneously and treating each as potentially true. The default human mode is confirmatory: generate one story, collect evidence, update only weakly when the evidence could go either way. Naming the bias does not fix it. What helps is externalizing the move.

Three habits that force it:

1. The three-explanations exercise before any experiment

Before finalizing a design, write down three explanations for the expected result. If only one can be generated, the design is not ready. A collaborator's first useful contribution is often a rival hypothesis the lead author had not considered.

2. A pre-registered prediction for each hypothesis

Pre-registration is not only about stopping p-hacking. It is a forcing function for the differential-predictions step. Stating in advance "under hypothesis A I expect outcome X, under hypothesis B I expect outcome Y" surfaces the cases where A and B predict the same thing. When they do, the experiment cannot distinguish them and the design needs more work.

3. A devil's-advocate pass on one's own work

After drafting an analysis, spend 20 minutes trying to explain the result without the favorite hypothesis. Which confounder could produce it? Which selection effect? Which reporting artifact? If nothing plausible surfaces, the result is genuinely strong. If several alternatives surface, the next experiment has a target list.

What the move looks like across fields

The pattern is field-agnostic, but the rivals change:

Machine learning. The new model beats a baseline. Rivals: data leakage, evaluation artifact, hyperparameter tuning, compute asymmetry, seed variance.
Social science. A correlation appears. Rivals: selection into the sample, asymmetric measurement error, reverse causation, omitted variables, aggregation artifact.
Biology. An intervention produces an effect. Rivals: off-target effect of the intervention, secondary pathway, handling or placebo effect, drift in the model organism.
History. A primary source suggests a causal claim. Rivals: selection of surviving sources, author bias, retrospective reframing, translation artifact.

The structural point is constant: the favored explanation is one of several, and the experimental, archival, or analytical design has to do real work to isolate it.

A small process change

A single habit, repeated, picks up most of the gains. In each active project's workspace, keep a running section titled "What else could explain this?" Drop a bullet into it every time a result lands. Keep it pedestrian: "maybe the effect is just seasonality," "maybe the improvement came from the preprocessor, not the model." After a month, the section is a library of ruled-out and un-ruled-out alternatives. The un-ruled-out ones become the next experiments.

The researchers who move fastest are not the ones who think of better ideas. They are the ones who notice rival ideas earlier and design the next experiment to discriminate. Chamberlain and Platt were right. Pass it on.