Research8 min read

Reading across vocabularies: how relevant papers stay invisible

The most interesting papers in a field are often the ones nobody has cited yet, not because they are hidden, but because they live under a different vocabulary in a neighboring field.

Academe

The scientific literature contains findings that nobody has noticed yet. Not because they are secret, but because the people who would care about them work in different fields, use different vocabulary, and read different journals. A paper in neuroscience might quietly resolve a question that has been open in philosophy for 30 years. A dataset sitting in economics might settle a long-running debate in education policy. The evidence is in the open. The connection has not been drawn.

The phenomenon has a name. It is called literature-based discovery, and the term traces back to the 1980s work of the information scientist Don Swanson. Swanson's canonical example was finding a published-but-unrecognized link between dietary fish oil and Raynaud's syndrome by connecting two literatures that did not cite each other: one on blood viscosity, one on fatty acids.

Forty years later, the search tooling has changed. What used to take a year of specialist work can be done in an afternoon. Some notes on how.

Why useful papers stay hidden

A paper does not need to be secret to be invisible. It only needs to live in a neighborhood that the right readers do not enter. The main mechanisms:

Vocabulary mismatch. The same phenomenon has different technical names in different fields. A biologist's "substrate" is a chemist's "reactant." Searchers for one rarely surface the other.
Venue silos. Physicists do not read PNAS. Biologists do not read Physical Review Letters. Cross-venue work exists; it does not get picked up by the default reading habits of either group.
Citation circles. A paper's citations tend to stay within the neighborhood the paper came from. Low early citations mean low recommender-system weight, which means even readers who would care do not see it.
Time lag. A paper that answers a question the field has not yet asked is not wrong, and not ignored. It is early. Its moment arrives later, if at all.

Papers at the intersection of these effects are where literature-based discovery tends to find purchase. A forgotten 1978 paper on mitochondrial stress responses, cited by almost nobody, might be the missing input for current aging research.

Four practical moves

1. Read across vocabularies, not within them

When a claim looks interesting, try to express it in the technical language of two adjacent fields. "Attention collapse in transformers" becomes "representational bottleneck in hierarchical encoders" becomes "degeneracy in feature-selection layers." Run each phrasing as a separate search. Most results will be familiar. The few that are not are usually the interesting ones. Keep them.

Semantic search, which compares meaning rather than keyword overlap, makes this faster than it was. A workspace asked for "papers on attention collapse in transformers" will surface papers about the same phenomenon that do not use the phrase.

2. Follow citations backward, not just forward

Forward citation chains return papers that built on this one. To find overlooked predecessors, go backward. Pick a recent paper worth admiring. Read its reference list in full. For each reference, read its reference list. Two hops deep, the literature is often from the 1970s or 1980s, with papers the field has nearly forgotten. Some are dated. Some are exactly the conceptual move the field is currently trying to rederive.

3. Ask what the paper does not cite

This is the move that separates experienced researchers from junior ones. For every paper read closely, ask: "What relevant work is not cited here?" Three categories are common.

The author does not know it exists. Usually because it lives in a neighboring field.
The author knows but treats it as outdated. Sometimes correctly. Often "outdated" is shorthand for "inconvenient."
The author knows but disagrees. Now the disagreement is the next reading.

In each case, the missing citation is a pointer to a paper that may be worth reading. The pattern of non-citations across multiple papers in the field is a rough map of the field's blind spots.

4. Use the unfamiliar names

Every field has papers with strong results whose authors are not famous, whose institutions are not central, or whose venue is not Nature or Science. These papers tend to be systematically under-read relative to their content. A practical scan: take ten recent high-impact papers in the subfield; look at who they cite that is relatively obscure; read those obscure papers. Half will be padding. The other half will often contain an observation the famous paper did not have space to emphasize.

What AI-assisted search actually changes

Literature-based discovery used to be a specialist skill because the search tooling was primitive. The work required expert-level familiarity with multiple fields' controlled vocabularies (MeSH headings, PsycINFO descriptors, ACM CCS) plus enough time to chase citations through physical libraries. Swanson, Weeber, and the early bibliometricians spent most of a career developing the taste.

Two things have shifted:

Semantic search finds papers that describe the same phenomenon in different languages. Knowledge of the controlled vocabulary is no longer a prerequisite.
Personal-corpus reasoning allows targeted questions across a library of thousands of papers. "Which of my saved papers has ever mentioned a mechanism that could explain the result in Chen et al. 2023?" is the kind of query that used to be impractical and is now a one-liner.

What has not shifted:

Good questions still matter. Literature-based discovery on a vague prompt returns vague papers. The sharper the question, the more surprising the match can be.
Reading still matters. Finding a candidate paper is 5 percent of the work; evaluating whether it actually resolves the question is the other 95.
Translation still matters. A paper from an adjacent field uses conventions the reader does not know. Translating its claims into the reader's field is where the research contribution lives.

A weekly habit

Once a week, for 20 minutes, search the active research question in two or three adjacent fields' vocabulary. Maintain a single running file titled "Interesting, from elsewhere." Most entries will go nowhere. A few will become the best-cited papers in a career.

The papers everyone missed are still in the literature. They are waiting for someone to read across the vocabulary they were written in.