Context Graph
Academe connects your drafts, notes, citations, and imported papers to the scholarly corpus. The research agent uses those links when it answers a question, suggests a source, or helps revise a claim.
At a glance
What the graph does in the product
Keeps project context together
Draft paragraphs, notes, citations, comments, and imported files can be retrieved as one connected workspace.
Connects to nearby literature
The agent can move from your project into related works, methods, datasets, and claims in the scholarly corpus.
Shows evidence paths
Answers can point back to the paper, passage, note, or draft section that shaped the response.
Respects project boundaries
Project data stays scoped to your account. Isolation controls let a project avoid global-corpus retrieval when needed.
Why a graph
Research isn’t a bag of keywords. A single paper connects to dozens of other ideas: the authors it builds on, the datasets it uses, the methods it shares with nearby fields, and the claims it supports or challenges. A graph captures those relationships directly.
Keyword search is a function of strings. Graph search is a function of structure: who cites whom, which methods appear in which fields, where ideas travel. Once your project sits inside that structure, questions can start from the work you already have instead of from a blank global lookup.
Academe maintains two graphs that work together:
The global corpus graph
Your project graph
Agentic search, gap detection, reviewer-style critique, and inline citations rely on those cross-links. Your project can be compared with nearby papers, methods, and objections instead of treating each query as isolated text.
Scale
The numbers below describe the public scholarly record Academe starts from before we add your project. Most figures grow over time; we publish snapshots as they change.
What lives in the graph (nodes)
A node is anything the graph can reason about. Nodes have a stable ID, a type, canonical metadata, and any embeddings, labels, or classifications Academe has generated for them.
Works
Authors
Institutions
Venues
Concepts
Datasets & software
Claims
Your notes
Your drafts
How nodes are connected (edges)
Edges encode the relationships that make the graph useful for reasoning. Academe preserves the edges we can observe directly (citations, authorship) and infers the ones we can measure reliably (topical similarity, method overlap, contradiction).
- Cites / cited-byReference lists are preserved when available. A citation edge records the citing work, the cited work, the section it appears in, and, when we can extract it, the sentence that justifies the citation.
- Authored-byLinks authors to works. When a paper is retracted or corrected, the edge can carry that status into author and project views.
- Co-authored-withAuthor ↔ author edges weighted by paper count and recency. Powers collaboration suggestions and disambiguates authors with common names.
- Affiliated-withAuthor ↔ institution edges, timestamped so a move from MIT to Stanford is a new edge rather than a replaced one.
- Published-inWork ↔ venue, with issue, volume, and page metadata so citations generated by Academe are bibliographically complete.
- Funded-byWork ↔ funder via grant acknowledgement. Useful for tracking research agendas and for compliance reporting.
- References-dataset / implements-methodExplicit links from papers to the datasets they use and the methods they implement. Makes "find me papers that used this dataset" a one-hop query.
- Topical similarityDense semantic similarity between works (and between works and your notes) via a scientific-text embedding. Similarity edges are thresholded so the graph stays sparse.
- Concept assignmentWork ↔ concept edges with confidence scores, so a paper sits in multiple concept neighbourhoods at once.
- Supports / contradictsClaim ↔ claim edges when Academe’s reviewer pipeline detects an agreement or a disagreement between two extracted claims. Drives contradiction surfacing.
- Version-ofPreprint ↔ published version, or v1 ↔ v2 of the same preprint. Your citations stay consistent as versions replace each other.
How the project graph is built
Your project graph starts empty when you create a new project. The work you add after that becomes a node or an edge, usually both.
Import a PDF
Write a paragraph
Add a citation
Take a note
Branch or version
Run the agent
How the global graph is built
The global graph is assembled from a curated set of primary sources, fused into a single normalised index, and enriched with Academe’s own semantic and claim layers.
- IngestPublisher metadata feeds, preprint servers, OA full-text corpora, dataset registries, institutional repositories, funder APIs, and retraction databases. Each source runs on its own schedule, such as hourly for preprints, daily for journals, and weekly for patents.
- ReconcileA multi-stage matcher reconciles DOI, title, author, venue, and year across sources so the same paper appearing in multiple places collapses to a single node with its known identifiers attached.
- NormaliseAuthor names are resolved against ORCID, institutions against ROR, venues against ISSN registries, and funders against the Crossref Funder Registry. This reduces duplicate records and string-spelling drift.
- EnrichWe add concept labels, a dense embedding, a full-text segmentation (sections, figures, tables, equations), a retraction status, and the list of other works that co-cite or are-cited-with this one.
- Extract claimsA claim-extraction model reads full text (where the licence permits) and emits atomic claim nodes along with their evidence spans. Claims get their own embedding so they can be retrieved independently of the papers they come from.
- EvaluateRetrieval and extraction releases are checked against public scholarly benchmarks and internal regression questions. We track deltas so regressions are visible before release.
Update cadence
The graph is not static. Different node and edge types refresh on different schedules, matched to how fast their source of truth moves.
Quality guardrails
- DeduplicationSame-work collapse across sources using a hybrid deterministic and learned matcher. We track false-merge and false-split rates as part of release quality.
- CanonicalisationRecords have a canonical ID surfaced in the interface, with alternate IDs preserved as redirects where possible. Exported citations keep the best available identifier.
- Retraction-awareRetractions and corrections flow into the graph from retraction sources. Academe flags retracted papers you’ve cited when the signal is available.
- VersionedPreprint-to-published edges are explicit. Your citations can resolve to the current version unless you pin a specific one.
- ProvenanceEnrichments such as concepts, embeddings, claims, and similarities carry model IDs and timestamps. Reproducibility is part of the data model.
- Licence-awareFull-text access is gated by publisher and repository terms. Paywalled full text is only used when your institutional subscription or BYOK credentials grant the right to.
- Bench-testedReleases run against BEIR, SciDocs, TREC-COVID, SciRepEval, plus an internal scholarly QA regression set. Regressions are tracked before release.
How Academe uses the graph at query time
Substantive interactions with Academe, including chat questions, inline suggestions, write-mode edits, citation inserts, and research-digest entries, can use graph traversal.
- AnchorThe agent identifies the nodes in your project graph that are most relevant to the request: the paragraph you are writing, the section of a PDF you are reading, or the claim you are defending.
- ExpandFrom each anchor, the agent walks outward along citation, similarity, and concept edges into the global corpus. Walk depth and breadth scale with the question’s difficulty.
- FilterCandidates are filtered by your project’s constraints, including date range, field, open-access status, language, retraction status, and anything else you have scoped.
- RankA learned reranker combines semantic similarity, citation centrality, topical specificity, recency, and your project’s implicit priorities into a single relevance score.
- CiteThe strongest candidates are returned with provenance: paper, section, and sentence where available. Answers are designed to be auditable.
How this compares to other tools
Many research tools now ship an "AI search" feature. The short version of what makes Academe different: chat-with-PDF tools, semantic search engines, AI literature tools, and bibliographic search usually hand back results that match your query. Academe can reason from your project, connect it to the scholarly corpus, and surface contradictions, themes, and provenance in the same workflow.
Read the full comparison
Privacy model
Your project graph is private by default. Nothing you add flows back into the public corpus graph, and nothing you do influences any other user’s results.
Connected
Isolated
Example traversals
- "Find me the papers I should have cited in §3"Anchor at §3 draft node → expand via concept + similarity edges → filter to works not already in your bibliography → rank by reviewer-likelihood → return five candidates with evidence spans.
- "Has anyone contradicted the claim in my abstract?"Anchor at abstract draft node → extract atomic claim → traverse contradicts edges → return contradicting claims with their sources and the strongest counter-evidence.
- "What methods do adjacent fields use for this problem?"Anchor at your methods section → extract method nodes → walk method ↔ work ↔ concept edges into neighbouring concept clusters → rank by transfer likelihood → return method options with representative papers.
- "Who should I collaborate with on follow-up work?"Anchor at your whole-project embedding → walk to author nodes whose work is most similar → filter by affiliation availability → rank by topical overlap and publication recency → return authors with the papers that placed them in the neighbourhood.
- "Summarise what my project says about X"Anchor at concept X → walk inward into your project graph → gather notes, paragraphs, and claims touching X → return a synthesis with links back to each source.
What it looks like from the outside
You don’t have to think about the graph. You write, read, cite, and chat; the graph updates in the background and quietly makes the product more useful.
While writing
When searching
When the agent takes initiative
When citing
Under the hood
- StorageThe canonical graph lives in a sharded Postgres layout with optimised adjacency tables. Embeddings are stored in a dedicated vector index. Full-text sits in an OpenSearch cluster for BM25 retrieval.
- EmbeddingsWorks, authors, concepts, claims, and your notes are embedded with a scientific-text encoder tuned on scholarly QA pairs. Re-indexing is incremental, so adding a single paper does not trigger a global rebuild.
- Query plannerRetrieval is a hybrid plan: BM25 for literal recall, dense retrieval for semantic recall, graph walks for neighbourhood expansion, and a learned reranker for the final ordering.
- CachingCommon neighbourhoods are cached per project; rarely traversed regions are materialised on demand. Project updates invalidate the relevant cache entries so new work can be picked up promptly.
- ObservabilityEvery graph traversal writes a provenance row you can inspect. Ask "why this paper?" and Academe can show the edges it used.
What the graph is not
- Not a fixed ontology for all researchAcademe does not attempt to encode "all of science" as a fixed ontology. Concepts are soft labels, claims are extracted, and edges carry confidences. The graph is calibrated, not declarative.
- Not a replacement for readingThe graph is a map. It tells you where to look. Reading, interpreting, and judging papers is still your job. Academe helps keep relevant work visible.
- Not opinion-freeOur ranking and claim-extraction models have been trained and tuned by us; they make choices. We publish the benchmark results and edge-extraction accuracy so those choices are visible.
- Not the public internetNews articles, blog posts, social media, and unvetted grey sources are not in the graph unless a scholarly work explicitly references them. The index stays scholarly on purpose.
FAQ
Does Academe see my project data?
Can I use the graph without connecting my project?
How up-to-date is the graph?
What happens when a paper I’ve cited is retracted?
Does the graph work for non-English research?
Can I query the graph directly?
What if I don’t want Academe to ingest a specific paper?
How do you handle confidential pre-publication work?
vs. vector RAG, semantic search, and Google Scholar
A side-by-side breakdown of where Academe is stronger and where other tools still fit.
Global corpus graph
How Academe maps the scholarly corpus so relevant work is easier to find.
Project graph
How Academe tracks your research: what you’ve read, what you’re claiming, what’s still open.
Connecting the two
What becomes possible when the two graphs talk to each other.