Skip to docs content
Core Platform

Context Graph

Academe connects your drafts, notes, citations, and imported papers to the scholarly corpus. The research agent uses those links when it answers a question, suggests a source, or helps revise a claim.

At a glance

What the graph does in the product

Keeps project context together

Draft paragraphs, notes, citations, comments, and imported files can be retrieved as one connected workspace.

Connects to nearby literature

The agent can move from your project into related works, methods, datasets, and claims in the scholarly corpus.

Shows evidence paths

Answers can point back to the paper, passage, note, or draft section that shaped the response.

Respects project boundaries

Project data stays scoped to your account. Isolation controls let a project avoid global-corpus retrieval when needed.

Why a graph

Research isn’t a bag of keywords. A single paper connects to dozens of other ideas: the authors it builds on, the datasets it uses, the methods it shares with nearby fields, and the claims it supports or challenges. A graph captures those relationships directly.

Keyword search is a function of strings. Graph search is a function of structure: who cites whom, which methods appear in which fields, where ideas travel. Once your project sits inside that structure, questions can start from the work you already have instead of from a blank global lookup.

Academe maintains two graphs that work together:

The global corpus graph

Scholarly works, authors, venues, methods, concepts, and datasets we can legally index, linked by citation, co-authorship, topical similarity, and shared data.

Your project graph

The papers you’ve imported, the notes you’ve written, the claims you’re defending, the questions you’re still answering, and how they connect. Private by default, with optional retrieval against the global corpus.

Agentic search, gap detection, reviewer-style critique, and inline citations rely on those cross-links. Your project can be compared with nearby papers, methods, and objections instead of treating each query as isolated text.

Scale

The numbers below describe the public scholarly record Academe starts from before we add your project. Most figures grow over time; we publish snapshots as they change.

400M+
Scholarly works
85M+
Authors
52k+
Venues
130k+
Institutions
65k
Concepts
3B+
Citation edges
12M+
Datasets
32k+
Funders
What lives in the graph (nodes)

A node is anything the graph can reason about. Nodes have a stable ID, a type, canonical metadata, and any embeddings, labels, or classifications Academe has generated for them.

Works

Peer-reviewed papers, preprints, books, chapters, theses, conference papers, standards, reports, patents, datasets, and software. Each work carries the identifiers we can reconcile across scholarly metadata systems.

Authors

Normalised to ORCID where possible. Includes affiliations through time, co-authorship history, and a person-level embedding so you can search by research area and authorship patterns, not just a name string.

Institutions

Universities, labs, hospitals, companies, and government agencies, all normalised to ROR IDs. Captures parent/child relationships (a department inside a university, a lab inside a department).

Venues

Journals, conferences, repositories, preprint servers. Carries ISSN, publisher, open-access policy, impact metrics, and the field classifications the venue covers.

Concepts

A 65,000-concept taxonomy, plus domain-specific ontologies for fields such as math, computer science, and physics. Concepts have hierarchy and aliases.

Datasets & software

DOI-identified datasets (Zenodo, Dryad, OSF, figshare) and code repositories (GitHub with Zenodo DOIs, CodeOcean capsules) cross-linked to the publications that describe them.

Claims

Atomic factual statements extracted from papers, linked back to evidence spans where available. Claims support reviewer-style critique and contradiction detection.

Your notes

Every markdown note, margin comment, and highlight in your project becomes a node. They inherit the permissions of the project they belong to.

Your drafts

Paragraphs in your LaTeX or markdown drafts are indexed as nodes while you write, so the agent knows what you are claiming in real time.
How nodes are connected (edges)

Edges encode the relationships that make the graph useful for reasoning. Academe preserves the edges we can observe directly (citations, authorship) and infers the ones we can measure reliably (topical similarity, method overlap, contradiction).

  • Cites / cited-byReference lists are preserved when available. A citation edge records the citing work, the cited work, the section it appears in, and, when we can extract it, the sentence that justifies the citation.
  • Authored-byLinks authors to works. When a paper is retracted or corrected, the edge can carry that status into author and project views.
  • Co-authored-withAuthor ↔ author edges weighted by paper count and recency. Powers collaboration suggestions and disambiguates authors with common names.
  • Affiliated-withAuthor ↔ institution edges, timestamped so a move from MIT to Stanford is a new edge rather than a replaced one.
  • Published-inWork ↔ venue, with issue, volume, and page metadata so citations generated by Academe are bibliographically complete.
  • Funded-byWork ↔ funder via grant acknowledgement. Useful for tracking research agendas and for compliance reporting.
  • References-dataset / implements-methodExplicit links from papers to the datasets they use and the methods they implement. Makes "find me papers that used this dataset" a one-hop query.
  • Topical similarityDense semantic similarity between works (and between works and your notes) via a scientific-text embedding. Similarity edges are thresholded so the graph stays sparse.
  • Concept assignmentWork ↔ concept edges with confidence scores, so a paper sits in multiple concept neighbourhoods at once.
  • Supports / contradictsClaim ↔ claim edges when Academe’s reviewer pipeline detects an agreement or a disagreement between two extracted claims. Drives contradiction surfacing.
  • Version-ofPreprint ↔ published version, or v1 ↔ v2 of the same preprint. Your citations stay consistent as versions replace each other.

How the project graph is built

Your project graph starts empty when you create a new project. The work you add after that becomes a node or an edge, usually both.

Import a PDF

Academe extracts text, metadata, references, figures, and tables. The work joins the graph and is auto-reconciled with its global-corpus counterpart if it already exists there.

Write a paragraph

Each paragraph is streamed into the graph as a draft node and re-embedded after significant edits. The agent can compare what you are writing against relevant literature while you work.

Add a citation

A citation in your draft creates an edge into either the global corpus or your project library, or both if you’ve imported the paper.

Take a note

Notes attach to whatever you are reading: a PDF page, a paragraph range, a figure. That anchor becomes an edge, so the note resurfaces when you revisit the spot.

Branch or version

Creating a version snapshot duplicates the project graph state. You can diff two snapshots to see which nodes and edges changed.

Run the agent

Agentic actions create provenance edges. Generated suggestions are linked to the evidence they drew on, so you can audit claims back to their source.
How the global graph is built

The global graph is assembled from a curated set of primary sources, fused into a single normalised index, and enriched with Academe’s own semantic and claim layers.

  • IngestPublisher metadata feeds, preprint servers, OA full-text corpora, dataset registries, institutional repositories, funder APIs, and retraction databases. Each source runs on its own schedule, such as hourly for preprints, daily for journals, and weekly for patents.
  • ReconcileA multi-stage matcher reconciles DOI, title, author, venue, and year across sources so the same paper appearing in multiple places collapses to a single node with its known identifiers attached.
  • NormaliseAuthor names are resolved against ORCID, institutions against ROR, venues against ISSN registries, and funders against the Crossref Funder Registry. This reduces duplicate records and string-spelling drift.
  • EnrichWe add concept labels, a dense embedding, a full-text segmentation (sections, figures, tables, equations), a retraction status, and the list of other works that co-cite or are-cited-with this one.
  • Extract claimsA claim-extraction model reads full text (where the licence permits) and emits atomic claim nodes along with their evidence spans. Claims get their own embedding so they can be retrieved independently of the papers they come from.
  • EvaluateRetrieval and extraction releases are checked against public scholarly benchmarks and internal regression questions. We track deltas so regressions are visible before release.

Update cadence

The graph is not static. Different node and edge types refresh on different schedules, matched to how fast their source of truth moves.

Hourly
Preprint servers
arXiv, bioRxiv, medRxiv, ChemRxiv, OSF
Daily
Published journal articles
Crossref + publisher feeds
Daily
Citation graph
Cited-by, references, bibliographic coupling
Daily
Retractions
Retraction Watch + publisher notices
Daily
Author & institution IDs
ORCID, ROR, Crossref funders
Weekly
Full-text ingestion
Where the licence permits
Weekly
Patent linkage
USPTO, EPO, WIPO ↔ publications
Live
Your project graph
Updates after meaningful project edits
Quality guardrails
  • DeduplicationSame-work collapse across sources using a hybrid deterministic and learned matcher. We track false-merge and false-split rates as part of release quality.
  • CanonicalisationRecords have a canonical ID surfaced in the interface, with alternate IDs preserved as redirects where possible. Exported citations keep the best available identifier.
  • Retraction-awareRetractions and corrections flow into the graph from retraction sources. Academe flags retracted papers you’ve cited when the signal is available.
  • VersionedPreprint-to-published edges are explicit. Your citations can resolve to the current version unless you pin a specific one.
  • ProvenanceEnrichments such as concepts, embeddings, claims, and similarities carry model IDs and timestamps. Reproducibility is part of the data model.
  • Licence-awareFull-text access is gated by publisher and repository terms. Paywalled full text is only used when your institutional subscription or BYOK credentials grant the right to.
  • Bench-testedReleases run against BEIR, SciDocs, TREC-COVID, SciRepEval, plus an internal scholarly QA regression set. Regressions are tracked before release.

How Academe uses the graph at query time

Substantive interactions with Academe, including chat questions, inline suggestions, write-mode edits, citation inserts, and research-digest entries, can use graph traversal.

  • AnchorThe agent identifies the nodes in your project graph that are most relevant to the request: the paragraph you are writing, the section of a PDF you are reading, or the claim you are defending.
  • ExpandFrom each anchor, the agent walks outward along citation, similarity, and concept edges into the global corpus. Walk depth and breadth scale with the question’s difficulty.
  • FilterCandidates are filtered by your project’s constraints, including date range, field, open-access status, language, retraction status, and anything else you have scoped.
  • RankA learned reranker combines semantic similarity, citation centrality, topical specificity, recency, and your project’s implicit priorities into a single relevance score.
  • CiteThe strongest candidates are returned with provenance: paper, section, and sentence where available. Answers are designed to be auditable.
Why this matters
Because retrieval starts from your project, the agent is less likely to default to the same popular papers. Two users asking the same question from different projects can get different, project-specific answers without crafting elaborate prompts.

How this compares to other tools

Many research tools now ship an "AI search" feature. The short version of what makes Academe different: chat-with-PDF tools, semantic search engines, AI literature tools, and bibliographic search usually hand back results that match your query. Academe can reason from your project, connect it to the scholarly corpus, and surface contradictions, themes, and provenance in the same workflow.

Read the full comparison

A capability-by-capability breakdown across five tool classes: Google Scholar, chat-with-PDF, semantic search, AI literature tools, and Academe.

Privacy model

Your project graph is private by default. Nothing you add flows back into the public corpus graph, and nothing you do influences any other user’s results.

Connected

The default. The global corpus is visible to the agent as it reasons about your project, but your project data stays private to your account. Agentic search and contradiction detection require this mode.

Isolated

Per-project switch. The agent reasons only against the papers and notes you have imported into that project. The global corpus is excluded from retrieval entirely. Useful for confidential work or pre-publication drafts.
What isolation actually does
Isolation is scoped to retrieval. Academe still stores your project data on our infrastructure under row-level security. Isolation stops mixing that data with the global graph at query time. For institution-specific deployments, talk to us about tenant-specific graph options.
Example traversals
  • "Find me the papers I should have cited in §3"Anchor at §3 draft node → expand via concept + similarity edges → filter to works not already in your bibliography → rank by reviewer-likelihood → return five candidates with evidence spans.
  • "Has anyone contradicted the claim in my abstract?"Anchor at abstract draft node → extract atomic claim → traverse contradicts edges → return contradicting claims with their sources and the strongest counter-evidence.
  • "What methods do adjacent fields use for this problem?"Anchor at your methods section → extract method nodes → walk method ↔ work ↔ concept edges into neighbouring concept clusters → rank by transfer likelihood → return method options with representative papers.
  • "Who should I collaborate with on follow-up work?"Anchor at your whole-project embedding → walk to author nodes whose work is most similar → filter by affiliation availability → rank by topical overlap and publication recency → return authors with the papers that placed them in the neighbourhood.
  • "Summarise what my project says about X"Anchor at concept X → walk inward into your project graph → gather notes, paragraphs, and claims touching X → return a synthesis with links back to each source.
What it looks like from the outside

You don’t have to think about the graph. You write, read, cite, and chat; the graph updates in the background and quietly makes the product more useful.

While writing

Autocomplete suggestions, citation nudges, and inline counter-evidence all come from graph traversals off the paragraph you are on.

When searching

Ranking is shaped by your project graph, not just the raw query text. Two identical queries from different projects get different answers.

When the agent takes initiative

The weekly digest, red-team reviews, and proactive gap suggestions are all scheduled graph walks.

When citing

Insert a citation and Academe resolves identifiers across the namespaces it knows, so bibliography exports need less manual fixing.
Under the hood
  • StorageThe canonical graph lives in a sharded Postgres layout with optimised adjacency tables. Embeddings are stored in a dedicated vector index. Full-text sits in an OpenSearch cluster for BM25 retrieval.
  • EmbeddingsWorks, authors, concepts, claims, and your notes are embedded with a scientific-text encoder tuned on scholarly QA pairs. Re-indexing is incremental, so adding a single paper does not trigger a global rebuild.
  • Query plannerRetrieval is a hybrid plan: BM25 for literal recall, dense retrieval for semantic recall, graph walks for neighbourhood expansion, and a learned reranker for the final ordering.
  • CachingCommon neighbourhoods are cached per project; rarely traversed regions are materialised on demand. Project updates invalidate the relevant cache entries so new work can be picked up promptly.
  • ObservabilityEvery graph traversal writes a provenance row you can inspect. Ask "why this paper?" and Academe can show the edges it used.
What the graph is not
  • Not a fixed ontology for all researchAcademe does not attempt to encode "all of science" as a fixed ontology. Concepts are soft labels, claims are extracted, and edges carry confidences. The graph is calibrated, not declarative.
  • Not a replacement for readingThe graph is a map. It tells you where to look. Reading, interpreting, and judging papers is still your job. Academe helps keep relevant work visible.
  • Not opinion-freeOur ranking and claim-extraction models have been trained and tuned by us; they make choices. We publish the benchmark results and edge-extraction accuracy so those choices are visible.
  • Not the public internetNews articles, blog posts, social media, and unvetted grey sources are not in the graph unless a scholarly work explicitly references them. The index stays scholarly on purpose.

FAQ

Does Academe see my project data?
Your project data is stored on Academe infrastructure under row-level security scoped to your account. It is used to power the agent for your project only, and never joins the public corpus or informs another user’s results. You can delete a project at any time, and deletion is propagated through the graph layer as part of the project cleanup flow.
Can I use the graph without connecting my project?
Yes. An isolated project reasons only against the works you import. You keep the agent, the writing tools, and the workspace experience, but agentic search against the global corpus stays off. You can flip a project from isolated to connected at any time, and back.
How up-to-date is the graph?
New preprints are refreshed frequently. New journal articles are indexed after they are released through Crossref and publisher feeds. Retractions propagate from retraction sources. Your project graph updates after meaningful project edits.
What happens when a paper I’ve cited is retracted?
When a retraction signal lands in the graph, Academe surfaces retracted or corrected citations so you can decide what to do. Export formats include retraction status where the style guide allows it.
Does the graph work for non-English research?
Yes. The index covers 65+ languages. Embeddings use a multilingual scientific encoder so a Japanese paper and an English paper on the same topic land near each other in the semantic neighbourhood. Translations are provided when available, and original-language titles are preserved.
Can I query the graph directly?
Most users do not need to query the graph directly. Chat, search, and inline suggestions are graph queries under the hood. For power users, there is a project-scoped query API on paid plans. Cypher-like syntax, rate-limited, read-only, scoped to your project and the public corpus.
What if I don’t want Academe to ingest a specific paper?
Per-document exclusions are available on projects. Excluded works are dropped from retrieval for that project and are not re-surfaced by agentic search. You can also exclude whole venues or authors if you need to.
How do you handle confidential pre-publication work?
Isolate the project. Your drafts, notes, and imports stay in your tenant and are not compared against the global corpus. For customer-managed deployments, talk to us about tenant-specific graph options.
Explore the graph