Skip to docs content
Feature

Extract data

Turn a folder of PDFs into a structured table. Define the columns you need, let Academe fill them from the full text, and click any cell to open the supporting passage in the original PDF.

When to use it

Data extraction is the right tool when you already have a set of papers and need to compare them on a fixed set of dimensions: methodology, sample size, effect size, outcomes measured, risk-of-bias items, or any custom schema. It is faster than reading each paper end to end and more auditable than copy-pasting into a spreadsheet, because every cell carries the model’s supporting quote alongside the value.

Defining columns

Start from a blank table or one of the built-in presets (PICO, study characteristics, risk-of-bias, intervention details). Each column is a typed extraction instruction:

  • Text: a paragraph or sentence, verbatim where possible.
  • Number: parsed into a numeric cell for sorting and filtering.
  • One-of: one of a fixed set of labels you supply (e.g. RCT / cohort / case-control).
  • List: multiple values separated by semicolons when the paper reports several.
  • Yes / No: a boolean with a supporting quote attached.

Column types steer the model’s output: a Number column gets a numeric-only instruction, a One-of column gets the allowed labels list, and Yes / No columns are coerced to canonical “Yes” / “No” / “N/A”.

Running extraction

Select the papers you want extracted from the project library, a folder, or a bulk selection from the file tree. Academe processes each paper in parallel, runs one focused model call per column (one cell at a time so the model’s attention is on a single field), and fills the table as results stream back. A progress indicator shows which papers are done.

Every cell is click-through: clicking it opens the source PDF in a tab and scrolls to the supporting passage, with the rough page resolved by searching the model’s quote against the PDF text. A book icon next to the value signals that a source quote is stored.

The model gives you a quote, not just a value
Cells store both the value and the verbatim supporting passage from the paper. That quote is what powers the cell-click jump and the Excel cell comments.

Editing cells

Click the pencil icon next to any value to edit it. Edits keep the original model’s supporting quote and recorded page, so click-to-source keeps working after a manual correction. Each row records who last edited it and when, shown as an “edited 3m ago” badge under the paper name.

Exporting

Tables export three ways:

  • CSV for spreadsheets and stats tools.
  • Excel with each cell’s supporting quote attached as a cell comment, so reviewers reading the spreadsheet can audit each value back to the source passage.
  • Markdown as a Pandoc-friendly table with optional bracketed citation keys, ready to drop into a manuscript appendix.
Keep going