Skip to main content
The Reading Garden ontology is a small, opinionated slice of the knowledge graph — eight node types and nine edges — designed for one job: turn raw highlights into atomic, queryable, idempotently-publishable Concept pages plus the navigable evidence around them (the Sources and Highlights they came from), without forcing every reader to commit to a single PKM methodology. Since v0.5.2 the published artefact is a three-database mirror in Notion (Sources / Highlights / Concepts, linked via RELATION columns), not a flat dump of Concept pages — see the “Publishing layer” section below. The slice lives at internal/schema/embedfs/graph-schema/knowledge-garden/ in the fracta repo (see internal/schema/embedfs/graph-schema/knowledge-garden for the loader-validated source of truth) and is resolved at runtime as embed://graph-schema/knowledge-garden. It plugs into the broader fracta schema via the core/DomainSource cross-family edge, so Reading Garden nodes show up in the same 4-tier resolution chain (DomainSource -> DataStore -> MCPServer -> MCPTool) as every other family.

The shape

Every later section reads against this diagram. The two anchor node types are Highlight (the raw unit of capture) and Concept (the atomic, publishable unit of distilled thought). Everything else exists to connect those two through the CODE flow (Capture -> Organize -> Distill -> Express).

Layers

The schema family follows fracta’s two-layer convention:
  • Universal layer (nodes/): one node — Topic. Universal nodes are user-curated and re-used across particulars.
  • Particular layer (particulars/): seven nodes — Highlight, Document, Concept, Entity, Claim, Question, Publication. Particulars are discovered by strategies and tied to a writer-of-record.
The loader enforces layer membership; you cannot register a particular node under nodes/.

Nodes

Topic

A user-curated theme that groups Concepts and Questions. The only scaffold-authority node in this family — populated by the user (or a future curation strategy), never by the ingest pipeline.
PropertyTypeRequiredSourceDescription
namestringyesuserUnique identifier — the topic name.
descriptionstringnouserOne-line description of the theme.

Highlight

A single highlight pulled from a source — typically a Kindle/article highlight in Readwise. The atomic unit of capture.
PropertyTypeRequiredSourceDescription
idstringyesreadwise_list_highlightsStable identifier, prefixed readwise:<id>.
textstringyesreadwise_list_highlightsThe highlight body.
notestringnoreadwise_list_highlightsThe Readwise highlight-note field (your own annotation, if any). Extractors scan this too.
book_idstringnoreadwise_list_highlightsForeign key to the source Document.
locationstringnoreadwise_list_highlightsPosition in the source (page, percent, timestamp).
tagsstringnoreadwise_list_highlightsComma-joined Readwise tags.
highlighted_atstringnoreadwise_list_highlightsISO-8601 timestamp of the original highlight.
captured_atstringyeshighlight_distillISO-8601 timestamp of when fracta ingested it.
Populated by highlight-distill. Every Highlight must be wired to a DomainSource via CAPTURED_FROM (enforced by checkpoint highlight_missing_captured_from).
Denormalised book fields. Since v0.5.2 the Readwise binding pulls a set of book_* fields (book_title, book_author, book_category, book_source, book_source_url, book_cover_image_url, book_document_note) alongside each highlight via the explicit response_fields argument. These are NOT stored on the Highlight node — they’re used only to MERGE the parent Document node (one per book_id). Resolve denormalised metadata by walking Highlight -[:PART_OF]-> Document. Closes Bug 13: the v1 reader-documents namespace mismatch meant PART_OF never fired; v3 derives Documents from highlights themselves.

Document

The container a Highlight comes from — a book, article, podcast, paper, or web page.
PropertyTypeRequiredSourceDescription
idstringyeshighlight-distillStable identifier; v3 form is readwise:book:<book_id> (derived from highlights, see Note below).
titlestringyeshighlight-distillTitle of the work. Sourced from Highlight.book_title (denormalised).
authorstringnohighlight-distillAuthor name. Sourced from Highlight.book_author (denormalised).
urlstringnohighlight-distillCanonical URL — Highlight.book_source_url.
cover_urlstringnohighlight-distillCover image URL — Highlight.book_cover_image_url.
categorystringnohighlight-distillDocument category — Highlight.book_category (e.g. book, article).
source_kindstringnohighlight-distillUpstream source kind — Highlight.book_source (e.g. kindle, airr).
document_notestringnohighlight-distillThe Readwise document-note field (book-level annotation) — Highlight.book_document_note.
captured_atstringyeshighlight-distillISO-8601 ingest timestamp.
v3 Document derivation (Bug 13 fix). v1/v2 attempted to MERGE Documents from the separate reader_list_documents MCP call, which lives in a different ID namespace from Readwise highlights — so Highlight -[:PART_OF]-> Document edges never fired. v3 derives Documents directly from the highlights themselves (MERGE (d:Document {id: 'readwise:book:' + book_id})) using the denormalised book_* properties listed above. The Readwise binding’s response_fields argument is what makes those fields non-null on every highlight.

Concept

The atomic, publishable unit — a distilled idea that recurs across highlights. The other anchor of this ontology.
PropertyTypeRequiredSourceDescription
namestringyeshighlight_distillUnique canonical name (lower-case, normalised).
display_namestringnohighlight_distillCased version used in renders.
descriptionstringnofuture / agentOptional one-line gloss.
confidencefloatnocross_source_conceptsGraph-aware confidence in [0, 1]. Exclusive writer: cross_source_concepts.
extraction_scorefloatnohighlight_distillRolling max of MENTIONS.extraction_score. Exclusive writer: highlight_distill.
epistemic_statusstringnonotion_publishOne of seedling, budding, evergreen. Derived from confidence at publish time.
mention_countintnocross_source_conceptsCount of inbound MENTIONS.
_statusstringnostrategiesspeculative / confirmed / stale lifecycle marker.
first_seen_atstringnohighlight_distillISO-8601 of first ingest.
last_seen_atstringnohighlight_distillISO-8601 of most recent ingest.
extraction_score vs confidence is intentional. They are separate properties with distinct writers-of-record. extraction_score reflects what the NLP extractors saw at ingest time (per-extractor agreement and signal, see Strategies — highlight-distill). confidence reflects graph-wide signal computed later (recency × frequency × source diversity, folded with mean extraction_score). The checkpoint rule concept_low_extraction_high_confidence flags drift between them as alias-suspicion — the canonical surfacing of “this Concept is probably an alias to another one.” Mirror this seam if you author a strategy that touches Concept scoring.

Entity

A typed real-world referent — a person, place, organisation, work, or product — mentioned in Highlights or Documents. Typed entities ground Concepts; “Karl Popper” the Person is distinct from “falsifiability” the Concept.
PropertyTypeRequiredSourceDescription
idstringyeshighlight_distillStable identifier.
namestringyeshighlight_distillDisplay name.
kindstringyeshighlight_distillOne of person, place, org, work, product.
canonical_urlstringnofutureOptional canonical reference (Wikipedia, etc.).
extraction_scorefloatnohighlight_distillPer-entity extraction confidence. Mirrors Concept.extraction_score.
Authoritative typer is concept-gliner (the only extractor with calibrated per-span probability and caller-supplied taxonomy). concept-spacy NER acts as a fallback typer; concept-keybert is untyped and never produces Entity routes.

Claim

A statement asserted or questioned in the source material. Forward-looking — highlight_distill does not produce Claims in v1; the node type exists for future strategies that perform claim extraction.
PropertyTypeRequiredSourceDescription
idstringyesfuture strategiesStable identifier.
textstringyesfuture strategiesThe claim verbatim.
stancestringnofuture strategiesOne of asserted, questioned.
confidencefloatnofuture strategiesStrength of the claim’s support in the graph.

Question

An open inquiry the reader (or a future strategy) wants to track. Like Topic, primarily user-curated in v1.
PropertyTypeRequiredSourceDescription
idstringyesuser / futureStable identifier.
textstringyesuser / futureThe question text.
statusstringnouser / futureOne of open, exploring, answered, abandoned.

Publication

A record of a Concept (or Topic / Document) published to an external sink. The idempotency anchor — notion_publish reads Publication.content_hash before deciding whether to write.
PropertyTypeRequiredSourceDescription
idstringyesnotion_publishStable identifier.
sinkstringyesnotion_publishv3 distinguishes per-database sinks: notion:source, notion:highlight, notion:concept. Future sinks: mintlify, quartz, obsidian, ghost.
external_idstringyesnotion_publishSink-side identifier (e.g. the Notion page UUID).
content_hashstringyesnotion_publishHash of the rendered content; basis for skip-vs-update decisions.
urlstringnonotion_publishDirect URL to the published page.
published_atstringyesnotion_publishISO-8601 of first publish.
last_updated_atstringyesnotion_publishISO-8601 of most recent update.
Publication.sink is the future-proofing seam. A future mintlify_publish strategy populates the same node type with sink: "mintlify". The graph stays canonical; sinks multiply. Indexed on both id and external_id for fast idempotent lookup.

Edges

EdgeFromToCardinalityPropertiesMeaning
MENTIONSHighlight, DocumentConcept, Entitymany-to-manyweight, extracted_by, extraction_score, agreement_nThe most-traversed edge. extracted_by is pipe-joined (e.g. keybert|gliner); agreement_n is 1, 2, or 3.
EVIDENCESHighlight, DocumentClaimmany-to-manystrengthSource-of-evidence link.
CONTRADICTSHighlight, Document, ClaimClaimmany-to-manyRefutation chain; reflexive Claim -> Claim allowed.
REFINESConcept, ClaimConcept, Claimmany-to-manyHierarchy without forcing a tree.
PART_OFHighlight, Concept, Question, ClaimDocument, Topicmany-to-oneComposition. Single edge type by design.
AUTHORED_BYDocumentEntitymany-to-manyrole (author/editor/translator/host)Authorship and contribution.
CAPTURED_FROMHighlight, DocumentDomainSourcemany-to-onecaptured_atCross-family edge to core/DomainSource. The hook into fracta’s 4-tier resolution chain.
RELATES_TOConcept, Entity, Topic, Question, Claim, Document, Highlight(same set)many-to-manyweight, reason, computed_byGeneric associative — the free Zettelkasten link.
PUBLISHED_ASConcept, Topic, DocumentPublicationmany-to-manyfirst_published_at, last_published_atIdempotency partner.

Provenance

Every node written by this pattern carries _source = 'strategy:<name>', matching fracta’s convention. Cross-family writes follow the same rule: when highlight_distill MERGEs the Readwise Highlights DomainSource, it sets _source = 'strategy:highlight_distill' on creation only — subsequent runs do not overwrite the originating attribution. Encourage the same convention for any new pattern: one writer-of-record per node, declared in the strategy name, traceable via _source. The checkpoint rules in internal/schema/embedfs/graph-schema/knowledge-garden/checkpoint.yaml lean on this — for example, concept_low_extraction_high_confidence only makes sense when extraction_score and confidence have separate authors.

Checkpoint rules

Eight validation rules ship with the family. The two most consequential:
  • highlight_missing_captured_from (error): every Highlight must be wired to a core/DomainSource. Without this, the 4-tier resolution chain breaks.
  • concept_low_extraction_high_confidence (warning): flags Concepts where confidence > 0.7 but extraction_score < 0.4 (or null) — the alias-drift detector. A graph-corroborated Concept that no NLP extractor strongly endorsed is almost always an alias to another Concept, surfaced for hand-merging until a future strategy (embedder MCP or LLM-based resolution) can do it automatically.
The remaining six (orphaned_concept_no_mentions, publication_missing_parent_concept, high_confidence_concept_without_topic, publication_missing_required_props, claim_without_evidence, concept_confidence_status_mismatch) live in internal/schema/embedfs/graph-schema/knowledge-garden/checkpoint.yaml. Run graph_checkpoint(mcp_servers='notion,readwise,concept-keybert,concept-gliner,concept-spacy') after any ingest to surface them.

Publishing layer (Notion three-database mirror)

Since v0.5.2 the published artefact is a navigable Notion structure, not a flat dump of Concept pages. The publishing layer maps the graph’s three-tier DomainSource -> Document -> Highlight onto three Notion databases connected by RELATION columns:
Graph tierNotion databaseSinkPage-perRELATION
Document (from Highlight.book_id)Sources DBnotion:sourceReadwise book / article
HighlightHighlights DBnotion:highlightReadwise highlightsource -> Sources DB
ConceptConcepts DBnotion:conceptAtomic concepthighlights -> Highlights DB
Each tier is idempotent independently: a Publication node is MERGEd per page with sink: 'notion:<tier>' and a tier-specific external_id (readwise_book_id, readwise_highlight_id, concept_name). Re-running notion-publish after no graph changes is a no-op (content-hash skip) at all three levels. The three RELATION values are written as JSON-stringified arrays of page IDs, not native arrays — that is the on-the-wire convention the hosted Notion MCP expects for properties of type RELATION. See Strategies — notion-publish for the call shape and the 7-step DAG (load_target_concepts -> load_supporting_highlights -> load_sources -> publish_sources -> publish_highlights -> render_concepts -> publish_concepts).

What’s next