strategies/knowledge-garden/ (promoted out of _example/ so a fresh deploy discovers them automatically) and follow the standard fracta strategy shape: contract.yaml + strategy.py + binding.yaml.
Strategies are deterministic Python DAGs run by the fracta strategy runner — not LLM calls. The pipeline prompt on First Run is the only place an LLM holds the framing; each strategy_run is a Python pipeline.
What ships
| Strategy | Category | Purpose | Source |
|---|---|---|---|
highlight-distill | enrichment | Pull Readwise highlights and documents, extract Concept and Entity candidates via three NLP MCPs, write to graph. | strategies/knowledge-garden/enrichment/highlight_distill |
cross-source-concepts | correlation | Rescore every Concept by graph-wide signal; update Concept.confidence, Concept.mention_count, and MENTIONS.weight. | strategies/knowledge-garden/correlation/cross_source_concepts |
notion-publish | traversal | Render a Concept (or top-N batch) as a Notion page; create-or-update idempotently via Publication.content_hash. | strategies/knowledge-garden/traversal/notion_publish |
highlight-distill
Purpose. Pull recent Readwise highlights and documents through the gateway, group them by source document, fan out three parallel NLP extractor calls per highlight, merge the results with an asymmetric scoring algorithm, and writeHighlight, Document, Concept, and Entity nodes (plus MENTIONS / PART_OF / CAPTURED_FROM edges) into the graph.
This is the Capture + Organize stage. It is the only strategy in the pattern that calls upstream MCPs for data ingest.
Inputs
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
watermark_iso | string | no | "1970-01-01T00:00:00Z" | Pull only highlights with updated > watermark. The contract default is the documented “backfill from the beginning” timestamp; pass an explicit recent ISO (e.g. 2026-04-01T00:00:00Z) on incremental runs to keep the Readwise call count bounded. The strategy also accepts a rolling sentinel -7d (resolved at runtime for DuckDB filtering); see contract.yaml for the current behaviour. |
page_size | int | no | 100 | Readwise list_highlights page size. The hosted MCP caps at 20 req/min, so larger pages mean fewer round-trips. |
Outputs
Graph mutations only — no tabular return payload. After a run, expect:- One
Highlightnode per ingested Readwise highlight. - One
Documentnode per source book / article / podcast. Conceptnodes for high-signal keyphrases (asymmetric score >= 0.40).Entitynodes forgliner-typed spans (Person / Organisation / Place / Work / Product).MENTIONSedges withweight,extracted_by(pipe-joined extractor IDs),extraction_score, andagreement_n(1, 2, or 3).CAPTURED_FROMedges to theReadwise HighlightsDomainSource.
[0.30, 0.40)) land in a DuckDB pending_extractions table — not graph — and graduate on second appearance.
Extractor fan-out
Each highlight triggers three parallel MCP calls viaThreadPoolExecutor(max_workers=8):
| Extractor | Tool | Score semantics | Role |
|---|---|---|---|
concept-keybert | keybert_extract_tool | Within-document MMR-reranked cosine — not a probability. | Concept candidates when GLiNER doesn’t fire; consumed as a rank tier (top-3 = 0.70, 4–10 = 0.50, > 10 = 0.30), never as raw cosine. |
concept-gliner | gliner_extract_tool | DeBERTa-v3 sigmoid in [0, 1]; comparable across spans. | Authoritative typer and score-of-record where it fires. Routes spans to Entity.kind (Person/Org/Place/Work/Product) or Concept (Theory/Method/Concept/Tool). |
concept-spacy | spacy_extract_tool | OntoNotes-5 NER + noun chunks — categorical, no score field. | Fallback typer when GLiNER doesn’t fire; noun chunks (filtered root_pos != PRON) are last-resort candidates; contributes to agreement_n but not to extraction_score. |
lambda=0.7, K=8 per highlight, Jaccard similarity over canonical tokens — KeyBERT does not expose per-keyphrase embeddings, so embedding cosine is unavailable in v1).
Call it
cross-source-concepts
Purpose. Read the graph; rescore everyConcept using recency × frequency × source diversity, folded with the mean of MENTIONS.extraction_score; write back Concept.confidence, Concept.mention_count, and MENTIONS.weight. Surface high-confidence Concepts not yet linked to a Topic via the high_confidence_concept_without_topic checkpoint rule.
This is the Distill stage. It is a pure-DuckDB-over-graph computation — no MCP fetch.
Inputs
This strategy takes no params in v1.Outputs
Graph mutations only:Concept.confidence— updated for every Concept the rescore touched.Concept.mention_count— refreshed from the actual inboundMENTIONScount.MENTIONS.weight— derived fromextraction_score × recency_decay.
Scoring formula
{freq: 0.30, recency: 0.15, diversity: 0.40, extract: 0.15}. The w_extract term is what folds highlight_distill’s per-extraction signal into the graph-aware confidence — without cross_source_concepts writing extraction_score itself. The two strategies never overwrite each other’s field.
Call it
notion-publish
Purpose. Mirror the knowledge-garden into Notion as a three-database structure — Sources, Highlights, and Concepts — linked via RELATION columns so a reader can navigate from a published Concept back into the highlights and books that produced it. Idempotent across all three sinks viaPublication.content_hash keyed by sink + external_id. Computes Concept.epistemic_status from Concept.confidence and writes it back to the graph before rendering.
This is the Express stage. Since v0.5.2 the published artefact is the three-DB mirror, not a flat dump of Concept pages.
Inputs
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
concept_name | string | no | — | Publish a single Concept by canonical name. Omit for batch mode. |
notion_concepts_database_id | string | yes | — | data_source_id of the Concepts DB. (Legacy: notion_database_id is still accepted.) |
notion_highlights_database_id | string | yes | — | data_source_id of the Highlights DB. |
notion_sources_database_id | string | yes | — | data_source_id of the Sources DB. |
batch_size | int | no | 10 | In batch mode, publish the top-N Concepts by confidence. |
Outputs
For each Concept published:- A new or updated Notion page in the Concepts database, with a
highlightsRELATION column pointing at one or more rows in the Highlights database. - One Notion page in the Highlights database per supporting highlight (de-duped across concepts), with a
sourceRELATION column pointing at the Source database row for its Readwise book. - One Notion page in the Sources database per distinct Readwise book/article/podcast referenced.
- A
Publicationnode MERGEd per page withsink: notion:source | notion:highlight | notion:concept,external_id, andcontent_hash. - A
PUBLISHED_ASedge wiring the source graph node to itsPublication.
Pipeline
The strategy is a 7-step DAG:load_target_concepts— top-N Concepts byconfidence(or a single concept byconcept_name).load_supporting_highlights— Highlights mentioning each target Concept, with their denormalisedbook_*properties.load_sources— distinct Documents reachable from the loaded Highlights.publish_sources— idempotent perreadwise_book_id; sinknotion:source.publish_highlights— idempotent perreadwise_highlight_id; sinknotion:highlight; carries asourceRELATION to its Source page.render_concepts— produces the Markdown body + properties dict per Concept.publish_concepts— idempotent per Concept; sinknotion:concept; carries thehighlightsRELATION assembled from step 5’sid_to_urlmap.
Notion MCP tool surface (v0.5.2)
The strategy calls four hosted Notion MCP tools:notion-search+notion-fetch— locate-or-adopt path. After anotion-searchhit, the strategy callsnotion-fetchand requires an exact match onproperties.concept_name(or equivalent) before adopting. Avoids the v1 cross-overwrite trap.notion-create-pages— creates new pages. The payload shape is{parent: {data_source_id: ...}, pages: [{properties: {...}, content: "markdown..."}]}. Properties are flat scalars (text, number, date), not the v1 wrapped objects.contentis enhanced-Markdown, not block JSON.notion-update-page— updates properties and/or content. Usecommand: replace_contentfor a content-hash-driven full refresh;command: insert_contentfor appends.- RELATION properties are sent as JSON-stringified arrays of page IDs, not native arrays.
Idempotency
The Publication node is the source of truth — local lookup first, then API. The publish sequence per page (Source, Highlight, or Concept):MATCH (p:Publication {sink: <sink>, external_id: <book_id|highlight_id|concept_name>}).- If found and
p.content_hash == new_hash-> skip; no API calls. - If found and hash differs ->
notion-update-pagewithcommand: replace_content. - If not found ->
notion-search+notion-fetchstrict-match; on hit, adopt; otherwisenotion-create-pages. - Update
Publication.content_hash,last_updated_at. WirePUBLISHED_AS.
epistemic_status mapping at publish time:
confidence < 0.4->seedling0.4 <= confidence <= 0.8->buddingconfidence > 0.8->evergreen
Call it
Why three strategies, not one
Splitting capture, distill, and express into three strategies lets you run each independently — re-publish without re-ingesting; re-score without re-extracting; iterate on one stage at a time. It also enforces the ownership seam:highlight-distill owns extraction_score; cross_source_concepts owns confidence; notion_publish owns epistemic_status and Publication.*. No two strategies write the same field, which keeps the checkpoint rules meaningful and the graph honest.

