Note Pipeline Design

Published: 2026-01-01

Overview

A single-source, multi-target publishing system that maintains notes in plain markdown with YAML frontmatter and routes them to different outputs based on metadata.

Architecture

```bash

Source Repository ├── posts/ │ ├── session-with-claude.md │ ├── project-notes.md │ └── research-paper.md ├── img/ │ └── diagrams/ ├── data/ │ ├── tables.csv │ └── figures.tex └── pipeline/ ├── Makefile ├── build.kt (or build.py) └── templates/ ├── pdf.tex ├── web.html └── slides.html

```

Metadata-Driven Routing

Your frontmatter determines the pipeline:

yaml title: Session with claude author: jensse category: private # routes to: private blog, encrypted PDF level: low # affects: visibility, indexing tags: ai code explore # used for: cross-linking, search lang: en date: 2025-02-21 output: - web-private # target: personal site (authenticated) - pdf-archive # target: versioned PDF in archive/ format: article # template: article vs slides vs paper

Pipeline Stages

Stage 1: Parse & Route

```kotlin // build.kt (pseudo-code) fun processNote(file: Path) { val (frontmatter, content) = parseMarkdown(file)

val targets = when {
    frontmatter.category == "private" &&
    frontmatter.level == "low" ->
        listOf(Target.PrivateWeb, Target.PDFArchive)

    frontmatter.category == "public" ->
        listOf(Target.PublicWeb, Target.PDF)

    frontmatter.format == "paper" ->
        listOf(Target.LaTeXPaper, Target.PDF)

    else -> listOf(Target.DefaultWeb)
}

targets.forEach { buildForTarget(it, file, frontmatter) }

} ```

Stage 2: Data Processing

For notes with embedded data/figures:

posts/research-paper.md ↓ (references data/experiment.csv) ↓ pipeline/processors/csv-to-table.kt ↓ (generates LaTeX table or SVG chart) ↓ build/intermediates/experiment-table.tex ↓ [included in final document]

Stage 3: Format Conversion

Using Pandoc as the universal converter:

```bash

For web output

pandoc post.md
--template=templates/web.html
--filter=process-images.py
--css=styles.css
-o output/web/post.html

For PDF via LaTeX

pandoc post.md
--template=templates/pdf.tex
--filter=process-latex.py
--pdf-engine=xelatex
-o output/pdf/post.pdf

For slides

pandoc post.md
--template=templates/slides.html
-t revealjs
-o output/slides/post.html ```

Stage 4: Post-Processing

  • Copy images to output directories
  • Generate index/TOC pages
  • Create RSS feeds
  • Update search index

Makefile Structure

```makefile

Variables

SRC_DIR := posts IMG_DIR := img OUT_DIR := output BUILD_DIR := build

Targets

all: web pdf

web: $(patsubst $(SRC_DIR)/%.md,$(OUT_DIR)/web/%.html,$(wildcard $(SRC_DIR)/*.md))

pdf: $(patsubst $(SRC_DIR)/%.md,$(OUT_DIR)/pdf/%.pdf,$(wildcard $(SRC_DIR)/*.md))

Pattern rules

$(OUT_DIR)/web/%.html: $(SRC_DIR)/%.md @mkdir -p $(OUT_DIR)/web kotlin build.kt web $<

$(OUT_DIR)/pdf/%.pdf: $(SRC_DIR)/%.md @mkdir -p $(OUT_DIR)/pdf kotlin build.kt pdf $<

clean: rm -rf $(OUT_DIR) $(BUILD_DIR)

watch: fswatch -o $(SRC_DIR) $(IMG_DIR) | xargs -n1 -I{} make all

.PHONY: all web pdf clean watch ```

Example Build Script (Kotlin)

```kotlin

// build.kt import java.nio.file.Path import kotlin.io.path.*

data class Frontmatter( val title: String, val category: String, val level: String, val output: List, val format: String = "article" )

enum class Target { PrivateWeb, PublicWeb, PDF, LaTeXPaper }

fun main(args: Array) { val command = args[0] // "web" or "pdf" val sourceFile = Path(args[1])

val (frontmatter, content) = parseMarkdownFile(sourceFile)

when (command) {
    "web" -> buildWebOutput(sourceFile, frontmatter, content)
    "pdf" -> buildPDFOutput(sourceFile, frontmatter, content)
}

}

fun buildWebOutput(source: Path, meta: Frontmatter, content: String) { val templatePath = selectTemplate(meta.category, meta.format, "web")

executeCommand(
    "pandoc", source.toString(),
    "--template=$templatePath",
    "--filter=filters/web-images.py",
    "-o", outputPath(source, "web", "html")
)

}

fun buildPDFOutput(source: Path, meta: Frontmatter, content: String) { // Preprocess: extract LaTeX snippets, CSV data val processed = preprocessForLaTeX(source, content)

// Convert to PDF via LaTeX
executeCommand(
    "pandoc", processed.toString(),
    "--template=templates/pdf.tex",
    "--pdf-engine=xelatex",
    "-o", outputPath(source, "pdf", "pdf")
)

}

fun selectTemplate(category: String, format: String, outputType: String): String = when { format == "paper" -> "templates/academic-paper.$outputType" category == "private" -> "templates/private-note.$outputType" else -> "templates/default.$outputType" } ```

Data Processing Pipeline

For your LaTeX/CSV workflow:

``` data/experiment.csv ↓ pipeline/processors/CsvToLatexTable.kt ↓ build/intermediates/experiment-table.tex

data/diagram.tex (TikZ) ↓ pipeline/processors/LatexToSvg.kt ↓ build/intermediates/diagram.svg ↓ (for web) img/generated/diagram.svg ↓ (for PDF, keep as .tex) [embedded directly in LaTeX] ```

Key Advantages

  1. Single source of truth: All notes in posts/*.md
  2. Version control friendly: Plain text, Git-friendly
  3. Metadata-driven: Routing logic in frontmatter, not file locations
  4. Composable: Each processor is a small, testable function
  5. Incremental builds: Make only rebuilds changed files
  6. Type-safe: Kotlin gives you compile-time safety for your build logic
  7. No hidden magic: You control every transformation

Migration Path

  1. Start simple: Get basic markdown → HTML working with Pandoc
  2. Add routing: Implement metadata parsing and target selection
  3. Add formats: One at a time (PDF, then slides, etc.)
  4. Add preprocessing: CSV conversion, LaTeX snippets as needed
  5. Optimize: Caching, parallel builds when it matters

Example Workflow

```bash

Write a note

vim posts/new-idea.md

Build everything

make all

Or build specific target

make web make pdf

Watch for changes

make watch

Deploy to destinations

make deploy-private # rsync to private server make deploy-public # push to GitHub Pages ```

Tools You'll Need

  • Pandoc: Universal document converter
  • Make: Build orchestration
  • Kotlin (or Python): Custom processing scripts
  • XeLaTeX: PDF generation from LaTeX
  • ImageMagick/Inkscape: Image format conversions
  • fswatch: File watching for auto-rebuild