Note Pipeline Design

Published: 2026-01-01

Overview

A single-source, multi-target publishing system that maintains notes in plain markdown with YAML frontmatter and routes them to different outputs based on metadata.

Architecture


Source Repository
├── posts/
│   ├── session-with-claude.md
│   ├── project-notes.md
│   └── research-paper.md
├── img/
│   └── diagrams/
├── data/
│   ├── tables.csv
│   └── figures.tex
└── pipeline/
    ├── Makefile
    ├── build.kt (or build.py)
    └── templates/
        ├── pdf.tex
        ├── web.html
        └── slides.html

Metadata-Driven Routing

Your frontmatter determines the pipeline:

title: Session with claude
author: jensse
category: private      # routes to: private blog, encrypted PDF
level: low            # affects: visibility, indexing
tags: ai code explore # used for: cross-linking, search
lang: en
date: 2025-02-21
output:
  - web-private       # target: personal site (authenticated)
  - pdf-archive       # target: versioned PDF in archive/
format: article       # template: article vs slides vs paper

Pipeline Stages

Stage 1: Parse & Route

// build.kt (pseudo-code)
fun processNote(file: Path) {
    val (frontmatter, content) = parseMarkdown(file)

    val targets = when {
        frontmatter.category == "private" &&
        frontmatter.level == "low" ->
            listOf(Target.PrivateWeb, Target.PDFArchive)

        frontmatter.category == "public" ->
            listOf(Target.PublicWeb, Target.PDF)

        frontmatter.format == "paper" ->
            listOf(Target.LaTeXPaper, Target.PDF)

        else -> listOf(Target.DefaultWeb)
    }

    targets.forEach { buildForTarget(it, file, frontmatter) }
}

Stage 2: Data Processing

For notes with embedded data/figures:

posts/research-paper.md
  ↓ (references data/experiment.csv)
  ↓
pipeline/processors/csv-to-table.kt
  ↓ (generates LaTeX table or SVG chart)
  ↓
build/intermediates/experiment-table.tex
  ↓
[included in final document]

Stage 3: Format Conversion

Using Pandoc as the universal converter:

# For web output
pandoc post.md \
  --template=templates/web.html \
  --filter=process-images.py \
  --css=styles.css \
  -o output/web/post.html

# For PDF via LaTeX
pandoc post.md \
  --template=templates/pdf.tex \
  --filter=process-latex.py \
  --pdf-engine=xelatex \
  -o output/pdf/post.pdf

# For slides
pandoc post.md \
  --template=templates/slides.html \
  -t revealjs \
  -o output/slides/post.html

Stage 4: Post-Processing

  • Copy images to output directories
  • Generate index/TOC pages
  • Create RSS feeds
  • Update search index

Makefile Structure

# Variables
SRC_DIR := posts
IMG_DIR := img
OUT_DIR := output
BUILD_DIR := build

# Targets
all: web pdf

web: $(patsubst $(SRC_DIR)/%.md,$(OUT_DIR)/web/%.html,$(wildcard $(SRC_DIR)/*.md))

pdf: $(patsubst $(SRC_DIR)/%.md,$(OUT_DIR)/pdf/%.pdf,$(wildcard $(SRC_DIR)/*.md))

# Pattern rules
$(OUT_DIR)/web/%.html: $(SRC_DIR)/%.md
	@mkdir -p $(OUT_DIR)/web
	kotlin build.kt web $<

$(OUT_DIR)/pdf/%.pdf: $(SRC_DIR)/%.md
	@mkdir -p $(OUT_DIR)/pdf
	kotlin build.kt pdf $<

clean:
	rm -rf $(OUT_DIR) $(BUILD_DIR)

watch:
	fswatch -o $(SRC_DIR) $(IMG_DIR) | xargs -n1 -I{} make all

.PHONY: all web pdf clean watch

Example Build Script (Kotlin)


// build.kt
import java.nio.file.Path
import kotlin.io.path.*

data class Frontmatter(
    val title: String,
    val category: String,
    val level: String,
    val output: List<String>,
    val format: String = "article"
)

enum class Target {
    PrivateWeb, PublicWeb, PDF, LaTeXPaper
}

fun main(args: Array<String>) {
    val command = args[0] // "web" or "pdf"
    val sourceFile = Path(args[1])

    val (frontmatter, content) = parseMarkdownFile(sourceFile)

    when (command) {
        "web" -> buildWebOutput(sourceFile, frontmatter, content)
        "pdf" -> buildPDFOutput(sourceFile, frontmatter, content)
    }
}

fun buildWebOutput(source: Path, meta: Frontmatter, content: String) {
    val templatePath = selectTemplate(meta.category, meta.format, "web")

    executeCommand(
        "pandoc", source.toString(),
        "--template=$templatePath",
        "--filter=filters/web-images.py",
        "-o", outputPath(source, "web", "html")
    )
}

fun buildPDFOutput(source: Path, meta: Frontmatter, content: String) {
    // Preprocess: extract LaTeX snippets, CSV data
    val processed = preprocessForLaTeX(source, content)

    // Convert to PDF via LaTeX
    executeCommand(
        "pandoc", processed.toString(),
        "--template=templates/pdf.tex",
        "--pdf-engine=xelatex",
        "-o", outputPath(source, "pdf", "pdf")
    )
}

fun selectTemplate(category: String, format: String, outputType: String): String =
    when {
        format == "paper" -> "templates/academic-paper.$outputType"
        category == "private" -> "templates/private-note.$outputType"
        else -> "templates/default.$outputType"
    }

Data Processing Pipeline

For your LaTeX/CSV workflow:

data/experiment.csv
  ↓
pipeline/processors/CsvToLatexTable.kt
  ↓
build/intermediates/experiment-table.tex

data/diagram.tex (TikZ)
  ↓
pipeline/processors/LatexToSvg.kt
  ↓
build/intermediates/diagram.svg
  ↓ (for web)
img/generated/diagram.svg
  ↓ (for PDF, keep as .tex)
[embedded directly in LaTeX]

Key Advantages

  1. Single source of truth: All notes in posts/*.md
  2. Version control friendly: Plain text, Git-friendly
  3. Metadata-driven: Routing logic in frontmatter, not file locations
  4. Composable: Each processor is a small, testable function
  5. Incremental builds: Make only rebuilds changed files
  6. Type-safe: Kotlin gives you compile-time safety for your build logic
  7. No hidden magic: You control every transformation

Migration Path

  1. Start simple: Get basic markdown → HTML working with Pandoc
  2. Add routing: Implement metadata parsing and target selection
  3. Add formats: One at a time (PDF, then slides, etc.)
  4. Add preprocessing: CSV conversion, LaTeX snippets as needed
  5. Optimize: Caching, parallel builds when it matters

Example Workflow

# Write a note
vim posts/new-idea.md

# Build everything
make all

# Or build specific target
make web
make pdf

# Watch for changes
make watch

# Deploy to destinations
make deploy-private  # rsync to private server
make deploy-public   # push to GitHub Pages

Tools You'll Need

  • Pandoc: Universal document converter
  • Make: Build orchestration
  • Kotlin (or Python): Custom processing scripts
  • XeLaTeX: PDF generation from LaTeX
  • ImageMagick/Inkscape: Image format conversions
  • fswatch: File watching for auto-rebuild