Skip to content

Performance & Cost

All numbers below are measured, not estimated, using tiktoken (cl100k_base) for token counts and tools/discovery_tax.py for the discovery model. Reproduce any of them with the commands shown.

Extraction (real conversions)

Measured with pdftotext (PDF) and ebooklib (EPUB):

Book Format Pages Tokens Chapters auto-detected
Think Python 2 PDF 244 119K 19
Working Backwards PDF 371 175K 10
Pro Git PDF 501 229K — †
Moby-Dick EPUB 301K 133

† Pro Git heads chapters with section titles (no Chapter N), so it does not auto-segment. Moby-Dick's bodies use bare titles, but its Roman-numeral table of contents is detected (133) — see Known limitations in the README.

Extraction method matters for technical books. On a 103-page technical PDF:

Method Time Tables Code blocks
pdftotext 0.1s 0 0
Docling (technical mode) 164s 48 36

pdftotext is instant but flattens structure; Docling is ~1.5s/page but preserves tables and code as markdown. Pick text mode for prose, technical mode for code/tables.

The Discovery Loop Tax

Tokens entering context to answer one targeted question. book-to-skill loads a resident core (~4K) plus one compiled chapter (~1K) ≈ 5,000 tokens.

Book (chapter size) Context-dump Discovery loop book-to-skill vs dump / loop
Think Python 2 (small) 119,264 12,152 ~5,000 24× / 2.4×
Working Backwards (medium) 175,253 33,444 ~5,000 35× / 6.7×
AI Engineering (large) 256,287 77,866 ~5,000 51× / 15.6×
python3 tools/discovery_tax.py --full-text /tmp/book_skill_work/full_text.txt --target-chapter 5
  • The context-dump advantage (24–51×) is the strongest claim: that cost recurs on every conversation turn.
  • The discovery-loop advantage (2.4–15.6×) is a one-time cost and a model using the book's real ToC/chapter sizes; it scales with chapter size.

Generation cost

One-pass full conversion, estimated from measured tokens (Claude Sonnet 4.5, \$3 / \$15 per MTok input/output):

Book Input Output ~Cost
Think Python 2 155K 28K \$0.88
Working Backwards 228K 19K \$0.96
Pro Git 298K 23K \$1.23
Moby-Dick 391K 17K \$1.42

Roughly \$1 per book for a full skill — paid once. Re-reading the same PDF into context every session costs far more over time (see the Discovery Loop Tax above).

Generated-skill output quality

A before/after of the adaptive-depth change (v1.0.0, #20) on one chapter:

Artifact Old spec New spec
Chapter file (tokens) 473 1,219
Worked example present no yes
Cheatsheet decision rules 0 32
Cheatsheet keyword/definition lines 9 0

The new spec turns the cheatsheet from a glossary into a decision layer and gives study-depth chapters a reproduced worked example.