claude

Cut Claude API Costs 94% With HTML Comment Tiers

Cortex TMS reduces Claude API costs by 94% using HTML comment tiers that categorize documentation as HOT, WARM, or COLD, allowing Claude to process only

Cut Claude API Costs 94% With HTML Comment Tiers

What It Is

Cortex TMS introduces a document tiering system that dramatically reduces Claude API token consumption through strategic HTML comment tags. The approach works by categorizing documentation into three tiers: HOT (actively needed), WARM (occasionally referenced), and COLD (rarely accessed). By default, Claude only processes HOT-tier documents, ignoring archived changelogs, deprecated guides, and other low-priority content that typically inflates context windows.

The implementation requires adding simple HTML comments to documentation files: <!-- @cortex-tms-tier HOT -->, <!-- @cortex-tms-tier WARM -->, or <!-- @cortex-tms-tier COLD -->. These markers tell the system which documents to include in each API session. The creator’s own project demonstrated the impact clearly - token usage dropped from 66,834 per session to just 3,647, reducing costs from $0.11 per Claude call to $0.01.

Why It Matters

Large language model API costs scale directly with token consumption, making context window management a critical concern for teams running frequent Claude sessions. Documentation-heavy projects face particularly steep bills because traditional approaches send entire knowledge bases with every request, regardless of relevance to the current task.

This tiering system addresses a fundamental inefficiency in how developers typically structure AI-assisted workflows. Most codebases accumulate documentation over time - installation guides, migration notes, historical decisions, deprecated features - that remains valuable for reference but rarely applies to active development work. Sending this material through the API on every call wastes both money and processing time.

The 94% cost reduction represents more than just savings. Faster response times improve developer experience, while the explicit tiering forces teams to think critically about information architecture. Projects with hundreds of documentation files benefit most, though even smaller codebases see meaningful improvements. With over 1,000 NPM downloads since release, the tool has found traction among teams managing Claude API budgets.

Getting Started

Install Cortex TMS from the repository at https://github.com/cortex-tms/cortex-tms or through NPM. The setup process involves tagging existing documentation files with tier markers based on access patterns.

Start by identifying frequently referenced documents - API specifications for active features, current architecture decisions, and ongoing project guidelines. Tag these as HOT:

<!-- @cortex-tms-tier HOT -->
# Current API Documentation

Mark occasionally needed references as WARM - these might include integration guides for optional features or historical context documents. Archive old changelogs, deprecated feature docs, and completed migration guides as COLD.

Monitor token usage across tiers with:

This command shows the distribution of tokens across tier levels, helping teams identify opportunities for further optimization. Adjust tier assignments based on actual usage patterns rather than assumptions about document importance.

Context

The tiering concept isn’t unique to Cortex TMS - retrieval-augmented generation (RAG) systems and vector databases tackle similar problems by selecting relevant context dynamically. However, those approaches require infrastructure setup, embedding generation, and semantic search capabilities. HTML comment tags offer a simpler alternative that works immediately without additional dependencies.

Manual tiering does introduce maintenance overhead. As projects evolve, teams must update tier assignments to reflect changing priorities. A document marked COLD today might become critical during a feature revival or bug investigation. The system also assumes developers can accurately predict which information Claude will need, which may not hold for exploratory or debugging sessions.

Alternative approaches include prompt engineering to request specific documentation sections, using Claude’s extended context windows more selectively, or implementing dynamic context selection based on query analysis. Each method involves tradeoffs between complexity, accuracy, and cost savings.

The 94% reduction represents an upper bound - results vary based on documentation structure and usage patterns. Projects with well-organized, frequently updated docs may see smaller gains than those with extensive historical archives. Still, even modest token reductions compound significantly over thousands of API calls, making the tagging investment worthwhile for most teams managing Claude integration costs.