API Documentation

Compress prompts programmatically via our REST API or npm SDK.

Quick start

1. Get your API key

2. Compress via API

curl -X POST https://tokenshrink.com/api/compress \
  -H "Content-Type: application/json" \
  -H "x-api-key: ts_live_your_key_here" \
  -d '{
    "text": "Your long prompt text here...",
    "domain": "auto"
  }'

3. Or use the SDK (v2.0)

npm install tokenshrink

import { compress } from 'tokenshrink';

// Compress a prompt — runs locally, no API call needed
const result = compress('Your long prompt...');
console.log(result.compressed);
console.log(result.stats.tokensSaved);       // Real token savings
console.log(result.stats.originalTokens);     // Original token count
console.log(result.stats.totalCompressedTokens); // Compressed token count

// Optional: plug in a real tokenizer for exact counts
import { encode } from 'gpt-tokenizer';
const result2 = compress('Your long prompt...', {
  tokenizer: (text) => encode(text).length
});

// Use with any LLM provider
import OpenAI from 'openai';
const openai = new OpenAI();
const res = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'system', content: result.compressed }],
});

API Reference

POST/api/compress

Headers

Content-Type: application/json

x-api-key: ts_live_... (optional for anonymous, required for API usage)

Request body

{
  "text": "string (required) — the text to compress",
  "domain": "string (optional) — auto|code|medical|legal|business"
}

Response

{
  "compressed": "string — full compressed text with Rosetta header",
  "rosetta": "string — just the decoder header",
  "stats": {
    "originalWords": 150,
    "compressedWords": 42,
    "rosettaWords": 18,
    "totalCompressedWords": 60,
    "originalTokens": 168,
    "compressedTokens": 45,
    "rosettaTokens": 22,
    "totalCompressedTokens": 67,
    "ratio": 2.5,
    "tokensSaved": 101,
    "dollarsSaved": 0.05,
    "strategy": "domain",
    "domain": "code",
    "tokenizerUsed": "built-in"
  }
}

GET/api/usage

Returns your current usage stats, monthly history, and recent compressions. Requires authentication (session or API key).

Rate limits

Requests per minute10

Words per request100,000

Monthly limit500 calls/month on Free, unlimited on Advanced

PriceFree

Token counting

v2.0 uses real token counts instead of word estimates. By default, TokenShrink uses a precomputed lookup table based on cl100k_base (GPT-4). For exact counts with your specific model, pass a custom tokenizer:

import { compress } from 'tokenshrink';
import { encode } from 'gpt-tokenizer';

const result = compress(text, {
  tokenizer: (text) => encode(text).length
});

Compression domains

Set domain to optimize compression for specific content types. Default is auto.

autoAutomatically detects the best strategy

codeProgramming and technical documentation

medicalMedical records, clinical notes

legalContracts, legal documents

businessBusiness communications, reports