Compress token usage
with zero accuracy loss

49-57% token reduction

A transparent HTTP proxy between your IDE and LLM providers. Lossless compression. Zero config. 50% savings.

Get Started View on GitHub

>50%token reduction

0%accuracy loss

<50msoverhead

The problem

The hidden tax on every LLM call

‹‹‹‹‹

Redundant context

Every API call ships repeated system prompts, bloated tool schemas, and stale conversation history. You’re paying per token for data the model doesn’t need.

‹‹‹‹

Context rot

As context windows fill, model quality degrades — the “lost in the middle” effect. More tokens doesn’t mean better results. Often it means worse.

‹‹‹

Agentic burn rate

Agentic workflows burn through context limits fast, forcing premature truncation or expensive summarization passes that lose information.

‹‹

Cost scales linearly

Token pricing is simple: more tokens, more cost. Without compression, every optimization you make elsewhere is undermined by raw token waste.

Pipeline

Five layers of compression

Each request passes through five independent stages. Every layer is toggleable. Together, they compound.

›››››

LTSC

Lossless Token Sequence Compression

18-47%input reduction

››››

Tools

Progressive Tool Disclosure

70-98%tool token reduction

›››

Cache

Semantic Cache

100%on cache hit

››

ACON

Agentic Context Compression

26-54%peak reduction

›

TOON

Tab-Oriented Output Notation

30-61%output reduction

Replaces repeated token subsequences with dictionary-backed placeholders using LZ77-style meta-token compression. Fully lossless.

paper ›

Two-tool architecture: find_tool discovers relevant tools, call_tool loads schemas on demand. No more sending all schemas upfront.

Embeds each request and matches against cached responses using vector similarity. On a hit, the LLM is bypassed entirely.

paper ›

Compresses multi-turn agent conversation histories through heuristic deduplication and tool result truncation.

paper ›

Negotiates a compact tab-separated output format instead of verbose JSON. Same data, dramatically fewer tokens out.

paper ›

Benefits

What compression means for your workflow

›››››

››››

Longer sessions

Your agent takes more steps before hitting context limits. No premature truncation, no forced summarization mid-task.

›››

More tools, less waste

››

Better quality

LLMs degrade as context fills. Lower effective token usage means the model performs better, even with the same real information.

›

Lower cost

You pay per token. Fitting more into less means the same work costs less. Typical savings of 50%+ on API spend.

Quickstart

Three commands. That's it.

terminal›››

# 1. Start the proxy

lessloss start

# 2. Point your LLM client at lessloss

eval "$(lessloss init zsh)"

# 3. Use your tools as normal

claude "explain this codebase"

codex "fix the auth bug"

aider /add src/

Clients

OpenAI SDKAnthropic SDKClaude CodeCodex CLIaiderCursorcontinue.devGemini CLI

Providers

OpenAIAnthropicGoogle GeminiAzureGroqTogether AIOllamaOpenRouter+ any OpenAI-compatible API

Stats

Compression in action

live session

$ lessloss stats

Tokens saved:847,291

Cache hits:43

Avg compression:0.47x

Pipeline latency:<42ms

Sessions active:12

Architecture

How it works

IDE / Agent

›››››

LTSC

Tools

Cache

ACON

TOON

››

LLM Provider

›

Request-side compression only

Responses stream through untouched

›

No TLS termination

Plain HTTP on localhost, HTTPS to upstream

›

Tower middleware architecture

Each layer independently toggleable

›

Connection pooling

HTTP/2 multiplexing to upstream providers

Compress token usagewith zero accuracy loss

The hidden tax on every LLM call

Redundant context

Context rot

Agentic burn rate

Cost scales linearly

Five layers of compression

What compression means for your workflow

Longer sessions

More tools, less waste

Better quality

Lower cost

Three commands. That's it.

Compression in action

How it works

Compress token usage
with zero accuracy loss