TRUST & AUDIT

How billing works — computed, inspectable, idempotent

Our billing is post-settlement: we forward the request, **read the upstream's actual usage**, then deduct credits. The charged amount is always tied to real upstream usage — never an estimate. Here's every step.

1. Formula

Every model has a context_tiers config in cc_models — up_tokens thresholds map to credit costs. BillingHandler reads upstream's prompt_tokens + completion_tokens, classifies by input_tokens, and deducts the matching credit tier:

// internal/service/model_registry.go
func (m ModelConfig) CreditsForTokens(inputTokens int) int {
    for _, tier := range m.ContextTiers {
        if tier.UpTokens == 0 || inputTokens <= tier.UpTokens {
            return tier.Credits
        }
    }
    return m.ContextTiers[len(m.ContextTiers)-1].Credits
}

// Example — claude-sonnet-4-6 with 18,000 input tokens:
//   tier 1: up_tokens=32000  credits=12  ← matches
//   tier 2: up_tokens=200000 credits=36
//   tier 3: up_tokens=0      credits=84  (terminal "anything bigger")
// → 12 credits deducted, regardless of completion length.

Completion tokens don't change which tier you land in — so a long answer doesn't surprise-bill you.

2. Full request lifecycle

01Your client hits api.clawfeeder.ai with Authorization: Bearer cf-sk-...
02After JWT/API Key middleware, BillingHandler reads the request body to extract the model field
03Looks up cc_models registry; rejects with 400 unsupported_model if not present
04Balance + trial gate check — insufficient or unauthorized → 402 immediately
05Forward to a real upstream via the channel chain; 5xx auto-falls-back to the next channel
06**After upstream responds**, the streaming usage scanner extracts the usage field on the fly
07Once usage is known, CreditsForTokens computes the deduction
08Write one row to cc_credits_ledger (reason=api_use, delta=negative, model, latency_ms, status_code, ref_id=trace_id)
09ref_id UNIQUE constraint guarantees idempotency — same trace_id can't be settled twice

3. Where to inspect each charge

Every request can be cross-checked across 4 independent surfaces:

✓ X-Request-ID response headerServer returns a UUID per request; same value as the ledger ref_id and the log trace_id. If anything goes wrong, send us this ID and we can find it.
✓ X-Clawfeeder-Model response headerThe actual settled model. With model=auto, this header shows which concrete model was picked. Billing follows this header, never the literal 'auto'.
✓ Dashboard usage page/dashboard/usage shows every charge (model, tokens, credits, ref_id, status). Full 30-day history.
✓ GET /api/credits/historyProgrammatic access to the same data — JSON output, for your reconciliation scripts.

4. Idempotency

cc_credits_ledger.ref_id has a UNIQUE partial index. Each request has one trace_id; if BillingHandler retries (graceful shutdown mid-flight, SSE reconnect, etc.) a second settle write hits the DB and gets rejected. You cannot be double-charged.

-- migration 013 (deployed 2026-04-20):
CREATE UNIQUE INDEX cc_credits_ledger_ref_id_idx
  ON cc_credits_ledger(ref_id)
  WHERE ref_id IS NOT NULL;

Async video billing uses the same mechanism — pre-deduct ref_id=video:<task_id>, settle ref_id=video-settle:<task_id>, refund ref_id=video-refund:<task_id>. A task can't be settled twice.

5. What we store and what we don't

✓ We store

· Request time, model, token counts
· Credits charged, ref_id, upstream HTTP status
· Upstream latency (latency_ms)
· User ID and API key fingerprint

✗ We do not store

· Prompt content
· Response content
· Any request / response body excerpt
· (Except Playground threads — those you opt-in to save)

This is a technical invariant, not a promise. BillingHandler has no code path that writes bodies to DB — it streams through a usage scanner that finds the usage field and discards the rest.

6. What happens at zero balance

Post-settlement billing means deduction happens after the request. We allow a -100 credit soft floor so a request that lands at exactly $0 balance doesn't get its usage swallowed. Once balance ≤ -100, new requests return 402 until you top up.

Floor of -100 credits ≈ $0.30 — the worst-case overrun is always tiny.

7. Credit validity & renewal

Credits granted with a subscription (plan top-ups, redeem codes) live as long as the subscription period — at expiry, whatever's unspent in that batch is cleared. That's standard for subscriptions; what matters is that renewal works in your favour.

· Renew early, days stack — the new period is added onto your current expiry, not restarted from today. Renewing early never costs you the days you have left.
· Unused credits roll forward — on renewal, any credits you haven't spent are extended to the new expiry date. Nothing is lost.
· Gifted credits never expire — invite, referral, and promotional credits carry no expiry and stay valid indefinitely.

Example: your membership expires 7-15 with 3,200 credits left; you renew 30 days on 7-1 → validity stacks to 8-14, and those 3,200 credits roll forward to 8-14 too.

8. Data consistency checks

A reconcile job runs daily at 03:17 UTC with 4 read-only checks:

· duplicate_ref_ids — Scan ledger for duplicate ref_ids
· no_usage_trend — Check zero-charge ledger trend (possible usage parse failure)
· tier_distribution — Watch for unusual concentration in a tier
· balance_drift — User balance must equal SUM(delta)

Results retained in Redis for 30 days. Any anomaly writes WARN/ERROR logs that downstream ops pipelines can alert on.

Want to inspect a charge yourself?

Open usage →