02 — API Cost Control & Rate Limiting

Ring: 1 (Launch Blocker) Dependency: None (can work independently from auth, upgrades to user-based after auth) Handbook: Ch. 47 (observability), Ch. 204 (error handling, API rate limits), Ch. 207 (security — rate limiting) Related: Hetzner migration plan (docs/claude-hetzner-vps-plan.md) — Kong gateway + Redis

Problem

AI API calls cost money (Gemini, Perplexity, OpenAI, Anthropic).
No rate limiting exists. while(true) \{ fetch("/api/discover") \} → risk of billing explosion.
withAiJobTracking() is active on only 2/6+ routes → most AI calls are untracked.
We don’t know how much we’re spending. Can’t see it until the invoice arrives.
Environment exists where one user can rack up another user’s costs (pre multi-tenant).

Decisions

D1: 3-Layer Architecture

Request arrives
    |
    v
[LAYER 1: Rate Limiter]      ← Abuse prevention (SEC-C)
    |                            IP-based (pre-auth) → user-based (post-auth)
    |                            429 + Retry-After header
    v
[LAYER 2: Cost Tracker]      ← Observability
    |                            EVERY AI call is written to ai_job_runs
    |                            estimated_cost is calculated
    |                            Daily aggregation table
    v
[LAYER 3: Usage Gate]        ← Business logic (to be built in Category I)
                                 Requires auth
                                 Plan-based limits
                                 Credit balance check
                                 Pre-operation confirmation dialog

Layers 1 and 2 are built now. Layer 3 comes after auth + billing.

D2: Rate Limit — 2-Phase Strategy

Phase	Period	Approach	Rationale
Phase 1	Dev environment (local / pre-deploy)	In-memory sliding window	Fast, no dependencies. Sufficient before deploy.
Phase 2	Production (Hetzner — deploy day)	Kong rate limiting + Redis	Kong gateway and Redis already exist in the Hetzner stack. Upgrade with a single file change.

IMPORTANT DECISION: This project will NOT be DEPLOYED to Vercel. The day it goes live, the Hetzner migration plan is pulled forward. Vercel can only be used for dev/preview. Production = Hetzner. Therefore, the in-memory limiter is only for the development environment; Kong/Redis will be active in production.

Interface will be kept clean: IRateLimiter abstraction → Phase 1 MemoryRateLimiter, Phase 2 KongRateLimiter or RedisRateLimiter.

D3: Cost Visibility

Who	What They See
`super_admin`	Entire platform: total cost, per-org breakdown, per-route distribution
Org `owner`	Their org’s cost: daily/weekly/monthly, per-route
Org `member`	Cost of their own triggered operations (in the pre-operation confirmation dialog)

D4: Alert System

Level	Mechanism	When
Dashboard	Visual alert in admin panel (red banner)	When daily cost exceeds threshold
In-app popup	Toast/notification — “Today’s cost exceeded $X”	Real-time, owner + super_admin
Email	Daily cost report	Later (Ring 2+), not now

D5: Rate Limit Configuration

Route Group	Limit	Window	Rationale
`POST /api/discover`	10 req	60 sec	Most expensive — web search + AI
`POST /api/headhunt`	10 req	60 sec	Web search + AI
`POST /api/ai/complete`	20 req	60 sec	Batch scripts may call frequently
`POST /api/ai/classify`	10 req	60 sec	Each handles 25 companies → 10 req = 250 companies
`POST /api/score`	10 req	60 sec	Batch rescore
`POST /api/scraper/run`	3 req	60 sec	Heavy operation, subprocess
`GET /api/companies` etc.	No limit	—	DB query, no AI cost

After auth, these limits will be user-based + plan-based (Free: 3 discover/day, Pro: 50/day).

Cost Estimation Model

Provider Token Prices (`lib/api/costs.ts`)

Provider	Model	Input ($/1M token)	Output ($/1M token)
Gemini	2.5 Flash	$0.15	$0.60
Anthropic	Claude Sonnet	$3.00	$15.00
Perplexity	Sonar Pro	$1.00	$1.00
OpenAI	GPT-4o	$2.50	$10.00

These prices are updated periodically. Kept in a single file costs.ts.

Estimated Cost per Operation

Operation	Estimated Tokens	Estimated Cost (Gemini)
1 Discovery (single country)	~2K input + ~1K output	~$0.0009
1 Headhunt	~1.5K input + ~0.5K output	~$0.0005
1 Classify batch (25 companies)	~3K input + ~1K output	~$0.001
1 Enrichment	~2K input + ~1K output	~$0.0009

Actual values will be calculated from the ai_job_runs table. The above are initial estimates.

Data Model

Existing Table Update

-- Add 2 columns to export_ai_ai_job_runs
ALTER TABLE export_ai_ai_job_runs
  ADD COLUMN estimated_cost DECIMAL(10,6),  -- estimated cost in $
  ADD COLUMN endpoint TEXT;                  -- which route triggered it (discover, headhunt, ai/complete...)

New Table: Daily Aggregation

CREATE TABLE export_ai_usage_daily (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  organization_id UUID NOT NULL REFERENCES export_ai_organizations(id),
  usage_date DATE NOT NULL,
  endpoint TEXT NOT NULL,
  request_count INT DEFAULT 0,
  total_tokens INT DEFAULT 0,
  total_cost DECIMAL(10,6) DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(organization_id, usage_date, endpoint)
);

-- RLS
ALTER TABLE export_ai_usage_daily ENABLE ROW LEVEL SECURITY;

-- Index
CREATE INDEX idx_usage_daily_org_date
  ON export_ai_usage_daily(organization_id, usage_date DESC);

This table is for monitoring. incrementDailyUsage() is called on every AI job completion. The admin dashboard reads this table.

Architecture

Rate Limiter (`lib/api/rateLimit.ts`)

IRateLimiter interface
  ├── check(key: string, limit: number, windowSec: number): IRateLimitResult
  └── IRateLimitResult: \{ allowed: boolean, remaining: number, resetAt: Date \}

MemoryRateLimiter implements IRateLimiter  ← Phase 1 (now)
RedisRateLimiter implements IRateLimiter   ← Phase 2 (after Hetzner)

Cost Tracker (`lib/api/costs.ts` + `lib/db/track-ai-job.ts` update)

estimateCost(provider, model, inputTokens, outputTokens): number
  → provider + model based price table
  → result in $

withAiJobTracking() update:
  → call estimateCost() inside finishAiJob
  → save estimated_cost + endpoint
  → call incrementDailyUsage()

Middleware (`middleware.ts`)

Request → path match (discover, headhunt, ai/*, score, scraper/*)
  → extract IP (x-forwarded-for or request.ip)
  → rateLimiter.check(ip, routeLimit, windowSec)
  → Allowed: add X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset headers, continue
  → Blocked: 429 \{ success: false, error: "Rate limit exceeded", retryAfter: N \}

Current Code Impact

Files to Change

File	Change
`lib/db/track-ai-job.ts`	Add `estimated_cost` + `endpoint` parameter, call `incrementDailyUsage()`
`app/api/ai/complete/route.ts`	Add `withAiJobTracking()` — CURRENTLY UNTRACKED
`app/api/ai/classify/route.ts`	Add `withAiJobTracking()` — CURRENTLY UNTRACKED
`app/api/score/route.ts`	Check if it contains AI calls, add tracking if so

New Files

File	Content
`lib/api/rateLimit.ts`	IRateLimiter + MemoryRateLimiter
`lib/api/costs.ts`	Provider/model price table + estimateCost()
`lib/db/usage-daily.ts`	incrementDailyUsage() — called on AI job completion
`middleware.ts`	Rate limit middleware (for costly routes)
`app/api/admin/usage/route.ts`	GET — daily/weekly/monthly cost report

Future Decisions (not now, but not forgotten)

FD-1: Kong Rate Limiting (After Hetzner Migration)

Kong gateway is already part of the Supabase stack on Hetzner. Rate limiting can be handled via Kong, moved from middleware to Kong. Change is minimal thanks to the IRateLimiter interface.

FD-2: Redis-Backed Limiter (After Hetzner Migration)

Coolify’s internal Redis or a separate Redis container. For distributed rate limiting (multiple Next.js instances). Not needed on a single VPS but ready for horizontal scaling.

FD-3: Email Alert (Ring 2+)

Send daily cost report via email. Resend or Supabase Edge Function + SMTP.

FD-4: Anomaly Detection (Ring 3+)

Learn normal usage patterns, auto-alert on deviations. For example “today’s costs are 10x yesterday’s” → instant notification.

Atomic Tasks

#	Task	Size
COST-1	`lib/api/rateLimit.ts` — IRateLimiter interface + MemoryRateLimiter	Medium
COST-2	`middleware.ts` — Rate limit middleware, route config, 429 response + headers	Medium
COST-3	`lib/api/costs.ts` — Provider/model price table + estimateCost()	Small
COST-4	DB migration: `ai_job_runs` + 2 columns, `usage_daily` table	Migration
COST-5	Update `track-ai-job.ts` — estimated_cost + endpoint + incrementDailyUsage	Medium
COST-6	`/api/ai/complete` + `/api/ai/classify` → add withAiJobTracking()	Small
COST-7	`lib/db/usage-daily.ts` — incrementDailyUsage()	Small
COST-8	`GET /api/admin/usage` — cost report endpoint	Medium
COST-9	Admin dashboard UI — cost chart + alert banner	Medium (with Ring 2)
COST-10	In-app popup — toast notification on threshold breach	Small

Documentation Index

​02 — API Cost Control & Rate Limiting

​Problem

​Decisions

​D1: 3-Layer Architecture

​D2: Rate Limit — 2-Phase Strategy

​D3: Cost Visibility

​D4: Alert System

​D5: Rate Limit Configuration

​Cost Estimation Model

​Provider Token Prices (lib/api/costs.ts)

​Estimated Cost per Operation

​Data Model

​Existing Table Update

​New Table: Daily Aggregation

​Architecture

​Rate Limiter (lib/api/rateLimit.ts)

​Cost Tracker (lib/api/costs.ts + lib/db/track-ai-job.ts update)

​Middleware (middleware.ts)

​Current Code Impact

​Files to Change

​New Files

​Future Decisions (not now, but not forgotten)

​FD-1: Kong Rate Limiting (After Hetzner Migration)

​FD-2: Redis-Backed Limiter (After Hetzner Migration)

​FD-3: Email Alert (Ring 2+)

​FD-4: Anomaly Detection (Ring 3+)

​Atomic Tasks

02 — API Cost Control & Rate Limiting

Problem

Decisions

D1: 3-Layer Architecture

D2: Rate Limit — 2-Phase Strategy

D3: Cost Visibility

D4: Alert System

D5: Rate Limit Configuration

Cost Estimation Model

Provider Token Prices (`lib/api/costs.ts`)

Estimated Cost per Operation

Data Model

Existing Table Update

New Table: Daily Aggregation

Architecture

Rate Limiter (`lib/api/rateLimit.ts`)

Cost Tracker (`lib/api/costs.ts` + `lib/db/track-ai-job.ts` update)

Middleware (`middleware.ts`)

Current Code Impact

Files to Change

New Files

Future Decisions (not now, but not forgotten)

FD-1: Kong Rate Limiting (After Hetzner Migration)

FD-2: Redis-Backed Limiter (After Hetzner Migration)

FD-3: Email Alert (Ring 2+)

FD-4: Anomaly Detection (Ring 3+)

Atomic Tasks