Skip to main content

02 — API Cost Control & Rate Limiting

Ring: 1 (Launch Blocker) Dependency: None (can work independently from auth, upgrades to user-based after auth) Handbook: Ch. 47 (observability), Ch. 204 (error handling, API rate limits), Ch. 207 (security — rate limiting) Related: Hetzner migration plan (docs/claude-hetzner-vps-plan.md) — Kong gateway + Redis

Problem

  • AI API calls cost money (Gemini, Perplexity, OpenAI, Anthropic).
  • No rate limiting exists. while(true) \{ fetch("/api/discover") \} → risk of billing explosion.
  • withAiJobTracking() is active on only 2/6+ routes → most AI calls are untracked.
  • We don’t know how much we’re spending. Can’t see it until the invoice arrives.
  • Environment exists where one user can rack up another user’s costs (pre multi-tenant).

Decisions

D1: 3-Layer Architecture

Request arrives
    |
    v
[LAYER 1: Rate Limiter]      ← Abuse prevention (SEC-C)
    |                            IP-based (pre-auth) → user-based (post-auth)
    |                            429 + Retry-After header
    v
[LAYER 2: Cost Tracker]      ← Observability
    |                            EVERY AI call is written to ai_job_runs
    |                            estimated_cost is calculated
    |                            Daily aggregation table
    v
[LAYER 3: Usage Gate]        ← Business logic (to be built in Category I)
                                 Requires auth
                                 Plan-based limits
                                 Credit balance check
                                 Pre-operation confirmation dialog
Layers 1 and 2 are built now. Layer 3 comes after auth + billing.

D2: Rate Limit — 2-Phase Strategy

PhasePeriodApproachRationale
Phase 1Dev environment (local / pre-deploy)In-memory sliding windowFast, no dependencies. Sufficient before deploy.
Phase 2Production (Hetzner — deploy day)Kong rate limiting + RedisKong gateway and Redis already exist in the Hetzner stack. Upgrade with a single file change.
IMPORTANT DECISION: This project will NOT be DEPLOYED to Vercel. The day it goes live, the Hetzner migration plan is pulled forward. Vercel can only be used for dev/preview. Production = Hetzner. Therefore, the in-memory limiter is only for the development environment; Kong/Redis will be active in production.
Interface will be kept clean: IRateLimiter abstraction → Phase 1 MemoryRateLimiter, Phase 2 KongRateLimiter or RedisRateLimiter.

D3: Cost Visibility

WhoWhat They See
super_adminEntire platform: total cost, per-org breakdown, per-route distribution
Org ownerTheir org’s cost: daily/weekly/monthly, per-route
Org memberCost of their own triggered operations (in the pre-operation confirmation dialog)

D4: Alert System

LevelMechanismWhen
DashboardVisual alert in admin panel (red banner)When daily cost exceeds threshold
In-app popupToast/notification — “Today’s cost exceeded $X”Real-time, owner + super_admin
EmailDaily cost reportLater (Ring 2+), not now

D5: Rate Limit Configuration

Route GroupLimitWindowRationale
POST /api/discover10 req60 secMost expensive — web search + AI
POST /api/headhunt10 req60 secWeb search + AI
POST /api/ai/complete20 req60 secBatch scripts may call frequently
POST /api/ai/classify10 req60 secEach handles 25 companies → 10 req = 250 companies
POST /api/score10 req60 secBatch rescore
POST /api/scraper/run3 req60 secHeavy operation, subprocess
GET /api/companies etc.No limitDB query, no AI cost
After auth, these limits will be user-based + plan-based (Free: 3 discover/day, Pro: 50/day).

Cost Estimation Model

Provider Token Prices (lib/api/costs.ts)

ProviderModelInput ($/1M token)Output ($/1M token)
Gemini2.5 Flash$0.15$0.60
AnthropicClaude Sonnet$3.00$15.00
PerplexitySonar Pro$1.00$1.00
OpenAIGPT-4o$2.50$10.00
These prices are updated periodically. Kept in a single file costs.ts.

Estimated Cost per Operation

OperationEstimated TokensEstimated Cost (Gemini)
1 Discovery (single country)~2K input + ~1K output~$0.0009
1 Headhunt~1.5K input + ~0.5K output~$0.0005
1 Classify batch (25 companies)~3K input + ~1K output~$0.001
1 Enrichment~2K input + ~1K output~$0.0009
Actual values will be calculated from the ai_job_runs table. The above are initial estimates.

Data Model

Existing Table Update

-- Add 2 columns to export_ai_ai_job_runs
ALTER TABLE export_ai_ai_job_runs
  ADD COLUMN estimated_cost DECIMAL(10,6),  -- estimated cost in $
  ADD COLUMN endpoint TEXT;                  -- which route triggered it (discover, headhunt, ai/complete...)

New Table: Daily Aggregation

CREATE TABLE export_ai_usage_daily (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  organization_id UUID NOT NULL REFERENCES export_ai_organizations(id),
  usage_date DATE NOT NULL,
  endpoint TEXT NOT NULL,
  request_count INT DEFAULT 0,
  total_tokens INT DEFAULT 0,
  total_cost DECIMAL(10,6) DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(organization_id, usage_date, endpoint)
);

-- RLS
ALTER TABLE export_ai_usage_daily ENABLE ROW LEVEL SECURITY;

-- Index
CREATE INDEX idx_usage_daily_org_date
  ON export_ai_usage_daily(organization_id, usage_date DESC);
This table is for monitoring. incrementDailyUsage() is called on every AI job completion. The admin dashboard reads this table.

Architecture

Rate Limiter (lib/api/rateLimit.ts)

IRateLimiter interface
  ├── check(key: string, limit: number, windowSec: number): IRateLimitResult
  └── IRateLimitResult: \{ allowed: boolean, remaining: number, resetAt: Date \}

MemoryRateLimiter implements IRateLimiter  ← Phase 1 (now)
RedisRateLimiter implements IRateLimiter   ← Phase 2 (after Hetzner)

Cost Tracker (lib/api/costs.ts + lib/db/track-ai-job.ts update)

estimateCost(provider, model, inputTokens, outputTokens): number
  → provider + model based price table
  → result in $

withAiJobTracking() update:
  → call estimateCost() inside finishAiJob
  → save estimated_cost + endpoint
  → call incrementDailyUsage()

Middleware (middleware.ts)

Request → path match (discover, headhunt, ai/*, score, scraper/*)
  → extract IP (x-forwarded-for or request.ip)
  → rateLimiter.check(ip, routeLimit, windowSec)
  → Allowed: add X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset headers, continue
  → Blocked: 429 \{ success: false, error: "Rate limit exceeded", retryAfter: N \}

Current Code Impact

Files to Change

FileChange
lib/db/track-ai-job.tsAdd estimated_cost + endpoint parameter, call incrementDailyUsage()
app/api/ai/complete/route.tsAdd withAiJobTracking() — CURRENTLY UNTRACKED
app/api/ai/classify/route.tsAdd withAiJobTracking() — CURRENTLY UNTRACKED
app/api/score/route.tsCheck if it contains AI calls, add tracking if so

New Files

FileContent
lib/api/rateLimit.tsIRateLimiter + MemoryRateLimiter
lib/api/costs.tsProvider/model price table + estimateCost()
lib/db/usage-daily.tsincrementDailyUsage() — called on AI job completion
middleware.tsRate limit middleware (for costly routes)
app/api/admin/usage/route.tsGET — daily/weekly/monthly cost report

Future Decisions (not now, but not forgotten)

FD-1: Kong Rate Limiting (After Hetzner Migration)

Kong gateway is already part of the Supabase stack on Hetzner. Rate limiting can be handled via Kong, moved from middleware to Kong. Change is minimal thanks to the IRateLimiter interface.

FD-2: Redis-Backed Limiter (After Hetzner Migration)

Coolify’s internal Redis or a separate Redis container. For distributed rate limiting (multiple Next.js instances). Not needed on a single VPS but ready for horizontal scaling.

FD-3: Email Alert (Ring 2+)

Send daily cost report via email. Resend or Supabase Edge Function + SMTP.

FD-4: Anomaly Detection (Ring 3+)

Learn normal usage patterns, auto-alert on deviations. For example “today’s costs are 10x yesterday’s” → instant notification.

Atomic Tasks

#TaskSize
COST-1lib/api/rateLimit.ts — IRateLimiter interface + MemoryRateLimiterMedium
COST-2middleware.ts — Rate limit middleware, route config, 429 response + headersMedium
COST-3lib/api/costs.ts — Provider/model price table + estimateCost()Small
COST-4DB migration: ai_job_runs + 2 columns, usage_daily tableMigration
COST-5Update track-ai-job.ts — estimated_cost + endpoint + incrementDailyUsageMedium
COST-6/api/ai/complete + /api/ai/classify → add withAiJobTracking()Small
COST-7lib/db/usage-daily.ts — incrementDailyUsage()Small
COST-8GET /api/admin/usage — cost report endpointMedium
COST-9Admin dashboard UI — cost chart + alert bannerMedium (with Ring 2)
COST-10In-app popup — toast notification on threshold breachSmall