02 — API Cost Control & Rate Limiting
Ring: 1 (Launch Blocker)
Dependency: None (can work independently from auth, upgrades to user-based after auth)
Handbook: Ch. 47 (observability), Ch. 204 (error handling, API rate limits), Ch. 207 (security — rate limiting)
Related: Hetzner migration plan (docs/claude-hetzner-vps-plan.md) — Kong gateway + Redis
Problem
- AI API calls cost money (Gemini, Perplexity, OpenAI, Anthropic).
- No rate limiting exists.
while(true) \{ fetch("/api/discover") \}→ risk of billing explosion. withAiJobTracking()is active on only 2/6+ routes → most AI calls are untracked.- We don’t know how much we’re spending. Can’t see it until the invoice arrives.
- Environment exists where one user can rack up another user’s costs (pre multi-tenant).
Decisions
D1: 3-Layer Architecture
Layers 1 and 2 are built now. Layer 3 comes after auth + billing.
D2: Rate Limit — 2-Phase Strategy
| Phase | Period | Approach | Rationale |
|---|---|---|---|
| Phase 1 | Dev environment (local / pre-deploy) | In-memory sliding window | Fast, no dependencies. Sufficient before deploy. |
| Phase 2 | Production (Hetzner — deploy day) | Kong rate limiting + Redis | Kong gateway and Redis already exist in the Hetzner stack. Upgrade with a single file change. |
IMPORTANT DECISION: This project will NOT be DEPLOYED to Vercel. The day it goes live, the Hetzner migration plan is pulled forward. Vercel can only be used for dev/preview. Production = Hetzner. Therefore, the in-memory limiter is only for the development environment; Kong/Redis will be active in production.
Interface will be kept clean:IRateLimiterabstraction → Phase 1MemoryRateLimiter, Phase 2KongRateLimiterorRedisRateLimiter.
D3: Cost Visibility
| Who | What They See |
|---|---|
super_admin | Entire platform: total cost, per-org breakdown, per-route distribution |
Org owner | Their org’s cost: daily/weekly/monthly, per-route |
Org member | Cost of their own triggered operations (in the pre-operation confirmation dialog) |
D4: Alert System
| Level | Mechanism | When |
|---|---|---|
| Dashboard | Visual alert in admin panel (red banner) | When daily cost exceeds threshold |
| In-app popup | Toast/notification — “Today’s cost exceeded $X” | Real-time, owner + super_admin |
| Daily cost report | Later (Ring 2+), not now |
D5: Rate Limit Configuration
| Route Group | Limit | Window | Rationale |
|---|---|---|---|
POST /api/discover | 10 req | 60 sec | Most expensive — web search + AI |
POST /api/headhunt | 10 req | 60 sec | Web search + AI |
POST /api/ai/complete | 20 req | 60 sec | Batch scripts may call frequently |
POST /api/ai/classify | 10 req | 60 sec | Each handles 25 companies → 10 req = 250 companies |
POST /api/score | 10 req | 60 sec | Batch rescore |
POST /api/scraper/run | 3 req | 60 sec | Heavy operation, subprocess |
GET /api/companies etc. | No limit | — | DB query, no AI cost |
After auth, these limits will be user-based + plan-based (Free: 3 discover/day, Pro: 50/day).
Cost Estimation Model
Provider Token Prices (lib/api/costs.ts)
| Provider | Model | Input ($/1M token) | Output ($/1M token) |
|---|---|---|---|
| Gemini | 2.5 Flash | $0.15 | $0.60 |
| Anthropic | Claude Sonnet | $3.00 | $15.00 |
| Perplexity | Sonar Pro | $1.00 | $1.00 |
| OpenAI | GPT-4o | $2.50 | $10.00 |
These prices are updated periodically. Kept in a single file costs.ts.
Estimated Cost per Operation
| Operation | Estimated Tokens | Estimated Cost (Gemini) |
|---|---|---|
| 1 Discovery (single country) | ~2K input + ~1K output | ~$0.0009 |
| 1 Headhunt | ~1.5K input + ~0.5K output | ~$0.0005 |
| 1 Classify batch (25 companies) | ~3K input + ~1K output | ~$0.001 |
| 1 Enrichment | ~2K input + ~1K output | ~$0.0009 |
Actual values will be calculated from the ai_job_runs table. The above are initial estimates.
Data Model
Existing Table Update
New Table: Daily Aggregation
This table is for monitoring. incrementDailyUsage() is called on every AI job completion. The admin dashboard reads this table.
Architecture
Rate Limiter (lib/api/rateLimit.ts)
Cost Tracker (lib/api/costs.ts + lib/db/track-ai-job.ts update)
Middleware (middleware.ts)
Current Code Impact
Files to Change
| File | Change |
|---|---|
lib/db/track-ai-job.ts | Add estimated_cost + endpoint parameter, call incrementDailyUsage() |
app/api/ai/complete/route.ts | Add withAiJobTracking() — CURRENTLY UNTRACKED |
app/api/ai/classify/route.ts | Add withAiJobTracking() — CURRENTLY UNTRACKED |
app/api/score/route.ts | Check if it contains AI calls, add tracking if so |
New Files
| File | Content |
|---|---|
lib/api/rateLimit.ts | IRateLimiter + MemoryRateLimiter |
lib/api/costs.ts | Provider/model price table + estimateCost() |
lib/db/usage-daily.ts | incrementDailyUsage() — called on AI job completion |
middleware.ts | Rate limit middleware (for costly routes) |
app/api/admin/usage/route.ts | GET — daily/weekly/monthly cost report |
Future Decisions (not now, but not forgotten)
FD-1: Kong Rate Limiting (After Hetzner Migration)
Kong gateway is already part of the Supabase stack on Hetzner. Rate limiting can be handled via Kong, moved from middleware to Kong. Change is minimal thanks to the IRateLimiter interface.
FD-2: Redis-Backed Limiter (After Hetzner Migration)
Coolify’s internal Redis or a separate Redis container. For distributed rate limiting (multiple Next.js instances). Not needed on a single VPS but ready for horizontal scaling.
FD-3: Email Alert (Ring 2+)
Send daily cost report via email. Resend or Supabase Edge Function + SMTP.
FD-4: Anomaly Detection (Ring 3+)
Learn normal usage patterns, auto-alert on deviations. For example “today’s costs are 10x yesterday’s” → instant notification.
Atomic Tasks
| # | Task | Size |
|---|---|---|
| COST-1 | lib/api/rateLimit.ts — IRateLimiter interface + MemoryRateLimiter | Medium |
| COST-2 | middleware.ts — Rate limit middleware, route config, 429 response + headers | Medium |
| COST-3 | lib/api/costs.ts — Provider/model price table + estimateCost() | Small |
| COST-4 | DB migration: ai_job_runs + 2 columns, usage_daily table | Migration |
| COST-5 | Update track-ai-job.ts — estimated_cost + endpoint + incrementDailyUsage | Medium |
| COST-6 | /api/ai/complete + /api/ai/classify → add withAiJobTracking() | Small |
| COST-7 | lib/db/usage-daily.ts — incrementDailyUsage() | Small |
| COST-8 | GET /api/admin/usage — cost report endpoint | Medium |
| COST-9 | Admin dashboard UI — cost chart + alert banner | Medium (with Ring 2) |
| COST-10 | In-app popup — toast notification on threshold breach | Small |