Documentation Index
Fetch the complete documentation index at: https://cernio.gadulabs.com/llms.txt
Use this file to discover all available pages before exploring further.
06 — Batch Operations UI
Ring: 2 (Retention) — usable for super_admin in Ring 1 as well
Dependency: R1-1 (Auth), R1-2 (Cost Control)
Handbook: Ch. 229 (pipeline architecture), batch scripts already running (CLI)
Problem
- 4 batch scripts (cleaner, finder, bulk_fixer, processor) only run from CLI.
- Requires terminal access — cannot be used from the web UI.
- No progress tracking — wait until the script finishes.
- Weak error handling — if the script crashes, unclear what happened.
- File upload (scraper) has minimal security limits.
Decisions
D1: Access — 2-Phase Rollout
| Phase | Who Uses It | Reason |
|---|
| Phase 1 (Ring 1-2) | super_admin only | Not opened to users until security protocols are tested |
| Phase 2 (Ring 3+) | Enterprise plan — with feature-based roles | Roles with can_run_batch permission. After security is proven. |
Batch operations DISABLED on Free/Pro/Team plans (batch_operations: false in plan_limits).
D2: 4 Batch Operations → Web UI
| Script | API Endpoint | What It Does |
|---|
cleaner.js | POST /api/batch/clean | Segment audit + IRRELEVANT cleanup |
finder.js | POST /api/batch/headhunt | Bulk contact discovery |
bulk_fixer.js | POST /api/batch/enrich | Bulk company enrichment |
processor.js | POST /api/batch/discover | Keywords + countries → bulk discovery |
D3: SSE Streaming Progress
Client Server
| POST /api/batch/clean |
|------------------------------>|
| SSE stream starts |
|<------------------------------|
| data: { progress: 5/120 } |
|<------------------------------|
| data: { progress: 6/120 } |
|<------------------------------|
| ... |
| data: { done: true, |
| summary: { cleaned: 8, |
| reclassified: 15 } } |
|<------------------------------|
| Stream closes |
Implementation: ReadableStream + TextEncoder (Next.js App Router natively supports this).
D4: File Upload Security (Scraper)
Now (super_admin only):
| Rule | Value | Current |
|---|
| Max file size | 50MB | ✅ Exists (FIX-10) |
| Timeout | 60s | ✅ Exists (FIX-10) |
| Allowed file types | .pdf, .xlsx, .xls, .docx, .csv | ⚠️ CSV missing |
| Concurrent job limit | 1 (sufficient for super_admin) | ❌ Missing |
Later (additional security when Enterprise opens):
| Rule | Value | When |
|---|
| File type whitelist (MIME + extension) | Strict check | Phase 2 |
| Virus/malware scan (ClamAV) | Every upload scanned | Phase 2 |
| Row limit per file | Max 10,000 records | Phase 2 |
| Concurrent job limit (per org) | 1 active job | Phase 2 |
| Quarantine folder | Suspicious files quarantined | Phase 2 |
In Phase 1 super_admin is the only user, so heavy security is unnecessary. Phase 2 security is implemented before Enterprise launch.
D5: Batch Job Tracking
Every batch job is recorded in the export_ai_ai_job_runs table:
job_type: ‘batch_clean’ | ‘batch_headhunt’ | ‘batch_enrich’ | ‘batch_discover’
status: ‘running’ → ‘done’ | ‘failed’
- Progress updated from within the SSE stream
- Cost: each AI call’s estimated_cost is summed
Architecture
Operations Page
/operations (super_admin only)
┌─────────────────────────────────────────────────────┐
│ Batch Operations │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │
│ │ Clean │ │ Discover │ │ Enrich │ │ Hunt │ │
│ │ │ │ │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └────────┘ │
│ │
│ [Selected operation parameter form] │
│ │
│ ┌─ Progress ──────────────────────────────────────┐ │
│ │ ████████████░░░░░░░░ 62/120 companies processed │ │
│ │ Succeeded: 55 | Errors: 3 | Skipped: 4 │ │
│ │ Estimated cost: $0.058 │ │
│ │ Elapsed: 2:34 │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ [Stop] │
└─────────────────────────────────────────────────────┘
API Route Structure
// app/api/batch/clean/route.ts
export async function POST(request: NextRequest): Promise<Response> {
// 1. Auth: super_admin check
// 2. Concurrent job check (is one already running?)
// 3. Create ReadableStream
// 4. For each company: process → send SSE event
// 5. On completion: summary event + close stream
// 6. Save to ai_job_runs
}
Current Code Impact
Status of Existing Scripts
scripts/cleaner.js → Logic to be moved to /api/batch/clean
scripts/finder.js → Logic to be moved to /api/batch/headhunt
scripts/bulk_fixer.js → Logic to be moved to /api/batch/enrich
scripts/processor.js → Logic to be moved to /api/batch/discover
scripts/lib/api.js → Shared helpers (sleep, callAI)
CLI scripts WILL NOT be removed. API routes will be built with the scripts’ logic, but CLI versions are kept for dev/debug. Scripts may be deprecated later.
New Files
| File | Content |
|---|
app/api/batch/clean/route.ts | SSE streaming batch cleanup |
app/api/batch/discover/route.ts | SSE streaming bulk discovery |
app/api/batch/enrich/route.ts | SSE streaming bulk enrichment |
app/api/batch/headhunt/route.ts | SSE streaming bulk contact discovery |
lib/batch/stream-helpers.ts | SSE encoder, progress event builder |
lib/batch/job-guard.ts | Concurrent job check (1 per org) |
app/operations/page.tsx | Operations UI (4 cards + form + progress) |
app/operations/components/BatchProgressPanel.tsx | SSE consumer + progress bar |
Scraper Change
| File | Change |
|---|
app/api/scraper/run/route.ts | CSV file type support to be added |
Future Decisions
FD-1: Batch Queue (Ring 4)
Background job queue instead of SSE (BullMQ/Redis). Job continues even if the user closes the page. Runs on a Hetzner worker VPS.
FD-2: Scheduled Batches (Ring 3+)
Scheduled batch operations — “run cleaner every Monday”. Cron job UI.
FD-3: Batch History & Replay (Ring 3)
List of past batch operations, results, re-run capability.
Atomic Tasks
| # | Task | Ring | Size |
|---|
| BATCH-1 | lib/batch/stream-helpers.ts — SSE encoder + progress builder | R2 | Small |
| BATCH-2 | lib/batch/job-guard.ts — concurrent job check | R2 | Small |
| BATCH-3 | POST /api/batch/clean — cleaner.js logic + SSE | R2 | Large |
| BATCH-4 | POST /api/batch/discover — processor.js logic + SSE | R2 | Large |
| BATCH-5 | POST /api/batch/enrich — bulk_fixer.js logic + SSE | R2 | Large |
| BATCH-6 | POST /api/batch/headhunt — finder.js logic + SSE | R2 | Large |
| BATCH-7 | app/operations/page.tsx — 4 cards + parameter form | R2 | Medium |
| BATCH-8 | BatchProgressPanel.tsx — SSE consumer + progress bar + cost display | R2 | Medium |
| BATCH-9 | Add CSV support to scraper | R2 | Small |
| BATCH-10 | Phase 2 security (ClamAV + MIME check + row limit) — before Enterprise | R3 | Medium |