Skip to main content

06 — Batch Operations UI

Ring: 2 (Retention) — usable for super_admin in Ring 1 as well Dependency: R1-1 (Auth), R1-2 (Cost Control) Handbook: Ch. 229 (pipeline architecture), batch scripts already running (CLI)

Problem

  • 4 batch scripts (cleaner, finder, bulk_fixer, processor) only run from CLI.
  • Requires terminal access — cannot be used from the web UI.
  • No progress tracking — wait until the script finishes.
  • Weak error handling — if the script crashes, unclear what happened.
  • File upload (scraper) has minimal security limits.

Decisions

D1: Access — 2-Phase Rollout

PhaseWho Uses ItReason
Phase 1 (Ring 1-2)super_admin onlyNot opened to users until security protocols are tested
Phase 2 (Ring 3+)Enterprise plan — with feature-based rolesRoles with can_run_batch permission. After security is proven.
Batch operations DISABLED on Free/Pro/Team plans (batch_operations: false in plan_limits).

D2: 4 Batch Operations → Web UI

ScriptAPI EndpointWhat It Does
cleaner.jsPOST /api/batch/cleanSegment audit + IRRELEVANT cleanup
finder.jsPOST /api/batch/headhuntBulk contact discovery
bulk_fixer.jsPOST /api/batch/enrichBulk company enrichment
processor.jsPOST /api/batch/discoverKeywords + countries → bulk discovery

D3: SSE Streaming Progress

Client                          Server
  |  POST /api/batch/clean        |
  |------------------------------>|
  |  SSE stream starts            |
  |<------------------------------|
  |  data: { progress: 5/120 }    |
  |<------------------------------|
  |  data: { progress: 6/120 }    |
  |<------------------------------|
  |  ...                          |
  |  data: { done: true,          |
  |    summary: { cleaned: 8,     |
  |    reclassified: 15 } }       |
  |<------------------------------|
  |  Stream closes                |
Implementation: ReadableStream + TextEncoder (Next.js App Router natively supports this).

D4: File Upload Security (Scraper)

Now (super_admin only):
RuleValueCurrent
Max file size50MB✅ Exists (FIX-10)
Timeout60s✅ Exists (FIX-10)
Allowed file types.pdf, .xlsx, .xls, .docx, .csv⚠️ CSV missing
Concurrent job limit1 (sufficient for super_admin)❌ Missing
Later (additional security when Enterprise opens):
RuleValueWhen
File type whitelist (MIME + extension)Strict checkPhase 2
Virus/malware scan (ClamAV)Every upload scannedPhase 2
Row limit per fileMax 10,000 recordsPhase 2
Concurrent job limit (per org)1 active jobPhase 2
Quarantine folderSuspicious files quarantinedPhase 2
In Phase 1 super_admin is the only user, so heavy security is unnecessary. Phase 2 security is implemented before Enterprise launch.

D5: Batch Job Tracking

Every batch job is recorded in the export_ai_ai_job_runs table:
  • job_type: ‘batch_clean’ | ‘batch_headhunt’ | ‘batch_enrich’ | ‘batch_discover’
  • status: ‘running’ → ‘done’ | ‘failed’
  • Progress updated from within the SSE stream
  • Cost: each AI call’s estimated_cost is summed

Architecture

Operations Page

/operations (super_admin only)
  ┌─────────────────────────────────────────────────────┐
  │  Batch Operations                                    │
  │                                                      │
  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │
  │  │ Clean    │ │ Discover │ │ Enrich   │ │ Hunt   │ │
  │  │          │ │          │ │          │ │        │ │
  │  └──────────┘ └──────────┘ └──────────┘ └────────┘ │
  │                                                      │
  │  [Selected operation parameter form]                 │
  │                                                      │
  │  ┌─ Progress ──────────────────────────────────────┐ │
  │  │ ████████████░░░░░░░░ 62/120 companies processed │ │
  │  │ Succeeded: 55 | Errors: 3 | Skipped: 4         │ │
  │  │ Estimated cost: $0.058                          │ │
  │  │ Elapsed: 2:34                                   │ │
  │  └─────────────────────────────────────────────────┘ │
  │                                                      │
  │  [Stop]                                              │
  └─────────────────────────────────────────────────────┘

API Route Structure

// app/api/batch/clean/route.ts
export async function POST(request: NextRequest): Promise<Response> {
  // 1. Auth: super_admin check
  // 2. Concurrent job check (is one already running?)
  // 3. Create ReadableStream
  // 4. For each company: process → send SSE event
  // 5. On completion: summary event + close stream
  // 6. Save to ai_job_runs
}

Current Code Impact

Status of Existing Scripts

scripts/cleaner.js   → Logic to be moved to /api/batch/clean
scripts/finder.js    → Logic to be moved to /api/batch/headhunt
scripts/bulk_fixer.js → Logic to be moved to /api/batch/enrich
scripts/processor.js → Logic to be moved to /api/batch/discover
scripts/lib/api.js   → Shared helpers (sleep, callAI)
CLI scripts WILL NOT be removed. API routes will be built with the scripts’ logic, but CLI versions are kept for dev/debug. Scripts may be deprecated later.

New Files

FileContent
app/api/batch/clean/route.tsSSE streaming batch cleanup
app/api/batch/discover/route.tsSSE streaming bulk discovery
app/api/batch/enrich/route.tsSSE streaming bulk enrichment
app/api/batch/headhunt/route.tsSSE streaming bulk contact discovery
lib/batch/stream-helpers.tsSSE encoder, progress event builder
lib/batch/job-guard.tsConcurrent job check (1 per org)
app/operations/page.tsxOperations UI (4 cards + form + progress)
app/operations/components/BatchProgressPanel.tsxSSE consumer + progress bar

Scraper Change

FileChange
app/api/scraper/run/route.tsCSV file type support to be added

Future Decisions

FD-1: Batch Queue (Ring 4)

Background job queue instead of SSE (BullMQ/Redis). Job continues even if the user closes the page. Runs on a Hetzner worker VPS.

FD-2: Scheduled Batches (Ring 3+)

Scheduled batch operations — “run cleaner every Monday”. Cron job UI.

FD-3: Batch History & Replay (Ring 3)

List of past batch operations, results, re-run capability.

Atomic Tasks

#TaskRingSize
BATCH-1lib/batch/stream-helpers.ts — SSE encoder + progress builderR2Small
BATCH-2lib/batch/job-guard.ts — concurrent job checkR2Small
BATCH-3POST /api/batch/clean — cleaner.js logic + SSER2Large
BATCH-4POST /api/batch/discover — processor.js logic + SSER2Large
BATCH-5POST /api/batch/enrich — bulk_fixer.js logic + SSER2Large
BATCH-6POST /api/batch/headhunt — finder.js logic + SSER2Large
BATCH-7app/operations/page.tsx — 4 cards + parameter formR2Medium
BATCH-8BatchProgressPanel.tsx — SSE consumer + progress bar + cost displayR2Medium
BATCH-9Add CSV support to scraperR2Small
BATCH-10Phase 2 security (ClamAV + MIME check + row limit) — before EnterpriseR3Medium