03 — AI Pipeline Quality Improvements
Ring: 2 (Retention) — but foundational pieces are built in Ring 1 Dependency: R1-2 (Cost Control — extending the pipeline without tracking is dangerous) Handbook: Ch. 11-26 (pipeline), Ch. 60-64 (market intelligence), Ch. 170-188 (prompt architecture)
Problem
- Current pipeline has 3 steps: AI Call → JSON Parse → DB Upsert.
- Handbook defines a 10-stage pipeline; we target a 12-stage advanced version.
- No product analysis — user’s text goes directly to AI.
- Single-language search — queries are not generated in the target country’s language.
- No market context — blind search with no country knowledge.
- Feedback buttons collect data but DO NOT AFFECT ranking (dead feature).
- Dedup is domain-based only — no fuzzy name matching.
- Contact discovery is a separate operation — not part of the pipeline.
Decisions
D1: 5-Phase, 12-Stage Pipeline
D2: AI Call Grouping (Cost Optimization)
Each of the 12 stages will NOT be a separate AI call. Logical groupings:D3: Market Context — Hybrid Cache
D4: Feedback Loop — Prompt Injection
D5: Multi-Language Query Generation
D6: Contact Discovery — Configurable
Data Model
New Table: Market Context Cache
Existing Table Update
Current Code Impact
To Be Rewritten (Large)
| File | Reason |
|---|---|
lib/discovery/run-discovery.ts | 9-step orchestrator → 12-step pipeline. Core logic changes. |
lib/discovery/types.ts | New interfaces: IProductAnalysis, IMarketContext, IQuerySet, IPipelineResult |
lib/prompts.ts → buildDiscoveryPrompt() | 3 separate prompt builders: CALL 1, CALL 2 config, CALL 3 |
app/api/discover/route.ts | Delegates to pipeline, only handles request/response itself |
New Files
| File | Content |
|---|---|
lib/discovery/stages/product-analysis.ts | Phase 1: product analysis + market context |
lib/discovery/stages/search.ts | Phase 2: query generation + multi-source search |
lib/discovery/stages/enrich.ts | Phase 3: entity extraction + classification + enrichment |
lib/discovery/stages/evaluate.ts | Phase 4: dedup + confidence + FitScore + feedback (deterministic) |
lib/discovery/stages/activate.ts | Phase 5: contact discovery (optional) |
lib/discovery/pipeline.ts | Orchestrator: runs 5 phases sequentially, passes each phase’s output to the next |
lib/discovery/feedback.ts | Feedback lookup + prompt injection + FitScore adjustment |
lib/discovery/dedup.ts | Domain match + fuzzy name match + DB cross-ref |
lib/discovery/market-context.ts | Cache lookup + AI call + DB save |
lib/prompts/discovery-prompts.ts | 3-group prompt builder (CALL 1, 2, 3) |
To Change (Medium)
| File | Change |
|---|---|
lib/scoring/fitScore.ts | New factors: market context score, feedback score |
app/api/headhunt/route.ts | Can be called from Pipeline Phase 5 (auto-headhunt) |
lib/db/save-companies.ts | Extended upsert for enrichment data |
Pipeline Performance Targets
| Metric | Target | Handbook Ref |
|---|---|---|
| Total pipeline duration | <20 seconds (3-4 AI calls) | Ch. 25: <15s (we add +5s for market context) |
| CALL 1 (Product + Market + Query) | <4 seconds | — |
| CALL 2 (Search) | <8 seconds | — |
| CALL 3 (Extract + Classify + Enrich) | <6 seconds | — |
| CALL 4 (Contact, optional) | <5 seconds | — |
| Deterministic stages | <2 seconds total | — |
| Accuracy (relevant company ratio) | >70% | Ch. 24 |
| Noise (irrelevant company ratio) | <15% | Ch. 24 |
These targets will be tracked via the ai_job_runs table (integrated with 02-api-cost-control).
Handbook Alignment
| Handbook Item | Status |
|---|---|
| Ch. 11: 10-stage pipeline | ✅ 12 stages (2 additions: Market Context, Multi-source) |
| Ch. 17: Segment classification | ✅ Existing + confidence added |
| Ch. 18: Deduplication | ✅ Domain + fuzzy name + DB cross-ref |
| Ch. 19-20: FitScore | ✅ v2 + market context + feedback factors |
| Ch. 23: Feedback loop | ✅ Prompt injection + FitScore adjustment |
| Ch. 24: Accuracy framework | ✅ Targets defined, tracked via ai_job_runs |
| Ch. 25: Latency target | ✅ <20s (handbook <15s, +5s market context) |
| Ch. 60: Market Discovery | ⏳ Market context in Phase 1. Full discovery (country suggestion) in Ring 3 |
| Ch. 62: Country Playbooks | ⏳ Market context collects base data. Full playbooks in Ring 3 |
| Ch. 170-188: Prompt architecture | ✅ 3-group prompts, structured JSON, confidence, anti-hallucination |
Future Decisions (not now, but not forgotten)
FD-1: Adaptive Pipeline (Ring 3+)
Simple searches use a short pipeline (CALL 1+2+deterministic), complex searches use the full pipeline (4 calls). Decision: automatic based on product analysis output.
FD-2: ML-Based Feedback (Ring 4+)
ML model training instead of prompt injection. When feedback data accumulates (1000+ ratings), automatic ranking improvement.
FD-3: Full Market Discovery (Ring 3)
Ch. 60: Without specifying a country, suggest the best markets based on product alone. Market context table supports this.
FD-4: Country Playbooks (Ring 3)
Ch. 62: Structured country guides. Extended version of the market context table.
FD-5: Import Data Integration (Ring 3+)
Ch. 61: Real import statistics integration. Requires an external data source.
FD-6: Vector DB (Ring 4)
Embedding-based search for company similarity. Enhances dedup and “similar companies” features.
Atomic Tasks
| # | Task | Ring | Size |
|---|---|---|---|
| PIPE-1 | Extend lib/discovery/types.ts — IProductAnalysis, IMarketContext, IQuerySet, IPipelineConfig | R2 | Small |
| PIPE-2 | export_ai_market_context table + RLS + index | R2 | Migration |
| PIPE-3 | lib/discovery/market-context.ts — cache lookup + AI call + DB save | R2 | Medium |
| PIPE-4 | lib/prompts/discovery-prompts.ts — 3-group prompt builder | R2 | Large |
| PIPE-5 | lib/discovery/stages/product-analysis.ts — CALL 1 | R2 | Medium |
| PIPE-6 | lib/discovery/stages/search.ts — CALL 2 (multi-source + multi-lang) | R2 | Large |
| PIPE-7 | lib/discovery/stages/enrich.ts — CALL 3 (extract + classify + enrich) | R2 | Large |
| PIPE-8 | lib/discovery/dedup.ts — domain + fuzzy name + DB cross-ref | R2 | Medium |
| PIPE-9 | lib/discovery/stages/evaluate.ts — confidence + FitScore + ranking | R2 | Medium |
| PIPE-10 | lib/discovery/feedback.ts — feedback lookup + prompt inject + score | R2 | Medium |
| PIPE-11 | lib/discovery/stages/activate.ts — configurable auto-headhunt | R2 | Medium |
| PIPE-12 | lib/discovery/pipeline.ts — 5-phase orchestrator | R2 | Large |
| PIPE-13 | Update lib/scoring/fitScore.ts — market context + feedback factors | R2 | Medium |
| PIPE-14 | Update app/api/discover/route.ts — delegate to new pipeline | R2 | Small |
| PIPE-15 | Pipeline latency + accuracy tracking (ai_job_runs integration) | R2 | Medium |
| PIPE-16 | companies.lower(name) index + feedback index | R2 | Migration |