Documentation Index
Fetch the complete documentation index at: https://cernio.gadulabs.com/llms.txt
Use this file to discover all available pages before exploring further.
Cernio — System Architecture
This document explains how the system works, the relationships between components, and data flow.
Last updated: 2026-03-25
Handbook References:
docs/handbook/03-system-architecture.md — 5-layer architecture, API design, multi-tenant, caching (Ch. 27-50)
docs/handbook/11-infrastructure-devops.md — Docker, worker, deployment, monitoring (Ch. 189-210)
docs/handbook/13-discovery-code-architecture.md — Orchestrator pattern, API route, scoring (Ch. 226-252)
docs/handbook/09-database-schema.md — 22 table details, indexing, RLS (Ch. 136-168)
1. Overview
Cernio is a B2B SaaS product that discovers, classifies, enriches, and scores export distributors. DOSE Chemicals = Customer #1 (super admin + first tenant). It consists of two sub-projects:
| Sub-Project | Technology | Execution Model | Purpose |
|---|
| Root | Next.js 16 (TypeScript, React 19) | Web app (browser) | UI + API routes |
scripts/ | Node.js + Python | Terminal (node script.js) | Batch operations |
C# analogy: Root Next.js app = ASP.NET MVC web app, scripts/ = Console App batch jobs. Both write to the same database and use the same AI service.
2. Component Map
+-----------------------------------------------------------+
| USER (Browser) |
| Companies | Contacts | Discovery | Leads | Scraper | Admin |
+---------------------------+-------------------------------+
| HTTP
+---------------------------v-------------------------------+
| Next.js 16 App (root) |
| |
| app/ |
| +-- companies/ -> Company list, filter, detail |
| +-- contacts/ -> Contact directory |
| +-- discovery/ -> AI company discovery |
| +-- leads/ -> Lead pipeline + detail |
| +-- scraper/ -> File upload -> review -> DB push |
| +-- admin/segments/ -> Segment CRUD management |
| +-- api/ |
| +-- ai/ |
| | +-- complete/ -> Single AI prompt call |
| | +-- classify/ -> Batch classification |
| | +-- providers/ -> Available provider list |
| +-- discover/ -> Company discovery + DB save |
| +-- headhunt/ -> Contact finding + DB save |
| +-- score/ -> FitScore v2 batch scoring |
| +-- leads/ -> Lead CRUD + [id] PATCH |
| +-- interactions/ -> Communication log |
| +-- tasks/ -> Task CRUD + [id] PATCH |
| +-- feedback/ -> Search result feedback |
| +-- segments/ -> Segment CRUD |
| +-- prompts/ -> Prompt builder access |
| +-- scraper/ |
| +-- run/ -> Trigger Python scraper |
| +-- push/ -> Write approved records to DB |
| |
| lib/ |
| +-- ai/ -> Central AI Client |
| | +-- types.ts -> Interfaces, AIProviderName |
| | +-- errors.ts -> Custom error classes |
| | +-- parseJson.ts -> Parse JSON from AI output |
| | +-- client.ts -> complete(), classifyBatch() |
| | +-- providers/ -> Gemini, Anthropic, Perplexity, OpenAI |
| +-- db/ -> DB operation layer |
| | +-- save-companies.ts -> Company upsert + domain update |
| | +-- save-contacts.ts -> Contact upsert (single + batch) |
| | +-- save-scores.ts -> FitScore breakdown save |
| | +-- save-search.ts -> Search history cache |
| | +-- save-search-queries.ts -> Sub-query save |
| | +-- save-search-results.ts -> Result save |
| | +-- track-ai-job.ts -> AI job observability |
| +-- discovery/ -> Discovery orchestrator |
| | +-- types.ts -> IDiscoveryInput/Result |
| | +-- run-discovery.ts -> 9-step pipeline |
| +-- scoring/ -> FitScore v2 |
| | +-- fitScore.ts -> 6-factor score calculation |
| +-- prompts.ts -> Central prompt builders |
| +-- types/ -> ICompany, IContact, ILead etc. |
| +-- constants.ts -> SEGMENT_MAP, SEGMENT_COLOR |
| +-- api/validateApiKey.ts -> Internal key validation |
+---------------------------+-------------------------------+
| Supabase Client
+---------------------------v-------------------------------+
| Supabase (PostgreSQL) - 18 tables |
| |
| export_ai_organizations -> SaaS tenants |
| export_ai_profiles -> Users |
| export_ai_companies -> Company records (1591) |
| export_ai_contacts -> Contact records (1522) |
| export_ai_leads -> Lead pipeline |
| export_ai_lead_contacts -> Lead-contact junction |
| export_ai_interactions -> Communication history |
| export_ai_tasks -> Follow-up tasks |
| export_ai_products -> Product portfolio |
| export_ai_quotes -> Quotes |
| export_ai_quote_items -> Quote line items |
| export_ai_segments -> Segment definitions (DB-driven) |
| export_ai_company_scores -> FitScore v2 breakdown |
| export_ai_search_history -> API cache |
| export_ai_search_results -> Discovery results |
| export_ai_search_queries -> Discovery sub-queries |
| export_ai_search_feedback-> Result feedback |
| export_ai_ai_job_runs -> AI job observability |
+---------------------------^-------------------------------+
| HTTP POST -> /api/ai/*
+---------------------------+-------------------------------+
| scripts/ (Node.js + Python CLI) |
| |
| processor.js -> Read companies from CSV -> AI discovery + analysis |
| finder.js -> Batch contact finding for DB companies |
| bulk_fixer.js -> Enrich missing company information |
| cleaner.js -> Segment audit + IRRELEVANT cleanup |
| scraper/ -> PDF/Excel/Word -> parse -> normalize |
| |
| lib/api.js -> callAI() + sleep() shared helpers |
+-----------------------------------------------------------+
3. scripts/ Batch Scripts — What They Do, How They Work
3.1 processor.js — Batch Company Processing from CSV
Input: egyptcompany.csv (or another CSV) — list of company domains.
What it does:
- Reads the CSV file
- For each domain, sends an “analyze this company + find competitors” prompt to AI (
/api/ai/complete, webSearch: true)
- Writes the company information returned by AI to the
export_ai_companies table in Supabase (upsert)
Usage: node processor.js (from terminal)
UI equivalent: Discovery page (one at a time). Batch discovery button from Operations page (planned).
Input: Companies in the DB (those without contacts found yet).
What it does:
- Pulls companies from the
export_ai_companies table where last_headhunt_at IS NULL
- For each company, sends a “find decision makers for this company” prompt to AI (
/api/ai/complete, webSearch: true)
- Writes found contacts to the
export_ai_contacts table
- Updates the company’s
last_headhunt_at field
Usage: node finder.js (from terminal)
UI equivalent: “Headhunt” button on the Companies page (one at a time). Batch headhunt from Operations page (planned).
3.3 bulk_fixer.js — Batch Company Enrichment
Input: Companies in the DB (those with missing information — no website, no description, etc.).
What it does:
- Pulls companies with incomplete data from the
export_ai_companies table
- For each company, sends an “enrich this company’s information” prompt to AI (
/api/ai/complete, webSearch: true)
- Updates the company record with returned information (website, description, segment, city, etc.)
Usage: node bulk_fixer.js (from terminal)
UI equivalent: None (planned — Operations page).
3.4 cleaner.js — Segment Audit & Cleanup
Input: All companies in the DB.
What it does:
- Pulls all companies from the
export_ai_companies table (organization-scoped)
- Sends companies to AI in batches of 25 (
/api/ai/classify)
- AI assigns each company a segment: S1 (Machinery), S2 (Chemicals), S3 (Spare Parts), or IRRELEVANT
- Reports those marked as IRRELEVANT (deletion decision is up to the user)
- Writes segment changes to the DB
Usage: node cleaner.js (from terminal)
UI equivalent: None (planned — Operations page).
3.5 scraper/ (Python) — File Parse & Normalize
Input: PDF, Excel, or Word files (trade fair catalogs, distributor lists, etc.).
What it does:
scrape.py runs from CLI, takes a file path
parsers/ — Selects the appropriate parser by file type (pdf, excel, word)
core/detector.py — Extracts company information from raw lines
core/keyword_filter.py — Filters by keywords defined in config.yaml
core/llm_normalizer.py — Normalizes ambiguous lines via AI (/api/ai/classify)
core/exporter.py — Outputs results as JSON
UI equivalent: Scraper page — upload file → approve on review screen → “Push to DB” to write to Supabase. This flow is already available in the UI.
4. AI Client Architecture
All AI calls go through a central client. Direct SDK calls are prohibited.
+----------------------+
| lib/ai/client.ts |
| complete() |
| classifyBatch() |
+------+---------------+
| resolveProvider()
+------------+----------------+
v v v v
+--------+ +----------+ +-----------+ +--------+
| Gemini | |Anthropic | |Perplexity | | OpenAI |
| 2.5 | | Claude | | Sonar | | GPT-4o |
| Flash | | Sonnet | | Pro | | |
+--------+ +----------+ +-----------+ +--------+
webSearch webSearch webSearch webSearch
YES NO YES NO
Provider selection priority (top to bottom):
- Explicit
provider parameter in body (from UI dropdown or script)
AI_PROVIDER env variable
- Default:
gemini
Tasks requiring webSearch: Discovery, Headhunt, bulk_fixer → uses Perplexity or Gemini grounding.
Batch classification: Groups of 25, 2-second wait between groups, provider rate limit protection.
5. Data Flow Diagram
FILE (PDF/Excel) --> Scraper --> Review UI --> DB push
|
CSV file --> processor.js --> AI discovery --> DB upsert
|
Keyword + Country --> Discovery UI --> AI discovery --> DB upsert
|
v
+-------------+
| companies |
| table |
+------+------+
|
+---------------+---------------+
v v v
+----------+ +-----------+ +----------+
| cleaner | | bulk_fixer| | finder/ |
| (audit) | | (enrich) | | headhunt |
+----+-----+ +-----+-----+ +----+-----+
| | |
v v v
Segment update Info update contacts table
6. Environment Variables
.env.local
NEXT_PUBLIC_SUPABASE_URL=... # Supabase project URL
NEXT_PUBLIC_SUPABASE_ANON_KEY=... # Supabase anon key
GOOGLE_API_KEY=... # Gemini API key
PPLX_API_KEY=... # Perplexity API key
OPENAI_API_KEY=... # OpenAI API key (optional)
ANTHROPIC_API_KEY=... # Anthropic API key (optional)
AI_PROVIDER=gemini # Default AI provider
AI_BATCH_SIZE=25 # Batch classification group size
AI_INTERNAL_API_KEY=... # Password for scripts/ batch jobs to access the API
scripts/.env
SUPABASE_URL=... # Supabase project URL
SUPABASE_KEY=... # Supabase service key
NEXTJS_API_URL=http://localhost:3000 # Next.js API base URL
AI_INTERNAL_API_KEY=... # Must be SAME as in .env.local
AI_PROVIDER=gemini # Default provider on this side
What is AI_INTERNAL_API_KEY? A password you generate yourself. scripts/ batch jobs send HTTP requests to Next.js API routes — this key prevents random external callers from hitting those routes. Both sides (.env.local + scripts/.env) must have the same value.
7. Segment System
Handbook: docs/handbook/02-ai-discovery-pipeline.md Ch. 17 (segment definitions — abstract in Founder Bible)
Handbook: docs/handbook/09-database-schema.md Ch. 151 (company_segments table)
Companies are divided into 4 segments:
| Code | Label | Description |
|---|
| S1 | Machinery | Sewing, cutting, embroidery, ironing/pressing machines |
| S2 | Chemicals | Dyes, washing agents, adhesives |
| S3 | Spare Parts | Needles, bobbins, folders, accessories |
| IRRELEVANT | Irrelevant | To be deleted — non-textile companies |
Current state: Segment definitions have been moved to the export_ai_segments table (SEG-A/B/C/D completed). Managed from the admin panel (/admin/segments). Automatically injected into AI prompts (lib/prompts.ts + fetchSegmentRules()).
Founder Bible note: Segments are kept abstract (each org defines its own). In the SaaS transition, each org will define its own segments.
7.1 Company Type (Customer Type — Phase 1.5)
While Segment defines “what the company sells,” Company Type defines “what kind of player it is”:
| Code | Description | DOSE Chemicals | DOSE Home Textiles |
|---|
distributor | Wholesale distributor, importer | Primary target | Secondary |
reseller | Retailer, dealer | Secondary | Important |
end_user | End user (factory, workshop) | Indirect | Primary target |
manufacturer | Producer | Competitor/partner | Competitor/partner |
unknown | Not yet classified | Default | Default |
Current state: Company type is stored in the export_ai_companies.company_type column (CTYPE-A completed). Discovery, enrichment, and cleaner prompts perform company_type classification (CTYPE-B/C completed). distributor weight is active in FitScore v2 (SCORE-A). Color-coded badge display on CompanyCard is live (CTYPE-D/E completed).
8. Multi-Tenant Architecture
Handbook: docs/handbook/03-system-architecture.md Ch. 39 (multi-tenant model)
Handbook: docs/handbook/09-database-schema.md Ch. 166 (RLS strategy)
Strategy: docs/strategy/01-auth-multi-tenant.md (R1-1 — 14 atomic tasks)
Currently single organization (DOSE Chemicals): cd3c0336-da2b-4cb6-a35d-ad43563b87f2
Every DB query must include an organization_id filter. Currently hardcoded; after R1-1 (Auth & Multi-Tenant) is completed, it will be sourced from session.activeOrgId.
Changes coming with R1-1:
- Supabase Auth (email + Google + Apple OAuth)
export_ai_organization_members table (owner/admin/member/viewer roles)
- JWT-based RLS policies (
temp_dev_anon_access will be removed)
lib/auth/session.ts — getSession(), requireAuth(), requireRole()
- All hardcoded
ORGANIZATION_ID → sourced from session
In the SaaS model, each organization:
- Sees only its own companies
- Defines its own segments
- Selects its own target company_types (distributor, end_user, etc.)
- Uses its own API keys
DB Schema Status
Current: 18 tables (RLS active). After Ring 1 completion, ~28 tables.
| Status | Tables |
|---|
| Current (18) | organizations, profiles, companies, contacts, leads, lead_contacts, interactions, tasks, products, quotes, quote_items, segments, company_scores, search_history, search_results, search_queries, search_feedback, ai_job_runs |
| R1-1 Auth | organization_members (profiles deprecated) |
| R1-2 Cost | usage_daily (ai_job_runs update) |
| R1-4 Billing | plan_limits, credit_wallets, credit_transactions, usage_monthly, subscriptions |
| R1-6 Monitor | activity_log |
| R2-1 Pipeline | market_context |
| R2-3 Dashboard | api_metrics |
All tables use the export_ai_ prefix. Details: docs/handbook/09-database-schema.md.