Cernio — System Architecture
This document explains how the system works, the relationships between components, and data flow. Last updated: 2026-03-25 Handbook References:
docs/handbook/03-system-architecture.md— 5-layer architecture, API design, multi-tenant, caching (Ch. 27-50)docs/handbook/11-infrastructure-devops.md— Docker, worker, deployment, monitoring (Ch. 189-210)docs/handbook/13-discovery-code-architecture.md— Orchestrator pattern, API route, scoring (Ch. 226-252)docs/handbook/09-database-schema.md— 22 table details, indexing, RLS (Ch. 136-168)
1. Overview
Cernio is a B2B SaaS product that discovers, classifies, enriches, and scores export distributors. DOSE Chemicals = Customer #1 (super admin + first tenant). It consists of two sub-projects:| Sub-Project | Technology | Execution Model | Purpose |
|---|---|---|---|
| Root | Next.js 16 (TypeScript, React 19) | Web app (browser) | UI + API routes |
scripts/ | Node.js + Python | Terminal (node script.js) | Batch operations |
scripts/ = Console App batch jobs. Both write to the same database and use the same AI service.
2. Component Map
3. scripts/ Batch Scripts — What They Do, How They Work
3.1 processor.js — Batch Company Processing from CSV
Input:egyptcompany.csv (or another CSV) — list of company domains.
What it does:
- Reads the CSV file
- For each domain, sends an “analyze this company + find competitors” prompt to AI (
/api/ai/complete, webSearch: true) - Writes the company information returned by AI to the
export_ai_companiestable in Supabase (upsert)
node processor.js (from terminal)
UI equivalent: Discovery page (one at a time). Batch discovery button from Operations page (planned).
3.2 finder.js — Batch Contact Finding (Headhunt)
Input: Companies in the DB (those without contacts found yet). What it does:- Pulls companies from the
export_ai_companiestable wherelast_headhunt_at IS NULL - For each company, sends a “find decision makers for this company” prompt to AI (
/api/ai/complete, webSearch: true) - Writes found contacts to the
export_ai_contactstable - Updates the company’s
last_headhunt_atfield
node finder.js (from terminal)
UI equivalent: “Headhunt” button on the Companies page (one at a time). Batch headhunt from Operations page (planned).
3.3 bulk_fixer.js — Batch Company Enrichment
Input: Companies in the DB (those with missing information — no website, no description, etc.). What it does:- Pulls companies with incomplete data from the
export_ai_companiestable - For each company, sends an “enrich this company’s information” prompt to AI (
/api/ai/complete, webSearch: true) - Updates the company record with returned information (website, description, segment, city, etc.)
node bulk_fixer.js (from terminal)
UI equivalent: None (planned — Operations page).
3.4 cleaner.js — Segment Audit & Cleanup
Input: All companies in the DB. What it does:- Pulls all companies from the
export_ai_companiestable (organization-scoped) - Sends companies to AI in batches of 25 (
/api/ai/classify) - AI assigns each company a segment: S1 (Machinery), S2 (Chemicals), S3 (Spare Parts), or IRRELEVANT
- Reports those marked as IRRELEVANT (deletion decision is up to the user)
- Writes segment changes to the DB
node cleaner.js (from terminal)
UI equivalent: None (planned — Operations page).
3.5 scraper/ (Python) — File Parse & Normalize
Input: PDF, Excel, or Word files (trade fair catalogs, distributor lists, etc.). What it does:scrape.pyruns from CLI, takes a file pathparsers/— Selects the appropriate parser by file type (pdf, excel, word)core/detector.py— Extracts company information from raw linescore/keyword_filter.py— Filters by keywords defined inconfig.yamlcore/llm_normalizer.py— Normalizes ambiguous lines via AI (/api/ai/classify)core/exporter.py— Outputs results as JSON
4. AI Client Architecture
All AI calls go through a central client. Direct SDK calls are prohibited.- Explicit
providerparameter in body (from UI dropdown or script) AI_PROVIDERenv variable- Default:
gemini
5. Data Flow Diagram
6. Environment Variables
.env.local
scripts/.env
7. Segment System
Handbook:Companies are divided into 4 segments:docs/handbook/02-ai-discovery-pipeline.mdCh. 17 (segment definitions — abstract in Founder Bible) Handbook:docs/handbook/09-database-schema.mdCh. 151 (company_segments table)
| Code | Label | Description |
|---|---|---|
| S1 | Machinery | Sewing, cutting, embroidery, ironing/pressing machines |
| S2 | Chemicals | Dyes, washing agents, adhesives |
| S3 | Spare Parts | Needles, bobbins, folders, accessories |
| IRRELEVANT | Irrelevant | To be deleted — non-textile companies |
export_ai_segments table (SEG-A/B/C/D completed). Managed from the admin panel (/admin/segments). Automatically injected into AI prompts (lib/prompts.ts + fetchSegmentRules()).
Founder Bible note: Segments are kept abstract (each org defines its own). In the SaaS transition, each org will define its own segments.
7.1 Company Type (Customer Type — Phase 1.5)
While Segment defines “what the company sells,” Company Type defines “what kind of player it is”:| Code | Description | DOSE Chemicals | DOSE Home Textiles |
|---|---|---|---|
distributor | Wholesale distributor, importer | Primary target | Secondary |
reseller | Retailer, dealer | Secondary | Important |
end_user | End user (factory, workshop) | Indirect | Primary target |
manufacturer | Producer | Competitor/partner | Competitor/partner |
unknown | Not yet classified | Default | Default |
export_ai_companies.company_type column (CTYPE-A completed). Discovery, enrichment, and cleaner prompts perform company_type classification (CTYPE-B/C completed). distributor weight is active in FitScore v2 (SCORE-A). Color-coded badge display on CompanyCard is live (CTYPE-D/E completed).
8. Multi-Tenant Architecture
Handbook:Currently single organization (DOSE Chemicals):docs/handbook/03-system-architecture.mdCh. 39 (multi-tenant model) Handbook:docs/handbook/09-database-schema.mdCh. 166 (RLS strategy) Strategy:docs/strategy/01-auth-multi-tenant.md(R1-1 — 14 atomic tasks)
cd3c0336-da2b-4cb6-a35d-ad43563b87f2
Every DB query must include an organization_id filter. Currently hardcoded; after R1-1 (Auth & Multi-Tenant) is completed, it will be sourced from session.activeOrgId.
Changes coming with R1-1:
- Supabase Auth (email + Google + Apple OAuth)
export_ai_organization_memberstable (owner/admin/member/viewer roles)- JWT-based RLS policies (
temp_dev_anon_accesswill be removed) lib/auth/session.ts— getSession(), requireAuth(), requireRole()- All hardcoded
ORGANIZATION_ID→ sourced from session
- Sees only its own companies
- Defines its own segments
- Selects its own target company_types (distributor, end_user, etc.)
- Uses its own API keys
DB Schema Status
Current: 18 tables (RLS active). After Ring 1 completion, ~28 tables.| Status | Tables |
|---|---|
| Current (18) | organizations, profiles, companies, contacts, leads, lead_contacts, interactions, tasks, products, quotes, quote_items, segments, company_scores, search_history, search_results, search_queries, search_feedback, ai_job_runs |
| R1-1 Auth | organization_members (profiles deprecated) |
| R1-2 Cost | usage_daily (ai_job_runs update) |
| R1-4 Billing | plan_limits, credit_wallets, credit_transactions, usage_monthly, subscriptions |
| R1-6 Monitor | activity_log |
| R2-1 Pipeline | market_context |
| R2-3 Dashboard | api_metrics |
All tables use theexport_ai_prefix. Details:docs/handbook/09-database-schema.md.