Cernio — System Architecture

This document explains how the system works, the relationships between components, and data flow. Last updated: 2026-03-25 Handbook References:

docs/handbook/03-system-architecture.md — 5-layer architecture, API design, multi-tenant, caching (Ch. 27-50)

docs/handbook/11-infrastructure-devops.md — Docker, worker, deployment, monitoring (Ch. 189-210)

docs/handbook/13-discovery-code-architecture.md — Orchestrator pattern, API route, scoring (Ch. 226-252)

docs/handbook/09-database-schema.md — 22 table details, indexing, RLS (Ch. 136-168)

1. Overview

Cernio is a B2B SaaS product that discovers, classifies, enriches, and scores export distributors. DOSE Chemicals = Customer #1 (super admin + first tenant). It consists of two sub-projects:

Sub-Project	Technology	Execution Model	Purpose
Root	Next.js 16 (TypeScript, React 19)	Web app (browser)	UI + API routes
`scripts/`	Node.js + Python	Terminal (`node script.js`)	Batch operations

C# analogy: Root Next.js app = ASP.NET MVC web app, scripts/ = Console App batch jobs. Both write to the same database and use the same AI service.

2. Component Map

+-----------------------------------------------------------+
|                    USER (Browser)                           |
|  Companies | Contacts | Discovery | Leads | Scraper | Admin |
+---------------------------+-------------------------------+
                            | HTTP
+---------------------------v-------------------------------+
|              Next.js 16 App (root)                         |
|                                                            |
|  app/                                                      |
|  +-- companies/        -> Company list, filter, detail     |
|  +-- contacts/         -> Contact directory                |
|  +-- discovery/        -> AI company discovery             |
|  +-- leads/            -> Lead pipeline + detail           |
|  +-- scraper/          -> File upload -> review -> DB push |
|  +-- admin/segments/   -> Segment CRUD management          |
|  +-- api/                                                  |
|      +-- ai/                                               |
|      |   +-- complete/   -> Single AI prompt call          |
|      |   +-- classify/   -> Batch classification           |
|      |   +-- providers/  -> Available provider list        |
|      +-- discover/       -> Company discovery + DB save    |
|      +-- headhunt/       -> Contact finding + DB save      |
|      +-- score/          -> FitScore v2 batch scoring      |
|      +-- leads/          -> Lead CRUD + [id] PATCH         |
|      +-- interactions/   -> Communication log              |
|      +-- tasks/          -> Task CRUD + [id] PATCH         |
|      +-- feedback/       -> Search result feedback         |
|      +-- segments/       -> Segment CRUD                   |
|      +-- prompts/        -> Prompt builder access          |
|      +-- scraper/                                          |
|          +-- run/        -> Trigger Python scraper         |
|          +-- push/       -> Write approved records to DB   |
|                                                            |
|  lib/                                                      |
|  +-- ai/               -> Central AI Client                |
|  |   +-- types.ts        -> Interfaces, AIProviderName     |
|  |   +-- errors.ts       -> Custom error classes           |
|  |   +-- parseJson.ts    -> Parse JSON from AI output      |
|  |   +-- client.ts       -> complete(), classifyBatch()    |
|  |   +-- providers/      -> Gemini, Anthropic, Perplexity, OpenAI |
|  +-- db/               -> DB operation layer               |
|  |   +-- save-companies.ts -> Company upsert + domain update |
|  |   +-- save-contacts.ts  -> Contact upsert (single + batch) |
|  |   +-- save-scores.ts    -> FitScore breakdown save      |
|  |   +-- save-search.ts    -> Search history cache         |
|  |   +-- save-search-queries.ts -> Sub-query save          |
|  |   +-- save-search-results.ts -> Result save             |
|  |   +-- track-ai-job.ts  -> AI job observability          |
|  +-- discovery/        -> Discovery orchestrator           |
|  |   +-- types.ts        -> IDiscoveryInput/Result         |
|  |   +-- run-discovery.ts -> 9-step pipeline               |
|  +-- scoring/          -> FitScore v2                      |
|  |   +-- fitScore.ts     -> 6-factor score calculation     |
|  +-- prompts.ts        -> Central prompt builders          |
|  +-- types/            -> ICompany, IContact, ILead etc.   |
|  +-- constants.ts      -> SEGMENT_MAP, SEGMENT_COLOR       |
|  +-- api/validateApiKey.ts -> Internal key validation      |
+---------------------------+-------------------------------+
                            | Supabase Client
+---------------------------v-------------------------------+
|              Supabase (PostgreSQL) - 18 tables             |
|                                                            |
|  export_ai_organizations  -> SaaS tenants                  |
|  export_ai_profiles       -> Users                         |
|  export_ai_companies      -> Company records (1591)        |
|  export_ai_contacts       -> Contact records (1522)        |
|  export_ai_leads          -> Lead pipeline                 |
|  export_ai_lead_contacts  -> Lead-contact junction         |
|  export_ai_interactions   -> Communication history         |
|  export_ai_tasks          -> Follow-up tasks               |
|  export_ai_products       -> Product portfolio             |
|  export_ai_quotes         -> Quotes                        |
|  export_ai_quote_items    -> Quote line items              |
|  export_ai_segments       -> Segment definitions (DB-driven) |
|  export_ai_company_scores -> FitScore v2 breakdown         |
|  export_ai_search_history -> API cache                     |
|  export_ai_search_results -> Discovery results             |
|  export_ai_search_queries -> Discovery sub-queries         |
|  export_ai_search_feedback-> Result feedback               |
|  export_ai_ai_job_runs   -> AI job observability           |
+---------------------------^-------------------------------+
                            | HTTP POST -> /api/ai/*
+---------------------------+-------------------------------+
|              scripts/ (Node.js + Python CLI)                |
|                                                            |
|  processor.js   -> Read companies from CSV -> AI discovery + analysis |
|  finder.js      -> Batch contact finding for DB companies  |
|  bulk_fixer.js  -> Enrich missing company information      |
|  cleaner.js     -> Segment audit + IRRELEVANT cleanup      |
|  scraper/       -> PDF/Excel/Word -> parse -> normalize    |
|                                                            |
|  lib/api.js     -> callAI() + sleep() shared helpers       |
+-----------------------------------------------------------+

3. scripts/ Batch Scripts — What They Do, How They Work

3.1 processor.js — Batch Company Processing from CSV

Input: egyptcompany.csv (or another CSV) — list of company domains. What it does:

Reads the CSV file
For each domain, sends an “analyze this company + find competitors” prompt to AI (/api/ai/complete, webSearch: true)
Writes the company information returned by AI to the export_ai_companies table in Supabase (upsert)

Usage: node processor.js (from terminal) UI equivalent: Discovery page (one at a time). Batch discovery button from Operations page (planned).

3.2 finder.js — Batch Contact Finding (Headhunt)

Input: Companies in the DB (those without contacts found yet). What it does:

Pulls companies from the export_ai_companies table where last_headhunt_at IS NULL
For each company, sends a “find decision makers for this company” prompt to AI (/api/ai/complete, webSearch: true)
Writes found contacts to the export_ai_contacts table
Updates the company’s last_headhunt_at field

Usage: node finder.js (from terminal) UI equivalent: “Headhunt” button on the Companies page (one at a time). Batch headhunt from Operations page (planned).

3.3 bulk_fixer.js — Batch Company Enrichment

Input: Companies in the DB (those with missing information — no website, no description, etc.). What it does:

Pulls companies with incomplete data from the export_ai_companies table
For each company, sends an “enrich this company’s information” prompt to AI (/api/ai/complete, webSearch: true)
Updates the company record with returned information (website, description, segment, city, etc.)

Usage: node bulk_fixer.js (from terminal) UI equivalent: None (planned — Operations page).

3.4 cleaner.js — Segment Audit & Cleanup

Input: All companies in the DB. What it does:

Pulls all companies from the export_ai_companies table (organization-scoped)
Sends companies to AI in batches of 25 (/api/ai/classify)
AI assigns each company a segment: S1 (Machinery), S2 (Chemicals), S3 (Spare Parts), or IRRELEVANT
Reports those marked as IRRELEVANT (deletion decision is up to the user)
Writes segment changes to the DB

Usage: node cleaner.js (from terminal) UI equivalent: None (planned — Operations page).

3.5 scraper/ (Python) — File Parse & Normalize

Input: PDF, Excel, or Word files (trade fair catalogs, distributor lists, etc.). What it does:

scrape.py runs from CLI, takes a file path
parsers/ — Selects the appropriate parser by file type (pdf, excel, word)
core/detector.py — Extracts company information from raw lines
core/keyword_filter.py — Filters by keywords defined in config.yaml
core/llm_normalizer.py — Normalizes ambiguous lines via AI (/api/ai/classify)
core/exporter.py — Outputs results as JSON

UI equivalent: Scraper page — upload file → approve on review screen → “Push to DB” to write to Supabase. This flow is already available in the UI.

4. AI Client Architecture

All AI calls go through a central client. Direct SDK calls are prohibited.

                    +----------------------+
                    |    lib/ai/client.ts    |
                    |  complete()            |
                    |  classifyBatch()       |
                    +------+---------------+
                           | resolveProvider()
              +------------+----------------+
              v            v                v            v
         +--------+  +----------+  +-----------+  +--------+
         | Gemini |  |Anthropic |  |Perplexity |  | OpenAI |
         |  2.5   |  |  Claude  |  |  Sonar    |  | GPT-4o |
         | Flash  |  |  Sonnet  |  |   Pro     |  |        |
         +--------+  +----------+  +-----------+  +--------+
         webSearch    webSearch     webSearch       webSearch
           YES          NO           YES              NO

Provider selection priority (top to bottom):

Explicit provider parameter in body (from UI dropdown or script)
AI_PROVIDER env variable
Default: gemini

Tasks requiring webSearch: Discovery, Headhunt, bulk_fixer → uses Perplexity or Gemini grounding. Batch classification: Groups of 25, 2-second wait between groups, provider rate limit protection.

5. Data Flow Diagram

FILE (PDF/Excel)  -->  Scraper  -->  Review UI  -->  DB push
                                                        |
CSV file  -->  processor.js  -->  AI discovery  -->  DB upsert
                                                        |
Keyword + Country  -->  Discovery UI  -->  AI discovery  -->  DB upsert
                                                        |
                                                        v
                                                 +-------------+
                                                 |  companies   |
                                                 |  table       |
                                                 +------+------+
                                                        |
                          +---------------+---------------+
                          v               v               v
                    +----------+   +-----------+   +----------+
                    | cleaner  |   | bulk_fixer|   | finder/  |
                    | (audit)  |   | (enrich)  |   | headhunt |
                    +----+-----+   +-----+-----+   +----+-----+
                         |               |              |
                         v               v              v
                    Segment update  Info update   contacts table

6. Environment Variables

.env.local

NEXT_PUBLIC_SUPABASE_URL=...        # Supabase project URL
NEXT_PUBLIC_SUPABASE_ANON_KEY=...   # Supabase anon key
GOOGLE_API_KEY=...                  # Gemini API key
PPLX_API_KEY=...                    # Perplexity API key
OPENAI_API_KEY=...                  # OpenAI API key (optional)
ANTHROPIC_API_KEY=...               # Anthropic API key (optional)
AI_PROVIDER=gemini                  # Default AI provider
AI_BATCH_SIZE=25                    # Batch classification group size
AI_INTERNAL_API_KEY=...             # Password for scripts/ batch jobs to access the API

scripts/.env

SUPABASE_URL=...                    # Supabase project URL
SUPABASE_KEY=...                    # Supabase service key
NEXTJS_API_URL=http://localhost:3000 # Next.js API base URL
AI_INTERNAL_API_KEY=...             # Must be SAME as in .env.local
AI_PROVIDER=gemini                  # Default provider on this side

What is AI_INTERNAL_API_KEY? A password you generate yourself. scripts/ batch jobs send HTTP requests to Next.js API routes — this key prevents random external callers from hitting those routes. Both sides (.env.local + scripts/.env) must have the same value.

7. Segment System

Handbook: docs/handbook/02-ai-discovery-pipeline.md Ch. 17 (segment definitions — abstract in Founder Bible) Handbook: docs/handbook/09-database-schema.md Ch. 151 (company_segments table)

Companies are divided into 4 segments:

Code	Label	Description
S1	Machinery	Sewing, cutting, embroidery, ironing/pressing machines
S2	Chemicals	Dyes, washing agents, adhesives
S3	Spare Parts	Needles, bobbins, folders, accessories
IRRELEVANT	Irrelevant	To be deleted — non-textile companies

Current state: Segment definitions have been moved to the export_ai_segments table (SEG-A/B/C/D completed). Managed from the admin panel (/admin/segments). Automatically injected into AI prompts (lib/prompts.ts + fetchSegmentRules()). Founder Bible note: Segments are kept abstract (each org defines its own). In the SaaS transition, each org will define its own segments.

7.1 Company Type (Customer Type — Phase 1.5)

While Segment defines “what the company sells,” Company Type defines “what kind of player it is”:

Code	Description	DOSE Chemicals	DOSE Home Textiles
`distributor`	Wholesale distributor, importer	Primary target	Secondary
`reseller`	Retailer, dealer	Secondary	Important
`end_user`	End user (factory, workshop)	Indirect	Primary target
`manufacturer`	Producer	Competitor/partner	Competitor/partner
`unknown`	Not yet classified	Default	Default

Current state: Company type is stored in the export_ai_companies.company_type column (CTYPE-A completed). Discovery, enrichment, and cleaner prompts perform company_type classification (CTYPE-B/C completed). distributor weight is active in FitScore v2 (SCORE-A). Color-coded badge display on CompanyCard is live (CTYPE-D/E completed).

8. Multi-Tenant Architecture

Handbook: docs/handbook/03-system-architecture.md Ch. 39 (multi-tenant model) Handbook: docs/handbook/09-database-schema.md Ch. 166 (RLS strategy) Strategy: docs/strategy/01-auth-multi-tenant.md (R1-1 — 14 atomic tasks)

Currently single organization (DOSE Chemicals): cd3c0336-da2b-4cb6-a35d-ad43563b87f2 Every DB query must include an organization_id filter. Currently hardcoded; after R1-1 (Auth & Multi-Tenant) is completed, it will be sourced from session.activeOrgId. Changes coming with R1-1:

Supabase Auth (email + Google + Apple OAuth)
export_ai_organization_members table (owner/admin/member/viewer roles)
JWT-based RLS policies (temp_dev_anon_access will be removed)
lib/auth/session.ts — getSession(), requireAuth(), requireRole()
All hardcoded ORGANIZATION_ID → sourced from session

In the SaaS model, each organization:

Sees only its own companies
Defines its own segments
Selects its own target company_types (distributor, end_user, etc.)
Uses its own API keys

DB Schema Status

Current: 18 tables (RLS active). After Ring 1 completion, ~28 tables.

Status	Tables
Current (18)	organizations, profiles, companies, contacts, leads, lead_contacts, interactions, tasks, products, quotes, quote_items, segments, company_scores, search_history, search_results, search_queries, search_feedback, ai_job_runs
R1-1 Auth	organization_members (profiles deprecated)
R1-2 Cost	usage_daily (ai_job_runs update)
R1-4 Billing	plan_limits, credit_wallets, credit_transactions, usage_monthly, subscriptions
R1-6 Monitor	activity_log
R2-1 Pipeline	market_context
R2-3 Dashboard	api_metrics

All tables use the export_ai_ prefix. Details: docs/handbook/09-database-schema.md.

Documentation Index

​Cernio — System Architecture

​1. Overview

​2. Component Map

​3. scripts/ Batch Scripts — What They Do, How They Work

​3.1 processor.js — Batch Company Processing from CSV

​3.2 finder.js — Batch Contact Finding (Headhunt)

​3.3 bulk_fixer.js — Batch Company Enrichment

​3.4 cleaner.js — Segment Audit & Cleanup

​3.5 scraper/ (Python) — File Parse & Normalize

​4. AI Client Architecture

​5. Data Flow Diagram

​6. Environment Variables

​.env.local

​scripts/.env

​7. Segment System

​7.1 Company Type (Customer Type — Phase 1.5)

​8. Multi-Tenant Architecture

​DB Schema Status