Skip to main content

Cernio — System Architecture

This document explains how the system works, the relationships between components, and data flow. Last updated: 2026-03-25 Handbook References:
  • docs/handbook/03-system-architecture.md — 5-layer architecture, API design, multi-tenant, caching (Ch. 27-50)
  • docs/handbook/11-infrastructure-devops.md — Docker, worker, deployment, monitoring (Ch. 189-210)
  • docs/handbook/13-discovery-code-architecture.md — Orchestrator pattern, API route, scoring (Ch. 226-252)
  • docs/handbook/09-database-schema.md — 22 table details, indexing, RLS (Ch. 136-168)

1. Overview

Cernio is a B2B SaaS product that discovers, classifies, enriches, and scores export distributors. DOSE Chemicals = Customer #1 (super admin + first tenant). It consists of two sub-projects:
Sub-ProjectTechnologyExecution ModelPurpose
RootNext.js 16 (TypeScript, React 19)Web app (browser)UI + API routes
scripts/Node.js + PythonTerminal (node script.js)Batch operations
C# analogy: Root Next.js app = ASP.NET MVC web app, scripts/ = Console App batch jobs. Both write to the same database and use the same AI service.

2. Component Map

+-----------------------------------------------------------+
|                    USER (Browser)                           |
|  Companies | Contacts | Discovery | Leads | Scraper | Admin |
+---------------------------+-------------------------------+
                            | HTTP
+---------------------------v-------------------------------+
|              Next.js 16 App (root)                         |
|                                                            |
|  app/                                                      |
|  +-- companies/        -> Company list, filter, detail     |
|  +-- contacts/         -> Contact directory                |
|  +-- discovery/        -> AI company discovery             |
|  +-- leads/            -> Lead pipeline + detail           |
|  +-- scraper/          -> File upload -> review -> DB push |
|  +-- admin/segments/   -> Segment CRUD management          |
|  +-- api/                                                  |
|      +-- ai/                                               |
|      |   +-- complete/   -> Single AI prompt call          |
|      |   +-- classify/   -> Batch classification           |
|      |   +-- providers/  -> Available provider list        |
|      +-- discover/       -> Company discovery + DB save    |
|      +-- headhunt/       -> Contact finding + DB save      |
|      +-- score/          -> FitScore v2 batch scoring      |
|      +-- leads/          -> Lead CRUD + [id] PATCH         |
|      +-- interactions/   -> Communication log              |
|      +-- tasks/          -> Task CRUD + [id] PATCH         |
|      +-- feedback/       -> Search result feedback         |
|      +-- segments/       -> Segment CRUD                   |
|      +-- prompts/        -> Prompt builder access          |
|      +-- scraper/                                          |
|          +-- run/        -> Trigger Python scraper         |
|          +-- push/       -> Write approved records to DB   |
|                                                            |
|  lib/                                                      |
|  +-- ai/               -> Central AI Client                |
|  |   +-- types.ts        -> Interfaces, AIProviderName     |
|  |   +-- errors.ts       -> Custom error classes           |
|  |   +-- parseJson.ts    -> Parse JSON from AI output      |
|  |   +-- client.ts       -> complete(), classifyBatch()    |
|  |   +-- providers/      -> Gemini, Anthropic, Perplexity, OpenAI |
|  +-- db/               -> DB operation layer               |
|  |   +-- save-companies.ts -> Company upsert + domain update |
|  |   +-- save-contacts.ts  -> Contact upsert (single + batch) |
|  |   +-- save-scores.ts    -> FitScore breakdown save      |
|  |   +-- save-search.ts    -> Search history cache         |
|  |   +-- save-search-queries.ts -> Sub-query save          |
|  |   +-- save-search-results.ts -> Result save             |
|  |   +-- track-ai-job.ts  -> AI job observability          |
|  +-- discovery/        -> Discovery orchestrator           |
|  |   +-- types.ts        -> IDiscoveryInput/Result         |
|  |   +-- run-discovery.ts -> 9-step pipeline               |
|  +-- scoring/          -> FitScore v2                      |
|  |   +-- fitScore.ts     -> 6-factor score calculation     |
|  +-- prompts.ts        -> Central prompt builders          |
|  +-- types/            -> ICompany, IContact, ILead etc.   |
|  +-- constants.ts      -> SEGMENT_MAP, SEGMENT_COLOR       |
|  +-- api/validateApiKey.ts -> Internal key validation      |
+---------------------------+-------------------------------+
                            | Supabase Client
+---------------------------v-------------------------------+
|              Supabase (PostgreSQL) - 18 tables             |
|                                                            |
|  export_ai_organizations  -> SaaS tenants                  |
|  export_ai_profiles       -> Users                         |
|  export_ai_companies      -> Company records (1591)        |
|  export_ai_contacts       -> Contact records (1522)        |
|  export_ai_leads          -> Lead pipeline                 |
|  export_ai_lead_contacts  -> Lead-contact junction         |
|  export_ai_interactions   -> Communication history         |
|  export_ai_tasks          -> Follow-up tasks               |
|  export_ai_products       -> Product portfolio             |
|  export_ai_quotes         -> Quotes                        |
|  export_ai_quote_items    -> Quote line items              |
|  export_ai_segments       -> Segment definitions (DB-driven) |
|  export_ai_company_scores -> FitScore v2 breakdown         |
|  export_ai_search_history -> API cache                     |
|  export_ai_search_results -> Discovery results             |
|  export_ai_search_queries -> Discovery sub-queries         |
|  export_ai_search_feedback-> Result feedback               |
|  export_ai_ai_job_runs   -> AI job observability           |
+---------------------------^-------------------------------+
                            | HTTP POST -> /api/ai/*
+---------------------------+-------------------------------+
|              scripts/ (Node.js + Python CLI)                |
|                                                            |
|  processor.js   -> Read companies from CSV -> AI discovery + analysis |
|  finder.js      -> Batch contact finding for DB companies  |
|  bulk_fixer.js  -> Enrich missing company information      |
|  cleaner.js     -> Segment audit + IRRELEVANT cleanup      |
|  scraper/       -> PDF/Excel/Word -> parse -> normalize    |
|                                                            |
|  lib/api.js     -> callAI() + sleep() shared helpers       |
+-----------------------------------------------------------+

3. scripts/ Batch Scripts — What They Do, How They Work

3.1 processor.js — Batch Company Processing from CSV

Input: egyptcompany.csv (or another CSV) — list of company domains. What it does:
  1. Reads the CSV file
  2. For each domain, sends an “analyze this company + find competitors” prompt to AI (/api/ai/complete, webSearch: true)
  3. Writes the company information returned by AI to the export_ai_companies table in Supabase (upsert)
Usage: node processor.js (from terminal) UI equivalent: Discovery page (one at a time). Batch discovery button from Operations page (planned).

3.2 finder.js — Batch Contact Finding (Headhunt)

Input: Companies in the DB (those without contacts found yet). What it does:
  1. Pulls companies from the export_ai_companies table where last_headhunt_at IS NULL
  2. For each company, sends a “find decision makers for this company” prompt to AI (/api/ai/complete, webSearch: true)
  3. Writes found contacts to the export_ai_contacts table
  4. Updates the company’s last_headhunt_at field
Usage: node finder.js (from terminal) UI equivalent: “Headhunt” button on the Companies page (one at a time). Batch headhunt from Operations page (planned).

3.3 bulk_fixer.js — Batch Company Enrichment

Input: Companies in the DB (those with missing information — no website, no description, etc.). What it does:
  1. Pulls companies with incomplete data from the export_ai_companies table
  2. For each company, sends an “enrich this company’s information” prompt to AI (/api/ai/complete, webSearch: true)
  3. Updates the company record with returned information (website, description, segment, city, etc.)
Usage: node bulk_fixer.js (from terminal) UI equivalent: None (planned — Operations page).

3.4 cleaner.js — Segment Audit & Cleanup

Input: All companies in the DB. What it does:
  1. Pulls all companies from the export_ai_companies table (organization-scoped)
  2. Sends companies to AI in batches of 25 (/api/ai/classify)
  3. AI assigns each company a segment: S1 (Machinery), S2 (Chemicals), S3 (Spare Parts), or IRRELEVANT
  4. Reports those marked as IRRELEVANT (deletion decision is up to the user)
  5. Writes segment changes to the DB
Usage: node cleaner.js (from terminal) UI equivalent: None (planned — Operations page).

3.5 scraper/ (Python) — File Parse & Normalize

Input: PDF, Excel, or Word files (trade fair catalogs, distributor lists, etc.). What it does:
  1. scrape.py runs from CLI, takes a file path
  2. parsers/ — Selects the appropriate parser by file type (pdf, excel, word)
  3. core/detector.py — Extracts company information from raw lines
  4. core/keyword_filter.py — Filters by keywords defined in config.yaml
  5. core/llm_normalizer.py — Normalizes ambiguous lines via AI (/api/ai/classify)
  6. core/exporter.py — Outputs results as JSON
UI equivalent: Scraper page — upload file → approve on review screen → “Push to DB” to write to Supabase. This flow is already available in the UI.

4. AI Client Architecture

All AI calls go through a central client. Direct SDK calls are prohibited.
                    +----------------------+
                    |    lib/ai/client.ts    |
                    |  complete()            |
                    |  classifyBatch()       |
                    +------+---------------+
                           | resolveProvider()
              +------------+----------------+
              v            v                v            v
         +--------+  +----------+  +-----------+  +--------+
         | Gemini |  |Anthropic |  |Perplexity |  | OpenAI |
         |  2.5   |  |  Claude  |  |  Sonar    |  | GPT-4o |
         | Flash  |  |  Sonnet  |  |   Pro     |  |        |
         +--------+  +----------+  +-----------+  +--------+
         webSearch    webSearch     webSearch       webSearch
           YES          NO           YES              NO
Provider selection priority (top to bottom):
  1. Explicit provider parameter in body (from UI dropdown or script)
  2. AI_PROVIDER env variable
  3. Default: gemini
Tasks requiring webSearch: Discovery, Headhunt, bulk_fixer → uses Perplexity or Gemini grounding. Batch classification: Groups of 25, 2-second wait between groups, provider rate limit protection.

5. Data Flow Diagram

FILE (PDF/Excel)  -->  Scraper  -->  Review UI  -->  DB push
                                                        |
CSV file  -->  processor.js  -->  AI discovery  -->  DB upsert
                                                        |
Keyword + Country  -->  Discovery UI  -->  AI discovery  -->  DB upsert
                                                        |
                                                        v
                                                 +-------------+
                                                 |  companies   |
                                                 |  table       |
                                                 +------+------+
                                                        |
                          +---------------+---------------+
                          v               v               v
                    +----------+   +-----------+   +----------+
                    | cleaner  |   | bulk_fixer|   | finder/  |
                    | (audit)  |   | (enrich)  |   | headhunt |
                    +----+-----+   +-----+-----+   +----+-----+
                         |               |              |
                         v               v              v
                    Segment update  Info update   contacts table

6. Environment Variables

.env.local

NEXT_PUBLIC_SUPABASE_URL=...        # Supabase project URL
NEXT_PUBLIC_SUPABASE_ANON_KEY=...   # Supabase anon key
GOOGLE_API_KEY=...                  # Gemini API key
PPLX_API_KEY=...                    # Perplexity API key
OPENAI_API_KEY=...                  # OpenAI API key (optional)
ANTHROPIC_API_KEY=...               # Anthropic API key (optional)
AI_PROVIDER=gemini                  # Default AI provider
AI_BATCH_SIZE=25                    # Batch classification group size
AI_INTERNAL_API_KEY=...             # Password for scripts/ batch jobs to access the API

scripts/.env

SUPABASE_URL=...                    # Supabase project URL
SUPABASE_KEY=...                    # Supabase service key
NEXTJS_API_URL=http://localhost:3000 # Next.js API base URL
AI_INTERNAL_API_KEY=...             # Must be SAME as in .env.local
AI_PROVIDER=gemini                  # Default provider on this side
What is AI_INTERNAL_API_KEY? A password you generate yourself. scripts/ batch jobs send HTTP requests to Next.js API routes — this key prevents random external callers from hitting those routes. Both sides (.env.local + scripts/.env) must have the same value.

7. Segment System

Handbook: docs/handbook/02-ai-discovery-pipeline.md Ch. 17 (segment definitions — abstract in Founder Bible) Handbook: docs/handbook/09-database-schema.md Ch. 151 (company_segments table)
Companies are divided into 4 segments:
CodeLabelDescription
S1MachinerySewing, cutting, embroidery, ironing/pressing machines
S2ChemicalsDyes, washing agents, adhesives
S3Spare PartsNeedles, bobbins, folders, accessories
IRRELEVANTIrrelevantTo be deleted — non-textile companies
Current state: Segment definitions have been moved to the export_ai_segments table (SEG-A/B/C/D completed). Managed from the admin panel (/admin/segments). Automatically injected into AI prompts (lib/prompts.ts + fetchSegmentRules()). Founder Bible note: Segments are kept abstract (each org defines its own). In the SaaS transition, each org will define its own segments.

7.1 Company Type (Customer Type — Phase 1.5)

While Segment defines “what the company sells,” Company Type defines “what kind of player it is”:
CodeDescriptionDOSE ChemicalsDOSE Home Textiles
distributorWholesale distributor, importerPrimary targetSecondary
resellerRetailer, dealerSecondaryImportant
end_userEnd user (factory, workshop)IndirectPrimary target
manufacturerProducerCompetitor/partnerCompetitor/partner
unknownNot yet classifiedDefaultDefault
Current state: Company type is stored in the export_ai_companies.company_type column (CTYPE-A completed). Discovery, enrichment, and cleaner prompts perform company_type classification (CTYPE-B/C completed). distributor weight is active in FitScore v2 (SCORE-A). Color-coded badge display on CompanyCard is live (CTYPE-D/E completed).

8. Multi-Tenant Architecture

Handbook: docs/handbook/03-system-architecture.md Ch. 39 (multi-tenant model) Handbook: docs/handbook/09-database-schema.md Ch. 166 (RLS strategy) Strategy: docs/strategy/01-auth-multi-tenant.md (R1-1 — 14 atomic tasks)
Currently single organization (DOSE Chemicals): cd3c0336-da2b-4cb6-a35d-ad43563b87f2 Every DB query must include an organization_id filter. Currently hardcoded; after R1-1 (Auth & Multi-Tenant) is completed, it will be sourced from session.activeOrgId. Changes coming with R1-1:
  • Supabase Auth (email + Google + Apple OAuth)
  • export_ai_organization_members table (owner/admin/member/viewer roles)
  • JWT-based RLS policies (temp_dev_anon_access will be removed)
  • lib/auth/session.ts — getSession(), requireAuth(), requireRole()
  • All hardcoded ORGANIZATION_ID → sourced from session
In the SaaS model, each organization:
  • Sees only its own companies
  • Defines its own segments
  • Selects its own target company_types (distributor, end_user, etc.)
  • Uses its own API keys

DB Schema Status

Current: 18 tables (RLS active). After Ring 1 completion, ~28 tables.
StatusTables
Current (18)organizations, profiles, companies, contacts, leads, lead_contacts, interactions, tasks, products, quotes, quote_items, segments, company_scores, search_history, search_results, search_queries, search_feedback, ai_job_runs
R1-1 Authorganization_members (profiles deprecated)
R1-2 Costusage_daily (ai_job_runs update)
R1-4 Billingplan_limits, credit_wallets, credit_transactions, usage_monthly, subscriptions
R1-6 Monitoractivity_log
R2-1 Pipelinemarket_context
R2-3 Dashboardapi_metrics
All tables use the export_ai_ prefix. Details: docs/handbook/09-database-schema.md.