(01)Overview

UCAR-OS is a multi-tenant university intelligence platform built for the Hack4UCAR hackathon (organised by Université de Carthage and ACM ENSTAB), where it placed 2nd with a 1500 DT prize. I couldn't make it on-site — the team held it down on the ground for 36 hours straight while I built from home: shipping features end-to-end, the entire UI, the platform deployment, and the live demo, all remote. The product is designed to turn non-digitalized universities into fully data-driven institutions, giving rectors, deans, and admins one operational pane of glass over academic, financial, HR, research, infrastructure, and ESG data. The architecture is a 9-service FastAPI mesh behind a Kong DB-less gateway, sharing PostgreSQL 16 (with Row-Level Security for per-tenant isolation), ClickHouse 24.8 for OLAP/KPI time-series, MinIO for uploaded files, and Apache Kafka 3.8 (KRaft) as the event bus. The headline AI feature, "Le Cabinet de la Présidence," orchestrates 6 specialist agents — Academic, Finance, HR, Research, Infrastructure, and ESG — with a Router that picks who weighs in and a Synthesis layer that produces a single recommendation with agreements, tensions, and second-order effects. Every consultation is auditable; every decision is explainable. Live demo: https://hack4ucar.ahmedxsaad.me/.

(02)The Problem

Tunisian university leadership runs on disconnected spreadsheets and PDFs. There is no single system that can answer a rector's question like "why did the dropout rate spike at one of the constituent schools this semester, and what should we do?" — that requires academic data, HR signals, infrastructure load, and budget context all at once. Existing solutions are either single-institution, single-domain, or offer no AI reasoning over the underlying data. The platform also has to support multiple tenants (faculties / institutes / partner universities) under one control plane without leaking data between them.

(03)Solution

01
Designed a 9-microservice backend behind a Kong 3.8 (DB-less, declarative) API gateway: iam-service (:9010) for JWT auth + RBAC + tenant resolution + Kafka audit trail; ingestion-service (:9011) for PDF/Excel/CSV upload and JSON ingestion; data-preprocessor (:8005) for column mapping, cleaning, and normalization; core_management_service (:8008) for CRUD over students/courses/faculty/finance; analytics_service (:8007) for anomaly detection, OLS-regression predictions, and cross-institution KPI comparison; chatbot_service (:8006) hosting Le Cabinet (6 specialist agents + Router + Synthesis); report_generation_service (:8004) for LaTeX → pdflatex PDF reports on a cron schedule; notification_service (:8003) for email + in-app alerts; and learning_service (:8009) for the anomaly feedback loop and weekly LLM insights.
02
Implemented true pooled multi-tenancy via PostgreSQL 16 Row-Level Security. Every row in the `university.*` schema carries a `tenant_id`, and RLS policies enforce isolation at the database level — application code cannot accidentally leak across tenants. Super-admins hold the `BYPASSRLS` privilege for cross-institution aggregations (e.g. averaging success rates across all 6 UCAR universities). The IAM middleware resolves `tenant_id` in a deterministic order: JWT claim → `X-Tenant-ID` header → subdomain (`enit.ucar.edu` → slug → DB lookup) → `GLOBAL` fallback, then issues `SET LOCAL app.current_tenant` on every request so RLS engages automatically.
03
Built "Le Cabinet de la Présidence" as a 6-agent AI council in chatbot_service: Academic, Finance, HR, Research, Infrastructure, and ESG agents each speak from one domain. A Router classifies the rector's question and picks who is in the room; the agents respond in parallel; a Synthesis layer ("Le Cabinet") fuses their answers into a single recommendation with explicit reasoning, agreements, tensions, and second-order effects. The full consultation is logged for governance via `GET /api/v1/cabinet/audit`, streamed live over SSE on `/api/v1/cabinet/stream` (routing → agent → synthesis events), and a deterministic demo cache replays hero questions in <50 ms with zero LLM calls — judges never hit a rate limit even with the network down.
04
Wired an end-to-end event-driven data flow over Apache Kafka (KRaft mode, no ZooKeeper). Files uploaded to ingestion-service publish to `raw.data.ingested`; data-preprocessor consumes, cleans, and emits batches of 500 records on `tenant.data.normalized`; core_management_service upserts into `university.students/courses/enrollments`; analytics_service runs anomaly detection (dropout spikes, budget inconsistencies, ESG lags) and emits `analytics.anomaly.detected.v1`; notification_service sends emails + in-app alerts; learning_service consumes both detection and review events, aggregates feedback per institution/domain, and runs a weekly Gemini-backed insight job.
05
Engineered the analytics + reporting layer on top of ClickHouse 24.8 OLAP for KPI time-series (success rate, dropout, budget trajectory, enrollment forecasts via OLS regression) with a parallel report_generation_service that reads Postgres + ClickHouse, renders Jinja2 LaTeX templates, and shells out to `pdflatex` for branded PDF reports — both on-demand via `POST /api/v1/reports/generate` and on a cron schedule via APScheduler.
06
Shipped a Next.js 15 (App Router, TypeScript) frontend with TailwindCSS, Tremor, and Apache ECharts dashboards plus a chatbot UI consuming the cabinet SSE stream. Productionised the whole stack: every service exposes Prometheus metrics; a Helm chart in `infra/helm/ucar-platform` deploys to Kubernetes with ServiceMonitors, Grafana dashboards, and PrometheusRule alerts. Local dev runs as a single `make infra-up` over docker-compose with Kong, Kafka, MinIO, ClickHouse, PostgreSQL, and all 9 services.

(04)Architecture

A Next.js 15 frontend (TailwindCSS + Tremor + Apache ECharts) talks to Kong 3.8 (DB-less, declarative) which fans out to 9 FastAPI services. The IAM service (:9010) issues JWTs, enforces RBAC, and publishes to the `iam.audit.events` Kafka topic; its TenantResolutionMiddleware sets `app.current_tenant` on every Postgres connection so RLS isolates data per institution. Ingestion (:9011) → data-preprocessor (:8005) → core_management (:8008) is a Kafka pipeline (`raw.data.ingested` → `tenant.data.normalized`) that upserts into PostgreSQL 16. From core, analytics_service (:8007) reads KPIs, runs anomaly detection + OLS-regression predictions, and emits `analytics.anomaly.detected.v1`; notification_service (:8003) consumes that and sends email + in-app alerts; learning_service (:8009) aggregates feedback and generates weekly Gemini insight reports. report_generation_service (:8004) reads Postgres + ClickHouse and renders Jinja2 LaTeX → pdflatex PDFs. chatbot_service (:8006) runs Le Cabinet — 6 specialist agents + Router + Synthesis — over Google Gemini with a deterministic demo cache. PostgreSQL 16 (OLTP, RLS), ClickHouse 24.8 (OLAP), MinIO (S3-compatible object storage), and Kafka 3.8 (KRaft) form the data layer. Every service exposes Prometheus metrics; the Helm chart ships ServiceMonitors, Grafana dashboards, and PrometheusRule alerts.

(05)Tech Stack

Kong 3.8 (DB-less Gateway)

Declarative API gateway in front of all 9 services — auth enforcement, rate limiting, and routing from one config file

9 FastAPI Microservices

IAM, Ingestion, Preprocessor, Core Management, Analytics, Chatbot, Reports, Notifications, Learning — Python 3.11/3.12 with structlog + pydantic-settings

PostgreSQL 16 + Row-Level Security

Pooled multi-tenancy — every `university.*` row carries `tenant_id`; RLS policies + `SET LOCAL app.current_tenant` enforce isolation in the database, not the app

ClickHouse 24.8 (OLAP)

KPI time-series store powering anomaly detection, dropout/budget/ESG comparisons, and OLS-regression predictions across institutions

Apache Kafka 3.8 (KRaft)

Event bus — `raw.data.ingested`, `tenant.data.normalized`, `analytics.anomaly.detected.v1`, `iam.audit.events` — driving ingestion, analytics, notifications, and the learning loop

Le Cabinet (6 AI Agents + Router + Synthesis)

Academic, Finance, HR, Research, Infrastructure, ESG specialists; Router picks who weighs in; Synthesis fuses agreements, tensions, and second-order effects with full audit trail and SSE streaming

MinIO + LaTeX Report Generation

S3-compatible storage for uploaded PDFs/Excels/CSVs; report_generation_service renders Jinja2 LaTeX templates → pdflatex on a cron schedule

Next.js 15 + Tremor + ECharts

App Router frontend with Tailwind, Tremor dashboard primitives, Apache ECharts visualisations, and an SSE-driven chatbot UI for Le Cabinet

Kubernetes (Helm) + Prometheus + Grafana

Helm chart in `infra/helm/ucar-platform` with ServiceMonitors, Grafana dashboards, and PrometheusRule alerts — every service exposes `/metrics`

(06)Results

🥈 2nd Place — 1500 DT

Hack4UCAR — organised by Université de Carthage and ACM ENSTAB, April 2026

9 Microservices

FastAPI services behind Kong — IAM, Ingestion, Preprocessor, Core, Analytics, Chatbot, Reports, Notifications, Learning

Pooled Multi-Tenancy

PostgreSQL Row-Level Security across faculties / institutes / partner universities — strict per-tenant isolation with cross-tenant aggregations for super-admins

6-Agent AI Cabinet

Academic / Finance / HR / Research / Infrastructure / ESG specialists with Router + Synthesis and full audit trail

Event-Driven Pipeline

End-to-end Kafka flow from upload → normalize → core → anomaly detection → notifications → weekly learning insights

K8s + Observability

Helm chart, Prometheus metrics on every service, Grafana dashboards, and PrometheusRule alerts

(07)Achievement Gallery

01Moment
🥈 2nd Place — Hack4UCAR (Université de Carthage × ACM ENSTAB), April 26, 2026. Couldn't make it in person — they brought me anyway. 📱 Team Sa7aBsisa held it down on-site (Omar, Ghassen, Louay) while I shipped features, the UI, the platform deployment, and the live demo — all remote.

(08)Takeaways

(01)

Building a hackathon project remotely while three teammates ran the floor on-site for 36 hours forced the cleanest split of responsibility I've ever shipped: I built features end-to-end, owned the UI, the deployment pipeline, and the live demo path; they owned the data, the agents, and the judges. We had to make every interface (API contracts, the demo script, the K8s manifests) good enough that we never needed a hallway conversation — because there was no hallway.

(02)

Pushing tenant isolation down into PostgreSQL Row-Level Security instead of guarding it in application code was the single highest-leverage decision — it makes leaks structurally impossible and turns `SET LOCAL app.current_tenant` into the only thing the IAM middleware has to get right.

(03)

"Le Cabinet" worked as a demo because the Synthesis layer surfaces *disagreements* between agents, not just consensus — judges immediately understood the value when the Finance and ESG agents disagreed on a campus expansion and the synthesis explained the trade-off.

(04)

The deterministic demo cache (hero questions replaying in <50 ms with zero LLM calls and `SIMULATE_AGENT_LATENCY_MS` choreography) was a hackathon-saving call — venue Wi-Fi was unreliable and rate limits would have killed the live cabinet stream during judging.

(05)

Splitting OLTP (PostgreSQL with RLS) from OLAP (ClickHouse) early let analytics_service answer cross-institution KPI queries fast without ever touching the transactional tables — and it kept the schema boundaries clean enough that 9 services could share the data layer without stepping on each other.

(06)

Kong DB-less + a Helm chart with ServiceMonitors and PrometheusRule alerts looked like overkill for a hackathon, but it turned the demo into a real K8s deployment story instead of a pile of docker-compose services — and judges noticed.

Next→Truth Engine

All projects