Overview
Agent IA BH Assurance: Le Conseiller Augmenté — a complete AI-powered insurance advisory platform built solo for BH Assurance's NEXT Challenge, winning 1st Place against teams of 3 over a month-long competition (August–September 2025). The architecture follows a Mixture-of-Experts (MoE) pattern: one central receptionist AI agent classifies user intent and dynamically routes to specialized expert agents (RAG expert, client data expert, scenario expert) — like a gating mechanism that only activates the right expert per query. The system runs as 4 services: a Next.js 14 frontend + Express.js chat backend (Prisma, PostgreSQL, JWT), a FastAPI RAG service (Qdrant with Qwen3-Embedding-8B at 4096-dim, FlashRank re-ranker, Qwen3-4B-Instruct via Ollama), a FastAPI client management API (Oracle DB integration), and a FastAPI quote generation service (SQLAlchemy + Alembic). The critical differentiator was meticulous data engineering: I manually extracted, translated (French/Arabic/mixed), and restructured BH Assurance's entire product catalog from dozens of unstructured PDFs and CSVs using Gemini 2.5 Pro — because RAG's power comes from data quality, not model size.
The Problem
BH Assurance's insurance advisors spend excessive time searching through hundreds of product documents across 6 insurance branches (automobile, vie, santé, engineering, transport, IARD) to answer client questions. The source material is scattered across multilingual PDFs and CSVs — some in French, some in Arabic, some in mixed formats — with no consistent structure. Existing chatbot solutions fail because they can't handle this data quality issue: garbage in, garbage out. The challenge required a production-grade advisory system that could accurately answer questions about all BH Assurance products.
Solution
The single most important engineering decision was investing heavily in data quality before touching any AI. I used Gemini 2.5 Pro (the best model at the time for this task — the Flash model couldn't reliably read the unstructured data) to extract, translate, and restructure BH Assurance's entire product catalog from dozens of PDFs and CSVs. The source data was in different languages (French, Arabic, mixed), unstructured, and inconsistent. I formatted every product into structured JSON with normalized fields: `product_category`, `product_name`, `branche`, `garantie`, `keywords`, `key_points`, and `references`. This augmented dataset is what made the RAG system actually useful — without it, vector search returns noise.
Built the RAG service (FastAPI on port 8000) with a 4-phase pipeline: (1) Enhanced semantic search using Qwen3-Embedding-8B (4096-dim vectors via Ollama) stored in Qdrant with category-aware payload filtering across 6 insurance categories (automobile, vie, santé, engineering, transport, IARD), (2) FlashRank re-ranking of candidate documents with score breakdown (semantic + domain + transport-specific bonuses), (3) intelligent context construction that groups results by product/article type and detects comparison queries, (4) Qwen3-4B-Instruct-2507 (GGUF Q4_K_M via Ollama) for answer generation with strict factuality prompts — the system prompt enforces 100% fidelity to retrieved context and explicitly refuses to hallucinate.
Built the chat frontend (Next.js 14 + TypeScript + Tailwind 3.4 + Zustand + Framer Motion + react-markdown) with Express.js backend (Prisma ORM + PostgreSQL for conversation persistence, JWT/bcrypt auth, user memory management). The FastAPI client management API integrates with Oracle DB for real client data lookup, and the quote generation service (FastAPI + SQLAlchemy + Alembic) handles insurance quote sessions with BH Assurance's pricing API.
Orchestrated the multi-agent conversation flow using a MoE-inspired architecture with 4 n8n workflow definitions: `Agent bh assurance.json` acts as the central receptionist/router — it classifies the user's intent and dynamically dispatches to the right expert agent, like a gating mechanism. `rag expert bh assurance.json` handles product knowledge queries via the RAG pipeline, `call client data expert bh assurance.json` looks up real client data from Oracle DB, and `scenario_expert.json` analyzes complex insurance scenarios. Only the relevant expert is activated per query — the others stay idle, keeping responses fast and focused.
Architecture
MoE-inspired multi-agent architecture: Next.js 14 frontend (port 3000) communicates with an Express.js chat backend (Prisma + PostgreSQL, JWT auth). The backend routes all queries to the central receptionist agent in n8n, which classifies intent and dispatches to the appropriate expert: (1) the RAG expert calls the FastAPI RAG service (port 8000) using Qdrant (Qwen3-Embedding-8B, 4096-dim) + FlashRank re-ranker + Qwen3-4B via Ollama for insurance Q&A, (2) the client data expert calls the FastAPI client API for Oracle DB lookups, (3) the scenario expert runs complex analysis, and (4) the quote expert calls the FastAPI quote service (SQLAlchemy + Alembic) for pricing. All services are Dockerized with individual docker-compose.yml files.
Tech Stack
Qwen3-Embedding-8B (4096-dim)
Ollama-served embedding model for high-dimensional insurance document vectors — stored in Qdrant collection `bh_assurance`
Qwen3-4B-Instruct-2507
GGUF Q4_K_M quantized LLM via Ollama — strict factuality prompts, 15K context window, 0.3 temperature default
FlashRank Re-ranker
Advanced hybrid re-ranking with score breakdown: semantic match + domain expertise + transport-specific bonuses
Qdrant Vector Database
Category-aware payload filtering across 6 insurance branches — automobile, vie, santé, engineering, transport, IARD
Gemini 2.5 Pro (Data Pipeline)
Used for bulk extraction, translation (French/Arabic/mixed), and structuring of unstructured PDFs/CSVs into augmented JSON — Flash model couldn't handle the data quality
Next.js 14 + Express.js + Prisma
Chat frontend (Zustand, react-markdown, Framer Motion) with PostgreSQL-backed conversation persistence and JWT auth
n8n MoE Workflows (4 Agents)
Receptionist router + 3 experts (RAG, client data, scenario) — only the relevant expert is activated per query, like a MoE gating mechanism
Docker Compose
Individual docker-compose.yml per service — one-command deployment for each component
Results
Won solo against teams of 3 at BH Assurance's NEXT Challenge
Engineered the entire 4-service platform, data pipeline, and n8n workflows alone
Qwen3-Embedding-8B vectors in Qdrant — high-dimensional insurance domain search
Complete coverage: automobile, vie, santé, engineering, transport, IARD
All LLM processing via Ollama — zero data leaves BH Assurance infrastructure
Extracted, translated, and structured entire product catalog using Gemini 2.5 Pro
Achievement Gallery
Moments that made it all worth it

Achievement 01
🥇 1st Place — NEXT Challenge by BH Assurance, September 2025 (Solo vs teams of 3)
Key Takeaways
RAG's power comes from data quality, not model size. Investing a significant portion of the month into data augmentation — extracting from unstructured PDFs/CSVs, translating between French and Arabic, normalizing product categories, and structuring into rich JSON with keywords and key_points — was what made the system actually useful. The Gemini 2.5 Pro model was essential for this: the Flash model couldn't reliably parse the messy, multilingual source documents.
Going solo against teams of 3 forced absolute prioritization. I chose to build 4 focused services rather than one monolithic app — this let me test each service independently and iterate faster without the coordination overhead that teams face.
Qwen3-4B is a beast for its size via Ollama — the GGUF Q4_K_M quantization runs on consumer hardware while maintaining strong instruction-following. Combined with the 15K context window and strict factuality prompts, it answers insurance questions accurately without hallucinating.
FlashRank re-ranking was the biggest accuracy win after data quality. The difference between naive vector search and re-ranked results was dramatic — it eliminated the 'right category, wrong product' failure mode that plagued early iterations.
Docker Compose per service is the solo developer's secret weapon for hackathons. Each service has its own docker-compose.yml, which means I could demo any subset of the system even if other parts were still in development.