🥇 1st Place — NEXT Challenge (Solo)August – September 2025Month-long challengeSolo (vs teams of 3)

Agent IA BH Assurance — Le Conseiller Augmenté

Role: Solo Engineer — Full Stack & AI

FastAPIQdrantOllamaQwen3FlashRankn8n

Overview

Agent IA BH Assurance: Le Conseiller Augmenté — a complete AI-powered insurance advisory platform built solo for BH Assurance's NEXT Challenge, winning 1st Place against teams of 3 over a month-long competition (August–September 2025). The architecture follows a Mixture-of-Experts (MoE) pattern: one central receptionist AI agent classifies user intent and dynamically routes to specialized expert agents (RAG expert, client data expert, scenario expert) — like a gating mechanism that only activates the right expert per query. The system runs as 4 services: a Next.js 14 frontend + Express.js chat backend (Prisma, PostgreSQL, JWT), a FastAPI RAG service (Qdrant with Qwen3-Embedding-8B at 4096-dim, FlashRank re-ranker, Qwen3-4B-Instruct via Ollama), a FastAPI client management API (Oracle DB integration), and a FastAPI quote generation service (SQLAlchemy + Alembic). The critical differentiator was meticulous data engineering: I manually extracted, translated (French/Arabic/mixed), and restructured BH Assurance's entire product catalog from dozens of unstructured PDFs and CSVs using Gemini 2.5 Pro — because RAG's power comes from data quality, not model size.

The Problem

BH Assurance's insurance advisors spend excessive time searching through hundreds of product documents across 6 insurance branches (automobile, vie, santé, engineering, transport, IARD) to answer client questions. The source material is scattered across multilingual PDFs and CSVs — some in French, some in Arabic, some in mixed formats — with no consistent structure. Existing chatbot solutions fail because they can't handle this data quality issue: garbage in, garbage out. The challenge required a production-grade advisory system that could accurately answer questions about all BH Assurance products.

Solution

1

The single most important engineering decision was investing heavily in data quality before touching any AI. I used Gemini 2.5 Pro (the best model at the time for this task — the Flash model couldn't reliably read the unstructured data) to extract, translate, and restructure BH Assurance's entire product catalog from dozens of PDFs and CSVs. The source data was in different languages (French, Arabic, mixed), unstructured, and inconsistent. I formatted every product into structured JSON with normalized fields: `product_category`, `product_name`, `branche`, `garantie`, `keywords`, `key_points`, and `references`. This augmented dataset is what made the RAG system actually useful — without it, vector search returns noise.

2

Built the RAG service (FastAPI on port 8000) with a 4-phase pipeline: (1) Enhanced semantic search using Qwen3-Embedding-8B (4096-dim vectors via Ollama) stored in Qdrant with category-aware payload filtering across 6 insurance categories (automobile, vie, santé, engineering, transport, IARD), (2) FlashRank re-ranking of candidate documents with score breakdown (semantic + domain + transport-specific bonuses), (3) intelligent context construction that groups results by product/article type and detects comparison queries, (4) Qwen3-4B-Instruct-2507 (GGUF Q4_K_M via Ollama) for answer generation with strict factuality prompts — the system prompt enforces 100% fidelity to retrieved context and explicitly refuses to hallucinate.

3

Built the chat frontend (Next.js 14 + TypeScript + Tailwind 3.4 + Zustand + Framer Motion + react-markdown) with Express.js backend (Prisma ORM + PostgreSQL for conversation persistence, JWT/bcrypt auth, user memory management). The FastAPI client management API integrates with Oracle DB for real client data lookup, and the quote generation service (FastAPI + SQLAlchemy + Alembic) handles insurance quote sessions with BH Assurance's pricing API.

4

Orchestrated the multi-agent conversation flow using a MoE-inspired architecture with 4 n8n workflow definitions: `Agent bh assurance.json` acts as the central receptionist/router — it classifies the user's intent and dynamically dispatches to the right expert agent, like a gating mechanism. `rag expert bh assurance.json` handles product knowledge queries via the RAG pipeline, `call client data expert bh assurance.json` looks up real client data from Oracle DB, and `scenario_expert.json` analyzes complex insurance scenarios. Only the relevant expert is activated per query — the others stay idle, keeping responses fast and focused.

Architecture

MoE-inspired multi-agent architecture: Next.js 14 frontend (port 3000) communicates with an Express.js chat backend (Prisma + PostgreSQL, JWT auth). The backend routes all queries to the central receptionist agent in n8n, which classifies intent and dispatches to the appropriate expert: (1) the RAG expert calls the FastAPI RAG service (port 8000) using Qdrant (Qwen3-Embedding-8B, 4096-dim) + FlashRank re-ranker + Qwen3-4B via Ollama for insurance Q&A, (2) the client data expert calls the FastAPI client API for Oracle DB lookups, (3) the scenario expert runs complex analysis, and (4) the quote expert calls the FastAPI quote service (SQLAlchemy + Alembic) for pricing. All services are Dockerized with individual docker-compose.yml files.

Tech Stack

Qwen3-Embedding-8B (4096-dim)

Ollama-served embedding model for high-dimensional insurance document vectors — stored in Qdrant collection `bh_assurance`

Qwen3-4B-Instruct-2507

GGUF Q4_K_M quantized LLM via Ollama — strict factuality prompts, 15K context window, 0.3 temperature default

FlashRank Re-ranker

Advanced hybrid re-ranking with score breakdown: semantic match + domain expertise + transport-specific bonuses

Qdrant Vector Database

Category-aware payload filtering across 6 insurance branches — automobile, vie, santé, engineering, transport, IARD

Gemini 2.5 Pro (Data Pipeline)

Used for bulk extraction, translation (French/Arabic/mixed), and structuring of unstructured PDFs/CSVs into augmented JSON — Flash model couldn't handle the data quality

Next.js 14 + Express.js + Prisma

Chat frontend (Zustand, react-markdown, Framer Motion) with PostgreSQL-backed conversation persistence and JWT auth

n8n MoE Workflows (4 Agents)

Receptionist router + 3 experts (RAG, client data, scenario) — only the relevant expert is activated per query, like a MoE gating mechanism

Docker Compose

Individual docker-compose.yml per service — one-command deployment for each component

Results

🥇 1st Place

Won solo against teams of 3 at BH Assurance's NEXT Challenge

Solo vs Teams of 3

Engineered the entire 4-service platform, data pipeline, and n8n workflows alone

4096-dim Embeddings

Qwen3-Embedding-8B vectors in Qdrant — high-dimensional insurance domain search

6 Insurance Categories

Complete coverage: automobile, vie, santé, engineering, transport, IARD

100% Local Inference

All LLM processing via Ollama — zero data leaves BH Assurance infrastructure

Augmented Data Pipeline

Extracted, translated, and structured entire product catalog using Gemini 2.5 Pro

🏆

Achievement Gallery

Moments that made it all worth it

🥇 1st Place — NEXT Challenge by BH Assurance, September 2025 (Solo vs teams of 3)

Achievement 01

🥇 1st Place — NEXT Challenge by BH Assurance, September 2025 (Solo vs teams of 3)

Key Takeaways

RAG's power comes from data quality, not model size. Investing a significant portion of the month into data augmentation — extracting from unstructured PDFs/CSVs, translating between French and Arabic, normalizing product categories, and structuring into rich JSON with keywords and key_points — was what made the system actually useful. The Gemini 2.5 Pro model was essential for this: the Flash model couldn't reliably parse the messy, multilingual source documents.

Going solo against teams of 3 forced absolute prioritization. I chose to build 4 focused services rather than one monolithic app — this let me test each service independently and iterate faster without the coordination overhead that teams face.

Qwen3-4B is a beast for its size via Ollama — the GGUF Q4_K_M quantization runs on consumer hardware while maintaining strong instruction-following. Combined with the 15K context window and strict factuality prompts, it answers insurance questions accurately without hallucinating.

FlashRank re-ranking was the biggest accuracy win after data quality. The difference between naive vector search and re-ranked results was dramatic — it eliminated the 'right category, wrong product' failure mode that plagued early iterations.

Docker Compose per service is the solo developer's secret weapon for hackathons. Each service has its own docker-compose.yml, which means I could demo any subset of the system even if other parts were still in development.