Moufida (Tunisian Arabic for 'useful/helpful') is a fully local-first, privacy-preserving AI desktop copilot built during the AI Minds hackathon in a 24-hour sprint. It runs entirely on local hardware with no mandatory cloud dependencies. A Tauri + Next.js transparent desktop overlay connects to 5 independent Python microservices: a multimodal retrieval engine that generates CLIP/BLIP/Whisper embeddings locally, a semantic search service using a custom XQdrant fork for explainable similarity scores, a natural-language file organisation service using DBSCAN clustering, a voice copilot with local STT and TTS, and an autonomous knowledge-gap detection engine backed by MongoDB and APScheduler.
Modern AI tools require cloud connectivity and send user data to external servers. For sensitive domains (medical, legal, research), this is a hard blocker. The challenge was to build a fully-featured, multi-modal AI assistant that operates entirely on local hardware within 24 hours — with no privacy trade-offs, no vendor lock-in, and no latency penalty from cloud round-trips.
- 01
Built 5 independent FastAPI microservices (ports 8000–8400) behind a Tauri + Next.js transparent desktop overlay providing 6 panels: Agents, Copilot, Files, Graph, Insights, and Settings. Services communicate directly without a central gateway.
- 02
Engineered the Retrieval Service (port 8100): a multimodal ingestion engine that detects file modality (text/PDF/DOCX/image/audio/video/HTML), generates embeddings in shared CLIP space (with BLIP for image captioning and Whisper transcription for audio/video), stores vectors in Qdrant/XQdrant and metadata in MongoDB, and builds a dynamic similarity graph with timeline events. Includes a filesystem watcher and weekly digest generation.
- 03
Implemented explainable semantic search via the Search Service (port 8400) using the custom XQdrant fork — every result can include a `score_explanation` field decomposing the vector similarity score for mathematical transparency. An agent search mode adds Qwen3 LLM query reformulation, per-result reasoning, and summary insights.
- 04
Built the Organisation Service (port 8200) for natural-language filesystem planning: groups files using DBSCAN clustering (eps=0.35, min_samples=2) over their vector embeddings, generates LLM-assisted folder plans, executes a preview-then-apply dry-run workflow, and updates MongoDB metadata/timeline collections.
- 05
Integrated a full voice pipeline in the Copilot Service (port 8300): faster-whisper (CTranslate2) for offline STT, Piper TTS for local text-to-speech, and a streaming LLM chat loop calling Qwen3 4B through an OpenAI-compatible ngrok endpoint. Added Knowledge Gap Service (port 8000) with 5 APScheduler-driven detection strategies: topic sparsity, incomplete plans, unresolved questions, abandoned topics, and decisions without justification.
The Tauri desktop overlay (Next.js 6-panel UI) calls 5 FastAPI services directly. Retrieval (8100) is the data backbone: it ingests files/URLs, generates CLIP/BLIP/Whisper embeddings locally, upserts to XQdrant vectors and MongoDB metadata, and maintains a similarity graph. Search (8400) queries the shared XQdrant index with CLIP-embedded queries and optionally runs Qwen3 agent reasoning. Organisation (8200) fetches vectors and metadata, runs DBSCAN clustering, presents a dry-run plan, then applies confirmed moves. Copilot (8300) runs faster-whisper STT → Qwen3 chat → Piper TTS locally. Knowledge Gap (8000) uses APScheduler to scan MongoDB every 720 minutes for sparse/abandoned/unresolved topics. Qwen3 4B runs via Ollama on a local machine and is exposed OpenAI-compatibly over ngrok so all services share one model host without moving data off-device. Prometheus + custom observability are wired to every service.
Transparent desktop overlay with 6 panels: Agents, Copilot, Files, Graph, Insights, Settings
Multimodal embedding pipeline — text/image/audio/video all embedded in shared CLIP space, locally
Modified Qdrant with `score_explanation` field for explainable vector similarity decomposition
Scikit-learn DBSCAN (eps=0.35) over vector embeddings for natural-language file organisation plans
Local STT via CTranslate2 Whisper and local TTS via Piper — fully offline voice pipeline
Local Ollama model exposed OpenAI-compatibly over ngrok for search reasoning, organisation, and chat
Stores multimodal metadata, similarity graph, timeline events, chat history, and knowledge gaps
Knowledge Gap service runs 5 gap detection strategies every 720 minutes via APScheduler
Full working prototype delivered in a single hackathon sprint
Retrieval, Search, Organisation, Copilot, Knowledge Gap — all independent
All embeddings, STT, and TTS run on-device — zero cloud data transfer
Text, PDF, DOCX, image, audio, video, and HTML ingested in shared CLIP space
XQdrant `score_explanation` gives per-dimension vector similarity breakdown
Topic sparsity, incomplete plans, unresolved questions, abandoned topics, unjustified decisions
Building 5 independent services in 24 hours works only if each service has a clearly bounded responsibility from the start — we defined the port map and API contracts in the first 30 minutes, then each engineer worked in parallel.
XQdrant's score_explanation field is a major UX differentiator: showing users *why* a document was retrieved (which dimensions contributed to the score) builds trust that 'smart' keyword search never could.
Qwen3 4B punches well above its weight class — it consistently tops benchmarks against models 2–3× its size on reasoning, instruction-following, and multilingual tasks. Running it locally via Ollama gave us near-instant cold-start and effectively zero marginal cost per query, which matters when 5 services are all calling it in parallel.
We initially tried to upgrade to Qwen3.5 VL for native vision-language understanding (directly describing or reasoning about ingested images without needing separate BLIP captioning). It turned out Ollama doesn't yet support the VL architecture introduced in the Qwen3.5 visual series — the multimodal projector layers aren't mapped in the GGUF backend yet — so we fell back to our BLIP-in-CLIP pipeline, which worked well enough for the 24h scope.
faster-whisper + Piper TTS is the fastest path to a fully local voice pipeline: both are CTranslate2-native, no internet required, and combined latency (STT + TTS roundtrip) was under 2 seconds on commodity hardware.
Knowledge gap detection via scheduled MongoDB analysis is underrated — flagging 'abandoned topics' and 'unresolved questions' automatically surfaces blind spots users didn't even know they had.