(01)Overview

QDesign is an AI-driven collaborative biological design workbench built for the Vectors in Orbit hackathon (Qdrant × InstaDeep × YellowSys), where it won 1st Place in the Multimodal Biological Design & Discovery category. Scientists upload PDFs, PDB/CIF structures, FASTA sequences, and images into unified project workspaces. XQdrant — our custom Rust fork of Qdrant — generates explainable vector search results that tell researchers not just that two proteins are similar, but why (e.g. 'α-helix propensity: 87.8%, surface charge: 5.9%'). A custom SciAgents-inspired 5-agent Co-Scientist (Planner → Ontologist → Scientist → Scientist² → Critic via OpenRouter) assists with hypothesis generation and auto-drafts IEEE-formatted research papers.

(02)The Problem

Standard vector databases are black boxes for scientists — they output a similarity score but give no reasoning. In drug discovery and protein engineering, 'why are these similar?' is as important as 'how similar are they?'. Existing tools also force researchers to work across fragmented formats (PDB, FASTA, PDF, images) with no unified semantic layer.

(03)Solution

01
Forked Qdrant's Rust codebase into XQdrant, extending the core with an `explainability.rs` module that identifies the top contributing dimensions of a 1280-dim ESM-2 cosine similarity score and maps them to biological properties using a `esm2_dim_to_biological_property.json` mapping derived from linear probing experiments. Every search result includes a `score_explanation` field breaking down α-helix propensity, surface accessibility, and charge distribution contributions.
02
Engineered a multimodal vector pipeline across 4 XQdrant collections: `qdesign_structures` and `qdesign_sequences` using ESM-2 (1280-dim) for PDB/CIF structures and FASTA homology, `qdesign_text` using MiniLM-L6-v2 (384-dim) for PDFs and research papers, and `qdesign_images` using CLIP ViT-B-32 (512-dim) for diagrams and microscopy images.
03
Built a FastAPI Knowledge Service (port 8001) that generates auto-wired semantic knowledge graphs from uploaded files — each edge includes a `biological_explanation` field derived from XQdrant explainability metadata. Visualized with @xyflow/react in the Next.js frontend.
04
Implemented the HITL Co-Scientist (FastAPI :8000 + Streamlit :8501) — a fully custom 5-agent pipeline inspired by the SciAgents paper (Ghafarollahi & Buehler 2024): Planner (networkx KG path-finding) → Ontologist (semantic interpretation) → Scientist (7-point hypothesis framework) → Scientist² (quantitative expansion with predictions & protocols) → Critic (scientific scoring). Each agent calls OpenRouter (default: `anthropic/claude-3.5-sonnet`) via httpx. HITL checkpoints via `/v2/hitl/run` — Approve / Modify / Reject at each stage. Results export to `workflow_outputs/`.
05
Built the NestJS 11 Core (MongoDB + Socket.io 4.8 + JWT) for auth, project workspaces with git-like objective versioning, real-time collaboration sync, and file storage. Frontend in Next.js 16.1 + React 19, with NGL for interactive 3D molecular structure visualization.

(04)Architecture

Next.js 16.1 + React 19 frontend communicates with a NestJS 11 Core (MongoDB, JWT, Socket.io WebSockets for real-time sync). The Core delegates to two Python microservices: Knowledge Service (FastAPI, port 8001) for graph generation and retrieval, and Co-Scientist Service (FastAPI :8000 + Streamlit :8501) for HITL inference and JSON export. The Co-Scientist runs a 5-agent SciAgents pipeline (Planner → Ontologist → Scientist → Scientist² → Critic) — each agent calls OpenRouter via httpx, uses networkx for KG path-finding, scispaCy for biomedical NER, and pdfplumber/PyMuPDF for paper parsing. Both Python services query XQdrant (our Rust fork of Qdrant, port 6333) for explainable vector search across 4 collections (ESM-2, MiniLM-L6-v2, CLIP ViT-B-32). Biopython parses CIF/FASTA; PyTorch runs ESM-2 locally.

(05)Tech Stack

XQdrant (Rust fork)

Custom Qdrant fork with `explainability.rs` — decomposes 1280-dim ESM-2 cosine similarity into biological property contributions

ESM-2 (1280-dim)

Meta AI protein language model powering structure and sequence similarity across PDB/CIF and FASTA files

MiniLM-L6-v2 + CLIP ViT-B-32

Text embeddings (384-dim) for PDFs/papers and image embeddings (512-dim) for diagrams/microscopy

SciAgents Pipeline (5 Agents)

Planner → Ontologist → Scientist → Scientist² → Critic calling OpenRouter via httpx; HITL via /v2/hitl/run; Streamlit UI on :8501

NestJS 11 + MongoDB

Core backend — auth (JWT/Passport), project workspaces with git-like versioning, Socket.io real-time sync

Next.js 16.1 + React 19

Collaborative UI — @xyflow/react knowledge graph, NGL 3D molecular viewer, Framer Motion animations

FastAPI (Knowledge + Co-Scientist)

Knowledge Service on :8001 for graph generation; Co-Scientist on :8000 (API) + :8501 (Streamlit HITL UI) for inference and export

(06)Results

🏆 1st Place

Vectors in Orbit — Qdrant × InstaDeep × YellowSys hackathon

4 Vector Collections

ESM-2 structures/sequences (1280-dim), MiniLM text (384-dim), CLIP images (512-dim)

Explainable Search

XQdrant score_explanation maps similarity to α-helix, surface charge, and flexibility

HITL Co-Scientist

5-agent SciAgents pipeline: Planner → Ontologist → Scientist → Scientist² → Critic via OpenRouter

IEEE Export

Auto-generates formatted research papers from project findings and knowledge graph

Live Demo

Deployed at qdesign.moetezfradi.me with real-time collaboration

(07)Achievement Gallery

01Moment
🏆 1st Place — Vectors in Orbit award ceremony (Qdrant × InstaDeep × YellowSys), February 2026

02Moment
Team Seleçaooooo celebrating 1st Place with the 3000 DT prize at SUP'COM

(08)Takeaways

(01)

Forking Qdrant's Rust core to add the explainability module was the single highest-risk bet of the hackathon — but the `score_explanation` field it unlocked was also what won us first place. Judges asked 'can you show why?' and we could.

(02)

The dimension-to-biological-property mapping needed probing experiments done offline before the hackathon sprint, not during it — that upfront research investment was what made XQdrant's explanations scientifically meaningful rather than arbitrary.

(03)

Running 4 embedding models (ESM-2, MiniLM-L6-v2, CLIP, Biopython) in a unified search pipeline across different vector dimensions required careful collection architecture. Keeping them in separate named collections with type filters was simpler than a single mixed collection.

(04)

Building the Co-Scientist as a custom 5-agent SciAgents pipeline (Planner → Ontologist → Scientist → Scientist² → Critic), rather than using a framework like LangGraph, meant full control over prompt structure, retry logic, and JSON parsing per agent — no abstraction layer fighting us under hackathon time pressure. The Scientist² step (quantitative expansion with predictions and experimental protocols) was the detail that convinced the judges the system produced actionable science, not just chatbot output.

(05)

Biologists and hackathon judges both responded far more strongly to the knowledge graph visualization than to any other feature — seeing PDF nodes, protein nodes, and image nodes connected by biologically-labeled edges made the platform's value immediately legible.

←PreviousMoufida Next→VeriVest

All projects