Architecture & Pipeline Flow
A technical walkthrough of how queries flow through the system — from intent classification to result delivery.
1Request Flow
User Query
User enters a natural-language search query
e.g. "What PPE do I need for metal grinding?" or "3M N95 respirator"
Query Classification (Claude Haiku)
Claude 3.5 Haiku analyzes intent and classifies the query
Routes to HYBRID (product lookups, SKU searches) or LLM_AUGMENTED (advisory questions, recommendations). Also produces a refined query optimized for the search index.
Query Embedding (Azure OpenAI)
The refined query is converted into a 1536-dimension vector
Uses text-embedding-3-small deployed on Azure OpenAI to capture semantic meaning for vector similarity search.
Hybrid Search (Azure AI Search)
Three search strategies execute simultaneously
BM25 keyword matching, HNSW vector similarity, and semantic reranking all run in parallel. Results are fused using Reciprocal Rank Fusion (RRF). Faceted counts are returned for filtering.
AI Synthesis (Claude Sonnet 4)
ConditionalLLM_AUGMENTED only — Claude generates an expert answer
The top 8 retrieved products are passed as context to Claude Sonnet 4, which acts as a senior technical advisor. It produces structured recommendations, safety notes, and pro tips grounded in actual catalog data.
Response Delivery
Unified response returned with full transparency
Results, AI answer (if applicable), facets, latency timings, and complete RAG debug info are returned to the frontend.
2Intelligent Query Routing
HYBRID Mode
Direct product retrieval using keyword + vector search. Fast, precise, and best for queries with clear product intent.
Triggers on
- Specific product lookups
- Brand + product searches
- SKU or part number searches
- Category browsing
- Attribute-based filtering
Example query
"3M N95 respirator"
LLM_AUGMENTED Mode
Retrieves products, then Claude synthesizes an expert answer with recommendations, safety notes, and trade tips.
Triggers on
- Advisory / "what do I need" questions
- Multi-product recommendation requests
- Project-based queries
- Comparison questions
- Questions requiring domain expertise
Example query
"What PPE do I need for metal grinding?"
Off-Topic Guardrail
The classifier also rejects queries unrelated to industrial supply (food, entertainment, general knowledge, etc.) — saving embedding and search costs. Off-topic queries short-circuit the pipeline after Step 1 and return a friendly redirect message with example queries.
Example rejected query
"best pizza near me" → OFF_TOPIC
3Technology Stack
Next.js 14
Frontend & API Routes
React-based framework with server-side API routes handling the entire RAG pipeline. App Router with Suspense for streaming UI.
Azure AI Search
Hybrid Retrieval Engine
Combines BM25 keyword matching, HNSW vector index, and semantic reranking via Reciprocal Rank Fusion (RRF). Supports faceted navigation.
Azure OpenAI
Query Embedding
text-embedding-3-small model generates 1536-dimension vectors for semantic similarity search against the product catalog.
Claude 3.5 Haiku
Query Classification
Fast, cost-efficient model that classifies query intent and produces an optimized search query. ~3.7x cheaper than Sonnet for this simple structured task.
Claude Sonnet 4
Answer Synthesis
Generates expert-level, structured answers grounded in retrieved products. Acts as a senior technical advisor with safety awareness.
TypeScript
End-to-End Type Safety
Full type coverage from API response schemas to UI component props. Discriminated unions for state management, strict mode enabled.
4Search Features
Hybrid Retrieval
BM25 keyword matching and HNSW vector similarity run in parallel, fused via RRF for optimal recall and precision.
Semantic Reranking
Azure AI Search applies a cross-encoder reranker on top of initial retrieval to improve result ordering.
Faceted Navigation
Dynamic facets for category, subcategory, brand, and price range — computed server-side with each query.
RAG Pipeline Inspector
Full transparency into every pipeline step: prompts sent to Claude, raw responses, and per-step latency timings.
Grounded AI Answers
Synthesis is constrained to only reference products present in the search results — no hallucinated recommendations.
Graceful Degradation
Router parse errors fall back to HYBRID mode. Synthesis parse errors return structured fallback from raw results.
5Cost Optimization
The pipeline is designed for production cost efficiency by using the right model for each task:
| Task | Model | Why | Input Cost |
|---|---|---|---|
| Classification | Haiku 3.5 | Simple structured output — speed & cost matter most | $0.80/M |
| Synthesis | Sonnet 4 | Complex reasoning — answer quality is critical | $3.00/M |
| Embedding | text-embedding-3-small | 1536-dim vectors — best cost/quality for retrieval | $0.02/M |