Architecture & Pipeline Flow

A technical walkthrough of how queries flow through the system — from intent classification to result delivery.

1Request Flow

🔍

User Query

User enters a natural-language search query

e.g. "What PPE do I need for metal grinding?" or "3M N95 respirator"

🧠

Query Classification (Claude Haiku)

Claude 3.5 Haiku analyzes intent and classifies the query

Routes to HYBRID (product lookups, SKU searches) or LLM_AUGMENTED (advisory questions, recommendations). Also produces a refined query optimized for the search index.

📐

Query Embedding (Azure OpenAI)

The refined query is converted into a 1536-dimension vector

Uses text-embedding-3-small deployed on Azure OpenAI to capture semantic meaning for vector similarity search.

Hybrid Search (Azure AI Search)

Three search strategies execute simultaneously

BM25 keyword matching, HNSW vector similarity, and semantic reranking all run in parallel. Results are fused using Reciprocal Rank Fusion (RRF). Faceted counts are returned for filtering.

AI Synthesis (Claude Sonnet 4)

Conditional

LLM_AUGMENTED only — Claude generates an expert answer

The top 8 retrieved products are passed as context to Claude Sonnet 4, which acts as a senior technical advisor. It produces structured recommendations, safety notes, and pro tips grounded in actual catalog data.

📦

Response Delivery

Unified response returned with full transparency

Results, AI answer (if applicable), facets, latency timings, and complete RAG debug info are returned to the frontend.

2Intelligent Query Routing

HYBRID Mode

Direct product retrieval using keyword + vector search. Fast, precise, and best for queries with clear product intent.

Triggers on

  • Specific product lookups
  • Brand + product searches
  • SKU or part number searches
  • Category browsing
  • Attribute-based filtering

Example query

"3M N95 respirator"

🤖

LLM_AUGMENTED Mode

Retrieves products, then Claude synthesizes an expert answer with recommendations, safety notes, and trade tips.

Triggers on

  • Advisory / "what do I need" questions
  • Multi-product recommendation requests
  • Project-based queries
  • Comparison questions
  • Questions requiring domain expertise

Example query

"What PPE do I need for metal grinding?"

🛡️

Off-Topic Guardrail

The classifier also rejects queries unrelated to industrial supply (food, entertainment, general knowledge, etc.) — saving embedding and search costs. Off-topic queries short-circuit the pipeline after Step 1 and return a friendly redirect message with example queries.

Example rejected query

"best pizza near me" → OFF_TOPIC

3Technology Stack

Next.js 14

Frontend & API Routes

React-based framework with server-side API routes handling the entire RAG pipeline. App Router with Suspense for streaming UI.

Azure AI Search

Hybrid Retrieval Engine

Combines BM25 keyword matching, HNSW vector index, and semantic reranking via Reciprocal Rank Fusion (RRF). Supports faceted navigation.

Azure OpenAI

Query Embedding

text-embedding-3-small model generates 1536-dimension vectors for semantic similarity search against the product catalog.

Claude 3.5 Haiku

Query Classification

Fast, cost-efficient model that classifies query intent and produces an optimized search query. ~3.7x cheaper than Sonnet for this simple structured task.

Claude Sonnet 4

Answer Synthesis

Generates expert-level, structured answers grounded in retrieved products. Acts as a senior technical advisor with safety awareness.

TypeScript

End-to-End Type Safety

Full type coverage from API response schemas to UI component props. Discriminated unions for state management, strict mode enabled.

4Search Features

Hybrid Retrieval

BM25 keyword matching and HNSW vector similarity run in parallel, fused via RRF for optimal recall and precision.

Semantic Reranking

Azure AI Search applies a cross-encoder reranker on top of initial retrieval to improve result ordering.

Faceted Navigation

Dynamic facets for category, subcategory, brand, and price range — computed server-side with each query.

RAG Pipeline Inspector

Full transparency into every pipeline step: prompts sent to Claude, raw responses, and per-step latency timings.

Grounded AI Answers

Synthesis is constrained to only reference products present in the search results — no hallucinated recommendations.

Graceful Degradation

Router parse errors fall back to HYBRID mode. Synthesis parse errors return structured fallback from raw results.

5Cost Optimization

The pipeline is designed for production cost efficiency by using the right model for each task:

TaskModelWhyInput Cost
ClassificationHaiku 3.5Simple structured output — speed & cost matter most$0.80/M
SynthesisSonnet 4Complex reasoning — answer quality is critical$3.00/M
Embeddingtext-embedding-3-small1536-dim vectors — best cost/quality for retrieval$0.02/M