RAGAS Evaluation
Systematic quality measurement of our RAG pipeline using RAGAS (Retrieval Augmented Generation Assessment) metrics, scored using the official RAGAS Python framework with Claude as the evaluator LLM.
Last evaluated: April 6, 2026 • 15 test cases
What is RAGAS?
Traditional Evaluation
- •Manual spot-checking of search results
- •Vibes-based quality assessment
- •No systematic measurement framework
- •Hard to detect regressions between changes
RAGAS Framework
- ✓4 standardized metrics covering retrieval + generation
- ✓LLM-as-judge for automated, scalable evaluation
- ✓Curated test dataset with ground truth answers
- ✓Reproducible scores for regression detection
Implementation note: Scored using the official RAGAS Python library (v0.4) with Claude Haiku as the evaluator LLM. A two-stage pipeline runs queries through the live RAG system (TypeScript), then scores the outputs using the industry-standard RAGAS framework (Python).
Overall Scores
Routing
Query classification accuracy
Faithfulness
Answers grounded in retrieved context
Relevancy
Answers address the actual question
Ctx Precision
Retrieved docs are relevant & well-ranked
Ctx Recall
Retrieved docs cover the ground truth
Scores by Query Type
⚡Simple Lookup
5 queries🧠Advisory
3 queries⚖️Comparison
2 queries🚫Off-Topic
3 queries🧪Edge Case
2 queriesIndividual Test Results
01✓3M N95 respiratorHybrid75%
Retrieved Products (20)
Generated Answer
We have two excellent 3M N95 respirator options available. The 3M 8210 offers basic NIOSH-approved N95 protection, while the 3M 8511 features a Cool Flow valve for improved comfort during extended wear. Recommended products: 3M 8210 N95 Particulate Respirator (20 pack): NIOSH-approved N95 disposabl...
Latency: 8078ms
02✓20-amp GFCI circuit breakerHybrid93%
Retrieved Products (20)
Generated Answer
For 20-amp GFCI protection, I recommend the Square D QO 20-Amp GFCI Circuit Breaker which provides Class A 5mA GFCI protection with self-test functionality. If you need both arc-fault and ground-fault protection for areas like bedrooms or kitchens, consider the Square D dual-function AFCI/GFCI break...
Latency: 8848ms
03✓Klein Tools wire stripperHybrid82%
Retrieved Products (20)
Generated Answer
Based on your search for Klein Tools wire strippers, I found the Klein Tools 11055 Wire Stripper/Cutter in our catalog. This is a precision tool designed for both solid and stranded wire applications with hardened steel construction. Recommended products: Klein Tools 11055 Wire Stripper/Cutter: Thi...
Latency: 7030ms
04✓copper pipe fittings push to connectHybrid87%
Retrieved Products (20)
Generated Answer
For copper pipe push-to-connect fittings, the SharkBite push-fit products are your best option from our available inventory. These fittings eliminate the need for soldering and work directly with copper, PEX, or CPVC pipe. Recommended products: SharkBite 1/2" Push-to-Connect Coupling: Direct answer...
Latency: 7907ms
05✓HVAC air filter 20x25Hybrid94%
Retrieved Products (20)
Generated Answer
For a 20x25 HVAC air filter, I found one exact match in our catalog. The Honeywell FC100A1037 is a 20x25x5 media filter, but note this is a 5-inch thick filter designed for whole-house air cleaner systems, not standard 1-inch furnace filters. Recommended products: Honeywell FC100A1037 20x25x5 Media...
Latency: 11283ms
06✓What PPE do I need for metal grinding?Agentic63%
Retrieved Products (22)
Generated Answer
Metal grinding requires comprehensive PPE to protect against sparks, flying particles, noise, and respiratory hazards. Essential equipment includes eye/face protection, respiratory protection, hearing protection, hand protection, and head protection. Recommended products: Lincoln Electric Auto-Dark...
Latency: 18535ms
07✓What do I need to wire a 20-amp outdoor circuit?Agentic58%
Retrieved Products (17)
Generated Answer
For a 20-amp outdoor circuit, you'll need GFCI protection, weather-resistant components, and proper wire rated for outdoor use. The total project cost for materials is approximately $193. Recommended products: Square D QO 20-Amp GFCI Circuit Breaker: Provides required GFCI protection at the panel f...
Latency: 21258ms
08✓Help me set up a basic plumbing repair kitAgentic64%
Retrieved Products (26)
Generated Answer
A basic plumbing repair kit requires adjustable pliers for gripping pipes and fittings, plus a screwdriver set for accessing fixtures and components. The search results include excellent water pump pliers and tongue & groove pliers specifically designed for plumbing work, along with a quality screwd...
Latency: 66951ms
09✓DeWalt vs Milwaukee drillAgentic59%
Retrieved Products (12)
Generated Answer
Based on my search results, I can only find one actual drill in the catalog - the DeWalt DCD791D2. Unfortunately, there don't appear to be any Milwaukee drills available in this catalog's current inventory. Here's what I found: **DeWalt Available:** - **DeWalt DCD791D2 20V MAX XR Brushless Drill/...
Latency: 17796ms
10✓Which is better for home use: GFCI or AFCI breaker?Agentic63%
Retrieved Products (12)
Generated Answer
GFCI breakers protect against ground faults (shock hazards), while AFCI breakers protect against arc faults (fire hazards). For modern homes, dual-function AFCI/GFCI breakers are often the best choice as they provide both protections and meet current NEC requirements for bedrooms, kitchens, and laun...
Latency: 30161ms
11✓What is the best pizza recipe?Off-Topic100%
Latency: 875ms
12✓How do I train for a marathon?Off-Topic100%
Latency: 820ms
13✓Explain quantum computingOff-Topic100%
Latency: 952ms
14✓safetyHybrid16%
Retrieved Products (20)
Generated Answer
Based on your general safety inquiry, I can recommend a comprehensive range of personal protective equipment (PPE) from our catalog covering respiratory, head, eye, hand, hearing, and fall protection. The selection includes both basic safety essentials and specialized protection for specific work en...
Latency: 12938ms
15✓I need to fix a leaky faucet and also replace a circuit breakerAgentic58%
Retrieved Products (28)
Generated Answer
For your leaky faucet repair, I found a Moen replacement cartridge that includes O-rings, which are the most common cause of faucet leaks. For the circuit breaker replacement, I have several options including standard 20-amp breakers and GFCI/AFCI breakers depending on your specific needs and electr...
Latency: 19503ms
Methodology
01 Faithfulness
Extracts atomic factual claims from the generated answer, then verifies each claim against the retrieved contexts. A claim must be directly supported — not merely plausible. Score = supported claims / total claims.
Scored by: RAGAS + Claude Haiku
02 Answer Relevancy
Generates hypothetical questions the answer would address, embeds them alongside the original query, and computes average cosine similarity. High similarity = answer stays on-topic.
Scored by: RAGAS + Azure OpenAI Embeddings
03 Context Precision
For each retrieved document (in search rank order), judges relevance to the query. Computes precision at each rank position, rewarding relevant documents appearing at the top. Measures retrieval ranking quality.
Scored by: RAGAS + Claude Haiku
04 Context Recall
Extracts factual claims from the ground truth answer, then checks whether the retrieved contexts contain supporting information. Measures how completely the retrieval covers the ideal answer. This is the only metric requiring ground truth.
Scored by: RAGAS + Claude Haiku
Evaluation Pipeline
Retrieval Ablation Study
Head-to-head comparison of 4 retrieval strategies on 12 labeled queries using standard IR metrics. Unlike RAGAS (end-to-end), these isolate pure retrieval quality from generation. The production mode (Hybrid + Semantic Reranking) is highlighted — this ablation justifies that architectural choice.
| Retrieval Mode | R@5 | R@10 | P@5 | P@10 | MRR | NDCG@10 | Latency |
|---|---|---|---|---|---|---|---|
BM25 Only Pure keyword search using BM25 scoring. No vectors, no reranking. | 70% | 76% | 37% | 26% | 84% | 74% | 49ms |
Vector Only Pure vector search using cosine similarity. No keyword matching. | 77% | 87% | 47% | 30% | 96% | 86% | 45ms |
Hybrid (no rerank) BM25 + vector fusion. No semantic reranking layer. | 77% | 84% | 47% | 28% | 92% | 83% | 45ms |
Hybrid + Semantic RerankingProduction Production: BM25 + vector + Microsoft semantic reranker. | 79% | 91% | 50% | 33% | 96% | 90% | 76ms |
Hybrid + Semantic + Field BoostingEnhanced Production + scoring profile: name×5, sku×4, brand×3, category×1.5 weights. | 79% | 91% | 50% | 33% | 96% | 90% | 78ms |
Per-Query Results
013M N95 respiratorsimple lookup3 GT
0220-amp GFCI circuit breakersimple lookup2 GT
03Klein Tools wire strippersimple lookup1 GT
04copper pipe fittings push to connectsimple lookup2 GT
05HVAC air filter 20x25simple lookup2 GT
06What PPE do I need for metal grinding?advisory8 GT
07What do I need to wire a 20-amp outdoor circuit?advisory6 GT
08Help me set up a basic plumbing repair kitadvisory7 GT
09DeWalt vs Milwaukee drillcomparison2 GT
10Which is better for home use: GFCI or AFCI breaker?comparison2 GT
11safetyedge case7 GT
12I need to fix a leaky faucet and also replace a circuit breakeredge case4 GT
IR Metrics
Embedding Model Comparison
Comparison of embedding models using Hybrid + Semantic Reranking mode (production configuration). Each model was used to embed both documents and queries, with separate Azure AI Search indexes per model.
| Embedding Model | Dims | R@5 | R@10 | P@5 | MRR | NDCG@10 |
|---|---|---|---|---|---|---|
text-embedding-3-smallProduction | 1,536 | 79% | 91% | 50% | 96% | 90% |
text-embedding-3-large | 3,072 | 78% | 92% | 48% | 96% | 90% |
text-embedding-3-large (reduced) | 1,536 | 79% | 92% | 50% | 96% | 90% |
Cohere Embed v4 | 1,536 | 79% | 91% | 50% | 96% | 90% |
Methodology: Each embedding model was used to index the full product catalog into a separate Azure AI Search index, then the same test queries were run against each index. BM25 scores are identical across models (keyword search is embedding-agnostic); differences appear in the vector and hybrid modes where embedding quality directly impacts retrieval.