Ben Novak

RAGAS Evaluation

Systematic quality measurement of our RAG pipeline using RAGAS (Retrieval Augmented Generation Assessment) metrics, scored using the official RAGAS Python framework with Claude as the evaluator LLM.

Last evaluated: April 26, 202615 test cases

1

What is RAGAS?

Previous

Traditional Evaluation

  • Manual spot-checking of search results
  • Vibes-based quality assessment
  • No systematic measurement framework
  • Hard to detect regressions between changes
Current

RAGAS Framework

  • 4 standardized metrics covering retrieval + generation
  • LLM-as-judge for automated, scalable evaluation
  • Curated test dataset with ground truth answers
  • Reproducible scores for regression detection

Implementation note: Scored using the official RAGAS Python library (v0.4) with Claude Haiku as the evaluator LLM. A two-stage pipeline runs queries through the live RAG system (TypeScript), then scores the outputs using the industry-standard RAGAS framework (Python).

2

Overall Scores

100%

Routing

Query classification accuracy

71%

Faithfulness

Answers grounded in retrieved context

78%

Relevancy

Answers address the actual question

46%

Ctx Precision

Retrieved docs are relevant & well-ranked

76%

Ctx Recall

Retrieved docs cover the ground truth

3

Scores by Query Type

Simple Lookup

5 queries
Faith
88%
Relev
77%
CPrec
90%
CRec
75%

🧠Advisory

3 queries
Faith
60%
Relev
90%
CPrec
17%
CRec
100%

⚖️Comparison

2 queries
Faith
57%
Relev
79%
CPrec
25%
CRec
67%

🚫Off-Topic

3 queries
Faith
100%
Relev
100%
CPrec
100%
CRec
100%

🧪Edge Case

2 queries
Faith
56%
Relev
60%
CPrec
0%
CRec
50%
4

Individual Test Results

013M N95 respiratorHybrid53%
75%
Faithfulness
62%
Relevancy
50%
Ctx Precision
25%
Ctx Recall

Retrieved Products (20)

SF-001SF-002SF-003HV-002SF-012US-008SF-014SF-010SF-009SF-016+10 more

Generated Answer

We have two excellent 3M N95 respirator options available. Both are NIOSH-approved and filter at least 95% of airborne particles, with the key difference being whether you need extended comfort features for longer wear periods. Recommended products: 3M 8210 N95 Particulate Respirator (20 pack): Sta...

Latency: 9444ms

0220-amp GFCI circuit breakerHybrid93%
85%
Faithfulness
86%
Relevancy
100%
Ctx Precision
100%
Ctx Recall

Retrieved Products (20)

EL-003EL-007EL-004EL-001EL-008EL-009EL-005US-009EL-013EL-006+10 more

Generated Answer

For a 20-amp GFCI circuit breaker, I recommend the Square D QO 20-Amp GFCI Circuit Breaker which provides Class A 5mA GFCI protection with self-test functionality. If you need dual protection for areas like bedrooms or kitchens, consider the AFCI/GFCI combination breaker that meets current NEC requi...

Latency: 8862ms

03Klein Tools wire stripperHybrid78%
81%
Faithfulness
80%
Relevancy
100%
Ctx Precision
50%
Ctx Recall

Retrieved Products (20)

TL-001TL-002TL-003DC-007PL-001US-006US-011EL-010US-005EL-011+10 more

Generated Answer

For Klein Tools wire strippers, we have the Klein Tools 11055 Wire Stripper/Cutter available. This precision tool handles both solid and stranded wire in the most common AWG ranges with hardened steel construction to prevent conductor damage. Recommended products: Klein Tools 11055 Wire Stripper/Cu...

Latency: 10334ms

04Belden Cat6 plenum cable 1000 ftHybrid95%
100%
Faithfulness
80%
Relevancy
100%
Ctx Precision
100%
Ctx Recall

Retrieved Products (20)

DC-001DC-002DC-003DC-007DC-011DC-004US-007DC-006DC-005EL-006+10 more

Generated Answer

I found the Belden Cat6 plenum cable you're looking for. The Belden 2413 DataTwist is available in 1000 ft length with CMP plenum rating, perfect for air-handling spaces without conduit. Recommended products: Belden 2413 Cat6 Plenum-Rated Cable, 1000 ft (Blue): This is exactly what you requested - ...

Latency: 8334ms

05Axis indoor IP dome cameraHybrid94%
100%
Faithfulness
78%
Relevancy
100%
Ctx Precision
100%
Ctx Recall

Retrieved Products (20)

US-001US-002US-004US-003DC-006DC-009DC-012HV-004EL-013DC-003+10 more

Generated Answer

For Axis indoor IP dome cameras, I can recommend the Axis P3245-LV 2 MP Network Dome Camera with IR. This indoor fixed dome camera offers 1080p HDTV resolution with Lightfinder 2.0 low-light technology and built-in OptimizedIR up to 40 m range. Recommended products: Axis P3245-LV 2 MP Network Dome ...

Latency: 9013ms

06What PPE do I need for metal grinding?Agentic72%
42%
Faithfulness
95%
Relevancy
50%
Ctx Precision
100%
Ctx Recall

Retrieved Products (29)

SF-009SF-002SF-012SF-008SF-010SF-001SF-006SF-015TL-007SF-003+19 more

Generated Answer

For metal grinding, you need comprehensive PPE to protect against sparks, metal particles, noise, and potential cuts. Essential equipment includes safety glasses, respiratory protection, cut-resistant or welding gloves, and hearing protection for prolonged work. Recommended products: Honeywell Uvex...

Latency: 20464ms

07What do I need to wire a 20-amp outdoor circuit?Agentic62%
60%
Faithfulness
87%
Relevancy
0%
Ctx Precision
100%
Ctx Recall

Retrieved Products (33)

EL-003EL-007EL-012EL-005EL-001EL-009EL-006EL-010US-011EL-008+23 more

Generated Answer

For a 20-amp outdoor circuit, you'll need a GFCI circuit breaker for electrical panel protection, weather-resistant GFCI outlet, appropriate wire, and weatherproof cover. The circuit requires GFCI protection due to outdoor moisture exposure and specific components rated for wet locations. Recommend...

Latency: 56329ms

08What do I need to install a Cat6A network drop end-to-end?Agentic66%
79%
Faithfulness
86%
Relevancy
0%
Ctx Precision
100%
Ctx Recall

Retrieved Products (23)

DC-002DC-007DC-003DC-001DC-010DC-004EL-011DC-011US-007EL-008+13 more

Generated Answer

For a complete Cat6A network drop installation, you'll need bulk cable, keystone jacks for termination, wall faceplates, patch panel ports, and patch cords to connect equipment. This setup provides 10 Gigabit capability with proper shielding for industrial environments. Recommended products: Belden...

Latency: 71982ms

09DeWalt vs Milwaukee drillAgentic47%
46%
Faithfulness
75%
Relevancy
0%
Ctx Precision
67%
Ctx Recall

Retrieved Products (13)

TL-020TL-004TL-018TL-006TL-017TL-016TL-019TL-005TL-007EL-015+3 more

Generated Answer

Both DeWalt and Milwaukee offer excellent brushless drills with similar performance. Milwaukee drills typically deliver higher torque and include larger batteries in kits, while DeWalt offers more compact options and the unique FLEXVOLT ADVANTAGE technology for increased power with compatible batter...

Latency: 53263ms

10Which is better for home use: GFCI or AFCI breaker?Agentic67%
69%
Faithfulness
83%
Relevancy
50%
Ctx Precision
67%
Ctx Recall

Retrieved Products (18)

EL-003EL-004EL-001EL-007EL-013EL-002EL-009EL-008EL-005SF-009+8 more

Generated Answer

For home use, neither GFCI nor AFCI alone is "better" - they protect against different hazards and current NEC code requires both types of protection in different areas. GFCI protects against ground faults (shock/electrocution), while AFCI protects against arc faults (fires). The best solution is a ...

Latency: 30895ms

11What is the best pizza recipe?Off-Topic100%
100%
Faithfulness
100%
Relevancy
100%
Ctx Precision
100%
Ctx Recall

Latency: 1257ms

12How do I train for a marathon?Off-Topic100%
100%
Faithfulness
100%
Relevancy
100%
Ctx Precision
100%
Ctx Recall

Latency: 1341ms

13Explain quantum computingOff-Topic100%
100%
Faithfulness
100%
Relevancy
100%
Ctx Precision
100%
Ctx Recall

Latency: 1314ms

14safetyHybrid20%
36%
Faithfulness
43%
Relevancy
0%
Ctx Precision
0%
Ctx Recall

Retrieved Products (20)

SF-001SF-003SF-002SF-013SF-005SF-008SF-004SF-011SF-007SF-006+10 more

Generated Answer

Safety equipment is essential for workplace protection across multiple hazard categories. Based on our product catalog, we offer comprehensive safety solutions including respiratory protection, fall protection, head and eye protection, hand protection, hearing protection, and specialized welding saf...

Latency: 11677ms

15I need to install an IP camera and replace a circuit breakerAgentic63%
77%
Faithfulness
76%
Relevancy
0%
Ctx Precision
100%
Ctx Recall

Retrieved Products (23)

EL-001US-002US-001EL-003EL-004EL-013EL-002DC-009US-004US-003+13 more

Generated Answer

For your IP camera installation, the Axis P3245-LV offers excellent value at $689 with 1080p resolution, low-light capabilities, and 40m infrared range for indoor use. For circuit breaker replacement, the Square D QO 20-Amp Standard breaker ($8.49) is the most economical choice, while GFCI ($42.99) ...

Latency: 17589ms

5

Methodology

01 Faithfulness

Extracts atomic factual claims from the generated answer, then verifies each claim against the retrieved contexts. A claim must be directly supported — not merely plausible. Score = supported claims / total claims.

Scored by: RAGAS + Claude Haiku

02 Answer Relevancy

Generates hypothetical questions the answer would address, embeds them alongside the original query, and computes average cosine similarity. High similarity = answer stays on-topic.

Scored by: RAGAS + Azure OpenAI Embeddings

03 Context Precision

For each retrieved document (in search rank order), judges relevance to the query. Computes precision at each rank position, rewarding relevant documents appearing at the top. Measures retrieval ranking quality.

Scored by: RAGAS + Claude Haiku

04 Context Recall

Extracts factual claims from the ground truth answer, then checks whether the retrieved contexts contain supporting information. Measures how completely the retrieval covers the ideal answer. This is the only metric requiring ground truth.

Scored by: RAGAS + Claude Haiku

Evaluation Pipeline

15 test casesHaiku classifiesSearch + RetrieveSonnet generatesRAGAS scores × 4 metricsScores
6

Retrieval Ablation Study

Head-to-head comparison of 4 retrieval strategies on 12 labeled queries using standard IR metrics. Unlike RAGAS (end-to-end), these isolate pure retrieval quality from generation. The production mode (Hybrid + Semantic Reranking) is highlighted — this ablation justifies that architectural choice.

Retrieval ModeR@5R@10P@5P@10MRRNDCG@10Latency
BM25 Only

Pure keyword search using BM25 scoring. No vectors, no reranking.

65%79%30%24%87%76%53ms
Vector Only

Pure vector search using cosine similarity. No keyword matching.

79%90%47%29%90%85%44ms
Hybrid (no rerank)

BM25 + vector fusion. No semantic reranking layer.

72%88%40%28%92%84%65ms
Hybrid + Semantic RerankingProduction

Production: BM25 + vector + Microsoft semantic reranker.

77%91%45%30%92%87%115ms
Hybrid + Semantic + Field BoostingEnhanced

Production + scoring profile: name×5, sku×4, brand×3, category×1.5 weights.

77%91%45%30%92%87%102ms

Per-Query Results

013M N95 respiratorsimple lookup3 GT
Ground Truth: SF-001, SF-002, SF-003
BM25 Only
R@5=100%P@5=60%MRR=1.00NDCG@10=100%202ms
1. SF-0012. SF-0023. SF-0034. US-0085. SF-0126. HV-002
Vector Only
R@5=100%P@5=60%MRR=1.00NDCG@10=100%91ms
1. SF-0012. SF-0023. SF-0034. HV-0025. SF-0126. SF-0167. HV-0018. SF-0149. SF-00510. SF-006
Hybrid (no rerank)
R@5=100%P@5=60%MRR=1.00NDCG@10=100%125ms
1. SF-0012. SF-0023. SF-0034. HV-0025. SF-0126. US-0087. SF-0168. HV-0019. SF-01410. SF-005
Hybrid + Semantic Reranking
R@5=100%P@5=60%MRR=1.00NDCG@10=100%220ms
1. SF-0012. SF-0023. SF-0034. HV-0025. SF-0126. US-0087. SF-0148. SF-0169. SF-00610. SF-005
Hybrid + Semantic + Field Boosting
R@5=100%P@5=60%MRR=1.00NDCG@10=100%129ms
1. SF-0012. SF-0023. SF-0034. HV-0025. SF-0126. US-0087. SF-0148. SF-0169. SF-00610. SF-005
0220-amp GFCI circuit breakersimple lookup2 GT
Ground Truth: EL-003, EL-004
BM25 Only
R@5=100%P@5=40%MRR=1.00NDCG@10=100%45ms
1. EL-0032. EL-0043. EL-0014. EL-0135. EL-0026. EL-0077. EL-0088. SF-0019. EL-00510. TL-001
Vector Only
R@5=100%P@5=40%MRR=1.00NDCG@10=100%42ms
1. EL-0032. EL-0043. EL-0014. EL-0075. EL-0026. EL-0137. EL-0088. EL-0099. EL-00510. US-009
Hybrid (no rerank)
R@5=100%P@5=40%MRR=1.00NDCG@10=100%54ms
1. EL-0032. EL-0043. EL-0014. EL-0075. EL-0136. EL-0027. EL-0088. EL-0059. EL-00910. SF-001
Hybrid + Semantic Reranking
R@5=100%P@5=40%MRR=1.00NDCG@10=92%95ms
1. EL-0032. EL-0073. EL-0044. EL-0015. EL-0086. EL-0097. EL-0058. US-0099. EL-01310. EL-006
Hybrid + Semantic + Field Boosting
R@5=100%P@5=40%MRR=1.00NDCG@10=92%134ms
1. EL-0032. EL-0073. EL-0044. EL-0015. EL-0086. EL-0097. EL-0058. US-0099. EL-01310. EL-006
03Klein Tools wire strippersimple lookup1 GT
Ground Truth: TL-001
BM25 Only
R@5=100%P@5=20%MRR=1.00NDCG@10=100%37ms
1. TL-0012. TL-0023. TL-0034. EL-0115. DC-0076. TL-0197. EL-0058. TL-0059. TL-01410. TL-006
Vector Only
R@5=100%P@5=20%MRR=1.00NDCG@10=100%41ms
1. TL-0012. TL-0023. TL-0034. TL-0115. TL-0136. EL-0117. TL-0098. US-0069. US-01110. DC-007
Hybrid (no rerank)
R@5=100%P@5=20%MRR=1.00NDCG@10=100%66ms
1. TL-0012. TL-0023. TL-0034. EL-0115. DC-0076. US-0067. US-0118. TL-0139. TL-01110. TL-009
Hybrid + Semantic Reranking
R@5=100%P@5=20%MRR=1.00NDCG@10=100%171ms
1. TL-0012. TL-0023. TL-0034. DC-0075. PL-0016. US-0067. US-0118. EL-0109. US-00510. EL-011
Hybrid + Semantic + Field Boosting
R@5=100%P@5=20%MRR=1.00NDCG@10=100%101ms
1. TL-0012. TL-0023. TL-0034. DC-0075. PL-0016. US-0067. US-0118. EL-0109. US-00510. EL-011
04Belden Cat6 plenum cable 1000 ftsimple lookup2 GT
Ground Truth: DC-001, DC-002
BM25 Only
R@5=100%P@5=40%MRR=1.00NDCG@10=100%32ms
1. DC-0012. DC-0023. EL-0064. EL-0055. US-0076. DC-0037. DC-0108. DC-0069. TL-01510. HV-005
Vector Only
R@5=100%P@5=40%MRR=1.00NDCG@10=100%35ms
1. DC-0012. DC-0023. DC-0114. DC-0065. EL-0066. US-0077. EL-0058. DC-0079. DC-00510. DC-003
Hybrid (no rerank)
R@5=100%P@5=40%MRR=1.00NDCG@10=100%68ms
1. DC-0012. DC-0023. EL-0064. EL-0055. US-0076. DC-0067. DC-0038. DC-0119. DC-00510. DC-007
Hybrid + Semantic Reranking
R@5=100%P@5=40%MRR=1.00NDCG@10=100%106ms
1. DC-0012. DC-0023. DC-0034. DC-0075. DC-0116. DC-0047. US-0078. DC-0069. DC-00510. EL-006
Hybrid + Semantic + Field Boosting
R@5=100%P@5=40%MRR=1.00NDCG@10=100%86ms
1. DC-0012. DC-0023. DC-0034. DC-0075. DC-0116. DC-0047. US-0078. DC-0069. DC-00510. EL-006
05Axis indoor IP dome camerasimple lookup1 GT
Ground Truth: US-001
BM25 Only
R@5=100%P@5=20%MRR=1.00NDCG@10=100%31ms
1. US-0012. US-0023. EL-0134. DC-0065. US-0046. DC-009
Vector Only
R@5=100%P@5=20%MRR=1.00NDCG@10=100%73ms
1. US-0012. US-0023. US-0034. US-0045. DC-0126. TL-0157. DC-0098. DC-0089. DC-00610. DC-004
Hybrid (no rerank)
R@5=100%P@5=20%MRR=1.00NDCG@10=100%51ms
1. US-0012. US-0023. US-0044. DC-0065. DC-0096. EL-0137. US-0038. DC-0129. TL-01510. DC-008
Hybrid + Semantic Reranking
R@5=100%P@5=20%MRR=1.00NDCG@10=100%81ms
1. US-0012. US-0023. US-0044. US-0035. DC-0066. DC-0097. DC-0128. EL-0139. DC-00410. DC-008
Hybrid + Semantic + Field Boosting
R@5=100%P@5=20%MRR=1.00NDCG@10=100%72ms
1. US-0012. US-0023. US-0044. US-0035. DC-0066. DC-0097. DC-0128. EL-0139. DC-00410. DC-008
06What PPE do I need for metal grinding?advisory8 GT
Ground Truth: SF-009, SF-010, SF-012, SF-001, SF-002, SF-008, SF-006, TL-007
BM25 Only
R@5=0%P@5=0%MRR=0.13NDCG@10=15%31ms
1. EL-0102. US-0103. EL-0054. SF-0115. TL-0206. TL-0197. EL-0048. TL-0079. HV-00510. SF-008
Vector Only
R@5=38%P@5=60%MRR=1.00NDCG@10=77%40ms
1. TL-0072. SF-0023. SF-0114. SF-0125. SF-0146. SF-0067. SF-0088. TL-0129. SF-01610. SF-001
Hybrid (no rerank)
R@5=38%P@5=60%MRR=0.50NDCG@10=48%64ms
1. SF-0112. TL-0073. SF-0024. SF-0085. EL-0106. US-0107. EL-0058. SF-0129. SF-01410. TL-020
Hybrid + Semantic Reranking
R@5=50%P@5=80%MRR=1.00NDCG@10=81%99ms
1. SF-0022. SF-0083. SF-0064. SF-0015. SF-0116. SF-0147. SF-0128. TL-0079. SF-01610. US-010
Hybrid + Semantic + Field Boosting
R@5=50%P@5=80%MRR=1.00NDCG@10=81%112ms
1. SF-0022. SF-0083. SF-0064. SF-0015. SF-0116. SF-0147. SF-0128. TL-0079. SF-01610. US-010
07What do I need to wire a 20-amp outdoor circuit?advisory6 GT
Ground Truth: EL-003, EL-006, EL-007, EL-009, EL-012, EL-010
BM25 Only
R@5=17%P@5=20%MRR=1.00NDCG@10=39%33ms
1. EL-0032. TL-0013. EL-0014. EL-0055. EL-0086. EL-0047. EL-0028. EL-0139. EL-00710. SF-001
Vector Only
R@5=50%P@5=60%MRR=1.00NDCG@10=74%36ms
1. EL-0072. EL-0033. EL-0064. EL-0055. EL-0016. EL-0087. EL-0138. EL-0029. EL-00910. EL-004
Hybrid (no rerank)
R@5=33%P@5=40%MRR=1.00NDCG@10=62%59ms
1. EL-0032. EL-0013. EL-0054. EL-0075. EL-0086. EL-0027. EL-0138. EL-0069. EL-00410. EL-009
Hybrid + Semantic Reranking
R@5=50%P@5=60%MRR=1.00NDCG@10=70%115ms
1. EL-0072. EL-0063. EL-0084. EL-0055. EL-0036. DC-0057. EL-0018. EL-0049. EL-01110. EL-009
Hybrid + Semantic + Field Boosting
R@5=50%P@5=60%MRR=1.00NDCG@10=70%102ms
1. EL-0072. EL-0063. EL-0084. EL-0055. EL-0036. DC-0057. EL-0018. EL-0049. EL-01110. EL-009
08What do I need to install a Cat6A network drop end-to-end?advisory6 GT
Ground Truth: DC-002, DC-004, DC-007, DC-011, DC-012, DC-010
BM25 Only
R@5=33%P@5=40%MRR=0.33NDCG@10=38%66ms
1. US-0032. PL-0023. DC-0024. DC-0095. DC-0116. DC-0047. DC-0018. EL-0109. HV-00510. US-001
Vector Only
R@5=67%P@5=80%MRR=1.00NDCG@10=80%45ms
1. DC-0022. DC-0013. DC-0074. DC-0115. DC-0046. DC-0037. DC-0108. DC-0069. DC-00510. EL-006
Hybrid (no rerank)
R@5=50%P@5=60%MRR=1.00NDCG@10=78%82ms
1. DC-0022. DC-0013. DC-0114. DC-0045. DC-0036. DC-0077. DC-0068. DC-0059. DC-01010. US-003
Hybrid + Semantic Reranking
R@5=50%P@5=60%MRR=1.00NDCG@10=84%119ms
1. DC-0022. DC-0043. DC-0114. DC-0015. DC-0036. DC-0077. DC-0098. DC-0069. DC-01210. DC-005
Hybrid + Semantic + Field Boosting
R@5=50%P@5=60%MRR=1.00NDCG@10=84%129ms
1. DC-0022. DC-0043. DC-0114. DC-0015. DC-0036. DC-0077. DC-0098. DC-0069. DC-01210. DC-005
09DeWalt vs Milwaukee drillcomparison2 GT
Ground Truth: TL-004, TL-006
BM25 Only
R@5=50%P@5=20%MRR=1.00NDCG@10=81%33ms
1. TL-0042. TL-0173. TL-0204. TL-0185. TL-0166. TL-0197. TL-0058. TL-0069. TL-00710. TL-021
Vector Only
R@5=50%P@5=20%MRR=0.25NDCG@10=46%35ms
1. TL-0192. TL-0183. TL-0204. TL-0045. TL-0176. TL-0167. TL-0078. TL-0069. TL-00510. TL-012
Hybrid (no rerank)
R@5=50%P@5=20%MRR=1.00NDCG@10=80%50ms
1. TL-0042. TL-0183. TL-0204. TL-0195. TL-0176. TL-0167. TL-0058. TL-0079. TL-00610. TL-012
Hybrid + Semantic Reranking
R@5=50%P@5=20%MRR=1.00NDCG@10=82%76ms
1. TL-0042. TL-0193. TL-0204. TL-0175. TL-0166. TL-0187. TL-0068. TL-0219. TL-00710. TL-012
Hybrid + Semantic + Field Boosting
R@5=50%P@5=20%MRR=1.00NDCG@10=82%93ms
1. TL-0042. TL-0193. TL-0204. TL-0175. TL-0166. TL-0187. TL-0068. TL-0219. TL-00710. TL-012
10Which is better for home use: GFCI or AFCI breaker?comparison2 GT
Ground Truth: EL-003, EL-004
BM25 Only
R@5=100%P@5=40%MRR=1.00NDCG@10=100%72ms
1. EL-0042. EL-0033. EL-0014. EL-0135. EL-0096. EL-0027. EL-0078. SF-0039. US-01010. EL-015
Vector Only
R@5=100%P@5=40%MRR=1.00NDCG@10=100%30ms
1. EL-0042. EL-0033. EL-0014. EL-0025. EL-0076. EL-0137. EL-0098. US-0099. EL-00810. EL-005
Hybrid (no rerank)
R@5=100%P@5=40%MRR=1.00NDCG@10=100%59ms
1. EL-0042. EL-0033. EL-0014. EL-0025. EL-0136. EL-0077. EL-0098. SF-0039. US-00910. EL-008
Hybrid + Semantic Reranking
R@5=100%P@5=40%MRR=1.00NDCG@10=100%111ms
1. EL-0042. EL-0033. EL-0074. EL-0095. EL-0016. EL-0137. EL-0028. US-0099. EL-00510. EL-008
Hybrid + Semantic + Field Boosting
R@5=100%P@5=40%MRR=1.00NDCG@10=100%103ms
1. EL-0042. EL-0033. EL-0074. EL-0095. EL-0016. EL-0137. EL-0028. US-0099. EL-00510. EL-008
11safetyedge case7 GT
Ground Truth: SF-001, SF-004, SF-009, SF-006, SF-012, SF-013, SF-011
BM25 Only
R@5=14%P@5=20%MRR=1.00NDCG@10=44%28ms
1. SF-0092. SF-0103. HV-0084. TL-0075. SF-0156. SF-0077. SF-0168. SF-0059. SF-00110. SF-012
Vector Only
R@5=43%P@5=60%MRR=0.50NDCG@10=50%31ms
1. SF-0102. SF-0133. SF-0044. SF-0055. SF-0096. US-0097. SF-0168. SF-0089. SF-01410. SF-012
Hybrid (no rerank)
R@5=29%P@5=40%MRR=0.50NDCG@10=46%42ms
1. SF-0102. SF-0093. SF-0054. SF-0165. SF-0136. SF-0147. SF-0128. SF-0049. SF-00810. HV-008
Hybrid + Semantic Reranking
R@5=29%P@5=40%MRR=0.50NDCG@10=55%81ms
1. SF-0102. SF-0093. SF-0054. SF-0015. TL-0076. SF-0127. SF-0038. SF-0179. SF-01310. SF-006
Hybrid + Semantic + Field Boosting
R@5=29%P@5=40%MRR=0.50NDCG@10=55%73ms
1. SF-0102. SF-0093. SF-0054. SF-0015. TL-0076. SF-0127. SF-0038. SF-0179. SF-01310. SF-006
12I need to install an IP camera and replace a circuit breakeredge case3 GT
Ground Truth: US-001, EL-001, EL-003
BM25 Only
R@5=67%P@5=40%MRR=1.00NDCG@10=92%31ms
1. EL-0012. EL-0033. EL-0134. EL-0025. EL-0046. US-0027. US-0018. US-0039. EL-01010. EL-005
Vector Only
R@5=100%P@5=60%MRR=1.00NDCG@10=91%30ms
1. EL-0032. US-0023. EL-0014. US-0015. EL-0046. EL-0027. EL-0138. EL-0099. EL-00710. US-003
Hybrid (no rerank)
R@5=67%P@5=40%MRR=1.00NDCG@10=92%62ms
1. EL-0032. EL-0013. US-0024. EL-0135. EL-0026. EL-0047. US-0018. US-0039. EL-00910. EL-007
Hybrid + Semantic Reranking
R@5=100%P@5=60%MRR=0.50NDCG@10=73%105ms
1. US-0022. EL-0033. US-0014. EL-0015. EL-0046. EL-0027. EL-0138. EL-0059. DC-00910. EL-007
Hybrid + Semantic + Field Boosting
R@5=100%P@5=60%MRR=0.50NDCG@10=73%84ms
1. US-0022. EL-0033. US-0014. EL-0015. EL-0046. EL-0027. EL-0138. EL-0059. DC-00910. EL-007

IR Metrics

Recall@K — fraction of ground truth in the top K results
Precision@K — fraction of top K results that are relevant
MRR — Mean Reciprocal Rank (1/rank of first relevant)
NDCG@10 — rank-weighted relevance over the top 10 results
7

Embedding Model Comparison

Comparison of embedding models using Hybrid + Semantic Reranking mode (production configuration). Each model was used to embed both documents and queries, with separate Azure AI Search indexes per model.

Embedding ModelDimsR@5R@10P@5MRRNDCG@10
text-embedding-3-smallProduction
1,53677%91%45%92%87%
text-embedding-3-large
3,07276%92%43%92%87%
text-embedding-3-large (reduced)
1,53677%92%45%92%87%
Cohere Embed v4
1,53677%91%45%92%86%

Methodology: Each embedding model was used to index the full product catalog into a separate Azure AI Search index, then the same test queries were run against each index. BM25 scores are identical across models (keyword search is embedding-agnostic); differences appear in the vector and hybrid modes where embedding quality directly impacts retrieval.

8

Platform Comparison: Azure AI Search vs Apache Solr vs Elasticsearch vs Vertex AI

Same 12 labeled queries. Three of the four engines share the same Azure OpenAI embeddings (text-embedding-3-small, 1,536 dims); Vertex AI uses its own internal embedder. Solr and Elasticsearch are both self-hosted on Azure Container Instances; try them live at /solr-search, /es-search, and /vertex-search.

How each engine does hybrid: Azure fuses BM25 + vector natively with its own Reciprocal Rank Fusion. Elasticsearch uses the retriever.rrf API (gated behind a Platinum license — trial active for this demo; see /es-search). Solr does app-level RRF in our code (SOLR-17319 ships first-party RRF in 10.1). The first three implement the same formula; results differ primarily because of engine-level BM25 and KNN tuning. Vertex AI is a managed black-box: Google fuses semantic + keyword internally with its own embedder, with no public API to disable either side — so it gets one row (hybrid only) and no separable bm25/vector breakdown. Only Azure has a cross-encoder semantic reranker, so the hybrid_semantic row below is the reference ceiling.

ModePlatformR@5R@10P@5MRRNDCG@10Latency
BM25 Only
Pure keyword search using BM25 scoring. No vectors, no reranking.
Azure
65%79%30%87%76%53ms
Solr
70%79%37%79%73%44ms
Elasticsearch
70%82%37%79%75%40ms
Vector Only
Pure vector search using cosine similarity. No keyword matching.
Azure
79%90%47%90%85%44ms
Solr
79%90%47%90%85%39ms
Elasticsearch
79%90%47%90%85%38ms
Hybrid (no rerank)
BM25 + vector fusion. No semantic reranking layer.
Azure
72%88%40%92%84%65ms
Solr
76%85%42%88%79%74ms
Elasticsearch
78%86%45%83%79%184ms
Vertex AI
62%66%53%83%69%510ms
Hybrid + Semantic RerankingAzure only
Microsoft semantic reranker (no Solr equivalent)
Azure
77%91%45%92%87%115ms

Why you'd pick Azure AI Search

  • Managed service, no servers to patch
  • Semantic reranker out of the box (+ significant NDCG win)
  • Azure-native auth, billing, compliance
  • Best retrieval quality in our measurements

Why you'd pick Apache Solr

  • Apache 2.0 — fully permissive, no restrictions
  • Largest ecosystem for custom scoring / analyzers
  • All features free at every scale
  • Mature cluster management (SolrCloud + ZooKeeper)

Why you'd pick Elasticsearch

  • Native RRF retriever (single-request hybrid)
  • Strong AI-native story: ELSER, inference APIs, cross-encoder reranker
  • Largest mindshare + vendor-backed tooling (Kibana, Elastic Cloud)
  • ⚠ RRF/reranker/ELSER gated behind Platinum license

Why you'd pick Vertex AI

  • Fully managed — no embedder, no infra, one API
  • Semantic + keyword fusion built-in (Google's Gecko-class embedder)
  • Free tier covers small catalogs (~10 GB)
  • ⚠ Black-box: no separable bm25/vector or scoring control