HR Knowledge Base Evaluation
Quality measurement of the HR policy RAG pipeline using RAGAS metrics. Evaluates semantic chunking, parent-document retrieval, and multi-country policy responses.
Last evaluated: April 14, 2026 • 15 test cases • Meridian Technologies HR Knowledge Base
Document RAG Evaluation
Structured Data RAG
- •Fixed-size product records (~100-200 tokens)
- •No chunking needed
- •Product comparison & cost calculation tools
- •Single-country catalog
Document RAG
- ✓Semantically chunked policy documents (variable size)
- ✓Parent-document retrieval (get_full_section)
- ✓Policy comparison & eligibility checking tools
- ✓Multi-country policies (US, UK, Germany)
Same evaluation framework: Both demos use the identical RAGAS pipeline — same 4 metrics, same Python scorer, same Claude Haiku evaluator. This enables apples-to-apples comparison across different RAG data models.
Overall Scores
Routing
Query classification accuracy
Faithfulness
Answers grounded in retrieved policies
Relevancy
Answers address the actual question
Ctx Precision
Retrieved chunks are relevant & well-ranked
Ctx Recall
Retrieved chunks cover the ground truth
Scores by Query Type
⚡Simple Lookup
4 queries🧠Advisory
3 queries⚖️Comparison
2 queries📋Procedural
2 queries🚫Off-Topic
2 queries🧪Edge Case
2 queriesIndividual Test Results
01✓How many PTO days do new employees get?Agentic84%
Retrieved Policy Chunks (15)
Generated Answer
New full-time employees at Meridian Technologies receive 15 business days of PTO annually. Employees with 0-2 years of tenure accrue 15 business days per year, which equals 1.25 days per month. PTO eligibility begins on the first day of employment and accrues monthly in equal increments. Referenced Policies: Paid Time Off (PTO) (US): Contains the complete PTO accrual schedule for new employees I...
Latency: 10485ms
02✓What is the 401k match?Agentic85%
Retrieved Policy Chunks (15)
Generated Answer
Meridian Technologies provides a 401(k) match equal to 6% of your eligible compensation when you contribute to your 401(k) account. To receive the full company match, you must contribute at least 6% of your gross salary to the plan. If you contribute less than 6%, you'll receive a proportional match. The company match is calculated on a pay period basis and deposited quarterly. Referenced Policie...
Latency: 11257ms
03✓What's the password policy?Agentic91%
Retrieved Policy Chunks (15)
Generated Answer
Meridian Technologies requires all employees to use passwords with a minimum of 14 characters that include at least one uppercase letter, lowercase letter, number, and special character. Passwords must be changed every 90 days, cannot contain your username or dictionary words, and cannot reuse any of your previous 12 passwords. Additionally, multi-factor authentication (MFA) must be enabled within...
Latency: 45220ms
04✓What is the probationary period for new hires?Agentic66%
Retrieved Policy Chunks (15)
Generated Answer
All new employees at Meridian Technologies are subject to a probationary period of 90 calendar days from their start date, applying to all full-time and part-time employees regardless of position level or department. During this period, HR facilitates check-in meetings at the 30-day and 60-day marks, and at the conclusion, the manager conducts a formal performance review to determine transition to...
Latency: 44430ms
05✓What benefits am I eligible for as a part-time employee?Agentic63%
Retrieved Policy Chunks (25)
Generated Answer
I found 25 relevant policy sections. Here are the most relevant results for your query. Referenced Policies: Employee Assistance Program (Global): Summary: All Meridian Technologies employees globally, including full-time, part-time, and contracto; Employee Assistance Program (Global): Summary: All global Meridian employees can access eight confidential EAP counseling sessions annuall; Sick Leave...
Latency: 47896ms
06✓I need to take extended leave for a family medical issue. What are my options?Agentic76%
Retrieved Policy Chunks (27)
Generated Answer
I found 27 relevant policy sections. Here are the most relevant results for your query. Referenced Policies: Family & Medical Leave (US): Summary: US employees working at locations with 50+ employees within 75 miles are eligible for up to; Family & Medical Leave (US): Summary: US employees must provide 30 days' advance notice for foreseeable FMLA leave and maintain h; Sick Leave (US): Summary: US...
Latency: 47788ms
07✓What should I know as a new remote employee?Agentic76%
Retrieved Policy Chunks (29)
Generated Answer
I found 29 relevant policy sections. Here are the most relevant results for your query. Referenced Policies: Remote Work Eligibility (Global): Summary: Global full-time employees at Meridian Technologies must complete their six-month probation; Remote Work Eligibility (Global): Summary: Global full-time employees must submit remote work requests to their manager at least four ; Remote Work Eligib...
Latency: 67090ms
08✓What's the difference between FMLA and parental leave?Agentic55%
Retrieved Policy Chunks (19)
Generated Answer
I found 19 relevant policy sections. Here are the most relevant results for your query. Referenced Policies: Family & Medical Leave (US): Summary: US employees working at locations with 50+ employees within 75 miles are eligible for up to; Family & Medical Leave (US): Summary: US employees may take Family and Medical Leave under FMLA protections with zero retaliation; Parental Leave (US): Summary...
Latency: 54691ms
09✓Compare PTO policies across US, UK, and GermanyAgentic65%
Retrieved Policy Chunks (7)
Generated Answer
Meridian Technologies' PTO policies vary significantly by country: US employees receive 15-25 days annually based on tenure (0-2 years: 15 days, 3-5 years: 18 days, 6-10 years: 20 days, 11+ years: 25 days), UK employees receive 33 days annually (28 statutory + 5 company days), and German employees receive 35 days annually (30 statutory + 5 company days). All countries allow maximum 5-day carryover...
Latency: 33549ms
10✓How do I report a workplace safety concern?Agentic87%
Retrieved Policy Chunks (12)
Generated Answer
You can report workplace safety concerns in two ways: (1) Submit hazard reports to your direct manager, or (2) Email the Health and Safety Team at safety@meridiantech.com. Reports will be investigated within five business days, and you'll be notified of findings and corrective actions. Meridian prohibits retaliation against employees who report safety concerns in good faith. Referenced Policies: ...
Latency: 30913ms
11✓What's the process for submitting an expense report?Agentic57%
Retrieved Policy Chunks (15)
Generated Answer
To submit an expense report, use Meridian's expense management system (preferred) or submit to your manager with documentation. You must obtain manager approval before incurring expenses (except meal per diem), submit within 30 days of the expense date, and provide itemized receipts for expenses over $75. Finance will process approved requests within 15 business days via direct deposit. Reference...
Latency: 41917ms
12✓What's the best pizza near the office?Off-Topic100%
Generated Answer
This question is not related to HR policies.
Latency: 0ms
13✓How do I change my car's oil?Off-Topic100%
Generated Answer
This question is not related to HR policies.
Latency: 0ms
14✓Can I bring my dog to work?Agentic25%
Retrieved Policy Chunks (40)
Generated Answer
I was unable to find any specific policy about bringing pets or dogs to work in Meridian Technologies' HR knowledge base. The company's policies do not appear to explicitly address pets in the workplace. Important Note: Since there is no specific pet policy documented, you should contact your local HR department to inquire about this topic directly. They can provide guidance on whether pets are p...
Latency: 73963ms
15✓I'm a contractor in Germany, what leave do I get?Agentic42%
Retrieved Policy Chunks (14)
Generated Answer
I found 14 relevant policy sections. Here are the most relevant results for your query. Referenced Policies: Sick Leave (DE): Summary: German employees at Meridian's Munich office are entitled to paid sick leave under the Wage; Sick Leave (DE): Summary: Germany requires all employees to maintain confidentiality obligations during sick leave, a; Sick Leave (DE): Summary: Germany provides all emplo...
Latency: 48350ms
Methodology
01 Faithfulness
Extracts atomic factual claims from the generated answer, then verifies each claim against the retrieved policy chunks. A claim must be directly supported — not merely plausible. Score = supported claims / total claims.
02 Answer Relevancy
Generates hypothetical questions the answer would address, embeds them alongside the original query, and computes average cosine similarity. High similarity = answer stays on-topic.
03 Context Precision
For each retrieved chunk (in search rank order), judges relevance to the query. Computes precision at each rank position, rewarding relevant chunks appearing at the top. Measures semantic chunking + retrieval ranking quality.
04 Context Recall
Extracts factual claims from the ground truth answer, then checks whether the retrieved chunks contain supporting information. Measures how completely the retrieval covers the ideal answer.
Evaluation Pipeline
Retrieval Ablation Study
Head-to-head comparison of 5 retrieval strategies on 13 labeled HR queries using standard IR metrics. Unlike RAGAS (end-to-end), these isolate pure retrieval quality from generation. The production mode (Hybrid + Semantic Reranking) is highlighted — this ablation shows how each retrieval component contributes to finding the right policy chunks.
Two granularities: Each HR policy is split into 2-4 chunks of ~1,500 characters. Chunk-level metrics check exact ID matches; the subsection-level variant credits retrieval for finding any chunk in a ground-truth policy, which matches how the agent actually consumes results.
Chunk-level (exact chunk_id match)
| Retrieval Mode | R@5 | R@10 | P@5 | P@10 | MRR | NDCG@10 | Latency |
|---|---|---|---|---|---|---|---|
BM25 Only Pure keyword search using BM25 scoring. No vectors, no reranking. | 62% | 72% | 37% | 23% | 73% | 62% | 98ms |
Vector Only Pure vector search using cosine similarity. No keyword matching. | 61% | 67% | 39% | 22% | 65% | 60% | 132ms |
Hybrid (no rerank) BM25 + vector fusion. No semantic reranking layer. | 57% | 67% | 35% | 22% | 66% | 61% | 141ms |
Hybrid + Semantic RerankingProduction Production: BM25 + vector + Microsoft semantic reranker. | 58% | 76% | 35% | 25% | 65% | 63% | 132ms |
Hybrid + Semantic + Field BoostingEnhanced Production + scoring profile: subsection_title×4, topic×2, section_title×1.5 weights. | 55% | 74% | 34% | 24% | 65% | 61% | 134ms |
Hybrid + Semantic + Query Rewriter Production + Claude Haiku extracts country filters (fan-out for comparison queries). | 62% | 76% | 37% | 25% | 71% | 68% | 153ms |
Hybrid + Semantic + Parent-Doc (MMR) Production + over-fetch 40, dedupe by subsection, keep top 10 distinct policies. | 25% | 34% | 17% | 10% | 57% | 33% | 146ms |
Why these look low: the ground truth uses specific chunk IDs (e.g., pto-us-000), but retrieval often surfaces neighboring chunks from the same policy (e.g., pto-us-002). Those are still functionally correct — the subsection-level table below credits them.
Subsection-level (chunk counts if its policy/subsection is in ground truth)
| Retrieval Mode | R@5 | R@10 | P@5 | P@10 | MRR | NDCG@10 |
|---|---|---|---|---|---|---|
BM25 Only | 67% | 73% | 52% | 35% | 81% | 43% |
Vector Only | 72% | 74% | 52% | 34% | 89% | 45% |
Hybrid (no rerank) | 64% | 74% | 48% | 34% | 80% | 43% |
Hybrid + Semantic RerankingProduction | 74% | 89% | 55% | 40% | 79% | 47% |
Hybrid + Semantic + Field BoostingEnhanced | 72% | 86% | 54% | 39% | 79% | 47% |
Hybrid + Semantic + Query Rewriter | 80% | 89% | 62% | 40% | 83% | 50% |
Hybrid + Semantic + Parent-Doc (MMR) | 80% | 95% | 32% | 19% | 80% | 28% |
Interpretation: at the subsection level, the production pipeline lands the correct policy in the top-5 for most queries. The Query Rewriter (country filters + fan-out) raises precision and NDCG, and Parent-Doc (over-fetch then dedupe by subsection) pushes recall to its highest values by diversifying top-K across distinct policies — which especially helps multi-topic queries like “what should I know as a new remote employee?”. Parent-doc trades chunk-level P@K and NDCG for that diversity, so the rewriter wins on position-sensitive metrics while parent-doc wins on coverage.
Per-Query Results
01How many PTO days do new employees get?simple lookup3 GT
02What is the 401k match?simple lookup3 GT
03What's the password policy?simple lookup3 GT
04What is the probationary period for new hires?simple lookup2 GT
05What benefits am I eligible for as a part-time employee?advisory3 GT
06I need to take extended leave for a family medical issue. What are my options?advisory4 GT
07What should I know as a new remote employee?advisory6 GT
08What's the difference between FMLA and parental leave?comparison4 GT
09Compare PTO policies across US, UK, and Germanycomparison3 GT
10How do I report a workplace safety concern?procedural3 GT
11What's the process for submitting an expense report?procedural2 GT
12Can I bring my dog to work?edge case1 GT
13I'm a contractor in Germany, what leave do I get?edge case2 GT
IR Metrics
chunk_id