Document RAG Architecture
Same RAG 3.0 agentic architecture, applied to long-form HR policy documents with semantic chunking, parent-document retrieval, and multi-country filtering.
Two Data Models, One Architecture
⚡ Structured Data RAG
Each product is a self-contained record (~100-200 tokens) with structured fields. No chunking required.
- →Fixed-size records with typed attributes (price, brand, SKU)
- →One record = one search result = one complete entity
- →No document reconstruction needed
- →Facets: category, brand, price range
- →Single catalog — no geographic variants
📋 Document RAG
Long-form policy documents (400-800 words each) that must be chunked for retrieval and reassembled for complete answers.
- ✓Variable-length documents requiring semantic chunking
- ✓Chunks reference parent documents for full context
- ✓Parent-document retrieval reconstructs complete policies
- ✓Facets: topic, policy type, applicability, country
- ✓Multi-country variants of the same policy (US, UK, DE)
Key insight: The same RAG 3.0 agent loop, Claude tool_use protocol, and Azure AI Search hybrid retrieval power both demos. Only the data model, index schema, and agent tools differ — proving the architecture generalizes across content types.
Semantic Chunking Pipeline
Handbook Generation
Claude Haiku
Semantic Chunking
Claude Haiku
Vector Embedding
text-embedding-3-small
Index Upload
Azure AI Search
Hybrid Retrieval
BM25 + Vector + Semantic
Why Semantic Chunking?
Fixed-size chunking (e.g., 300 tokens) splits text at arbitrary boundaries, often mid-sentence or mid-concept. Semantic chunking uses Claude Haiku to identify natural topic boundaries.
Each chunk covers a single coherent concept (e.g., “eligibility criteria” vs. “accrual rates” vs. “carryover rules”), producing higher-quality embeddings and more precise retrieval.
Variable chunk sizes (100-500 tokens) are fine — a short eligibility rule and a long procedure both work as self-contained retrieval units.
Chunking Statistics
Chunking Prompt
"Split this policy text into logical chunks where each chunk covers a single coherent topic or concept."
HR Agent Tools
search_policiesHybrid search (BM25 + vector + semantic reranking) against the HR knowledge base with country-aware filtering.
{
"query": "PTO accrual rates",
"filters": { "country": "US" }
}Returns: Scored chunks with facets (topic, country, policy type)
get_full_sectionKey ToolParent-document retrieval: fetches ALL chunks for a subsection and joins them in order. The signature tool for document RAG.
{
"subsection_id": "pto",
"country": "US"
}Returns: Complete policy text reconstructed from ordered chunks
compare_policiesSide-by-side comparison of 2-4 policies. Compare same policy across countries or different policies in one country.
{
"comparisons": [
{"subsection_id": "pto", "country": "US"},
{"subsection_id": "pto", "country": "UK"}
]
}Returns: Full text of each policy with metadata for comparison
check_eligibilityFind policies that apply to a specific employee type and country. Filters by applicability and location.
{
"employee_type": "Part-Time",
"country": "UK",
"topic": "Benefits"
}Returns: Applicable policies filtered by employee type and country
get_related_policiesDiscover all subsections within a section. Helps the agent find related content after an initial search.
{
"section_id": "time-off-leave",
"country": "US"
}Returns: List of subsection titles and IDs in the section
Parent-document retrieval is the key pattern that distinguishes document RAG from structured data RAG. When a chunk match provides partial context, get_full_section retrieves the complete policy — ensuring the agent never answers from a fragment.
Multi-Country Policy Architecture
Country-Aware Filtering
Many HR policies have country-specific implementations that comply with local employment law. The same policy (e.g., PTO, parental leave, termination) exists in up to 4 variants:
FMLA, at-will employment, 401(k)
Statutory leave, ACAS, workplace pension
Betriebsrat, Elternzeit, Kündigungsschutz
Code of conduct, IT security, safety
How Filtering Works
When a user selects a country (default: US), every search automatically includes both the country-specific AND global policies. The OData filter ensures compliance-relevant local content always appears.
OData Filter Expression
(country eq 'US' or country eq 'Global')
Chunk IDs encode country: pto-us-001, pto-uk-001, pto-de-001
Agent behavior: When a user mentions a country, the agent passes it as a filter. When no country is specified, the system defaults to US + Global.
Cross-country comparison: The compare_policies tool can fetch the same subsection across multiple countries for side-by-side analysis.
Technology Stack
Next.js 14
App Framework
App Router with server components for architecture pages and SSE streaming for real-time agent events.
Claude Sonnet 4
Agent LLM
Powers the HR agent loop with tool_use for autonomous policy search, retrieval, comparison, and eligibility checking.
Claude Haiku
Chunking + Evaluation
Semantic chunking at index time. RAGAS evaluation judge at test time. Fast and cost-effective for both.
Azure AI Search
Hybrid Retrieval
BM25 keyword + HNSW vector + semantic reranking. Separate hr-knowledge-base index with 624 chunks.
Azure OpenAI
Embeddings
text-embedding-3-small (1536 dims) for both query and document embedding. Shared across both demos.
Server-Sent Events
Real-time Streaming
Agent reasoning, tool calls, results, and answers stream to the UI in real-time via SSE.
RAGAS v0.4
Quality Evaluation
4 standardized metrics (faithfulness, relevancy, context precision, context recall) for automated quality assessment.
Index Schema Comparison
| Aspect | grainger-products | hr-knowledge-base |
|---|---|---|
| Key Field | product_id | chunk_id |
| Content | description + attributes_text | content (single field) |
| Title | name | subsection_title |
| Hierarchy | category > subcategory | section > subsection > chunk_index |
| Geographic | (none) | country (US, UK, DE, Global) |
| Facets | category, subcategory, brand | topic, policy_type, applies_to, country |
| Pricing | price (filterable, sortable) | (not applicable) |
| Reconstruction | N/A (each record is complete) | chunk_index orders chunks for reassembly |
| Vector Dims | 1536 (cosine) | 1536 (cosine) |
| Semantic Config | name + description + attributes | subsection_title + content + topic |
Same infrastructure, different schema: Both indexes live in the same Azure AI Search service, use the same embedding model, and the same HNSW vector search algorithm. The schema differences reflect the fundamental distinction between structured product records and chunked policy documents.