Document RAG Architecture

Same RAG 3.0 agentic architecture, applied to long-form HR policy documents with semantic chunking, parent-document retrieval, and multi-country filtering.

Two Data Models, One Architecture

Product Demo

⚡ Structured Data RAG

Each product is a self-contained record (~100-200 tokens) with structured fields. No chunking required.

→Fixed-size records with typed attributes (price, brand, SKU)
→One record = one search result = one complete entity
→No document reconstruction needed
→Facets: category, brand, price range
→Single catalog — no geographic variants

HR Demo

📋 Document RAG

Long-form policy documents (400-800 words each) that must be chunked for retrieval and reassembled for complete answers.

✓Variable-length documents requiring semantic chunking
✓Chunks reference parent documents for full context
✓Parent-document retrieval reconstructs complete policies
✓Facets: topic, policy type, applicability, country
✓Multi-country variants of the same policy (US, UK, DE)

Key insight: The same RAG 3.0 agent loop, Claude tool_use protocol, and Azure AI Search hybrid retrieval power both demos. Only the data model, index schema, and agent tools differ — proving the architecture generalizes across content types.

Semantic Chunking Pipeline

📝

Handbook Generation

Claude Haiku

→

✂️

Semantic Chunking

Claude Haiku

→

🧮

Vector Embedding

text-embedding-3-small

→

📤

Index Upload

Azure AI Search

→

🔍

Hybrid Retrieval

BM25 + Vector + Semantic

Why Semantic Chunking?

Fixed-size chunking (e.g., 300 tokens) splits text at arbitrary boundaries, often mid-sentence or mid-concept. Semantic chunking uses Claude Haiku to identify natural topic boundaries.

Each chunk covers a single coherent concept (e.g., “eligibility criteria” vs. “accrual rates” vs. “carryover rules”), producing higher-quality embeddings and more precise retrieval.

Variable chunk sizes (100-500 tokens) are fine — a short eligibility rule and a long procedure both work as self-contained retrieval units.

Chunking Statistics

624

Total Chunks

Policy Subsections

Policy Sections

3+1

Countries (US, UK, DE + Global)

Chunking Prompt

"Split this policy text into logical
chunks where each chunk covers a
single coherent topic or concept."

HR Agent Tools

search_policies

Hybrid search (BM25 + vector + semantic reranking) against the HR knowledge base with country-aware filtering.

{
  "query": "PTO accrual rates",
  "filters": { "country": "US" }
}

Returns: Scored chunks with facets (topic, country, policy type)

get_full_sectionKey Tool

Parent-document retrieval: fetches ALL chunks for a subsection and joins them in order. The signature tool for document RAG.

{
  "subsection_id": "pto",
  "country": "US"
}

Returns: Complete policy text reconstructed from ordered chunks

compare_policies

Side-by-side comparison of 2-4 policies. Compare same policy across countries or different policies in one country.

{
  "comparisons": [
    {"subsection_id": "pto", "country": "US"},
    {"subsection_id": "pto", "country": "UK"}
  ]
}

Returns: Full text of each policy with metadata for comparison

check_eligibility

Find policies that apply to a specific employee type and country. Filters by applicability and location.

{
  "employee_type": "Part-Time",
  "country": "UK",
  "topic": "Benefits"
}

Returns: Applicable policies filtered by employee type and country

get_related_policies

Discover all subsections within a section. Helps the agent find related content after an initial search.

{
  "section_id": "time-off-leave",
  "country": "US"
}

Returns: List of subsection titles and IDs in the section

Parent-document retrieval is the key pattern that distinguishes document RAG from structured data RAG. When a chunk match provides partial context, get_full_section retrieves the complete policy — ensuring the agent never answers from a fragment.

Multi-Country Policy Architecture

Country-Aware Filtering

Many HR policies have country-specific implementations that comply with local employment law. The same policy (e.g., PTO, parental leave, termination) exists in up to 4 variants:

🇺🇸

FMLA, at-will employment, 401(k)

🇬🇧

Statutory leave, ACAS, workplace pension

🇩🇪

Betriebsrat, Elternzeit, Kündigungsschutz

🌐

Global

Code of conduct, IT security, safety

How Filtering Works

When a user selects a country (default: US), every search automatically includes both the country-specific AND global policies. The OData filter ensures compliance-relevant local content always appears.

OData Filter Expression

(country eq 'US' or country eq 'Global')

Chunk IDs encode country: pto-us-001, pto-uk-001, pto-de-001

Agent behavior: When a user mentions a country, the agent passes it as a filter. When no country is specified, the system defaults to US + Global.

Cross-country comparison: The compare_policies tool can fetch the same subsection across multiple countries for side-by-side analysis.

Technology Stack

Next.js 14

App Framework

App Router with server components for architecture pages and SSE streaming for real-time agent events.

Claude Sonnet 4

Agent LLM

Powers the HR agent loop with tool_use for autonomous policy search, retrieval, comparison, and eligibility checking.

Claude Haiku

Chunking + Evaluation

Semantic chunking at index time. RAGAS evaluation judge at test time. Fast and cost-effective for both.

Azure AI Search

Hybrid Retrieval

BM25 keyword + HNSW vector + semantic reranking. Separate hr-knowledge-base index with 624 chunks.

Azure OpenAI

Embeddings

text-embedding-3-small (1536 dims) for both query and document embedding. Shared across both demos.

Server-Sent Events

Real-time Streaming

Agent reasoning, tool calls, results, and answers stream to the UI in real-time via SSE.

RAGAS v0.4

Quality Evaluation

4 standardized metrics (faithfulness, relevancy, context precision, context recall) for automated quality assessment.

Index Schema Comparison

Aspect	product-catalog	hr-knowledge-base
Key Field	product_id	chunk_id
Content	description + attributes_text	content (single field)
Title	name	subsection_title
Hierarchy	category > subcategory	section > subsection > chunk_index
Geographic	(none)	country (US, UK, DE, Global)
Facets	category, subcategory, brand	topic, policy_type, applies_to, country
Pricing	price (filterable, sortable)	(not applicable)
Reconstruction	N/A (each record is complete)	chunk_index orders chunks for reassembly
Vector Dims	1536 (cosine)	1536 (cosine)
Semantic Config	name + description + attributes	subsection_title + content + topic

Same infrastructure, different schema: Both indexes live in the same Azure AI Search service, use the same embedding model, and the same HNSW vector search algorithm. The schema differences reflect the fundamental distinction between structured product records and chunked policy documents.

← HR Knowledge Base Product Architecture →