Skip to content

Knowledge Base Introduction

The Datus Agent Knowledge Base is a multi-modal intelligence system that transforms scattered data assets into a unified, searchable repository. Think of it as "Google for your data" with deep understanding of SQL, business metrics, and data relationships.

Core Purpose

  • Data Discovery: Find relevant tables, columns, and patterns
  • Query Intelligence: Understand business intent and generate SQL
  • Knowledge Preservation: Capture and organize SQL expertise
  • Semantic Search: Find information by meaning, not keywords

Core Components

1. Schema Metadata

Purpose: Understand database structure and provide intelligent table recommendations.

  • Stores: Table definitions, column info, sample data, statistics
  • Capabilities: Find tables by business meaning, get table structures, semantic search
  • Use: Automatic table selection, data discovery, schema understanding

2. Semantic Models

Purpose: Enrich database schemas with semantic information for better SQL generation.

  • Stores: Table structures, dimensions, measures, entity relationships
  • Capabilities: Schema linking, column usage patterns, foreign key discovery
  • Use: Accurate ad-hoc SQL generation, smart filtering, proper JOIN construction

3. Business Metrics

Purpose: Manage and query standardized business KPIs.

  • Stores: Metric definitions, subject tree categorization
  • Capabilities: Direct metric queries via MetricFlow, metrics-first strategy
  • Use: Consistent reporting, eliminate duplicate SQL, standardized definitions

4. Reference SQL

Purpose: Capture, analyze, and make searchable SQL expertise.

  • Stores: Historical queries, LLM summaries, query patterns, best practices
  • Capabilities: Find queries by intent, get similar queries, learn patterns
  • Use: Knowledge sharing, optimization through examples, team onboarding

5. Reference Template

Purpose: Manage parameterized SQL templates for stable, repeatable query generation.

  • Stores: Jinja2 templates, parameter definitions, LLM summaries, subject tree classification
  • Capabilities: Search templates by intent, retrieve with parameter metadata, server-side rendering
  • Use: Stable SQL output for production scenarios, parameterized report queries, template-based SQL generation

6. External Knowledge

Purpose: Process and index domain-specific business knowledge for intelligent search.

  • Stores: Business terminology, rules, concepts, hierarchical categorization
  • Capabilities: Semantic search for business terms, context enrichment, term resolution
  • Use: Agent context enhancement, terminology standardization, knowledge onboarding

7. Platform Documentation

Purpose: Provide authoritative platform documentation for SQL generation and validation.

  • Stores: Official documentation chunks per platform and version
  • Capabilities: Navigation browsing, document retrieval, semantic search
  • Use: Verify platform-specific syntax and features before writing SQL

Storage Backends

All knowledge base components rely on a dual-track storage architecture:

  • Vector Database: Stores embedding vectors, powering semantic search (schema linking, document search, etc.)
  • Relational Database (RDB): Stores structured metadata (task, feedback, success story, etc.)

Datus Agent supports pluggable storage backends via a Registry + entry-point mechanism — switch backends without modifying business code.

Default: LanceDB + SQLite

  • Zero configuration, works out of the box
  • Data stored under data/datus_db_<namespace>/
  • Ideal for development and single-machine deployment

PostgreSQL (pgvector)

  • Production-grade backend, automatically registered after installing datus-storage-postgresql
  • Vector: pgvector extension provides vector search
  • RDB: Native PostgreSQL relational storage
  • Database isolation via PostgreSQL schemas

Database Isolation

  • Each namespace is stored independently with no cross-contamination
  • LanceDB: One directory per namespace
  • PostgreSQL: One schema per namespace

For detailed configuration, see Storage Configuration.

Key Features

  • Unified Search: Single interface across all knowledge domains
  • Semantic Search: Find by meaning using vector embeddings
  • Intelligent Classification: Automatic categorization and organization
  • Scalable: Lazy loading, batch processing, incremental updates