External Knowledge Intelligence¶
Overview¶
Bootstrap-KB External Knowledge is a component that processes, stores, and indexes domain-specific business knowledge to create an intelligent searchable knowledge base. It transforms business rules and concepts into a structured repository with semantic search capabilities.
Core Value¶
What Problem Does It Solve?¶
- Business Knowledge Silos: Domain knowledge scattered across teams without centralized access
- Terminology Ambiguity: Different interpretations of business terms across the organization
- Context Gap: SQL agents lacking understanding of business-specific concepts
- Knowledge Onboarding: New team members struggling to understand domain-specific terminology
What Value Does It Provide?¶
- Unified Knowledge Base: Centralized repository for business terminology and rules
- Semantic Search: Find relevant knowledge using natural language queries
- Agent Context Enhancement: Enriches SQL generation with domain understanding
- Knowledge Preservation: Captures business expertise in a structured, searchable format
Usage¶
Basic Command¶
# From CSV (direct import)
datus bootstrap-kb \
--namespace <your_namespace> \
--components ext_knowledge \
--ext_knowledge /path/to/knowledge.csv \
--kb_update_strategy overwrite
# From success story (AI generation)
datus bootstrap-kb \
--namespace <your_namespace> \
--components ext_knowledge \
--success_story /path/to/success_story.csv \
--kb_update_strategy overwrite
Key Parameters¶
| Parameter | Required | Description | Example |
|---|---|---|---|
--namespace |
✅ | Database namespace | analytics_db |
--components |
✅ | Components to initialize | ext_knowledge |
--ext_knowledge |
⚠️ | Path to knowledge CSV file (required if no --success_story) |
/data/knowledge.csv |
--success_story |
⚠️ | Path to success story CSV file (required if no --ext_knowledge) |
/data/success_story.csv |
--kb_update_strategy |
✅ | Update strategy | overwrite/incremental |
--subject_tree |
❌ | Predefined subject tree categories | Finance/Revenue,User/Engagement |
--pool_size |
❌ | Concurrent processing threads, default is 1 | 8 |
Data Source Formats¶
Direct Import¶
Import pre-defined knowledge entries directly from a CSV file.
CSV Format¶
| Column | Required | Description | Example |
|---|---|---|---|
subject_path |
Yes | Hierarchical category path | Finance/Revenue/Metrics |
name |
Yes | Knowledge entry name | GMV Definition |
search_text |
Yes | Searchable business term | GMV |
explanation |
Yes | Detailed description | Gross Merchandise Volume... |
Example CSV¶
subject_path,name,search_text,explanation
Finance/Revenue/Metrics,GMV Definition,GMV,"Gross Merchandise Volume (GMV) represents the total value of merchandise sold through the platform, including both paid and unpaid orders."
User/Engagement/DAU,DAU Definition,DAU,"Daily Active Users (DAU) counts unique users who performed at least one activity within a calendar day."
User/Engagement/Retention,Retention Rate,retention rate,"The percentage of users who return to the platform after their first visit, typically measured at Day 1, Day 7, and Day 30 intervals."
AI Generation¶
Generate knowledge automatically from question-SQL pairs using AI agent.
CSV Format¶
| Column | Required | Description | Example |
|---|---|---|---|
question |
Yes | Business question or query intent | What is the total GMV for last month? |
sql |
Yes | SQL query that answers the question | SELECT SUM(amount) FROM orders... |
subject_path |
No | Hierarchical category path (optional) | Finance/Revenue/Metrics |
Example CSV¶
question,sql,subject_path
"What is the total GMV for last month?","SELECT SUM(amount) as gmv FROM orders WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH)",Finance/Revenue/Metrics
"How many daily active users do we have?","SELECT COUNT(DISTINCT user_id) as dau FROM user_activity WHERE activity_date = CURDATE()",User/Engagement/DAU
"What is our 7-day retention rate?","SELECT COUNT(DISTINCT d7.user_id) / COUNT(DISTINCT d0.user_id) as retention FROM users d0 LEFT JOIN users d7 ON d0.user_id = d7.user_id",User/Engagement/Retention
How AI Generation Works¶
The success story mode uses GenExtKnowledgeAgenticNode to:
- Analyze Question-SQL Pairs: Understands the business intent behind each query
- Extract Business Concepts: Identifies key terminology, rules, and patterns
- Generate Knowledge Entries: Creates structured knowledge with search_text and explanation
- Classify Categories: Assigns appropriate subject paths for organization
Update Strategies¶
1. Overwrite Mode¶
Clears existing knowledge and loads fresh data:
datus bootstrap-kb \
--namespace analytics_db \
--components ext_knowledge \
--ext_knowledge /path/to/knowledge.csv \
--kb_update_strategy overwrite
2. Incremental Mode¶
Adds new knowledge entries while preserving existing ones (duplicates are skipped):
datus bootstrap-kb \
--namespace analytics_db \
--components ext_knowledge \
--success_story /path/to/success_story.csv \
--kb_update_strategy incremental
Subject Tree Categorization¶
Subject tree provides a hierarchical taxonomy for organizing knowledge entries.
1. Predefined Mode¶
datus bootstrap-kb \
--namespace analytics_db \
--components ext_knowledge \
--success_story /path/to/success_story.csv \
--kb_update_strategy overwrite \
--subject_tree "Finance/Revenue/Metrics,User/Engagement/DAU"
2. Learning Mode¶
When no subject_tree is provided, the system:
- Reuses existing categories from the Knowledge Base
- Creates new categories as needed based on content
- Builds taxonomy organically over time
Integration with SQL Agent¶
External knowledge integrates with the SQL generation agent through context search tools:
- Automatic Context Retrieval: When generating SQL, the agent queries relevant business knowledge
- Term Resolution: Ambiguous business terms are resolved using stored definitions
- Rule Application: Business rules stored in knowledge base guide SQL logic
Example Workflow¶
- User asks: "Calculate the GMV for last month"
- Agent searches knowledge for "GMV"
- Finds definition: "GMV = total value of merchandise including paid and unpaid orders"
- Generates SQL with correct business logic
Summary¶
The Bootstrap-KB External Knowledge component transforms scattered business knowledge into an intelligent, searchable knowledge base.
Key Features:
- Dual Import Modes: Direct CSV import or AI-driven generation from success stories
- Unified Repository: Centralized storage for business terminology and rules
- Semantic Search: Find knowledge using natural language queries
- Hierarchical Organization: Navigate knowledge through subject path taxonomy
- Flexible Classification: Support both predefined and learning modes
- Agent Integration: Enhances SQL generation with domain context
- Duplicate Detection: Automatic deduplication in incremental mode
By implementing external knowledge, teams can ensure consistent understanding of business concepts and enable intelligent SQL generation with domain awareness.