SQL Summary Generation Guide¶
Overview¶
The SQL Summary feature helps you analyze, classify, and catalog SQL queries for knowledge reuse. It automatically generates structured YAML summaries that are stored in a searchable Knowledge Base, making it easy to find and reuse similar queries in the future.
What is a SQL Summary?¶
A SQL summary is a structured YAML document that captures: - Query Text: The complete SQL query - Business Context: Domain, categories, and tags - Semantic Summary: Detailed explanation for vector search - Metadata: Name, comment, file path
Use Cases: - Build a searchable SQL query library - Share and reuse proven queries across teams - Document complex query patterns - Enable semantic search: "Find queries related to revenue analysis"
Quick Start¶
Launch the SQL summary generation subagent:
/gen_sql_summary Analyze this SQL: SELECT SUM(revenue) FROM sales GROUP BY region.(You can also add some description on this SQL)
How It Works¶
Generation Workflow¶
User provides SQL + description → Agent analyzes query → Automatically retrieves context (taxonomy + similar queries) →
Generates unique ID → Creates YAML → Saves file → User confirms → Syncs to Knowledge Base
Step-by-Step Process¶
- Understand SQL: The AI analyzes your query structure and business logic
- Get Context: Automatically retrieves from Knowledge Base:
- Existing subject trees (domain/layer1/layer2 combinations)
- Similar SQL summaries for classification reference
- Generate Unique ID: Uses
generate_sql_summary_id()tool based on SQL + comment - Create Unique Name: Generates a descriptive name (max 20 characters)
- Classify Query: Assigns domain, layer1, layer2, and tags following existing patterns
- Generate YAML: Creates structured summary document
- Save File: Writes YAML to workspace using
write_file()tool - User Confirmation: Shows the generated YAML and prompts for approval
- Sync to Knowledge Base: Stores in LanceDB for semantic search
Interactive Confirmation¶
After generation, you'll see:
==========================================================
Generated Reference SQL YAML
File: /path/to/sql_summary.yml
==========================================================
[YAML content with syntax highlighting]
SYNC TO KNOWLEDGE BASE?
1. Yes - Save to Knowledge Base
2. No - Keep file only
Please enter your choice: [1/2]
Configuration¶
Agent Configuration¶
Most configurations are built-in. In agent.yml, minimal setup is needed:
agentic_nodes:
gen_sql_summary:
model: deepseek # Optional: defaults to configured model
max_turns: 30 # Optional: defaults to 30
Built-in configurations (automatically enabled):
- Tools: Filesystem tools (read_file, write_file, edit_file, list_directory) and generate_sql_summary_id
- Hooks: User confirmation workflow in interactive mode
- System Prompt: Built-in template version 1.0
- Workspace: ~/.datus/data/{namespace}/reference_sql
- Context Retrieval: Automatically queries existing subject trees and similar SQLs from Knowledge Base
Configuration Options¶
| Parameter | Required | Description | Default |
|---|---|---|---|
model |
No | LLM model to use | Uses default configured model |
max_turns |
No | Maximum conversation turns | 30 |
Subject Tree Categorization¶
Subject tree allows organizing SQL summaries by domain and layers. In CLI mode, include it in your question:
Example with subject_tree:
/gen_sql_summary Analyze this SQL: SELECT SUM(revenue) FROM sales, subject_tree: sales/reporting/revenue_analysis
Example without subject_tree:
When subject_tree is provided, the SQL summary will be categorized accordingly (e.g., domain: sales, layer1: reporting, layer2: revenue_analysis). If not provided, the agent operates in learning mode and automatically suggests categories based on: - Existing subject trees in the Knowledge Base - Similar SQL summaries (top 5 most similar queries)
YAML Structure¶
The generated SQL summary follows this structure:
id: "abc123def456..." # Auto-generated MD5 hash
name: "Revenue by Region" # Descriptive name (max 20 chars)
sql: | # Complete SQL query
SELECT
region,
SUM(revenue) as total_revenue
FROM sales
GROUP BY region
comment: "Calculate total revenue grouped by region"
summary: "This query aggregates total revenue from the sales table, grouping results by geographic region. It uses SUM aggregation to calculate revenue totals for each region."
filepath: "/Users/you/.datus/data/reference_sql/revenue_by_region.yml"
domain: "Sales" # Business domain
layer1: "Reporting" # Primary category
layer2: "Revenue Analysis" # Secondary category
tags: "revenue, region, aggregation" # Comma-separated tags
Field Descriptions¶
| Field | Required | Description | Example |
|---|---|---|---|
id |
Yes | Unique hash (auto-generated) | abc123def456... |
name |
Yes | Short descriptive name (max 20 chars) | Revenue by Region |
sql |
Yes | Complete SQL query | SELECT ... |
comment |
Yes | Brief one-line description | User's message or generated summary |
summary |
Yes | Detailed explanation (for search) | Comprehensive query description |
filepath |
Yes | Actual file path | /path/to/file.yml |
domain |
Yes | Business domain | Sales, Marketing, Finance |
layer1 |
Yes | Primary category | Reporting, Analytics, ETL |
layer2 |
Yes | Secondary category | Revenue Analysis, Customer Insights |
tags |
Optional | Comma-separated keywords | revenue, region, aggregation |
Summary¶
The SQL Summary feature provides:
✅ Automated Analysis: AI understands query structure and purpose
✅ Smart Classification: Taxonomy-based categorization with consistency using existing patterns
✅ Automatic Context Retrieval: Queries existing subject trees and similar SQLs from Knowledge Base
✅ Deduplication: Automatic hash-based duplicate detection using generate_sql_summary_id tool
✅ Semantic Search: Vector embeddings enable intelligent query discovery
✅ Interactive Workflow: Review and approve before syncing (in interactive mode)
✅ Subject Tree Support: Organize by domain/layer1/layer2 with predefined or learned categories
✅ Knowledge Reuse: Build searchable SQL query library