Schema Metadata Intelligence¶
Introduction¶
The metadata module is primarily used to enable LLMs to quickly match possible related table definition information and sample data based on user questions.
When you use the bootstrap-kb command, we initialize the SQL statements and sample data for creating tables/views/materialized views in the data source you specify into a vector database.
This module contains two types of information: table definition and sample data.
Data Structure of Table Definition¶
| Field Name | Explanation | Supported Database Types |
|---|---|---|
catalog_name |
The top-level container in a database system. It typically represents a collection of databases and provides metadata about them, such as available schemas, tables, and security settings | StarRocks/Snowflake |
database_name |
A logical container that stores related data. It usually groups together multiple schemas and provides boundaries for data organization, security, and management. | DuckDB/MySQL/StarRocks/Snowflake |
schema_name |
A namespace inside a database. It organizes objects such as tables, views, functions, and procedures into logical groups. Schemas help avoid name conflicts and support role-based access. | DuckDB/Snowflake |
table_type |
The types of tables in the database, including table, view, and mv (abbreviation for materialized view). Each database supports table and view. DuckDB and Snowflake support materialized views. |
All supported databases |
table_name |
Name of the table/view/materialized view | All supported databases |
definition |
SQL statements for creating tables/views/materialized views | All supported databases |
identifier |
The unique identifier of the current table, which is composed of catalog_name, database_name, schema_name and table_name. You don't need to worry about it, because you won't need it in most scenarios. |
All supported databases |
Data Structure of Sample Data¶
| Field Name | Explanation |
|---|---|
catalog_name |
Same as above |
database_name |
Same as above |
schema_name |
Same as above |
table_type |
Same as above |
table_name |
Same as above |
sample_rows |
Sample data for the current table/view/mv. Usually it will be the first 5 items in the current table |
identifier |
Same as above |
How to Build¶
You can build it using the datus-agent bootstrap-kb command:
datus-agent bootstrap-kb --namespace <your_namespace> --kb_update_strategy [check/overwrite/incremental]
Command Line Parameter Description¶
--namespace: The key corresponding to your database configuration--kb_update_strategy: Execution strategy, there are three options:check: Check the number of data entries currently constructedoverwrite: Fully overwrite existing dataincremental: Incremental update: if existing data has changed, update it and append non-existent data
Usage Examples¶
Check Current Status¶
Full Rebuild¶
Incremental Update¶
```bash
datus-agent bootstrap-kb --namespace
Best Practices¶
Database Configuration¶
- Ensure your database namespace is properly configured in
agent.yml - Verify database connectivity before running bootstrap commands
- Use appropriate credentials with read access to system tables
Update Strategy Selection¶
- Use
checkto verify current state without making changes - Use
overwritefor initial setup or when schema has changed significantly - Use
incrementalfor regular updates to capture new tables and changes
Performance Considerations¶
- Large databases may take time to process during initial bootstrap
- Consider running during off-peak hours for production databases
- Monitor disk space as metadata is stored locally in LanceDB
Troubleshooting¶
Common Issues¶
- Permission errors: Ensure database user has access to system/information schema tables
- Connection timeouts: Check network connectivity and database availability
- Large result sets: Consider filtering to specific schemas if database is very large
Verification¶
After bootstrap completion, verify the metadata was captured correctly:
- Check LanceDB storage directory for populated files
- Test search functionality through the CLI
- Verify sample data represents actual table contents