Workflow Orchestration¶
Workflow orchestration in Datus Agent is the process of defining, managing, and executing sequences of nodes to accomplish data analysis tasks. This guide explains how workflows are structured, configured, and executed to transform natural language requests into SQL queries and results.
Core Concepts¶
1. Workflow Definition¶
A workflow is a sequence of nodes that:
- Has a clear purpose: Each workflow solves specific types of problems
- Follows a logical order: Nodes execute in a predefined sequence
- Shares data: Information flows between nodes through a shared context
- Can be adaptive: Some workflows can modify themselves during execution
2. Workflow Configuration¶
Datus provides several built-in workflow templates optimized for different use cases:
workflow:
reflection:
- schema_linking
- generate_sql
- execute_sql
- reflect
- output
fixed:
- schema_linking
- generate_sql
- execute_sql
- output
metric_to_sql:
- schema_linking
- search_metrics
- date_parser
- generate_sql
- execute_sql
- output
Note: These are built-in workflow templates. To customize workflows or create your own, you need to configure them in agent.yml
(see Customizing Workflows section below).
Built-in Workflow Types¶
1. Reflection Workflow¶
Purpose: Intelligent, self-improving SQL generation with adaptive behavior
Node Sequence:
Key Features:
- Self-assessment: Reflect node evaluates results and decides next steps
- Adaptive: Can add new nodes dynamically based on execution results
- Robust: Handles complex queries that may require multiple attempts
Best For:
- Complex business queries
- Situations where perfect SQL isn't generated on first try
- Queries requiring domain knowledge
Real-world Example:
User: "Show me quarterly revenue trends by product category,
excluding returns and considering seasonal adjustments"
Process:
1. Schema Linking: Finds orders, products, categories tables
2. Generate SQL: Creates initial quarterly revenue query
3. Execute SQL: Runs the query
4. Reflect: Notices missing seasonal adjustment logic
5. Add Fix Node: Corrects the query with seasonal calculations
6. Output: Final results with proper seasonal adjustments
2. Fixed Workflow¶
Purpose: Deterministic SQL generation with predictable execution path
Node Sequence:
Key Features:
- Predictable: Always follows the same execution path
- Fast: No reflection overhead
- Simple: Easy to understand and debug
- Reliable: Consistent behavior for well-understood problems
Best For:
- Simple, straightforward queries
- Well-defined data requirements
- Situations where you know exactly what you need
- Performance-critical applications
Real-world Example:
User: "List all customers from California"
Process:
1. Schema Linking: Finds customers table with state column
2. Generate SQL: Creates "SELECT * FROM customers WHERE state = 'CA'"
3. Execute SQL: Returns California customers
4. Output: Displays results
3. Metric-to-SQL Workflow¶
Purpose: Generate SQL from predefined business metrics
Node Sequence:
Key Features:
- Metric-driven: Starts with business metrics rather than raw SQL
- Time-aware: Includes date parsing for temporal queries
- Reusable: Leverages existing metric definitions
- Standardized: Ensures consistent business calculations
Best For:
- Business intelligence and reporting
- Standardized KPI calculations
- Time-series analysis
- Dashboards and regular reports
Real-world Example:
User: "Show monthly active users for the last quarter"
Process:
1. Schema Linking: Finds user_activity table
2. Search Metrics: Finds "monthly_active_users" metric definition
3. Date Parser: Determines "last quarter" date range
4. Generate SQL: Creates query using the metric definition
5. Execute SQL: Runs the metric calculation
6. Output: Displays monthly active users by month
Workflow Configuration¶
Customizing Workflows¶
You can create custom workflow templates by adding them to your agent.yml
configuration:
agent:
workflow:
plan: custom_analytics # Set your custom plan as default
custom_analytics:
- schema_linking
- search_metrics
- generate_sql
- execute_sql
- compare
- output
data_exploration:
- schema_linking
- doc_search
- generate_sql
- execute_sql
- reflect
- output
Advanced Workflow Features¶
Parallel Execution¶
Workflows support parallel node execution for improved performance:
agent:
workflow:
plan: bird_para
bird_para:
- schema_linking
- parallel:
- generate_sql
- reasoning
- selection
- execute_sql
- output
Sub-workflows¶
You can define reusable sub-workflows:
agent:
workflow:
plan: main_workflow
main_workflow:
- schema_linking
- parallel:
- subworkflow1
- subworkflow2
- selection
- execute_sql
- output
subworkflow1:
- search_metrics
- generate_sql
subworkflow2:
- search_metrics
- reasoning
Sub-workflows with Custom Configuration¶
Sub-workflows can reference separate configuration files:
agent:
workflow:
plan: multi_agent
multi_agent:
- schema_linking
- parallel:
- agent1_workflow
- agent2_workflow
- selection
- output
agent1_workflow:
steps:
- search_metrics
- generate_sql
config: multi/agent1.yaml
agent2_workflow:
steps:
- reasoning
- reflect
config: multi/agent2.yaml
Workflow Parameters¶
Workflows can be configured with parameters:
# Use specific workflow
datus run --namespace <your_namespace> --task "your query" --plan reflection
# Use custom workflow
datus run --namespace <your_namespace> --task "your query" --plan custom_analytics
Available Parameters¶
Parameter | Description | Default | Options |
---|---|---|---|
--plan |
Workflow type to execute | reflection |
reflection , fixed , metric_to_sql , custom |
--namespace |
Database namespace | Required | Any configured namespace |
--task |
Natural language query | Required | Any string |
--max_iterations |
Maximum reflection rounds | 3 |
Integer |
--save_dir |
Directory to save workflow state | ./save |
Any valid path |
Best Practices¶
Workflow Selection¶
Use Fixed for Simple Queries
- Direct data retrieval
- Well-understood requirements
- Performance-critical scenarios
Use Reflection for Complex Analysis
- Multi-table joins
- Business logic implementation
- Uncertain or exploratory queries
Use Metric-to-SQL for Standardized Reports
- KPI calculations
- Regular business reports
- Time-series analysis
Debugging and Monitoring¶
# Enable debug mode for detailed logging
datus run --namespace <your_namespace> --task "your query" --debug
# Save workflow state for inspection
datus run --namespace <your_namespace> --task "your query" --save_dir ./debug_session
# Resume from saved state
datus resume --save_dir ./debug_session
Conclusion¶
Workflow orchestration is the backbone of Datus Agent's intelligent SQL generation capabilities. By understanding the different workflow types and their appropriate use cases, you can leverage the full power of the system to solve complex data analysis problems efficiently and reliably.