Skip to content

Workflow Orchestration

Workflow orchestration in Datus Agent is the process of defining, managing, and executing sequences of nodes to accomplish data analysis tasks. This guide explains how workflows are structured, configured, and executed to transform natural language requests into SQL queries and results.

Core Concepts

1. Workflow Definition

A workflow is a sequence of nodes that:

  • Has a clear purpose: Each workflow solves specific types of problems
  • Follows a logical order: Nodes execute in a predefined sequence
  • Shares data: Information flows between nodes through a shared context
  • Can be adaptive: Some workflows can modify themselves during execution

2. Workflow Configuration

Datus provides several built-in workflow templates optimized for different use cases:

workflow:
  reflection:
    - schema_linking
    - generate_sql
    - execute_sql
    - reflect
    - output

  fixed:
    - schema_linking
    - generate_sql
    - execute_sql
    - output

  metric_to_sql:
    - schema_linking
    - search_metrics
    - date_parser
    - generate_sql
    - execute_sql
    - output

Note: These are built-in workflow templates. To customize workflows or create your own, you need to configure them in agent.yml (see Customizing Workflows section below).

Built-in Workflow Types

1. Reflection Workflow

Purpose: Intelligent, self-improving SQL generation with adaptive behavior

Node Sequence:

Schema Linking → Generate SQL → Execute SQL → Reflect → Output

Key Features:

  • Self-assessment: Reflect node evaluates results and decides next steps
  • Adaptive: Can add new nodes dynamically based on execution results
  • Robust: Handles complex queries that may require multiple attempts

Best For:

  • Complex business queries
  • Situations where perfect SQL isn't generated on first try
  • Queries requiring domain knowledge

Real-world Example:

User: "Show me quarterly revenue trends by product category,
       excluding returns and considering seasonal adjustments"

Process:
1. Schema Linking: Finds orders, products, categories tables
2. Generate SQL: Creates initial quarterly revenue query
3. Execute SQL: Runs the query
4. Reflect: Notices missing seasonal adjustment logic
5. Add Fix Node: Corrects the query with seasonal calculations
6. Output: Final results with proper seasonal adjustments

2. Fixed Workflow

Purpose: Deterministic SQL generation with predictable execution path

Node Sequence:

Schema Linking → Generate SQL → Execute SQL → Output

Key Features:

  • Predictable: Always follows the same execution path
  • Fast: No reflection overhead
  • Simple: Easy to understand and debug
  • Reliable: Consistent behavior for well-understood problems

Best For:

  • Simple, straightforward queries
  • Well-defined data requirements
  • Situations where you know exactly what you need
  • Performance-critical applications

Real-world Example:

User: "List all customers from California"

Process:
1. Schema Linking: Finds customers table with state column
2. Generate SQL: Creates "SELECT * FROM customers WHERE state = 'CA'"
3. Execute SQL: Returns California customers
4. Output: Displays results

3. Metric-to-SQL Workflow

Purpose: Generate SQL from predefined business metrics

Node Sequence:

Schema Linking → Search Metrics → Date Parser → Generate SQL → Execute SQL → Output

Key Features:

  • Metric-driven: Starts with business metrics rather than raw SQL
  • Time-aware: Includes date parsing for temporal queries
  • Reusable: Leverages existing metric definitions
  • Standardized: Ensures consistent business calculations

Best For:

  • Business intelligence and reporting
  • Standardized KPI calculations
  • Time-series analysis
  • Dashboards and regular reports

Real-world Example:

User: "Show monthly active users for the last quarter"

Process:
1. Schema Linking: Finds user_activity table
2. Search Metrics: Finds "monthly_active_users" metric definition
3. Date Parser: Determines "last quarter" date range
4. Generate SQL: Creates query using the metric definition
5. Execute SQL: Runs the metric calculation
6. Output: Displays monthly active users by month

Workflow Configuration

Customizing Workflows

You can create custom workflow templates by adding them to your agent.yml configuration:

agent:
  workflow:
    plan: custom_analytics  # Set your custom plan as default

    custom_analytics:
      - schema_linking
      - search_metrics
      - generate_sql
      - execute_sql
      - compare
      - output

    data_exploration:
      - schema_linking
      - doc_search
      - generate_sql
      - execute_sql
      - reflect
      - output

Advanced Workflow Features

Parallel Execution

Workflows support parallel node execution for improved performance:

agent:
  workflow:
    plan: bird_para

    bird_para:
      - schema_linking
      - parallel:
        - generate_sql
        - reasoning
      - selection
      - execute_sql
      - output

Sub-workflows

You can define reusable sub-workflows:

agent:
  workflow:
    plan: main_workflow

    main_workflow:
      - schema_linking
      - parallel:
        - subworkflow1
        - subworkflow2
      - selection
      - execute_sql
      - output

    subworkflow1:
      - search_metrics
      - generate_sql

    subworkflow2:
      - search_metrics
      - reasoning

Sub-workflows with Custom Configuration

Sub-workflows can reference separate configuration files:

agent:
  workflow:
    plan: multi_agent

    multi_agent:
      - schema_linking
      - parallel:
        - agent1_workflow
        - agent2_workflow
      - selection
      - output

    agent1_workflow:
      steps:
        - search_metrics
        - generate_sql
      config: multi/agent1.yaml

    agent2_workflow:
      steps:
        - reasoning
        - reflect
      config: multi/agent2.yaml

Workflow Parameters

Workflows can be configured with parameters:

# Use specific workflow
datus run --namespace <your_namespace> --task "your query" --plan reflection

# Use custom workflow
datus run --namespace <your_namespace> --task "your query" --plan custom_analytics

Available Parameters

Parameter Description Default Options
--plan Workflow type to execute reflection reflection, fixed, metric_to_sql, custom
--namespace Database namespace Required Any configured namespace
--task Natural language query Required Any string
--max_iterations Maximum reflection rounds 3 Integer
--save_dir Directory to save workflow state ./save Any valid path

Best Practices

Workflow Selection

Use Fixed for Simple Queries

  • Direct data retrieval
  • Well-understood requirements
  • Performance-critical scenarios

Use Reflection for Complex Analysis

  • Multi-table joins
  • Business logic implementation
  • Uncertain or exploratory queries

Use Metric-to-SQL for Standardized Reports

  • KPI calculations
  • Regular business reports
  • Time-series analysis

Debugging and Monitoring

# Enable debug mode for detailed logging
datus run --namespace <your_namespace> --task "your query" --debug

# Save workflow state for inspection
datus run --namespace <your_namespace> --task "your query" --save_dir ./debug_session

# Resume from saved state
datus resume --save_dir ./debug_session

Conclusion

Workflow orchestration is the backbone of Datus Agent's intelligent SQL generation capabilities. By understanding the different workflow types and their appropriate use cases, you can leverage the full power of the system to solve complex data analysis problems efficiently and reliably.