Database Adapters¶

Datus Agent supports connecting to various databases through a plugin-based adapter system. This document explains the available adapters, how to install them, and how to configure your database connections.

Overview¶

Datus uses a modular adapter architecture that allows you to connect to different databases:

Built-in Adapters: SQLite and DuckDB are included with the core package
Plugin Adapters: Additional databases (MySQL, Snowflake, StarRocks, etc.) can be installed as separate packages

This design keeps the core package lightweight while allowing you to add support for specific databases as needed.

Supported Databases¶

Database	Package	Installation	Status
SQLite	Built-in	Included	Ready
DuckDB	Built-in	Included	Ready
MySQL	datus-mysql	`pip install datus-mysql`	Ready
PostgreSQL	datus-postgresql	`pip install datus-postgresql`	Ready
StarRocks	datus-starrocks	`pip install datus-starrocks`	Ready
Snowflake	datus-snowflake	`pip install datus-snowflake`	Ready
ClickZetta	datus-clickzetta	`pip install datus-clickzetta`	Ready
Hive	datus-hive	`pip install datus-hive`	Ready
Spark	datus-spark	`pip install datus-spark`	Ready
ClickHouse	datus-clickhouse	`pip install datus-clickhouse`	Ready
Trino	datus-trino	`pip install datus-trino`	Ready

Installation¶

Built-in Databases¶

SQLite and DuckDB are included with Datus Agent and require no additional installation.

Plugin Adapters¶

Install the adapter package for your database:

# MySQL
pip install datus-mysql

# PostgreSQL
pip install datus-postgresql

# Snowflake
pip install datus-snowflake

# StarRocks
pip install datus-starrocks

# ClickZetta
pip install datus-clickzetta

# Hive
pip install datus-hive

# Spark
pip install datus-spark

# ClickHouse
pip install datus-clickhouse

# Trino
pip install datus-trino

Once installed, Datus Agent will automatically detect and load the adapter.

Configuration¶

Configure database connections under agent.services.datasources in agent.yml:

agent:
  services:
    datasources:
      mydata:
        type: sqlite
        uri: sqlite:///path/to/database.db

Each entry under services.datasources is one logical database connection.

SQLite¶

mydata:
  type: sqlite
  uri: sqlite:///path/to/database.db

DuckDB¶

analytics:
  type: duckdb
  uri: duckdb:///path/to/database.duckdb

MySQL¶

production:
  type: mysql
  host: localhost
  port: 3306
  username: your_username
  password: your_password
  database: your_database

PostgreSQL¶

production_pg:
  type: postgresql
  host: localhost
  port: 5432
  username: your_username
  password: your_password
  database: your_database
  schema: public  # optional, default is public
  sslmode: prefer  # optional, default is prefer

Snowflake¶

warehouse:
  type: snowflake
  account: your_account
  username: your_username
  password: your_password  # Use either password or private_key_file
  # private_key_file: /path/to/rsa_key.p8
  # private_key_file_pwd: optional_key_passphrase
  warehouse: your_warehouse
  database: your_database
  schema: your_schema
  role: your_role  # optional

StarRocks¶

analytics:
  type: starrocks
  host: localhost
  port: 9030
  username: root
  password: your_password
  database: your_database

ClickZetta¶

lakehouse:
  type: clickzetta
  service: CLICKZETTA_SERVICE
  username: CLICKZETTA_USERNAME
  password: CLICKZETTA_PASSWORD
  instance: CLICKZETTA_INSTANCE
  workspace: CLICKZETTA_WORKSPACE
  schema: CLICKZETTA_SCHEMA
  vcluster: CLICKZETTA_VCLUSTER

Hive¶

hive_data:
  type: hive
  host: 127.0.0.1
  port: 10000
  username: hive
  database: default
  auth: NONE  # optional: NONE, LDAP, CUSTOM, KERBEROS
  configuration:  # optional Hive session config
    hive.execution.engine: spark

Spark¶

spark_data:
  type: spark
  host: localhost
  port: 10000
  username: spark
  database: default
  auth_mechanism: NONE  # optional: NONE, PLAIN, KERBEROS

ClickHouse¶

analytics:
  type: clickhouse
  host: localhost
  port: 8123
  username: default
  password: your_password
  database: your_database

Trino¶

trino_data:
  type: trino
  host: localhost
  port: 8080
  username: trino
  catalog: hive
  schema: default
  http_scheme: http  # optional: http or https

Multiple Database Entries¶

agent:
  services:
    datasources:
      source_db:
        type: mysql
        host: source-server
        username: reader
        password: password
        database: source

      target_db:
        type: snowflake
        account: your_account
        username: writer
        password: password
        warehouse: compute_wh
        database: target

Features by Adapter¶

Common Features¶

All adapters support:

SQL query execution (SELECT, INSERT, UPDATE, DELETE)
DDL operations (CREATE, ALTER, DROP)
Metadata retrieval (tables, views, schemas)
Sample data retrieval
Connection pooling and timeout management

Adapter-Specific Features¶

MySQL¶

INFORMATION_SCHEMA queries
SHOW CREATE TABLE/VIEW support
Full CRUD operations

PostgreSQL¶

INFORMATION_SCHEMA queries
Tables, views, and materialized views support
Multi-schema datasource support
SSL connection modes (disable, allow, prefer, require, verify-ca, verify-full)
SQLAlchemy-based (psycopg2 driver)

Snowflake¶

Multi-database and schema support
Tables, views, and materialized views
Arrow format for efficient data transfer
Native SDK integration

StarRocks¶

Multi-Catalog support
Materialized view support
MySQL protocol compatibility

ClickZetta¶

Workspace and schema management
Volume/Stage file operations
Native SDK integration

Hive¶

HiveServer2/Thrift protocol connection
Hive session configuration support
Multiple auth mechanisms (NONE, LDAP, CUSTOM, KERBEROS)
Database context switching (USE statement)

Spark¶

Spark Thrift Server connection via HiveServer2 protocol
Multiple auth mechanisms (NONE, PLAIN, KERBEROS)
Spark SQL dialect support

ClickHouse¶

HTTP protocol connection
No schema layer (databases serve as schemas)
ClickHouse-specific DML syntax (ALTER TABLE UPDATE)
Lightweight deletes support

Trino¶

Three-level hierarchy: catalog → schema → table
Cross-catalog query support
Built-in TPC-H connector for benchmarking
HTTP/HTTPS connection with SSL support

Troubleshooting¶

Adapter Not Found¶

If you see an error like Connector 'mysql' not found, make sure you have installed the corresponding adapter package:

pip install datus-mysql

Connection Issues¶

Check the following:

Network connectivity: Ensure you can reach the database server
Credentials: Verify username and password are correct
Port: Confirm the correct port is specified
Database name: Ensure the database exists

Driver Dependencies¶

Some adapters require additional system dependencies:

MySQL: Requires pymysql (installed automatically)
PostgreSQL: Requires psycopg2-binary (installed automatically)
Snowflake: Requires snowflake-connector-python (installed automatically)
Hive: Requires pyhive, thrift, thrift-sasl, pure-sasl (installed automatically)
Spark: Requires pyhive, thrift, thrift-sasl, pure-sasl (installed automatically)
ClickHouse: Requires clickhouse-sqlalchemy (installed automatically)
Trino: Requires trino (installed automatically)

Architecture¶

datus-agent (Core)
├── Built-in Adapters
│   ├── SQLite Connector
│   └── DuckDB Connector
│
└── Plugin System (Entry Points)
    ├── datus-sqlalchemy (Base layer)
    │   ├── datus-mysql
    │   ├── datus-postgresql
    │   ├── datus-starrocks
    │   ├── datus-hive
    │   ├── datus-spark
    │   ├── datus-clickhouse
    │   └── datus-trino
    │
    └── Native SDK Adapters
        ├── datus-snowflake
        └── datus-clickzetta

The adapter system uses Python's entry points mechanism for automatic discovery. When you install an adapter package, it registers itself with Datus Agent and becomes available for use.

Next Steps¶

Quick Start Guide - Get started with Datus Agent
Configuration Reference - Detailed configuration options