Skip to content

Introduction

Datus is an open-source data engineering agent that builds evolvable context for your data systems. Unlike traditional tools that merely move data, Datus captures, learns, and evolves the knowledge surrounding your data—transforming metadata, reference SQL, semantic models, and metrics into a living knowledge base that grounds AI queries and eliminates hallucinations.

With Datus, data engineers shift from writing repetitive SQL to building reusable, AI-ready context. Every query, correction, and domain rule becomes long-term memory—enabling specialized subagents that deliver accurate, domain-aware analytics to your entire organization.

Datus architecture

Three Entry Points for Different Users

  • Datus-CLI: An AI-powered command-line interface for data engineers—think "Claude Code for data engineers." Write SQL, build subagents, and construct context interactively.
  • Datus-Chat: A web chatbot providing multi-turn conversations with built-in feedback mechanisms (upvotes, issue reports, success stories) for data analysts.
  • Datus-API: RESTful APIs for other agents or applications that need stable, accurate data services.

Two Execution Modes

  • Agentic Mode: Ideal for ad-hoc development and exploratory workflows. Flexible, conversational, and context-aware through specialized subagents.
  • Workflow Mode: Optimized for production scenarios requiring high stability and orchestration. Workflows can use subagents as nodes for complex pipelines.

Context Engine at the Core

The heart of Datus is its Context Engine, which combines human expertise with AI capabilities:

  • Automatically captures metadata, metrics, reference SQL, documents, and success stories
  • Supports human-in-the-loop curation and refinement
  • Powers both subagents and workflows with rich, domain-specific context

Flexible Integration Layer

Datus integrates seamlessly with your existing stack:

  • LLMs: OpenAI, Claude, DeepSeek, Qwen, Kimi, and more (Configuration)
  • Data Warehouses: StarRocks, Snowflake, DuckDB, SQLite, PostgreSQL, and others (Namespace Setup)
  • Semantic Layers: MetricFlow support for metric definitions and queries
  • Extensibility: Add custom integrations via MCP (Model Context Protocol)

Getting Started

Get your Datus Agent up and running in minutes.

Start Here

Quickstart Guide

Discover how Datus leverages contextual data engineering from your data assets to continuously learn and improve

Learn Key Concepts

Contextual Data Engineering

Important Topics

  • Datus CLI


    Command-line interface for local development and real-time preview of your data workflows.

    Learn more

  • Knowledge Base


    Centralized repository for organizing and managing your data assets and documentation.

    Browse knowledge base

  • Subagent System


    Extend Datus with specialized subagents for different data engineering tasks and workflows.

    Explore subagents

  • Workflow Management


    Design and orchestrate complex data pipelines with configurable workflow builder.

    Explore workflows