OpenSage: Self-programming Agent Generation Engine

Hongwei Li1,*, Zhun Wang2,*, Qinrun Dai3, Yuzhou Nie1, Jinjun Peng4, Ruitong Liu3, Jingyang Zhang6, Kaijie Zhu1, Jingxuan He2, Lun Wang7, Yangruibo Ding5, Yueqi Chen3, Wenbo Guo1, Dawn Song2
1UC Santa Barbara 2UC Berkeley 3University of Colorado Boulder 4Columbia University 5UCLA 6Duke University 7Google DeepMind
*Fully equal contribution

An AI-centered agent generation engine that enables LLMs to self-create agent topology, synthesize toolsets, and manage structured memory for complex real-world tasks.

Overview

OpenSage system overview figure

OpenSage (Open Self-programming Agent Generation Engine) is an AI-centered agent framework designed to shift agent development from a human-engineered, fixed paradigm to an AI-driven, self-programming one. Instead of requiring developers to hand-design workflows, tool lists, and memory logic for each task, OpenSage provides a minimal scaffold that lets the model create and orchestrate these components at runtime.

OpenSage is built around three core systems that strongly influence agent performance:

  • Self-generating agent topology: the agent can dynamically create, execute, and terminate sub-agents during task execution, supporting both vertical agent topology (decomposing a complex task into sequential sub-tasks handled by specialized sub-agents) and horizontal agent topology (multiple sub-agents execute the same task using distinct plans, then merge results via an agent ensemble mechanism).
  • Dynamic tool synthesis and management: the agent can create tools during execution (e.g., scripts, analyzers, generators), supported by a tooling runtime with tool-specific sandboxing and state management.
  • Hierarchical Memory Management: target-level long-term memory (a graph database for shareable knowledge) plus execution-based short-term memory (a graph structure for tracking agent runs), with a built-in, dedicated memory agent for memory management that can be enabled with a single line of code.

Feature Matrix

Comparison between OpenSage and SOTA ADKs in key features. ● means full support; ◐ means partial or limited support; ○ means not supported.

Feature matrix comparing OpenSage and SOTA ADKs in key features, using solid, half, and hollow circles to denote support.

Results

Unlike toy frameworks, agents built with OpenSage is evaluated on three state-of-the-art benchmarks (CyberGym, Terminal-Bench 2.0, and SWE-Bench Pro) where it achieves leading performance.

Benchmark results on CyberGym, Terminal-Bench 2.0, and SWE-Bench Pro (Python): resolved rate (%) bar charts.

Key Techniques

Self-generating Agent Topology

  • Dynamic creation/execution/termination of sub-agents
  • Vertical topology and horizontal topology
  • Agent ensemble mechanism

Dynamic Tool Synthesis

  • Dynamic tool synthesis and management
  • Tool-specific sandboxing and tool state management
  • Domain-specific tool set tailored to software engineering and security tasks

Hierarchical Memory Management

  • Short-term memory: graph-based short-term memory
  • Long-term memory: graph-based long-term memory generated by agents
  • Memory agent: built-in dedicated agent for memory management

Domain-Specific Toolkit

OpenSage includes a comprehensive toolkit spanning software engineering and security, covering both static and dynamic analysis, enabling agents to perform real-world tasks out of the box.

Category Tool set Libraries Features
Static Code analysis Joern, CodeQL CPG query, call graph analysis, dataflow slicing, semantic-aware search
Dynamic Fuzzing AFL++, LibFuzzer Customizable seed generation, mutation, scoring
Dynamic Coverage LLVM-Cov Query coverage with Neo4j, generate detailed reports
Dynamic Debugger GDB, PDB Breakpoints, inspect states, trace execution, custom commands

More Key Findings

In addition to the overall benchmark scores, our analyses provide more concrete evidence for why OpenSage works: agent topology, tooling, and memory each contribute materially, and the framework supports practical patterns like heterogeneous model collaboration.

CyberGym ablation: topology and tooling

Topology + tooling are both necessary (CyberGym). This plot quantifies how OpenSage’s key design choices affect end-to-end vulnerability reproduction.

The left panel isolates agent topology effects (horizontal ensemble and vertical dynamic sub-agents). The right panel isolates the tooling system contribution versus a raw terminal and a no-feature baseline.

CyberGym ablation: tooling system

Tooling is more than a shell (CyberGym). Evaluates OpenSage’s tooling system versus replacing it with a raw terminal interface.

This highlights the benefit of dynamic tool synthesis plus tool/runtime management beyond “just having a bash tool”.

SWE-Bench Pro ablation: memory designs

Memory helps long-horizon tasks (SWE-Bench Pro). Compares OpenSage’s memory against no-memory and Mem0g.

OpenSage’s hierarchical, agent-managed memory achieves the best performance among the compared baselines. Long-horizon tasks benefit from explicitly storing high-signal intermediate findings and retrieving them at the point of need, especially after history compaction or when revisiting earlier decisions.

Terminal-Bench large-small collaboration results

Heterogeneous model collaboration (Terminal-Bench). Shows resolved rate and cost when combining a strong “planner/reviewer” with a cheaper “executor” via sub-agents.

This demonstrates OpenSage’s flexibility for cost/quality trade-offs by assigning different models to different roles.

Citation

@misc{li2026opensage,
  title={OpenSage: Self-programming Agent Generation Engine},
  author={Hongwei Li and Zhun Wang and Qinrun Dai and Yuzhou Nie and Jinjun Peng and Ruitong Liu and Jingyang Zhang and Kaijie Zhu and Jingxuan He and Lun Wang and Yangruibo Ding and Yueqi Chen and Wenbo Guo and Dawn Song},
  year={2026},
  archivePrefix={arXiv},
  primaryClass={cs.SE},
}