← Back to list
February 11, 2026

Software That Rewrites Itself

Inside the architecture that makes OpenClaw the first truly self-improving AI agent — and what it means for the future of software

By the OpenClaw Team · February 2026 · 15 min read

“The skill system is the entire game with OpenClaw. A skillless agent is just a really expensive chatbot that forgets everything between sessions.” — Jonathan, “Stop. Do Not Touch OpenClaw Until You Read This” [1]

TL;DR

OpenClaw isn’t just a chatbot with extra features. It’s a fundamentally different architecture: a persistent Gateway server that routes messages between your chat apps, an AI brain, and a library of tools called “skills.” The secret sauce is that skills are plain Markdown files — and the AI can write new ones itself, extending its own capabilities through conversation. This article is a technical deep dive into how it actually works: the Gateway, the skill system (with its clever “progressive disclosure” trick), multi-agent delegation, the memory architecture, and how all of this compares to other frameworks like LangChain and CrewAI. If you want to try it without building any of this yourself, that’s what GoClaw is for.

The Gateway: A Server, Not a Script

Most people’s first instinct when they hear “AI assistant” is to picture something like a ChatGPT plugin — a script that runs when you call it and dies when you’re done.

OpenClaw is nothing like that. At its core, OpenClaw is a long-running Node.js server called the Gateway [2]. It starts, it stays running, and it waits. Think of it less like a script and more like a web server — except instead of serving web pages, it serves intelligence.

The Gateway runs on port 18789 by default and manages everything: inbound messages from your chat apps, outbound calls to AI providers (Anthropic, OpenAI, Google, local models via Ollama), tool execution, memory persistence, scheduled jobs, and security [2]. The entire OpenClaw codebase is roughly 3,400 TypeScript files in a pnpm monorepo, with additional Swift code for macOS/iOS and Kotlin for Android [3].

Why does this matter? Because the Gateway model is what makes everything else possible.

A script that runs and dies can’t remember things between sessions. A script that runs and dies can’t proactively reach out to you at 8 AM with a morning briefing. A script that runs and dies can’t listen simultaneously on WhatsApp, Telegram, Slack, and Discord. The Gateway can do all of these things because it’s always alive, always listening, always maintaining state.

This is also why self-hosting OpenClaw is non-trivial. You’re not just “installing an app” — you’re deploying a server. It needs persistent storage, network connectivity, process management, and security hardening. (More on this at the end of the article.)

The Gateway connects to over 10 messaging channels natively: WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Google Chat, Microsoft Teams, Matrix, and Zalo [4]. Messages from any channel arrive at the Gateway, get routed to the appropriate AI model, and the response flows back through the same channel. From the user’s perspective, they’re just texting. Behind the scenes, there’s a sophisticated routing and state management layer that tracks sessions, manages context windows, and handles concurrent conversations.

Gateway Architecture

Skills: The App Store for Your Agent

If the Gateway is the engine, skills are the fuel. And this is where OpenClaw’s design gets genuinely brilliant.

What a Skill Actually Is

A skill is a directory containing a file called SKILL.md [5]. That’s it. No compiled code. No binary plugins. No SDK to learn. Just a Markdown file with some YAML metadata at the top and plain-English instructions below.

Here’s what a minimal skill looks like:

---
name: daily-summary
description: Generate a daily summary of tasks, meetings, and key updates.
---

# Daily Summary Skill

## When to use
When the user asks for a daily briefing, morning summary, or end-of-day recap.

## How to execute
1. Check the user's calendar for today's events
2. Review any pending tasks or reminders
3. Scan recent messages for action items
4. Format as a concise briefing with priorities highlighted

That’s a real, functional skill. The AI reads these instructions and follows them. No API calls to register. No webhook endpoints. No custom authentication flows. The skill is a document that teaches the agent how to do something — the same way you’d write instructions for a human colleague.

OpenClaw ships with 50+ built-in skills covering GitHub, Notion, Slack, Spotify, Apple Notes, Obsidian, Gmail, Google Calendar, web browsing, file management, and more [4]. A community registry called ClawHub hosts thousands more, ranging from DeFi trading to Todoist integration to custom meditation generators.

Skills can also include executable scripts (Python, Bash, whatever you need) in a scripts/ subdirectory and detailed reference documentation in references/ — but the SKILL.md file is always the entry point, always the source of truth [6].

Progressive Disclosure: The Trick That Makes It Scale

Here’s the obvious problem: if you install 50 skills, and each skill’s instructions are 500+ words, you’d be injecting 25,000+ tokens into every single conversation just to list what the agent can do. At typical API pricing, that’s expensive and slow — and most of it would be irrelevant to whatever the user is actually asking about.

OpenClaw solves this with a pattern called progressive disclosure — and we think it’s one of the cleverest design decisions in the entire project [7].

It works in three tiers:

Tier 1 — Name tags only. When a session starts, OpenClaw loads just the name and one-line description of each skill into the system prompt. We’re talking ~30 tokens per skill [8]. Fifty skills? About 1,500 tokens. The agent knows “I have a Slack skill, I have a GitHub skill, I have a Spotify skill” — but nothing more.

Tier 2 — The SKILL.md body. When the agent determines that a specific skill is relevant to the current task, it loads that skill’s full SKILL.md file. This is the quick-start guide: common operations, instructions, pointers to detailed references. Typically 200–500 tokens. Crucially, the system prompt explicitly instructs the agent: “Only read one skill upfront; never read more than one.” [9]

Tier 3 — Reference files. Only when the agent needs deep, specific knowledge (a full API reference, a database schema, a complex script) does it load individual reference files. These can be thousands of tokens, but they’re loaded on-demand and only when truly needed.

Progressive Disclosure

The result is that an agent with 50 installed skills loads only ~1,500 tokens of skill metadata per conversation, reads ~300 tokens when it activates a relevant skill, and occasionally dives into a 2,000-token reference file for complex tasks. Compare that to naively dumping all instructions into every prompt: the cost difference is roughly 10–20x [8].

This isn’t just an optimization — it’s what makes the skill ecosystem viable. Without progressive disclosure, every new skill you install would make every conversation more expensive and slower. With it, installing more skills has almost zero marginal cost until you actually use them.

Self-Writing Skills: Where It Gets Wild

The skill format is intentionally simple. Markdown with YAML frontmatter. Plain English instructions. Optional scripts. This simplicity isn’t a limitation — it’s the feature that enables OpenClaw’s most remarkable capability.

Since the AI can read and write files, and since skills are just files… the AI can write new skills for itself.

When you ask OpenClaw to do something it doesn’t have a skill for, it doesn’t just say “I can’t do that.” It can:

  1. Examine the existing skill format (by reading any existing SKILL.md as a template)
  2. Research the required API or tool
  3. Write a new SKILL.md with instructions
  4. Optionally write supporting scripts
  5. Place the skill in the correct directory
  6. Start using it immediately

Self-Writing Skill Process

Real examples from the community: one user asked his agent to integrate with his university’s course management system. The agent wrote the skill autonomously and began using it on its own [10]. Another set up OpenClaw in Discord and found that “the fact that Claw can just keep building upon itself just by talking to it is crazy” [11]. One user had his agent write custom meditation skills with text-to-speech integration and generated ambient audio [12].

Peter Steinberger calls this “self-rewriting software.” In the Lex Fridman interview, he dedicated an entire chapter to the concept, describing it as the most important aspect of the project [13]. We tend to agree.

Think about the implications. Traditional software has a fixed set of capabilities defined at compile time. SaaS products add features on their roadmap’s schedule, not yours. OpenClaw’s capabilities are bounded only by what the underlying AI model can write — and those models are getting better every month. A skill written today for Claude Sonnet 4 will work with Claude Opus 4.6, Gemini, GPT-4o, or any future model. The format is model-agnostic. The improvement is compounding.

Multi-Agent Delegation: One Brain Isn’t Enough

Single-agent architectures hit a wall when tasks get complex. If you ask an agent to “research the top 10 competitors in our space and write a report comparing their pricing,” a single agent has to do everything sequentially: research each competitor, compile notes, write the report, format it. The conversation gets long. The context window fills up. Costs accumulate.

OpenClaw addresses this with sub-agents — background workers that the main agent can spawn to handle tasks in parallel [14].

How It Works

The main agent calls a tool called sessions_spawn with a task description. This is non-blocking — the main agent immediately gets back a confirmation and continues the conversation with you [14]. Meanwhile, a new isolated session is created for the sub-agent with its own context window and token budget.

When the sub-agent finishes, it “announces” its findings back to the main conversation — a natural-language summary appears in your chat.

Here’s a practical example. You message your agent on Telegram:

“Research the latest Node.js release notes, compare our dependencies against the changelog, and draft a migration plan.”

The orchestrator might: 1. Spawn a Research sub-agent (using GPT-4o, which is fast and cheap) to fetch the release notes 2. Spawn a Code Analysis sub-agent (using Claude Opus, which excels at reasoning about code) to review your package.json against the changelog 3. Wait for both to report back, then synthesize a migration plan itself

All three can use different models — you pay premium rates for tasks that need deep reasoning and economy rates for straightforward web lookups [15].

Multi-Agent Delegation

The Hub-and-Spoke Pattern

The community has formalized this as a “hub-and-spoke” architecture: one orchestrator agent connected to specialized sub-agents, each with its own workspace, tools, and personality [16]. In one popular configuration, the orchestrator (“Seth”) delegates to a Coder agent (“Linus”) that runs exclusively in Docker, a Research agent (“Finch”), and a Cron agent (“Otto”) that handles scheduled tasks [16].

Each sub-agent can have:

  • Its own model. Route expensive reasoning to Opus, routine tasks to Flash.
  • Its own sandbox. Sub-agents can run in Docker containers for security isolation.
  • Its own tool restrictions. A research agent gets browser access but not file write. A coding agent gets exec but not messaging.
  • Its own SOUL.md personality. A writing agent can have a different tone than a coding agent.

The system has deliberate safety constraints: sub-agents cannot spawn their own sub-agents (preventing runaway fan-out), and there’s a configurable maxConcurrent limit to control costs [14]. Each sub-agent’s session is automatically archived after a configurable period.

There’s also an active RFC for “Agent Teams” — a more sophisticated coordination model where agents can communicate directly with each other (not just through the parent), share task lists with dependencies, and claim work from a shared queue [17]. This would bring OpenClaw to feature parity with proprietary multi-agent systems while preserving what makes it unique: channel-native delivery, per-agent sandboxing, and open-source extensibility.

Memory: Plain Text That Remembers Everything

Most AI systems either have no persistent memory or rely on opaque vector databases that you can’t inspect or edit. OpenClaw takes a radically different approach: plain Markdown files as the source of truth [18].

The workspace (typically ~/.openclaw/workspace/) contains several key files:

SOUL.md defines the agent’s personality, values, and behavioral guidelines. Think of it as the agent’s character sheet — who it is, how it communicates, what it cares about [19].

USER.md stores your preferences and personal context. “I prefer morning meetings.” “I use TypeScript, not JavaScript.” “My kid’s school pickup is at 3:15.” You can edit this file directly, or tell the agent to update it for you [19].

MEMORY.md is the long-term curated memory. Stable facts that should persist for months — project decisions, contact information, key preferences. This file should stay relatively small and calm [18].

memory/YYYY-MM-DD.md files are daily journals. Append-only logs of what happened in each session — topics discussed, decisions made, tasks completed. When a new session starts, the agent reads today’s notes plus yesterday’s for short-term continuity [18].

Transcripts store full session histories and are indexed for search [20].

~/.openclaw/workspace/
├── SOUL.md
├── USER.md
├── MEMORY.md
├── memory
│   ├── 2026-02-10.md
│   ├── 2026-02-11.md
│   └── 2026-02-12.md
└── transcripts
    ├── session-101.json
    └── session-102.json

The genius of this approach is transparency. Every piece of information the agent “remembers” is in a file you can open in any text editor. You can read it, edit it, delete it, put it in Git and track changes. There’s no proprietary database format, no vendor lock-in. If you switch from OpenClaw to something else, your data comes with you as plain text files.

Semantic Search Under the Hood

Plain files alone would break down at scale — you can’t grep through six months of daily notes to find “what we decided about the API migration in October.” So OpenClaw layers a search system on top.

The index lives in a per-agent SQLite database at ~/.openclaw/memory/{agentId}.sqlite [20]. It uses a hybrid search approach with weighted score fusion:

Vector search (70% weight) uses cosine similarity with embeddings stored via the sqlite-vec extension. This handles conceptual matches where wording differs — “gateway host” matches “the machine running the server” [21].

BM25 keyword search (30% weight) uses SQLite’s FTS5 full-text search for exact token matching — error codes, function names, environment variable names, things where semantic similarity would fail [21].

Critically, OpenClaw uses union, not intersection: results from either search method contribute to the final ranking. A chunk that scores high on vectors but zero on keywords is still included [21].

The chunking strategy is a sliding window with overlap preservation (~400 token chunks, 80 token overlap), and the system tracks the embedding provider and model per chunk so that switching providers triggers an automatic reindex [20].

Memory Architecture

The Pre-Compaction Flush

There’s one more clever trick. AI models have context window limits — when the conversation gets too long, older messages need to be “compacted” (summarized and discarded). This is where memories would normally be lost.

OpenClaw intercepts this moment. Before compaction happens, it triggers an automatic “agentic turn” that prompts the model to save anything important to today’s daily notes [22]. Information that would otherwise vanish gets preserved in the file system. You can have marathon sessions without losing important details — the files catch what the context window cannot hold.

How OpenClaw Compares to Other Frameworks

Here’s where we need to be precise, because this question comes up constantly: “Why not just use LangChain? Or CrewAI? Or AutoGen?”

The answer is that they’re solving fundamentally different problems. Let’s explain.

OpenClaw

LangChain / LangGraph

CrewAI

AutoGen

 

What it is

Complete personal assistant

Developer toolkit / library

Multi-agent orchestration framework

Conversational multi-agent framework

User interface

WhatsApp, Telegram, Slack, etc.

None (code-only)

None (code-only)

None (code-only)

Target user

End user (via messenger)

Python developer

Python developer

Python developer

Runs as

24/7 server (Gateway)

Library in your app

Library in your app

Library in your app

Memory

Built-in (Markdown + vector search)

BYO (add-on modules)

Role-based with RAG

Conversation-based

Self-improvement

✅ Writes own skills

❌ Capabilities are code

❌ Capabilities are code

❌ Capabilities are code

Multi-agent

Sub-agents + Agent Teams (RFC)

Graph-based workflows

Role-based crews

Conversation-based

Multi-channel

10+ channels native

❌ Not applicable

❌ Not applicable

❌ Not applicable

Scheduling / Cron

✅ Built-in heartbeat

❌ External

❌ External

❌ External

Open source

MIT

MIT

Apache 2.0 + paid platform

MIT

Maturity

Young (Nov 2025)

Mature (2022)

Moderate (2023)

Moderate (2023)

Here’s a useful way to think about it: LangChain, CrewAI, and AutoGen are for developers building AI-powered applications. OpenClaw is the application.

If you’re a developer building a customer support bot or a data analysis pipeline, you’d use LangChain or CrewAI as libraries inside your own codebase. You control the architecture, the interface, the deployment.

If you want a personal AI assistant that lives in your WhatsApp, remembers your preferences, runs 24/7, can learn new skills through conversation, and proactively reaches out to you when something needs attention — that’s OpenClaw’s territory. Nobody else occupies this exact niche: a complete, self-hosted, self-improving personal agent with native messaging integration.

The Tesla analogy works here. LangChain is like buying an engine, a chassis, and wheels separately — maximum flexibility, but you’re building the car yourself. OpenClaw is the finished car. You can still pop the hood and modify everything (it’s MIT-licensed), but most people just want to drive.

Security: The Elephant in the Server Room

We’d be doing a disservice not to address this directly, because the technical architecture that makes OpenClaw powerful also creates real security challenges.

The attack surface is inherent to the design. An agent that can read emails, browse the web, execute shell commands, and send messages has access to everything that matters on your system. Palo Alto Networks described this as a “lethal trifecta” — high autonomy, broad system access, and open internet connectivity [23].

CVE-2026-25253 (CVSS 8.8) demonstrated the risk concretely: a malicious website could steal authentication tokens and achieve remote code execution through a single link [24]. Cisco’s research team found a third-party skill performing data exfiltration without user awareness [25]. The skill ecosystem, by design, allows anyone to publish skills that the agent will execute — and the vetting is minimal.

OpenClaw’s security model is defense-in-depth:

  • DM pairing restricts which phone numbers / accounts can interact with the agent
  • Sandbox modes can run tool execution in Docker containers
  • Tool allowlists and denylists restrict what each agent can do (e.g., research agent gets browser but not file write)
  • Exec approval gates require human confirmation for dangerous operations
  • Per-agent isolation gives each agent separate workspaces, auth profiles, and session stores [26]

But here’s the honest truth: defense-in-depth only works if you actually configure it. The default setup is permissive. One of OpenClaw’s own maintainers warned that if you can’t understand command-line operations, “this is far too dangerous of a project for you to use safely” [25].

This is a solvable problem — but it’s an infrastructure problem, not a code problem. You need proper network isolation, secret management, firewall rules, monitoring, and update processes. This is exactly the kind of operational work that GoClaw handles: every instance runs in a fully isolated Google Cloud environment with encrypted secrets, network policies, and managed updates [27].

Why This Architecture Matters

Let’s zoom out for a moment and explain why we’ve spent 4,000 words on architecture, because this isn’t just academic.

The Gateway model is what GoClaw is built on. Because OpenClaw is a server, not a script, it maps naturally to cloud infrastructure. One Gateway per customer. One isolated environment per Gateway. Kubernetes handles scheduling, persistent volumes handle state, network policies handle isolation. The architecture was practically designed for multi-tenant cloud hosting — even though that wasn’t Steinberger’s intention.

The skill system is what makes agents useful beyond demo day. Most AI agent demos are impressive for five minutes and useless for five weeks. The reason is that real life is full of edge cases, custom workflows, and proprietary tools. Skills let you teach the agent about your world — your Slack channels, your database schemas, your reporting conventions — without waiting for a product team to add support. And the agent can teach itself.

The memory architecture is what makes agents feel like colleagues. Without persistent memory, every conversation starts from zero. You re-explain your preferences, re-establish context, re-describe your projects. With OpenClaw’s memory system, after two weeks of use, the agent knows your patterns, your preferences, your projects. It remembers that the Johnson project deadline moved to March. It remembers you prefer Bash over Python for quick scripts. It remembers your kid’s name. Not because it’s surveilling you — because you told it, and unlike your human colleagues, it actually wrote it down.

The multi-agent system is what makes complex work possible. Single-agent systems hit context limits. Multi-agent systems parallelize work, specialize by task, and optimize cost by routing different work to different models.

Put it all together, and OpenClaw isn’t just a clever hack or a viral toy. It’s a genuinely new architecture for personal computing — one where the interface is conversation, the capabilities are extensible by the user, the memory is persistent, and the system improves itself over time.

Whether this particular project survives (especially with Meta and OpenAI circling), the architecture is out in the open, MIT-licensed, and already being studied, forked, and extended by thousands of developers worldwide. The genie is out of the bottle. 🦞

👉 Don’t want to build any of this yourself? GoClaw provisions a fully configured, secured, always-on OpenClaw instance in under 60 seconds. Your agent, your memory, your skills — our infrastructure.

This is the second article in a series about the AI agent revolution. Previous: “The Lobster That Broke the Internet” — the story of how OpenClaw went from a one-hour prototype to 180,000 GitHub stars. Next up: “The New Cathedrals (And the Lobster Inside)” — what AI agents really cost to run, from the $135 billion datacenters to the tokens on your bill.

References

[1] Jonathan, “Stop. Do Not Touch OpenClaw Until You Read This,” Substack, Feb 2026. Link

[2] OpenClaw Documentation: Gateway. Link

[3] OpenClaw codebase analysis. GitHub repository: ~3,400 TypeScript files, monorepo structure. Link

[4] awesome-openclaw, GitHub: “179,000+ stars | 10+ messaging channels | 50+ built-in skills.” Link

[5] OpenClaw Documentation: Skills. Link

[6] BetterLink Blog, “OpenClaw Custom Skill Development: A Complete Guide,” Feb 2026. Link

[7] OpenClaw Skills Tutorial, Lilys.ai: “Skills utilize progressive disclosure, meaning OpenClaw only loads the name and description of each skill when it starts.” Link

[8] OpenClaw Docs: Skills — “Per skill: 97 characters + field lengths. ~4 chars/token → ~24 tokens per skill.” Link

[9] DeepWiki: Skills System — “‘Never read more than one skill upfront; only read after selecting.’” Link

[10] @pranavkarthik__ testimonial, OpenClaw website. Link

[11] @jonahships_ testimonial, OpenClaw website. Link

[12] OpenClaw community testimonials. Link

[13] Lex Fridman, “#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger,” Feb 11, 2026. Link

[14] OpenClaw Documentation: Sub-Agents. Link

[15] Zen van Riel, “OpenClaw Sub-agents and Parallel Task Execution Guide,” Feb 2026. Link

[16] TheSethRose, “OpenClaw-Advanced-Config: Hub-and-spoke multi-agent architecture,” GitHub. Link

[17] “RFC: Agent Teams — Coordinated Multi-Agent Orchestration,” OpenClaw GitHub Discussion #10036. Link

[18] LumaDock, “How OpenClaw Memory Works and How to Control It.” Link

[19] Zen van Riel, “OpenClaw Memory Architecture — Daily Notes and Long-Term Memory,” Feb 2026. Link

[20] PingCAP, “Local-First RAG: Using SQLite for AI Agent Memory with OpenClaw,” Feb 2026. Link

[21] Shivam Agarwal, “How OpenClaw’s Memory System Actually Works,” Medium, Feb 2026. Link

[22] MMNTM, “How OpenClaw Implements Agent Memory: A Code Walkthrough,” Feb 2026. Link

[23] “OpenClaw: From Side Project to 145K GitHub Stars,” LearnDevRel, Feb 2026. Link

[24] CVE-2026-25253, CVSS 8.8. Disclosed by Mav Levin (depthfirst). Link

[25] OpenClaw, Wikipedia. Cisco security finding and Shadow’s Discord warning. Link

[26] OpenClaw Documentation: Multi-Agent Configuration. Link

[27] GoClaw — the fastest way to run OpenClaw. Link

Get your personal AI superpowers with GoClaw

Still have questions?

Reach out to us at support@goclaw.io