Anthropic: Stop Building Agents, Build Skills Instead
A great AIECS session from Barry Zhang & Mahesh Murag, the creators of Anthropic Agent Skills.

Last week, the news cycle1 collided with my post on Claude Agent Skills. This week, perhaps I’m having better luck: the lead session of the CODE track at last week’s AI Engineer Code Summit was Anthropic’s Barry Zhang & Mahesh Murag speaking on that exact topic.
15 Minutes of Solid Gold: The amount of information that Zhang and Murag fit into their 15 minute slot was incredible. As someone deeply engaged in the Anthropic / Claude / Claude Code / Skills ecosystem, I wanted to wring every drop out of this session. It took several viewings to get it all, but was well worth the effort. Enjoy!
A Year of Agents, and a Crucial Hint
To kick off the talk, Zhang looks back at key events in Anthropic’s agent world from February through today … this timeline blows my mind because it feels, for example, like Skills were announced at least six months ago. Nope: five weeks.
A lot of things have changed since our last talk. MCP became the standard for agent connectivity. Claude Code, our first coding agent, launched to the world, and Claude Agent SDK, now provides a production-ready agent out of the box.
I have been deep enough in this agent world to know the pain points, and this next statement (along with yesterday’s release of Opus 4.5 and much more2) gives me hope that Anthropic is contining to march in the right direction:
We have a more mature ecosystem, and we’re moving towards a new paradigm for agents. That paradigm is a tighter coupling between the model and the runtime environment.
When the model is tightly coupled with a well-equipped runtime—something we’re already beginning to see—magical things become possible.
Old and New Ways of Thinking About Agents
Zhang explains how Anthropic’s view of agents was transformed based on what they learned with Claude Code:
We used to think agents in different domains will look very different. Each one will need its own tools and scaffolding, and that means we’ll have a separate agent for each use case for each domain.
But then they had this insight:
Well, customization is still important for each domain. The agent underneath is actually more universal than we thought. What we realized is that Code is not just a use case, but a universal interface to the digital world. After we built Code, we realized that Code is actually a general-purpose agent.
Think about generating a financial report. The model can call the API to pull in data and do research. It can organize that data in the file system. It can analyze it with Python and synthesize the insights in old file format all through code. The core scaffolding can suddenly become as thin as just bash and file system.
But:
[this] is great and really scalable—but we very quickly run into a different problem.
The Missing Link: Domain Expertise
Given a powerful, general purpose agent, the world’s the limit. Except for the small issue of domain expertise:
That problem is domain expertise. Who do you want to do your taxes? Is it going to be Mahesh, the 300 IQ mathematical genius, or is it Barry, an experienced tax professional? I would pick Barry every time. I don’t want Mahesh to figure out the 2025 tax code from First Principles, I need consistent execution from a domain expert.
Agents today are a lot like Mahesh. They’re brilliant, but they lack expertise … They can do amazing things when you really put in effort and give proper guidance, but they’re often missing the important context up front. They can’t really absorb your expertise super well, and they don’t learn over time.
That’s why we created Agent Skills.
Skills are all about encapsulating domain expertise.
Defining Skills
I love how Anthropic defines Skills:
They’re as simple as a single markdown file with a tiny bit of front matter. The “complex” example above has a main SKILL.md, a couple of reference files, and a Python script. Zhang goes on to say:
This simplicity is deliberate. We want something that anyone, human or agent, can create and use as long as they have a computer.This also works with what you already have. You can version them in Git, you can throw them in Google Drive, and you can zip them up and share it with your team. We have used files as a primitive for decades, and we like them, so why change now?
Code Can Be Better Than Tools
Agents have traditionally been tool-users. But tools have significant drawbacks:
Traditonal tools have a lot of problems: poorly written instructions, are pretty ambiguous. When model is struggling, it can’t really make a change to the tool. Always live in the context window.
Code, on the other hand, is easily understood, emminently malleable, and context-friendly:
Code solves some of these issues: It’s self-documenting, it is modifiable and can live in the file system until they are really needed and used.
The “solves some of these issues” here is a bit humorous: this talk happened just days before Anthropic announced Advanced Tool Use, which solves several drawbacks of tools, while at the same time defining the code-tool connection.
From Context-starved to Context-friendly
My biggest pain point across the Claude family has been context exhaustion: running out of context too quickly, and failure to handle that scenario gracefully. Skills are a significant part of the solution, because they’re progressively disclosed:
Skills can contain a lot of information, and we want to protect the context window so that we can fit in hundreds of skills and make them truly composable. That’s why skills are progressively disclosed. At runtime, only [the brief front matter] metadata is shown to the model, just to indicate that it has this skill. When an agent needs to use a skill, yoitu can read in the rest of the SKILL.md, which contains the core instructiond and directory for the rest of the folder. Everything else is just organized for ease of access.
So that’s all skills are. They’re organized folders with scripts as tools.
Anthropic’s latest flurry of announcements—Opus 4.5, Advanced Tool Use, and new context management features—join with Skills as part of the solution to context starvation. I cut over to Opus 4.5 in Claude and Claude Code within minutes of its release, and so far context pain has been completely absent.
The Skills Ecosystem
When I was researching for the post Claude Skills in Claude Code: A Compleat Guide, my most exciting discoveries related to the existence of a well-structured and thought-out Skills ecosystem.
Since our launch five weeks ago, this very simple design has translated into a very quickly growing ecosystem of thousands of skills. And we’ve seen this be split across a couple of different types of skills. There are foundational skills, third-party skills created by partners in the ecosystem, and skills built within an enterprise and within teams.
To start, foundational skills are those that give agents new general capabilities or domain-specific capabilities that it didn’t have before. We ourselves, with our launch, built document skills that give Claude the ability to create and edit professional-quality office documents. We’re also really excited to see people like Cadence built scientific research skills that give Claude new capabilities like EHR data analysis and using common Python bioinformatics libraries better than it could before.
In the partner category, the new skill from Browserbase gets me very excited, because, “look mom, no MCP!”:
We’ve also seen partners in the ecosystem build skills that help Claude better with their own software and their own products. Browserbase is a pretty good example of this. They built a skill for their open-source browser automation tooling, Stagehand. And now Claude [is] equipped [with] this skill and with Stagehand can now go navigate the web and use a browser more effectively to get work done.
Murag goes on to describe how enterprises are using skills, both outside and inside engineering. He emphasizes that:
Finally, and I think most excitingly for me personally, is we’re seeing skills that are being built by people that aren’t technical. These are people in functions like finance, recruiting, accounting, legal, and a lot more. And I think this is pretty early validation of our initial idea that skills help people that aren’t doing coding work extend these general agents and they make these agents more accessible for the day-to-day of what these people are working on.
Emerging Architecture of General Agents
When I first glanced at this slide, I thought, yeah, our old friend, the agentic loop. But wait a minute, what’s that thing with code brackets?
Yeah, upon closer inspection, this is definitely not our old OODA agentic loop. Murag says:
So tying this all together, let’s talk about how these all fit into this emerging architecture of general agents. First, we think this architecture is converging on a couple of things. The first is this agent loop that helps manage the model’s internal context and manages what tokens are going in and out. And this is coupled with a runtime environment that provides the agent with a file system and the ability to read and write code.
That code bracket thingy is a runtime environment, with a file system and the ability to read and write code. That’s new.
Murag then further expands the architecture, with MCPs and Skills:
This agent, as many of us have done throughout this year, can be connected to MCP servers. And these are tools and data from the outside world that make the agent more relevant and more effective.
And now we can give the same agent a library of hundreds or thousands of skills that it can decide to pull into context only at runtime when it’s deciding to work on a particular task.
The implication:
Today, giving an agent a new capability in a new domain might just involve equipping it with the right set of MCP servers and the right library of skills.
Anthropic is eating their own dogfood here:
This emerging pattern of an agent with an MCP server and a set of skills is something that’s already helping us at Anthropic deploy Claude to new verticals. Just after we launched skills five weeks ago, we immediately launched new offerings in financial services and life sciences. And each of these came with a set of MCP servers and a set of skills that immediately make Claude more effective for professionals in each of these domains.
Skills Evolution
Skills are simple, but they represent immense value and leverage. Murag points out that Skills deserve the same attention and care we apply to software to evaluate and optimize, to make sure we get the most out of them.
He explains
As [skills] start to become more complex, we really want to support developers, enterprises, and other skill builders by starting to treat skills like we treat software. This means exploring, testing, and evaluation. Better tooling to make sure that these agents are loading and triggering skills at the right time and for the right task. And tooling to help measure the output quality of an agent equipped with the skill to make sure that’s on par with what the agent is supposed to be doing.
We’d also like to focus on versioning. As a skill evolves and the resulting agent behavior evolves, we want this to be clearly tracked and to have a clear lineage over time.
And finally, we’d also like to explore skills that can explicitly depend on and refer to either other skills, MCP servers, and dependencies and packages within the agent’s environment … The composability of multiple skills together will help agents like Claude elicit even more complex and relevant behavior from these agents.
To me, these comments (reinforced by Anthropic’s series of announcements since) increase my confidence that this architecture is worth following.
Compounding Value
When people invest time to leverage themselves and other in the organization through Skills-driven agents, the result is compounding growth of that leverage.
As Murag puts it:
The vision that excites us most is one of a collecting and collective and evolving knowledge base of capabilities that’s curated by people and agents inside of an organization. We think skills are a big step towards this vision. They provide the procedural knowledge for your agents to do useful things, and as you interact with an agent and give it feedback and more institutional knowledge, it starts to get better and all of the agents inside your team and your org get better as well.
And when someone joins your team and starts using Claude for the first time, it already knows what your team cares about. It knows about your day to day, and it knows about how to be most effective for the work that you’re doing. And as this grows and this ecosystem starts to develop even more, this compound value is going to extend outside of just your org and into the broader community.
Toward Continuous Learning
Zhang returns to speak about the exponential effect that results from the agent starting to write skills too.
He explains:
This vision of an evolving knowledge base gets even more powerful when Claude starts to create these skills. We designed skills specifically as a concrete step towards continuous learning. When you first start using Claude, this standardized format gives a very important guarantee: anything that Claude writes down can be used efficiently by a future version of itself. This makes the learning actually transferable.
Claude can acquire new capabilities instantly, evolve them as needed, and then drop the ones that become obsolete. This is what we have always known. The power of in-context learning makes this a lot more cost effective for information that change on a daily basis. Our goal is that Claude on day 30 of working with you is going to be a lot better on Claude on day one. Claude can already create skills for you today using our
skill-creatorskill, and we’re going to continue pushing in that direction.
Skills As Application Layer
The final thought that Zhang and Murag leave us with: comparing Models, Agents, and Skills with the Processor - Operating System - Application stack:
In a rough analogy, models are like processors. Both require massive investment and contain immense potential, but are only so useful by themselves.
The operating system made processors far more valuable by orchestrating the processes, resources, and data around the processor. In AI, we believe that agent runtime is starting to play this role. We’re all trying to build the cleanest, most efficient, and most scalable abstractions to get the right tokens in and out of the model.
But once we have a platform, the real value comes from applications. Millions of developers like us have built software that encode domain expertise at our unique points of view. We hope that skills can help us open up this layer for everyone.




















