Skills vs Agents: What the Industry Got Right

Skills vs Agents: What the Industry Got Right
Most AI agent projects fail. Not because the models aren't smart enough, but because developers keep solving the wrong problem.
I've watched this pattern repeat for months now. Someone builds a custom agent for their specific workflow. It works beautifully in demos. Then they try to extend it, share it with their team, or adapt it to a slightly different use case—and the whole thing falls apart. Giant system prompts. Context window explosions. Bespoke code or subagents that don't compose well with anything else.
Then Anthropic released their "Skills" feature for Claude Code, and something clicked.
The insight isn't technical. It's architectural. And it's one of those ideas that feels obvious in retrospect: stop building agents, start building skills.
The Problem With Agents
The AI community has been obsessed with agents for the past year. 2025 was "the year of the agent". And for good reason—the idea of autonomous systems that can reason, plan, and execute is compelling. But there's a gap between the vision and what actually ships.
Most agent implementations I've seen (including some of my own experiments) suffer from the same structural issues:
Monolithic system prompts. Every capability gets crammed into one giant context. The agent "knows" everything from the start, which sounds powerful until you realize you're burning 30% of your context window before the user types anything.
Bespoke everything. Each agent is a snowflake. Want to add a new capability? Rewrite the core loop. Want to share it with your team? Good luck explaining how it works.
Poor composition. Agents don't combine well. You end up with meta-agents orchestrating agents orchestrating agents, and nobody can debug why the output is wrong.
The fundamental issue is that most agent architectures conflate two different concerns: the capability (what the agent can do) and the runtime (how the agent executes). When you bundle them together, you get systems that are powerful but brittle.
I wrote about a specific manifestation of this in The Hidden Cost of MCP Servers—how even well-intentioned tooling creates context overhead that compounds with every capability you add. But that was about symptoms. The skills architecture addresses the root cause.
Anthropic's Insight: Separate Expertise From Execution
At a recent AI Engineer conference, Barry Zhang and Mahesh Murag from Anthropic laid out a different mental model. They described the emerging enterprise AI stack with an analogy that stuck with me:
- Model = CPU (the raw compute)
- Agent runtime = OS (code execution + filesystem)
- MCP servers = Device drivers (connectivity to external systems)
- Skills = Applications (packaged expertise)
The key insight is that skills are just folders. They're not special infrastructure. They're markdown files, scripts, and assets organized in a directory structure that agents can read and execute.
This sounds almost too simple. But that simplicity is the point.
A skill contains:
- A description (when to use it, what it does)
- Detailed instructions (the actual workflow)
- Optional scripts and tools (deterministic operations)
- Example inputs and outputs (for calibration)
When an agent encounters a problem, it reads the skill's description. If it decides the skill is relevant, it loads the full instructions. If those instructions reference scripts, it runs them.
This is progressive disclosure applied to AI context. Only load what you need, when you need it. The token efficiency gains are real—but more importantly, the architecture suddenly makes sense.
Progressive Disclosure: The Pattern That Keeps Appearing
Every serious AI practitioner I've talked to has independently discovered some version of this pattern.
Kenny Liao calls it "context engineering"—the idea that most agent failures aren't model failures, they're context failures. You're feeding the model too much, too soon, with too little structure.
IndyDevDan structures his skills like mini-products: a pivot file (the index), a tools directory, a cookbook of conditional instructions, and a clear outcome. "Plan on paper before touching an agent," he says. The planning forces you to decide what's core vs. what's conditional.
Daniel Miessler built an entire personal AI infrastructure around this principle. His system (which he's open-sourced as "Pi") treats skills as modular, composable units with their own workflows and deterministic tools. His mantra: "Clear thinking becomes clear writing, and clear writing is essentially what prompting is."
What they're all describing is the same underlying architecture, discovered through trial and error.
Deterministic vs. Opportunistic: Know the Trade-off
There's a critical distinction that's easy to miss when first working with skills.
Skills are opportunistic. The agent decides when to use them based on the description and the current context. You can't force a skill to run—you can only make it available and hope the agent picks it up.
This is fundamentally different from:
- Commands (explicitly invoked, guaranteed execution)
- Memories/Standards (always present in context)
- Subagents (isolated execution with their own context)
Brian Casel makes this point clearly: "Skills trade predictability for context efficiency. They're there if Claude decides it needs them."
This means skills are poorly suited for things that must happen every time—coding standards, architecture patterns, safety checks. Those belong in deterministic channels. Skills shine for well-defined but occasionally-needed capabilities: generating commit messages, formatting reports, analyzing data files.
The mistake is treating skills as a replacement for everything. They're one tool in a toolbox that now includes multiple levels of determinism and context management.
My Implementation: A Case Study
I've been building what I call my "digital self"—a system of commands, skills, and memory files that help me manage work across multiple roles. Without consciously following Anthropic's architecture, I ended up with something similar:
.claude/
├── commands/ # Deterministic (explicitly invoked)
│ ├── daily-brief.md
│ ├── shutdown.md
│ └── content/
│ └── express-mode.md
│
├── skills/ # Opportunistic (agent decides)
│ └── voice-formatter/
│ ├── SKILL.md
│ ├── core.md
│ └── website.md
│
└── settings.json # Permissions and configuration
CLAUDE.md # Standards (always in context)
Commands are for workflows I want to invoke explicitly: my morning standup, end-of-day commit, content generation. They're reliable because I control when they run.
Skills are for capabilities I want available but not always active: applying my voice to content, formatting for specific platforms. The voice-formatter skill only loads its detailed guidelines when I'm actually editing an article.
The CLAUDE.md file holds standards that should always be present: project structure, frontmatter requirements, commit message format.
This layered approach—deterministic commands, opportunistic skills, persistent standards—has made the system both more powerful and easier to maintain than any single-agent approach I tried before.
The Real Lesson: Scaffolding Over Models
Here's the uncomfortable truth I keep coming back to: the scaffolding matters more than the model.
Daniel Miessler put it bluntly: "If I had to choose between the latest model with not very good scaffolding, or excellent scaffolding with a model from 18 months ago, I would definitely pick the latter."
This feels counterintuitive when new models drop every few weeks with benchmark improvements. But the benchmarks measure the model's raw capability, not what you can reliably get out of it in practice.
Well-designed context architecture—skills, commands, progressive disclosure, file-based organization—creates compounding value. Every skill you build works with the next model. Every workflow you encode survives API changes.
The model is becoming the commodity layer. The scaffolding is where your actual leverage lives.
A Word of Caution: We're Still Figuring This Out
I want to be honest about something: these patterns are emerging in real-time. We're all figuring out best practices as the ground shifts beneath us.
Six months ago, nobody was talking about skills vs. agents. Six months from now, there might be a new abstraction that makes this distinction feel quaint. The pace of change in AI tooling is genuinely unprecedented.
This is why I'm skeptical of building heavy abstractions around current models and tools. Every layer of indirection you add is a layer that might need to be rewritten when the platform evolves. The developers who are thriving aren't the ones with the most sophisticated agent frameworks—they're the ones who can adapt quickly because they kept things simple.
Skills as folders. Commands as markdown files. Context as a filesystem. These primitives are boring on purpose. They'll survive whatever comes next.
Practical Takeaways
These are the principles I wish I'd known before building three different agent implementations that all ended up deprecated. If you're building AI-assisted workflows, here's where I'd start:
1. Identify your most repeated workflow. What do you do manually every week that follows a predictable pattern? That's your first skill candidate.
2. Start with the structure. Before writing any prompts, decide: is this deterministic (should always run when invoked) or opportunistic (agent chooses when it's relevant)?
3. Plan on paper first. What are the inputs? What are the outputs? What scripts or tools support the workflow? Write this down before touching any AI tooling.
4. Keep skills small and focused. One skill, one job. If you're tempted to add "and also it should..." that's probably a second skill.
5. Test reliability before depending on it. Run the skill in realistic conditions. How often does the agent pick it up unprompted? If reliability matters, consider making it a command instead.
6. Embrace progressive disclosure. Don't load everything upfront. Write clear descriptions that help the agent decide when to load the full context.
The Future Isn't Better Agents
I think the agent hype cycle is starting to correct itself. Not because agents aren't valuable—they are—but because the framing was wrong.
The future isn't about building smarter, more autonomous agents. It's about building better primitives: skills that encode your expertise, tools that execute reliably, context systems that manage token budgets intelligently.
The agent is just the loop that composes these primitives. The real work is in what you feed it.
Anthropic seems to understand this. Their skills architecture isn't a feature—it's a philosophy. And it's one I'm betting on.
Next: Understanding the philosophy is step one. In Part 2, I'll show how to actually implement this—the composition patterns that emerged from building my digital-self system, and the over-engineering traps I fell into along the way.
I'm building these systems in public as CTO of Pitchello and author of Self Engineer. If you're experimenting with skills-based architectures, I'd love to hear what's working for you.

