Everyone’s Talking About AI Agents. But What Are They, Really?
AI agents are everywhere in the news recently. We hear about groundbreaking new “software engineers” like Devin, “general-purpose agents” like Manus, the rise of “digital employees,” and the ever-present “conversational agents” (something I happen to be working on). They are all called “agents,” but what’s the real difference between them?
Currently, I think it helps to think about them like we think about human occupations. Just as people specialize in different fields, different types of AI agents are designed to excel at different tasks. Each “job” requires a unique blend of skills:
- Salesperson: Thrives on interpersonal relationships and stellar communication.
- Software Engineer: Needs strong, logical reasoning and problem-solving skills.
- Product Manager: Balances communication with deep informational synthesis.
If you think about it this way, the priorities become clear. It’s often okay for a software engineer to miss a Slack message, but it’s critical for a salesperson to follow up and communicate effectively. The same principle applies to AI agents; their “occupation” dictates what they need to be good at.
The Anatomy of an Agent
At its core, an agent is a system that perceives its environment and acts upon it to achieve a goal. This creates a fundamental loop:
- Observation (or Perception)
- Reasoning (or Cognition, Planning, Decision-Making)
- Action (or Tool Use)

This framework is remarkably similar to the “perception-planning-control” decomposition used in self-driving cars. An autonomous vehicle first perceives the road and obstacles, then plans a path, and finally controls the steering and acceleration to execute that path. Similarly, an AI agent observes user requests and data, reasons about the best course of action, and then acts by using tools or generating a response.
A Deep Dive Into Cognition
For me, the Large Language Model (LLM) is the heart of the Reasoning engine. In a single pass, my mental model is that an LLM reasons based on the context it’s given. The LLM’s internal knowledge (its pre-trained “common sense”) is both a powerful resource and a potential distraction. For the reasoning to be correct, we make a big assumption: that the necessary knowledge is available, either from its internal state or the provided context.
Let’s take a simple example: asking an LLM to calculate 12 + 23.
- Semantic knowledge is the understanding of what addition is.
- Procedural knowledge is the “how-to” of performing addition, the step-by-step process.
- Episodic knowledge might be recalling a specific, previous example of addition it has seen.
To solve 12 + 23, the LLM could reason using any of these. Since addition is a well-defined procedure, the knowledge could come from the LLM’s training data or be explicitly provided in the context (the prompt).
Case Studies: A Tour of Different Agent Occupations
General-Purpose Agent
When we say “general-purpose,” we mean an agent that can tackle tasks from different “occupations,” much like a versatile human freelancer. Take Manus, for example, which was developed to handle complex real-world tasks autonomously, from writing and deploying code to analyzing data and generating reports.
The main challenge for these agents isn’t just reasoning; it’s the breadth of actions they can perform. To be truly general-purpose, an agent needs a vast toolkit—browsers, code editors, API connectors, and more—and the intelligence to know which tool to use when. The push for more robust reasoning and perception in these agents is driven by the need to handle the unpredictable nature of these diverse tasks.
Deep Research Agent
A deep research agent’s “job” is to dive into a topic and deliver a comprehensive report. This isn’t just a simple web search. As seen in features from OpenAI, Google, and others, this process is a multi-agent workflow.
The process typically looks like this:
- Planner Agent: First, a planner agent analyzes the research query and breaks it down into a series of smaller, specialized research tasks.
- Sub-Agents for Parallel Research: Multiple “sub-agents” then work in parallel, each tackling one of the sub-tasks by searching the web, databases, and other sources.
- Writer Agent: Once the research is complete, a writer agent synthesizes all the findings into a single, coherent report.
- Verification and Iteration: Some advanced systems even have a loop where the agent analyzes its own report for gaps and can trigger another round of research to improve the quality.
The key here is not just finding information but structuring the inquiry, synthesizing disparate sources, and presenting the result in a useful format.
Customer Support Agent
For a customer support agent, the most critical skill is following specific instructions to resolve customer issues effectively and empathetically. These agents are becoming increasingly sophisticated, moving beyond simple, scripted chatbots.
Modern customer support agents, like those offered by platforms such as ElevenLabs, Bland AI, or Decagon, can:
- Understand Natural Language: They use Natural Language Processing (NLP) to grasp the customer’s intent and sentiment, even if it’s phrased poorly.
- Automate Repetitive Tasks: They handle the bulk of routine inquiries like order status checks or password resets, which can make up 60-80% of support tickets.
- Integrate with Tools: They connect to back-end systems like CRMs to process refunds, update customer records, or schedule appointments.
- Escalate Intelligently: When an issue is too complex, the agent can seamlessly escalate it to a human agent, providing the full context of the conversation so the customer doesn’t have to repeat themselves.
The goal is to provide fast, accurate, and 24/7 support, freeing up human agents to focus on the most complex and sensitive customer problems.
Conclusion
So the agent types are different, they focus on different aspects of the cognition parts.