When ChatGPT first launched, it was mostly a text generator. You asked a question, and it predicted the next word. It was a brain in a jar.
But today, if you use ChatGPT or Gemini, you notice they can browse the internet, generate images, or run Python code to analyze data. They have evolved from simple models into AI Agents.
But how does that actually work? How does a text model suddenly learn to "click" a button or "edit" a file?
This is where reasoning and planning happen. The model decides what needs to be done.
This is how the agent acts. A raw LLM cannot browse the web, read files, or run code on its own. Tools are callable functions like web search, file access, or code execution which give the model a way to interact with the outside world.
An agent’s capabilities are defined entirely by its tools. An AI Agent is essentially a program where LLM outputs orchestrate tool execution.
The core intelligence that analyzes problems and decides what to do next.
These allow the agent to interact with files, the web, databases, or code execution environments.
These guidelines keep the agent focused, safe, and aligned with its intended purpose.
That’s the entire system: a model that thinks, tools that act, and instructions that keep it on track.
We classify agents based on how much autonomy they have:
Workflows are systems where the path is pre-defined. It is like a train on a track. The code says: "First do A, then do B, then do C." The LLM just helps along the way.
Agents are dynamic. It is like driving a car off-road. The LLM is in the driver's seat. It decides: "The road is blocked here, so I will try this other route." LLMs maintain control over how they achieve the given task.
How does an AI Code Agent go from a request like "Edit the is_prime() function in test.py" to actually finishing the task?
Here’s how the step-by-step journey of a task in an AI Code Editor looks:
The agent starts with the user's goal. But it’s blind. It needs to see where it is.
It selects a tool to list the files in the directory and discovers test.py.
The agent thinks: "Okay, I see the file. Now I need to read it to understand the code before I edit it."
When the LLM decides to use a tool, it doesn't execute anything directly. Instead, it outputs structured JSON like:
{ "tool": "read_file", "arguments": { "path": "test.py" } }
The orchestration layer parses this output, executes the actual function, and feeds the result back to the model. The LLM only speaks — the environment acts.
Now the agent has the file content in its memory. It looks at the code, finds the function, and plans the specific edit. It then calls the edit_file tool.
This loop continues until the task is complete. The agent uses its memory to store what it has done in previous steps to make better decisions next time.
Why do complex agents consume so much computational power and tokens? For a human, editing a line is quick. For an agent, it’s a multi-step process:
In every step, the agent must process the entire history because LLMs are stateless. Each step includes all previous context, tool calls, and results.
A simple four-step task can consume thousands of tokens, even if the final edit is only a single line of code.
This is why even simple agentic tasks are computationally heavy. The agent continuously maintains world state, re-reads context, and ensures nothing breaks.
For a code-editing agent, the Body usually includes a small, well-defined set of tools:
The LLM does not manipulate files directly. It acts as the decision-maker, selecting tools and supplying arguments. The environment executes the actions and returns results to the model.
We are moving toward a world where we don’t just chat with AI — we collaborate with it. Whether it’s ChatGPT searching the web or a code editor fixing a bug, the underlying architecture is the same.
It is a Brain (LLM) using Tools (Functions) to interact with an Environment, constantly looping and iterating until the job is done.
Once you understand this structure, AI agents become clear: they are systems of logic, memory, and action working together to extend human capability.