Software architecture has long been rooted in object-oriented and, later, service-oriented paradigms. These models have helped teams build modular systems, isolating behavior into manageable services that communicate over well-defined APIs. As systems grew, microservices brought benefits like scalability and decoupling, but also introduced significant complexity in orchestration.
Today, we’re witnessing a fundamental shift. The growing influence of foundation models, particularly large language models (LLMs), is changing how we approach software design. These models aren’t just code libraries; they can understand context, reason about goals, and generate human-like responses. This has led to the rise of agent-oriented programming, where autonomous agents, not statically programmed services, drive system behavior. In this new paradigm, agents are constructed from language models, structured prompts, memory layers, and external tools.
What drives them is the cognitive loop: a cycle where an agent processes input, reasons over its state, takes actions using tools, and updates its memory. As small language models (SLMs) become more capable, this model is evolving to balance performance with flexibility and cost-efficiency.
The cognitive architecture#
At the core of a cognitive architecture is a language model, effectively the brain of the system. This model is responsible for interpreting input, reasoning about goals, and planning actions. But reasoning alone is not enough. Just like the human brain depends on sensory organs and muscles to perceive and act on the world, an intelligent agent must be able to access and manipulate external systems in a structured way. This is the essence of agentic AI: giving models the ability to act, not just think.
One emerging approach to enable this interaction is the Model Context Protocol (MCP), an open standard developed by Anthropic. MCP aims to provide a standardized interface through which models can retrieve contextual information and invoke tools in their environment. However, it’s important to note that MCP is still an early attempt, promising but not yet an established standard. It represents a broader effort across the AI community to define patterns and protocols that allow agents to interface safely and reliably with external components such as APIs, databases, and services.
In systems that use MCP or similar abstractions, the architecture separates reasoning from execution: the model focuses on understanding, planning, and decision-making, while dedicated tooling (like an MCP server) handles the actual execution of external operations. This creates a cognitive loop: the model observes inputs (from the user, sensors, or past interactions), interprets them using memory and reasoning, then takes action through tools that generate new inputs and continue the cycle.
The choice of model driving this architecture is essential. Large language models (LLMs) and small language models (SLMs) offer distinct trade-offs depending on the complexity of the task, the resource constraints, and the required level of reasoning. LLMs such as GPT-4, Claude, and Gemini are trained on massive corpora and exhibit broad generalization, abstraction, and conversational capabilities. They can manage multi-turn dialogues, resolve ambiguity, and reason across diverse domains.
However, they come at a high computational cost and typically require substantial infrastructure to operate efficiently. On the other hand, SLMs like DistilBERT, TinyLLaMA, and Phi-2 are optimized for speed and efficiency. They are lightweight, often open-source, and can be deployed on edge devices or environments with limited resources. While their reasoning capabilities are more narrow and their context windows smaller, they are highly effective for specialized, domain-specific tasks where determinism and performance are prioritized over generalization. This naturally leads to hybrid system designs, where LLMs are responsible for global coordination and strategy, while SLMs handle routine or narrowly scoped operations.
Below is a comparison highlighting the core differences between the two:
Feature | Large Language Models (LLMs) | Small Language Models (SLMs) |
---|---|---|
Examples | GPT-4, Claude, Gemini | DistilBERT, TinyLLaMA, Phi-2 |
Model Size | Billions to trillions of parameters | Tens to hundreds of millions of parameters |
Reasoning Ability | High, can handle abstract, multi-step tasks | Limited to focused, well-defined tasks |
Context Window | Large (32k–128k tokens) | Small to medium (512–8k tokens) |
Inference Cost | High | Low |
Deployment | Cloud, high-performance infrastructure | Edge, browser, lightweight servers |
Use cases | Complex workflows, multi-agent coordination | Classification, log parsing, quick lookups |
Most cognitive systems benefit from hybrid designs, where an LLM oversees high-level reasoning and coordination, while SLMs handle specialized, well-scoped operations combining performance, adaptability, and cost-efficiency.
From multi-service to multi-agent architectures: patterns for making agents work together#
As cognitive architectures mature, they evolve from handling isolated use cases to coordinating distributed tasks across multiple agents. This mirrors the shift from monolithic applications to microservice-based designs — only here, the components are intelligent agents that understand goals, reason about actions, and collaborate toward shared outcomes.
In multi-agent architectures, each agent can be powered by the same or different language models, and they may have overlapping or distinct toolsets. Often, agents are also assigned specific personas or domains of expertise, allowing them to handle different parts of a broader workflow. The structure of multi-agent systems generally falls along a spectrum between two extremes: vertical and horizontal coordination.
In vertical architectures, one agent plays the role of leader, orchestrating others and delegating responsibilities in a top-down manner. Communication typically flows through this central agent, though in some cases, all agents may share a joint conversational thread overseen by the leader. These systems work well for hierarchical workflows that benefit from clear task separation and control. In contrast, horizontal architectures treat all agents as peers. Each agent can see the shared context and respond accordingly, contributing ideas, solving tasks, or calling tools independently. These systems are better suited for collaborative environments where feedback, shared reasoning, and open discussion improve task outcomes.
Whether organized hierarchically or as peers, these agents can exchange information through memory structures, direct messaging, or via orchestration protocols such as A2A (Agent-to-Agent). As a result, systems gain modularity and resilience: agents can be updated or swapped without affecting the overall design, and capabilities can grow organically by expanding the agent set.
Agent-Oriented systems: an interactive shopping assistant example#
To illustrate the capabilities of cognitive architectures, let’s consider an interactive shopping assistant for an e-commerce platform. Unlike a traditional product recommendation system focused on speed and structured queries, this assistant prioritizes a flexible, conversational user experience, allowing users to describe their preferences naturally and refine their choices through dialogue.
Imagine a user looking for a new outfit who might start with a free-form request like: “I’m looking for a red floral summer dress in medium size.” Instead of routing this through predefined APIs and checkboxes, we create an agent powered by an LLM that can understand such nuanced requests, extract product attributes, query the catalog, filter results, and engage in a dynamic conversation to refine the search.
Here’s how this could work with an agent-oriented approach, leveraging frameworks like Google’s Agent Development Kit (ADK), LangChain, or AutoGen. Let’s assume we use ADK and have several tools available to the agent:
product_search(query_parameters: dict)
: This tool interacts with the product catalog. Instead of fetching the entire catalog, it takes structured parameters (e.g.,{'color': 'red', 'pattern': 'floral', 'category': 'dress', 'size': 'medium'}
) and returns a filtered list of products. This addresses the context window limitation by allowing the agent to perform targeted searches.image_recognition(image_url: str)
: This tool processes an uploaded image to identify attributes like color, style, and patterns.refine_search(product_id: str, new_parameters: dict)
: Allows the agent to modify an existing search or product selection based on user feedback.user_profile_update(preferences: dict)
: Stores user preferences in a vector memory or database for personalized recommendations in future interactions.
The agent’s workflow would incorporate a planning phase and the ability to handle multi-turn interactions:
- Initial request and intent understanding (LLM): The user says, “I’m looking for a red floral summer dress in medium size.” The LLM agent, acting as the brain, processes this free-form text. It identifies the user’s intent (find a dress) and extracts key attributes:
color: red, pattern: floral, category: dress, size: medium
.- Planning: The agent determines the best course of action. It decides to use the product_search tool first.
- Tool invocation and execution: The agent constructs a structured query based on the extracted attributes and calls the product_search tool:
product_search({'color': 'red', 'pattern': 'floral', 'category': 'dress', 'size': 'medium'})
. - Tool output and response generation (LLM): The
product_search
tool returns a list of matching dresses. The LLM then synthesizes these results into a human-readable response, perhaps showing a few top recommendations with product names and prices. For example: “I found a few red floral summer dresses for you! How about the ‘Crimson Bloom Maxi Dress’ or the ‘Garden Party Midi Dress’?” - Refinement and dialogue (LLM and tools): The user responds, “I like the Garden Party Midi Dress, but do you have it in blue instead of red?”
- Reasoning and planning: The LLM understands this is a refinement request. It recognizes the
product_id
(Garden Party Midi Dress) and the newcolor: blue
. It plans to use therefine_search
tool. - Tool invocation: The agent calls
refine_search({'product_id': 'Garden Party Midi Dress', 'color': 'blue'})
. - Guardrails/validation: If the
refine_search
tool returns no results, the agent is programmed with a fallback: “Unfortunately, the ‘Garden Party Midi Dress’ isn’t available in blue. Would you like to see other blue floral dresses?” This demonstrates a guardrail to ensure a graceful fallback rather than a generic error.
- Reasoning and planning: The LLM understands this is a refinement request. It recognizes the
- Image-based search (optional): If the user uploads a picture and says, “Find me something like this,” the agent could leverage the
image_recognition
tool to extract visual attributes, then useproduct_search
with those attributes.
Here’s a simplified Python example demonstrating the ADK agent with multiple tools:
import osimport requestsfrom google.adk.agents import Agent
def product_search(query_parameters: dict) -> dict: """ Searches the product catalog based on structured query parameters.
Args: query_parameters (dict): A dictionary of parameters like {'color': 'red', 'category': 'dress'}.
Returns: dict: The search response or an error message. """ try: products_api_url = os.getenv("PRODUCTS_SEARCH_API_PATH") if not products_api_url: raise ValueError("PRODUCTS_SEARCH_API_PATH not defined.")
response = requests.get(products_api_url, params=query_parameters) response.raise_for_status() return {"status": "success", "report": {"data": response.json()}} except Exception as e: return {"status": "error", "error_message": f"Error searching products: {str(e)}"}
def refine_search(product_id: str, new_parameters: dict) -> dict: """ Refines an existing product search or modifies parameters for a specific product.
Args: product_id (str): The ID of the product to refine. new_parameters (dict): New parameters to apply (e.g., {'color': 'blue'}).
Returns: dict: The updated product information or an error. """ # This would typically interact with a product details API or an update mechanism print(f"Refining product {product_id} with parameters: {new_parameters}") # Simulate a successful refinement for demonstration return {"status": "success", "report": {"message": f"Refined search for {product_id} with new parameters."}}
root_agent = Agent( name="interactive_shopping_assistant", model="gemini-2.0-flash", # Or a more capable LLM like Gemini 1.5 Pro for complex reasoning description=( "An agent that provides a conversational interface for product discovery and recommendations." ), instruction=( "You are a helpful interactive shopping assistant. Understand user preferences from freeform text or images, " "use available tools to find products, and engage in multi-turn dialogues to refine results. " "If a search yields no results, suggest alternative options gracefully." ), tools=[product_search, refine_search], # Add other tools like image_recognition as needed)
# Example of agent processing a request (conceptual)# user_input_1 = "I'm looking for a red floral summer dress in medium size."# agent_response_1 = root_agent.process_input(user_input_1)# print(agent_response_1)## user_input_2 = "I like the Garden Party Midi Dress, but do you have it in blue instead of red?"# agent_response_2 = root_agent.process_input(user_input_2)# print(agent_response_2)
The agentic approach simplifies orchestration logic by allowing the LLM to interpret intent, sequence tool usage, and manage context within a single reasoning loop. This makes the architecture highly adaptive to shifting user expectations and business needs, especially when new product attributes or complex search patterns emerge.
The ability to integrate user feedback in real-time, refine searches conversationally, and handle diverse input modalities (like text or images) demonstrates why cognitive architectures excel in scenarios requiring flexibility and natural interaction.
Benefits and challenges of cognitive architectures#
One of the most significant advantages of cognitive architectures is the natural interface they provide. Users and developers can interact with systems through plain language rather than structured APIs or formal input schemas. This allows faster iteration and reduces the complexity typically associated with tightly coupled service orchestration. But the real shift comes from the agent’s ability to reason.
Reasoning is a core part of human intelligence: it allows us to make informed decisions, adapt to unexpected situations, and learn from new information. The same capabilities are essential for agents. Without reasoning, an agent might take user input too literally, fail to account for multi-step implications, or ignore relevant context. With reasoning, agents can plan, reflect, revise, and make decisions autonomously.
In practice, most agent architectures include a dedicated planning phase, where the model chooses how to act before executing any specific steps. This planning can follow various strategies, such as task decomposition, multi-option evaluation, retrieval-augmented guidance, or plan refinement. More advanced techniques, like representing plans as graphs (e.g., in Plan Like a Graph or PLaG), allow agents to execute steps in parallel, improving performance for workflows with many independent subtasks. The ability to adapt is another key benefit. Agents don’t require redeployment to change behavior: often, changing a prompt or swapping a tool is enough. They can integrate feedback, adjust strategies in real time, and operate in environments where the full task definition is not known upfront.
Of course, there are tradeoffs. Agents must manage limited context windows, which can impact long-running or multi-step tasks. Reasoning itself requires larger models, which increases the cost. Using SLMs can reduce this overhead, but it comes with limited planning and abstraction capabilities. There’s also the challenge of unpredictability. Traditional systems are deterministic and easy to debug. In contrast, agents reason probabilistically, and tracing their decisions isn’t straightforward. Ensuring consistent outputs often means combining language model-based reasoning with guardrails, fallback logic, or rules-based validators, as seen in our shopping assistant example, where the agent gracefully handles unavailable product variations.
Finally, observability remains a critical frontier. As reasoning becomes a central part of system behavior, we need better tools to trace decisions, evaluate alternatives, and debug unexpected outputs. This will be key to deploying robust, production-grade agentic systems at scale.
Conclusion#
The movement toward agentic architectures signals a deeper change in how we think about software. Instead of writing detailed instructions and managing services manually, we are increasingly enabling intelligent agents to reason, act, and learn on our behalf. By carefully combining LLMs and SLMs, developers can design systems that are not only more powerful but also more adaptable. The cognitive capabilities of modern models allow us to abstract complexity and work closer to natural human thinking.
Yet this power comes with new responsibilities. As we step into a world of cognitive software, we must rethink reliability, cost management, and transparency. The future of software may not be written in code alone: it may be prompted, reasoned, and evolved through agents that think alongside us.