Agentic AI Security – Part 1: Uncovering Hidden Threats

Dishan M. Francis

14 hours ago

As we all know, AI is evolving at an incredible pace — sometimes it feels like we barely have time to catch our breath. The old ’10-year tech evolution cycle’ no longer applies; groundbreaking innovations are emerging faster than ever. One of the latest is Agentic AI. In fact, interest in the term ‘Agentic AI’ has skyrocketed, reaching its peak on Google search trends within just three months (March to June). There’s clearly a growing buzz and curiosity around it!

What is Agentic AI?

If you search for the term, you’ll come across various interpretations of what Agentic AI means. The image below highlights some of the key terms commonly used to describe it.

As AI continues to evolve at a rapid pace, we’re shifting from traditional predictive models toward autonomous systems—and that’s where Agentic AI comes in. If we consolidate the many definitions out there, most agree on a few common capabilities that define Agentic AI.

Unlike earlier models that simply predict outcomes, Agentic AI can plan and execute tasks autonomously—either on your behalf or entirely on its own. In a single-agent setup, it might complete a focused task, but today’s multi-agent systems allow multiple agents to collaborate, solve complex problems in parallel, and do so without any human involvement. Agentic AI systems has certain characterstics:

Planning: For example, if you ask an agent to generate a sales trends report, it must figure out how to gather data from a database, interact with other agents for inventory info, analyze the data, generate visuals, and present it meaningfully. That’s a lot of behind-the-scenes orchestration.
Memory: Agents need memory—short-term and long-term—to remember context, retain data across tasks, and use it in future interactions.
Reasoning: They also require reasoning and evaluation skills. Repetitive reasoning with feedback loops helps refine outcomes over time.
Action Execution: Ultimately, an agent should be able to take action—like sending emails, updating records, or creating reports.

When I tried to think of the best way to explain Agentic AI, the first thing that came to mind was J.A.R.V.I.S. from Iron Man. Even before the concept of Agentic AI took off, this fictional assistant demonstrated many qualities we now aim for in real-world AI systems.

J.A.R.V.I.S. understands Tony Stark’s voice commands (like today’s NLP), makes autonomous decisions, and constantly monitors its surroundings—just like an AI agent with situational awareness. It plans and executes tasks (like prepping the Iron Man suit), gives proactive alerts about system threats, and learns from Tony’s habits using machine learning. In many ways, Agentic AI is like giving software a brain and purpose—just like J.A.R.V.I.S. helps Tony Stark (minus the flying suit… for now 😊).

That said, the goal of this blog series isn’t to deep-dive into how Agentic AI works, but rather to explore how we secure these powerful new systems. So, I’ll keep the overview simple and shift focus on the security side.

What Security Challenges Agentic AI has ?

Just like any other software system, Agentic AI also has vulnerabilities—but that doesn’t automatically mean it’s a risk. This is a common misconception when discussing solution security. Before we dive into the specific security challenges of Agentic AI, it’s important to clarify this: all software systems have vulnerabilities. For example, unpatched software is a known vulnerability. However, a vulnerability only becomes a “risk” if there’s a real-world threat that can exploit it. Without a relevant threat, the vulnerability remains a potential issue—not an immediate risk.

Risk = Vulnerability + Threat

Think of it this way—having a house on the beach is a vulnerability. A tsunami is a threat. When those two factors combine, they create a risk to life.

Understanding risk is crucial in security. But it’s also important to remember: risk is ultimately a business decision. As security professionals, we can assess threats and provide evidence-based insights, but it’s up to the business to determine how to handle the risk—whether to accept it, mitigate it, transfer it, or avoid it.

In this section, I’ll focus on threats specific to Agentic AI systems, and in future articles, we’ll explore mitigation strategies in more detail.

Not long ago, our focus was on securing GenAI solutions. That journey continues, as new threats emerge rapidly. Fortunately, we now have several frameworks that provide guidance on securing GenAI systems:

MITRE ATLAS Framework – A cybersecurity framework developed by MITRE that organizes real-world tactics, techniques, and case studies of adversarial threats targeting AI systems https://atlas.mitre.org/ .
OWASP Top 10 for LLMs and GenAI Apps – A widely recognized framework that identifies and explains the top risks and mitigation strategies for applications using Large Language Models and Generative AI. https://genai.owasp.org/llm-top-10/

In the world of Agentic AI, many of these threats still apply. Agentic AI systems typically rely on LLMs(unless its traditional rule based agents), accept natural language inputs, process sensitive data, and frequently use Retrieval-Augmented Generation (RAG) as an one of core mechanism. Because of this, many of the known threats in the LLM and GenAI space carry over to Agentic AI as well.

Compromised Intent

Agentic AI systems are designed with clearly defined objectives. For example, a sales agent might be tasked with analyzing monthly sales trends and sending reports to the sales lead. To accomplish this, the system relies on capabilities like planning, reasoning, access to tools, and collaboration with other agents.

However, when the intent of the agent is compromised, an attacker can manipulate these objectives to carry out malicious actions. For instance, an attacker could insert harmful instructions into a tool the agent uses, leading to the exfiltration of sensitive data.

Here are several common methods adversaries use to manipulate an Agentic AI system’s intent:

1. Prompt Injection

This is currently the #1 threat to LLMs, according to OWASP. In this attack, an adversary alters prompts to manipulate the behavior of LLM-powered agents. What’s alarming is how simple such attacks can be—studies like BEST-OF-N Jailbreaking show that even minimal prompt changes can successfully bypass guardrails. The TAP method further demonstrates how AI itself can be used to jailbreak LLMs automatically, reinforcing how rapidly this threat is evolving.

2. Data Manipulation

In some systems, direct prompt injection may be less likely—especially in RAG (Retrieval-Augmented Generation) setups. These systems retrieve context from trusted data sources before generating responses. If relevant context isn’t found, the system typically returns a neutral response.

However, indirect prompt injection still poses a threat. Instead of targeting the model directly, attackers compromise the data sources that the LLM references. For example, if our sales agent pulls data from a shared file repository, an attacker could inject malware or malicious instructions into those files. When the agent accesses and processes this data, it may unintentionally execute the malicious payload—altering its behavior in ways that deviate from its original objectives.

3. Compromising Reasoning or Feedback Loops

Agentic systems often rely on reasoning and feedback loops for iterative decision-making. An attacker can exploit this by triggering infinite or excessive loops, which can degrade performance, consume system resources, and prevent timely or accurate decisions.

4. Cascading Failure Across Multi-Agent Systems

In multi-agent systems, agents work autonomously and communicate with one another to solve complex problems. If an attacker compromises one agent, it can initiate cascading failures across the system—propagating poisoned data or manipulated instructions to other agents. This domino effect can alter the behavior of multiple agents, ultimately undermining the entire system.

Memory Poisoning

Agents in Agentic AI systems leverage both short-term and long-term memory to enrich their decision-making and autonomy. These memory structures are critical for enabling agents to recall past actions, follow predefined workflows, adapt plans, and generate new actions based on accumulated knowledge. Memory is not just a functional component—it defines the agent’s behavior and directly impacts the efficiency and reliability of autonomous operations.

1) RAG-Based Attacks

In Agentic AI, external data sources are often accessed to supplement reasoning with long-term memory. Retrieval-Augmented Generation (RAG) is commonly used to enrich the agent’s context with relevant information. However, this mechanism introduces a significant attack surface. Adversaries can poison the retrieval pipeline by injecting manipulated or misleading data, causing the agent to act on false context. This can lead to sensitive data exfiltration, unauthorized tool usage, or manipulation of system behavior.

2) Memory Exhaustion & Context Overload

Similar to prompt injection and denial-of-wallet attacks in LLMs, Agentic systems are vulnerable to memory exhaustion. Attackers may flood the agent with variable-length inputs or high-volume requests to saturate memory buffers. This degrades the agent’s ability to maintain adaptive context, detect anomalies, or respond to lateral movement. The result is a weakened defense posture and increased risk of undetected misuse or data leakage.

3) Hallucination Attacks

By feeding fabricated or misleading information into the agent’s input stream, attackers can induce hallucinations—responses that appear plausible but are factually incorrect. This undermines the agent’s ability to distinguish truth from fiction, especially in systems that rely on feedback loops for self-learning. If unchecked, hallucinations can propagate through the system, triggering cascading failures across dependent components and workflows.

Identity Compromise

Identity compromise is a universal threat—and Agentic AI systems are no exception. These systems operate with a diverse set of identities, each with varying levels of privilege and impact. Understanding the identity landscape is key to assessing risk:

Privileged identities for platform management: Used for platform-level operations such as resource provisioning, security configuration, and identity management.
Application-level roles: Include app administrators, database admins, and other roles managing application-specific settings.
System identities: Service principals or service accounts used for inter-component communication.
Consumer identities: End users who interact with the Agentic AI system or consume its outputs.

The impact of a compromise depends heavily on the identity type. Here’s how:

1) Consumer Identity Compromise

Consider an internal Agentic AI system that analyses sales data, forecasts trends, and generates strategic plans. This system handles highly sensitive organisational and partner data. If an attacker gains access using a legitimate user account—without any technical exploitation—they can extract valuable insights. Since the attacker operates within the expected access context, detection becomes significantly harder.

2) Platform Administrator Compromise

Agentic AI systems run on cloud-based PaaS or SaaS platforms, managed either manually or via automated pipelines. If an attacker compromises the underlying infrastructure, the consequences extend beyond the AI system itself. They could poison memory, data, or models, introduce malicious tools, or manipulate existing ones to exfiltrate data or leak sensitive information. This is a high-impact vector with systemic implications.

3) Tools and API Access Compromise

Agentic AI systems rely on service accounts and APIs for internal communication—agent-to-agent, service-to-service. These identities often have excessive permissions, far beyond what’s required. If compromised, attackers can exploit this over-privilege to cause significant damage. Worse, such activity is difficult to detect, as requests appear legitimate to the system.

Output Alteration

Agentic AI systems can produce outputs in a variety of ways—ranging from sending reports via email, updating databases, making reservations, processing refunds, to triggering physical actions like shipping a product. In some cases, the output is even passed to another agent for further processing.

If an attacker manages to tamper with this output before it reaches its intended destination—whether a system or a human—they can cause downstream decisions to be made based on false or misleading information. This is distinct from influencing the agent’s decision-making logic (i.e., model behavior manipulation). Here, the agent may have made the correct decision, but the attacker alters the final output en route.

Attackers may exploit vulnerabilities in the tools or APIs used by the Agentic AI system to intercept or modify outputs. This could also manifest as a supply chain attack, where compromised third-party components are used to manipulate results. Additionally, excessive agency—where identities have more permissions than necessary—if compromise, it can allow attackers to inject rogue tools into the system, further expanding the attack surface.

Modified outputs can also be used as a covert channel for data exfiltration, embedding sensitive information in seemingly benign responses.

Final Thoughts

This post covered some of the most critical and common threats facing Agentic AI systems today. However, the threat landscape is evolving rapidly. New risks are emerging as these systems become more autonomous and interconnected.

In the next blog post, we’ll dive into Threat Modeling for Agentic AI Systems—exploring how to systematically identify, assess, and mitigate these risks.