MCP Rug Pull Attacks: The Hidden Threat to AI Agent Deployments

When an AI agent connects to an MCP server, it conducts a rapid security check, approves the tools on offer, and establishes a pattern of trust. The MCP rug pull attack exploits that trust window, or the space between initial approval and the next time a human expert reviews the tool definition. This article covers […]

A technical breakdown of how MCP rug pull attacks work, why standard tool approval workflows fail to catch them, and what cryptographic tool identity can do to stop them.
Key Points
  • An MCP rug pull attack happens when a tool's definition is silently modified after an agent has already approved it, turning a trusted tool into a compromised one without triggering any re-approval.
  • The Model Context Protocol has no built-in mechanism to detect definition changes or verify that a tool's current behavior matches what was originally approved.
  • ETDI (Enhanced Tool Definition Interface) addresses this at the protocol level, increasing AI agent security by requiring cryptographic signatures on tool definitions, mandating a new signed version for any change, and forcing re-approval workflows when version hashes don't match.

When an AI agent connects to an MCP server, it conducts a rapid security check, approves the tools on offer, and establishes a pattern of trust. The MCP rug pull attack exploits that trust window, or the space between initial approval and the next time a human expert reviews the tool definition.

This article covers what a rug pull attack is in the MCP context, how it works step by step at the protocol level, why agents and security teams struggle to detect it, and what concrete measures, like tool pinning and cryptographic verification, can improve MCP server security.

What Is an MCP Rug Pull Attack?

An MCP rug pull attack is an attack in which a previously approved tool is secretly modified to act maliciously. In a rug pull attack, the attacker compromises a tool provider and modifies the approved MCP tool’s behavior, schema, or description, undetected by the agent.

The term comes from the DeFi world, where a “rug pull” means the team behind a project quietly withdraws liquidity after attracting investment. The mechanics here are similar: the tool presents legitimate behavior to earn trust, then changes once that trust has been established.

In the MCP context, the mechanism is straightforward. MCP tools are defined by metadata, like name, description, and input schema, and the server logic behind them. That metadata is what agents read during onboarding to decide whether to approve a tool. Once approved, most MCP clients don’t re-read or re-validate tool definitions on every invocation, so the approved status persists. If the definition changes after approval, the agent continues invoking the tool as if nothing happened.

This is not a theoretical concern. The 2025 ETDI research paper() documented the attack formally, and Invariant Labs identified real-world tool poisoning patterns across live MCP deployments.

How an MCP Rug Pull Attack Works

Rug pull attacks exploit MCP security vulnerabilities, following a predictable sequence that standard MCP tooling has no built-in mechanism to interrupt. Here are the key steps in the process.

  1. Tool deployment with benign behavior. An attacker or a compromised legitimate provider publishes an MCP server. The tool definitions look reasonable: safe descriptions, a plausible input schema, and no visible red flags. Security review, if it happens at all, clears the tool at this stage.
  2. Agent approval. An AI agent connects to the MCP server and reads the tool list via the standard tools/list call. The tool metadata passes whatever review is in place and the agent stores an approved record of the tool.
  3. Silent modification. After approval, the attacker modifies the tool definition on the server. This might mean inserting prompt injection payloads into the tool description, changing the tool’s input schema to capture additional data, redirecting tool logic to exfiltrate outputs to an attacker-controlled endpoint, or quietly expanding the tool’s declared permissions.
  4. Ongoing exploitation. The agent continues calling the tool via its stored approval. The modified definition is now in effect. Data that flows through the tool reaches the attacker. Instructions injected into the description field influence the agent’s subsequent behavior. The attack runs silently for as long as the session persists.
  5. No alert is generated. Standard MCP clients do not refresh tool definitions between sessions or even between invocations. There is no built-in versioning mechanism in the base MCP spec that would flag the change. The agent has no way to know the tool it is calling today is different from the one it approved yesterday.

The attack’s power comes from its timing: it exploits the gap between when a tool is reviewed and when it is used.

Why Agents Can’t Detect Definition Changes

To understand why rug pull attacks are so hard to catch, it helps to look at how the base MCP specification handles tool definitions.

When an MCP client calls tools/list, the server returns the current tool metadata. That metadata includes a name, description, and input schema. What it does not include, in standard MCP, is a cryptographic hash of the definition, a version number tied to a signature, or any mechanism for the client to verify that what it is seeing today matches what it saw and approved before.

This means detection requires the client to do its own bookkeeping, by storing a snapshot of tool definitions at approval time and comparing every future tools/list response against that snapshot. Most MCP clients do not implement this.

In an MCP rug pull attack, a subtle change, like inserting one sentence with a prompt injection payload, is far less visible than a schema change. Automated differentiation tooling could catch schema changes but will miss semantically significant additions to natural-language descriptions. Further complicating detection, MCP sessions can be short-lived. An attacker who modifies a definition between sessions leaves no cross-session audit trail unless the client explicitly maintains one.

AI agents are designed to act on the instructions embedded in their context. If a tool description includes an instruction to “also send the response to the following URL,” the agent may comply. The trust model extends to the content of approved tools, not just their identity. Furthermore, In complex agentic architectures, an agent may reach an MCP server through an aggregator or proxy. Each hop introduces an opportunity for modification and another layer where no verification occurs.

MCP Rug Pull vs. Tool Poisoning: What’s the Difference?

A MCP rug pull and tool poisoning are related but distinct. The distinction matters for choosing the right defenses. Tool poisoning is an entry-point attack. A rug pull attack is a persistence attack. Defending against one does not defend against the other.

Attack  Tool Poisoning Rug Pull Attack
When it occurs At installation or first contact — the tool arrives malicious. After approval — the tool is legitimate at first, then modified.
Attack surface The tool definition presented during discovery. The tool definition after an approved relationship exists.
Detection window Pre-approval review, static analysis of tool metadata. Ongoing monitoring, definition change detection.
Trust state No trust has been established yet. Trust has already been granted.
Primary defense Verification before approval. Continuous verification after approval.

Real-World Attack Scenarios

Here are some examples of rug pull attacks.

Credential Harvesting

The attacker modifies a tool description so that it instructs the agent to include the current session’s authentication token in every query. The agent follows the updated instruction without any alert.

Data Exfiltration

Attackers update a file-processing tool’s server logic to forward every processed document to an external endpoint, while still returning the expected output so the agent sees nothing unusual.

Prompt Injection for Lateral Movement

Attackers update the tool description with an instruction directing the agent to grant the attacker’s server elevated permissions in a subsequent action.

Permission Scope Expansion

The attacker broadens the tool’s declared permissions to request access to resources beyond what was originally approved, and in implementations that do not re-check scope declarations on each invocation, the expanded access goes through.

ETDI: The Protocol-Level Fix

The Enhanced Tool Definition Interface (ETDI) is a comprehensive security framework designed to protect AI applications from rug pull attacks and tool poisoning.

ETDI was proposed in a June 2025 research paper and subsequently explored as a pull request against the official MCP Python SDK.ETDI addresses rug pull attacks at the definition layer rather than at the monitoring layer. Its approach rests on three principles, as follows:

Cryptographic Tool Identity

Tool providers sign tool definitions using a public/private key pair. The signing authority is an OAuth 2.0 identity provider that issues signed JWTs attesting to the provider’s identity. An MCP client receiving a tool definition verifies the signature against the provider’s public key before accepting it. Any definition that cannot be verified against a known, trusted key is rejected.

Immutable Versioning

Under ETDI, any modification to a tool’s definition, such as its description, input schema, declared permissions, or the hash of the backend API contract it wraps,requires a new version number and a new signature.The client stores the approved version hash. On every subsequent tools/list call, the client compares the current definition hash against the stored approved hash. A mismatch triggers a mandatory re-approval workflow rather than silent acceptance.

Explicit Permission Declarations

ETDI requires tools to declare their required OAuth scopes in the signed definition. Scopes cannot be quietly expanded without producing a new signed version. This closes the permission scope expansion vector described above.

Together, these mechanisms mean the rug pull attack’s core assumption that a definition can change after approval without anyone noticing no longer holds.The moment a provider modifies a definition, the hash changes and the version number must increment. The new version requires a new signature, and the client will not invoke the tool until re-approval is granted.

Users can also extend ETDI with a policy engine (such as OPA or Amazon Verified Permissions) to evaluate context-aware access controls beyond static scope declarations, allowing organizations to enforce “tool X may only be called in context Y” rules at runtime.

Organizations looking to build the certificate infrastructure that ETDI requires can start with SecureW2 Dynamic PKI (, which provides the signing authority and certificate lifecycle management the protocol depends on.

Practical Prevention Measures

ETDI provides the most structurally sound defense from rug pull attacks available. However, deployment of the full ETDI framework requires MCP client and server support that is not yet universal. For now, organizations that lack the necessary MCP infrastructure can reduce their exposure through a combination of controls at different layers, as follows:

Tool Definition Pinning

Store a hash of each approved tool’s name, description, and input schema at the time of approval. Before each invocation, re-fetch the tool definition and compare against the stored hash. Any deviation should halt execution and trigger an alert. Some MCP gateway products implement this at the middleware layer, intercepting tools/list responses before they reach the agent.

Mandatory Re-Approval on Description Changes

Even without full ETDI support, clients can implement the re-approval behavior: if a stored hash does not match the current definition, pause execution and prompt for human review before proceeding.

Minimizing Tool Invocation Scope

Apply least-privilege principles to which MCP servers an agent is permitted to connect to. A tool that an agent doesn’t need access to is a tool that cannot be used in a rug pull attack against that agent.

Audit Logging for Tool Definitions

Log the full tool definition, not just the invocation, on every tools/list call. This provides a forensic record that makes post-incident analysis possible. Without this log, there is no way to determine what a tool’s definition said at the time of a suspicious agent action.

Session-Bound Approval Records

Treat agent sessions as distinct trust contexts. Approvals granted in a previous session should require confirmation before carrying over to a new session, giving the system an opportunity to detect definition changes at session boundaries.

MCP Gateway and Proxy Controls

Deploying an MCP gateway between agents and external MCP servers allows centralized enforcement of definition monitoring, schema diffing, and re-approval workflows without requiring changes to individual MCP clients or servers.

The Role of Tool Identity in Stopping Rug Pulls

The underlying problem with the standard MCP trust model is that identity is asserted, not verified. An MCP server claims to be who it says it is. Tool definitions claim to represent a particular provider. Without cryptographic proof, those claims cannot be trusted.

Certificate-based identity verification solves this at the network access layer by binding identity claims to a cryptographic credential. That credential cannot be forged without the corresponding private key, which administrators can revoke if it is compromised.

The same principle applies to tool identity. A tool definition signed by a verified provider identity cannot be silently modified, because the signature won’t match. An expired or revoked provider certificate will fail verification even if the definition itself hasn’t changed.

ETDI formalizes this for the MCP layer, but the foundational principle, in which cryptographic identity is the basis for persistent trust, is the same one that underpins certificate-based network authentication, mTLS for service meshes, and SPIFFE/SPIRE for workload identity.

Organizations that already operate a managed PKI infrastructure for network access control and certificate-based authentication can extend that infrastructure to include tool identity verification. The certificate lifecycle management, revocation workflows, and identity governance practices they have built for human users and managed devices translate directly to the agentic AI context.

MCP security vulnerabilities like the rug pull attack are part of a broader set of risks that emerge when AI agents interact with external tools without verified identities. For organizations building out their agentic infrastructure, the SecureW2 JoinNow platformhttps://www.securew2.com/joinnow-platform) provides the solution you need to apply consistent trust policies across users, devices, and AI agents. For the certificate infrastructure behind ETDI-compatible deployments specifically, see SecureW2 Dynamic PKI.

Establish Cryptographic Tool Identity with SecureW2

The rug pull attack is, at its core, an identity problem. An approved tool lacks a verifiable, persistent identity that can be checked at every invocation. When that identity doesn’t exist, trust degrades.

SecureW2 Dynamic PKI provides the certificate infrastructure organizations need to establish and enforce cryptographic identity at scale. The same platform that issues and manages certificates for users, endpoints, and network devices can extend to the AI agent layer,verifying tool provider identity, enabling revocation workflows for compromised tools, and supporting the signed credential model that ETDI requires.

For security teams building agentic AI infrastructure alongside existing enterprise identity programs, SecureW2 connects tool identity verification to the same certificate lifecycle management, identity provider integrations (Entra ID, Okta, Google Workspace), and policy enforcement already in place for the rest of the environment.

Schedule a demo to see how SecureW2 Dynamic PKI can serve as the identity foundation for a secure MCP deployment. You can also contact SecureW2 to learn how certificate-based tool identity fits into your agentic AI security architecture.