Really good research on practical attacks against LLM agents.
> “Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants
> in Production Are Practical and Dangerous”
>
> Abstract: The growing integration of LLMs into applications has introduced new
> security risks, notably known as Promptware—maliciously engineered prompts
> designed to manipulate LLMs to compromise the CIA triad of these applications.
> While prior research warned about a potential shift in the threat landscape
> for LLM-powered applications, the risk posed by Promptware is frequently
> perceived as low. In this paper, we investigate the risk Promptware poses to
> users of Gemini-powered assistants (web application, mobile application, and
> Google Assistant). We propose a novel Threat Analysis and Risk Assessment
> (TARA) framework to assess Promptware risks for end users. Our analysis
> focuses on a new variant of Promptware called Targeted Promptware Attacks,
> which leverage indirect prompt injection via common user interactions such as
> emails, calendar invitations, and shared documents. We demonstrate 14 attack
> scenarios applied against Gemini-powered assistants across five identified
> threat classes: Short-term Context Poisoning, Permanent Memory Poisoning, Tool
> Misuse, Automatic Agent Invocation, and Automatic App Invocation. These
> attacks highlight both digital and physical consequences, including spamming,
> phishing, disinformation campaigns, data exfiltration, unapproved user video
> streaming, and control of home automation devices. We reveal Promptware’s
> potential for on-device lateral movement, escaping the boundaries of the
> LLM-powered application, to trigger malicious actions using a device’s
> applications. Our TARA reveals that 73% of the analyzed threats pose
> High-Critical risk to end users. We discuss mitigations and reassess the risk
> (in response to deployed mitigations) and show that the risk could be reduced
> significantly to Very Low-Medium. We disclosed our findings to Google, which
> deployed dedicated mitigations...
Tag - threat models
Interesting research: “Guillotine: Hypervisors for Isolating Malicious AIs.”
> Abstract:As AI models become more embedded in critical sectors like finance,
> healthcare, and the military, their inscrutable behavior poses ever-greater
> risks to society. To mitigate this risk, we propose Guillotine, a hypervisor
> architecture for sandboxing powerful AI models—models that, by accident or
> malice, can generate existential threats to humanity. Although Guillotine
> borrows some well-known virtualization techniques, Guillotine must also
> introduce fundamentally new isolation mechanisms to handle the unique threat
> model posed by existential-risk AIs. For example, a rogue AI may try to
> introspect upon hypervisor software or the underlying hardware substrate to
> enable later subversion of that control plane; thus, a Guillotine hypervisor
> requires careful co-design of the hypervisor software and the CPUs, RAM, NIC,
> and storage devices that support the hypervisor software, to thwart side
> channel leakage and more generally eliminate mechanisms for AI to exploit
> reflection-based vulnerabilities. Beyond such isolation at the software,
> network, and microarchitectural layers, a Guillotine hypervisor must also
> provide physical fail-safes more commonly associated with nuclear power
> plants, avionic platforms, and other types of mission critical systems.
> Physical fail-safes, e.g., involving electromechanical disconnection of
> network cables, or the flooding of a datacenter which holds a rogue AI,
> provide defense in depth if software, network, and microarchitectural
> isolation is compromised and a rogue AI must be temporarily shut down or
> permanently destroyed. ...