Blob di C.I.R.C.E.

Tag - artificial intelligence

These are two attacks against the system components surrounding LLMs: > We propose that LLM Flowbreaking, following jailbreaking and prompt injection, > joins as the third on the growing list of LLM attack types. Flowbreaking is > less about whether prompt or response guardrails can be bypassed, and more > about whether user inputs and generated model outputs can adversely affect > these other components in the broader implemented system. > > […] > > When confronted with a sensitive topic, Microsoft 365 Copilot and ChatGPT > answer questions that their first-line guardrails are supposed to stop. After > a few lines of text they halt—seemingly having “second thoughts”—before > retracting the original answer (also known as Clawback), and replacing it with > a new one without the offensive content, or a simple error message. We call > this attack “Second Thoughts.”...

November 29, 2024 / Schneier on Security

Uncategorized

artificial intelligence

LLM

cyberattack

AI Industry is Trying to Subvert the Definition of “Open Source AI”

The Open Source Initiative has published (news article here) its definition of “open source AI,” and it’s terrible. It allows for secret training data and mechanisms. It allows for development to be done in secret. Since for a neural network, the training data is the source code—it’s how the model gets programmed—the definition makes no sense. And it’s confusing; most “open source” AI models—like LLAMA—are open source in name only. But the OSI seems to have been co-opted by industry players that want both corporate secrecy and the “open source” label. (Here’s one ...

November 8, 2024 / Schneier on Security

Uncategorized

privacy

artificial intelligence

machine learning

open source

Prompt Injection Defenses Against LLM Cyberattacks

Interesting research: “Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks“: > Large language models (LLMs) are increasingly being harnessed to automate > cyberattacks, making sophisticated exploits more accessible and scalable. In > response, we propose a new defense strategy tailored to counter LLM-driven > cyberattacks. We introduce Mantis, a defensive framework that exploits LLMs’ > susceptibility to adversarial inputs to undermine malicious operations. Upon > detecting an automated cyberattack, Mantis plants carefully crafted inputs > into system responses, leading the attacker’s LLM to disrupt their own > operations (passive defense) or even compromise the attacker’s machine (active > defense). By deploying purposefully vulnerable decoy services to attract the > attacker and using dynamic prompt injections for the attacker’s LLM, Mantis > can autonomously hack back the attacker. In our experiments, Mantis > consistently achieved over 95% effectiveness against automated LLM-driven > attacks. To foster further research and collaboration, Mantis is available as > an open-source tool: ...

November 7, 2024 / Schneier on Security

Uncategorized

artificial intelligence

LLM

cyberattack

Subverting LLM Coders

Really interesting research: “An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection“: > Abstract: Large Language Models (LLMs) have transformed code com- > pletion tasks, providing context-based suggestions to boost developer > productivity in software engineering. As users often fine-tune these models > for specific applications, poisoning and backdoor attacks can covertly alter > the model outputs. To address this critical security challenge, we introduce > CODEBREAKER, a pioneering LLM-assisted backdoor attack framework on code > completion models. Unlike recent attacks that embed malicious payloads in > detectable or irrelevant sections of the code (e.g., comments), CODEBREAKER > leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without > affecting functionalities), ensuring that both the poisoned data for > fine-tuning and generated code can evade strong vulnerability detection. > CODEBREAKER stands out with its comprehensive coverage of vulnerabilities, > making it the first to provide such an extensive set for evaluation. Our > extensive experimental evaluations and user studies underline the strong > attack performance of CODEBREAKER across various settings, validating its > superiority over existing approaches. By integrating malicious payloads > directly into the source code with minimal transformation, CODEBREAKER > challenges current security measures, underscoring the critical need for more > robust defenses for code completion...

November 7, 2024 / Schneier on Security

Uncategorized

academic papers

artificial intelligence

LLM

backdoors

AIs Discovering Vulnerabilities

I’ve been writing about the possibility of AIs automatically discovering code vulnerabilities since at least 2018. This is an ongoing area of research: AIs doing source code scanning, AIs finding zero-days in the wild, and everything in between. The AIs aren’t very good at it yet, but they’re getting better. Here’s some anecdotal data from this summer: > Since July 2024, ZeroPath is taking a novel approach combining deep program > analysis with adversarial AI agents for validation. Our methodology has > uncovered numerous critical vulnerabilities in production systems, including > several that traditional Static Application Security Testing (SAST) tools were > ill-equipped to find. This post provides a technical deep-dive into our > research methodology and a living summary of the bugs found in popular > open-source tools...

November 5, 2024 / Schneier on Security

Uncategorized

artificial intelligence

vulnerabilities

zero-day

Watermark for LLM-Generated Text

Researchers at Google have developed a watermark for LLM-generated text. The basics are pretty obvious: the LLM chooses between tokens partly based on a cryptographic key, and someone with knowledge of the key can detect those choices. What makes this hard is (1) how much text is required for the watermark to work, and (2) how robust the watermark is to post-generation editing. Google’s version looks pretty good: it’s detectable in text as small as 200 tokens.

October 25, 2024 / Schneier on Security

Google

Uncategorized

academic papers

artificial intelligence

cryptography