University researchers in China have found a way to alter the behavior of AI voice models by embedding hidden commands inside audio clips that are inaudible to humans. The attack has an up to 96% success rate, according to research out of Zhejiang University.
The attack method, presented at the 47th IEEE Symposium on Security and Privacy in San Francisco, targets large audio-language models, or LALMs, which can process spoken commands and interact with external tools and applications.
“It takes just half an hour to train this signal, and then, because this signal is context-agnostic, you can use it to attack the target model whenever you want, no matter what the user says,” lead author Meng Chen, a Ph.D. student at Zhejiang University, said in a statement.
The attack works by modifying the numerical values inside a digital audio waveform in ways that are not perceptible to human listeners but still affect how AI models interpret the signal. Researchers said the manipulated audio can override or redirect a model’s behavior even when legitimate user instructions are included with the clip.
AudioHijack differs from traditional prompt injection attacks because it does not manipulate what the user says to the AI. Instead, it alters the audio signal itself, embedding hidden instructions inside sounds humans cannot hear. Researchers said that makes the attack harder to defend against because it bypasses safeguards designed to detect suspicious text prompts.
The researchers tested AudioHijack on 13 open-source AI voice models, and found that it could make them refuse requests, spread false information, insert harmful links, change personality, or perform actions the user never asked for, including web searches, file downloads, and emails containing personal data. The attacks also worked on commercial voice AI systems from Microsoft and Mistral that use similar technology.
“Many previous attacks on generative models required the attacker to have complete control over both the final audio input and original instructions given to the model, essentially acting as the user,” the study said. “Here, the attacker manipulates only the audio data being processed by the model, which makes it possible to attack a model while it’s being used by someone else.”
According to the study, possible delivery methods include online videos, music clips, voice notes, or audio from Zoom calls uploaded to AI transcription services. The team also said unpublished follow-up work demonstrated similar attacks in live AI voice chats.
The researchers said monitoring a model’s internal attention mechanisms was the most effective defense they tested. However, they also found that attackers aware of the defense could reduce the strength of the manipulation while maintaining much of the attack’s effectiveness.
“These single-point defenses struggle to resist our attack because we found it’s very hard for these models to distinguish the normal user intent and our adversary attack,” Chen said.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。