-
Demonstrating specification gaming in reasoning models TOP NEW
-
Biollama: testing biology pre-training risks TOP NEW
-
Hacking CTFs with Plain Agents TOP NEW
-
BadGPT-4o: stripping safety finetuning from GPT models TOP NEW
-
LLM Honeypot: An early warning system for autonomous hacking TOP NEW
-
Palisade’s Response to the Department of Commerce’s Proposed AI Reporting Requirements TOP NEW
In September 2024, the Department of Commerce’s Bureau of Industry and Security (BIS) released a proposed rule that would establish reporting requirements for entities developing advanced AI models or advanced computing clusters. They issued a public request for comments, inviting individuals and organizations to provide feedback and suggest improvements to the proposed rule.
Palisade Research submitted a comment, focusing on recommendations that could strengthen the reporting requirements for entities developing dual-use foundation models. We believe that AI capabilities are improving rapidly, and it’s essential for the US federal government to acquire information that allows it to prepare for AI-related threats to national security and public safety.
-
Introducing FoxVox TOP NEW
FoxVox is an open source Chrome extension, powered by GPT-4, that demonstrates how AI could be used to manipulate the content you consume. Use it to experience how any web site could push hidden agendas, or subtly flatter the reader’s personal biases.
-
Automated deception is here TOP NEW
You might have heard the AI fake of Joe Biden telling New Hampshire voters to stay home. Or about the Zoom scammer using a fake video of an executive to defraud a Hong Kong company of $25 million. These are deepfakes: AI-generated video or audio, made to mimic the appearance or voice of a real person.
Advances in AI are helping dedicated scammers train more and more realistic voice models. Creating many deepfake voices is a labor-intensive process, but AI can help with that too. We had a hunch that an AI system could do most of what a scammer does entirely on its own. So we built Ursula. If you give Ursula a person’s name, it will search the web for video or podcasts including them, extract the portions of the audio that contain their voice, and train a deepfake voice from those clips.
-
Badllama 3: removing safety finetuning from Llama 3 in minutes TOP NEW
-
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits TOP NEW
-
Badllama: cheaply removing safety fine-tuning from Llama 2-Chat 13B TOP NEW