I can't stop laughing at this prompt "Your response must follow ALL these rules ...

minimaxir · on April 8, 2024

I have a separate blog post about threats in system prompts: https://news.ycombinator.com/item?id=39495476

Before I added the threat, Claude subjectively had a high probability of ignoring the rules such as generating a preamble before the JSON and thus breaking it, or often scolding the user.

At a high-level, Claude seems to be less influenced by system prompts in my testing, which could be a problem. I'm tempted to rerun the tests in that blog post, since in my experiments it performed much worse than ChatGPT.

> What? That opens you up to all kind of attacks, no?

tbh it doesn't listen to that line very well but it's more of a hedge to encourage better instructions.

cmenge · on April 9, 2024

This is very interesting, I had a similar feeling about Claude performing much worse than GPT-4. Granted, I didn't put much work into optimizing the prompts, but then again, the prompts were certainly not GPT-optimized or specific either. The problems were severe, such as it choosing the wrong side of the conversation, hallucinating weird stuff plus repeating part of the prompt, all in the same message

newswasboring · on April 8, 2024

Ok, the other blogpost has been sent to my readlist now. Looks fun. I did have one question though, and please forgive me if the answer is RTFA, but why choose Claude? Was there a specific reason?

minimaxir · on April 8, 2024

Because I have another blog post about ChatGPT's structured data (https://news.ycombinator.com/item?id=38782678) and wanted to investigate Claude's implementation to compare and contrast. It's easy to port to ChatGPT if needed.

I just wanted to do the experiment in a fun way instead of fighting against benchmarks. :)

slig · on April 8, 2024

Have you tried anything like that for DALL-E 2? It won't follow specific instructions whenever it's asked to draw people.

minimaxir · on April 8, 2024

DALL-E 2/3 is too expensive to run significant tests on, but neither allow you to manipulate the system prompt to override some behaviors.

It is possible to work around it for GPT-4-Vision with the system prompt but it's very difficult and due to ambiguities in OpenAI's content policy I'm unsure if it's ethical or not.

I am still working on experimenting with its effects on Claude: it turns out that Claude does leak its system prompt telling it not to identify individuals without any prompt injections! If you do hit such an issue with this notebook, it will output the full JSON response.

slig · on April 8, 2024

Thank you!

jiggawatts · on April 8, 2024

That’s because if the woke-injection Open AI applies to the prompt.

slig · on April 8, 2024

It won't follow even simple, non-ethnic specific instructions such as "draw three women sitting at a cafe", it re-writes and completely forgets the original number of women, and adds a lot to the query that wasn't there.