I would also love to see more transparency around AI behavior guardrails, but I don't expect that will happen anytime soon. Transparency would make it much easier to circumvent guardrails.
Why is it an issue that you can circumvent the guardrails? I never understood that. The guard rails are there so that innocent people doesn't get bad responses with porn or racism, a user looking for porn or racism getting that doesn't seem to be a big deal.
The problem is bad actors who think porn or racism are intolerable in any form, who will publish mountains of articles condemning your chatbot for producing such things, even if they had to go out of their way to break the guardrails to make it do so.
They will create boycotts against you, they will lobby government to make your life harder, they will petition payment processors and cloud service providers to not work with you.
We've see this behavior before, it's nothing new. Now if you're the type to fight them, that might not be a problem. If you are a super risk-averse board of directors who doesn't want that sort of controversy, then you will take steps not to draw their attention in the first place.
But I can find porn and racism using Google search right now, how is that different? You have to disable their filters, but you can find it. Why is there no such thing for the google generation bots, I don't see why it would be so much worse here?
I'm leaning towards 'there is a difference between being the one who enables access to x and being the one who created x' (albeit not a substantive one for the end user), but that leaves open the question of why that doesn't apply to, eg, social media platforms. Maybe people think of google search as closer to an ISP than a platform?
It's not fundamentally different. It's just not making that big of a headline because Google search isn't "new and exciting". But to give you some examples:
I think users are desensitized to what google search turns up. Generative AI is the latest and greatest thing and so people are curious and wary, hustlers are taking advantage of these people to drive monetized "engagement".
Because 'those' legal battles over search have already been fought and are established law across most countries.
When you throw in some new application now all that same stuff goes back to court and gets fought again. Section 230 is already legally contentious enough these days.
Well if you have no explanation for that I don’t see why we should try and use your model to understand anything about being risk adverse. They don’t care about being sued, they want to change reality.
That's a pretty unreasonably high standard to hold.
It's an offhand comment in a discussion on the internet not a research paper, expecting me to immediately have an answer to every possible angle here that I haven't immediately considered is a bit much.
Take it or leave it, I don't really care. I was just hoping to have an interesting conversation.
Yeah, you can find incorrect information on Google too, but you'll find a lot more wailing and gnashing of teeth on HN about "hallucination". So the simple answer is that lots of people treat them differently.
Sounds like we need to relentlessly fight those psychopaths until they're utterly defeated.
Or we could just cave to their insane demands. I'm sure that will placate them, and they won't be back for more. It's never worked before... but it might work for us!
If you can get it on purpose, you can get it on accident. There's no perfect filter available so companies choose to cut more and stay on the safe side. It's not even just the overt cases - their systems are used by businesses and getting a bad response is a risk. Think of the recent incident with airline chatbot giving wrong answers. Now think of the cases where GPT gave racially biased answers in code as an example.
As a user who makes any business decision or does user communication including LLM, you really don't want to have a bad day because the LLM learned about some bias decided to merge it into your answer.
> The guard rails are there so that innocent people doesn't get bad responses with porn or racism
That seems pretty naive. The "guard rails" are there to ensure that AI is comfortable for PMC people, making it uncomfortable for people who experience differences between races (i.e. working-class people) is a feature not a bug.
racism victims being defined in 2024 by anyone but western/white people. being erased seems ok. can you bet than in 20 years the standard will not shift to mixed race people like me? then you will also call people complaining racist and put guardrails against them... this is where it is going
>The guard rails are there so that innocent people doesn't get bad responses
The guardrails are also there so bad actors can't use the most powerful tools to generate deepfakes, disinformation videos and racist manifestos.
That Pandora's box will be open soon when local models run on cell phones and workstations with current datacenter-scale performance. I'm the meantime, they're holding back the tsunami of evil shit that will occur when AI goes uncontrolled.
No legal or financial strategist at OpenAI or Google is going to be worried about buying a couple months or years of fewer deepfakes out in the world as a whole.
Their concern is liability and brand. With the opportunity to stake out territory in an extremely promising new market, they don't want their brand associated with anything awkward to defend right now.
There may be a few idealist stewards who have the (debatable) anxieties you do and are advocating as you say, but they'd still need to be getting sign off from the more coldly strategic $$$$$ people.
I am almost certain the federal government is working with these companies to dampen its full power for the public until we get more accustomed to its impact and are more able to search for credible sources of truth.
I often wonder if corporate lawyers just tell tech founders whatever they want to hear.
At a previous healthcare startup our founder asked us to build some really dodgy stuff with healthcare data. He assured us that it "cleared legal", but from everything I could tell it was in direct violation of the local healthcare info privacy acts.
I've had 'AI Attorneys' on Twitter unable to even debate the most basic of arguments. It is definitely a self fulfilling death spiral and no one wants to check reality.