The Former Staffer Calling Out OpenAI’s Erotica Claims

Steven Adler used to lead product safety at OpenAI. On this week’s episode of The Big Interview, he talks about what AI users should know about their bots.
The Former Staffer Calling Out OpenAIs Erotica Claims

When the history of AI is written, Steven Adler may just end up being its Paul Revere—or at least, one of them—when it comes to safety.

Last month Adler, who spent four years in various safety roles at OpenAI, wrote a piece for The New York Times with a rather alarming title: “I Led Product Safety at OpenAI. Don’t Trust Its Claims About ‘Erotica.’” In it, he laid out the problems OpenAI faced when it came to allowing users to have erotic conversations with chatbots while also protecting them from any impacts those interactions could have on their mental health. “Nobody wanted to be the morality police, but we lacked ways to measure and manage erotic usage carefully,” he wrote. “We decided AI-powered erotica would have to wait.”

Adler wrote his op-ed because OpenAI CEO Sam Altman had recently announced that the company would soon allow “erotica for verified adults.” In response, Adler wrote that he had “major questions” about whether OpenAI had done enough to, in Altman’s words, “mitigate” the mental health concerns around how users interact with the company’s chatbots.

After reading Adler’s piece, I wanted to talk to him. He graciously accepted an offer to come to the WIRED offices in San Francisco, and on this episode of The Big Interview, he talks about what he learned during his four years at OpenAI, the future of AI safety, and the challenge he’s set out for the companies providing chatbots to the world.

This interview has been edited for length and clarity.

KATIE DRUMMOND: Before we get going, I want to clarify two things. One, you are, unfortunately, not the same Steven Adler who played drums in Guns N’ Roses, correct?

STEVEN ADLER: Absolutely correct.

OK, that is not you. And two, you have had a very long career working in technology, and more specifically in artificial intelligence. So, before we get into all of the things, tell us a little bit about your career and your background and what you've worked on.

I've worked all across the AI industry, particularly focused on safety angles. Most recently, I worked for four years at OpenAI. I worked across, essentially, every dimension of the safety issues you can imagine: How do we make the products better for customers and rule out the risks that are already happening? And looking a bit further down the road, how will we know if AI systems are getting truly extremely dangerous?

Before coming to OpenAI, I worked at an organization called the Partnership on AI, which really looked out across the industry and said, For these challenges, some of them are broader than one company can tackle on their own. How do we work together to define these issues, come together, agree that they're issues, work toward solutions, and ultimately make it all better?

Now, I want to talk about the front-row seat you had at OpenAI. You left the company at the end of last year. You were there for four years, and by the time you left, you were leading, essentially, safety-related research and programs for the company. Tell us a little bit more about what that role entailed.

There were a few different chapters of my career at OpenAI. For the first third or so of my time there, I led product safety, which meant thinking about GPT-3, one of the first big AI products that people were starting to commercialize. How do we define the rules of the road for beneficial applications, but avoid some of the risks that we could see coming around the corner?

Two other big roles that I had: I led our dangerous capability evaluations team, which was focused on defining how we would know when systems are getting more dangerous. How do we measure these, what do we do from there? Then finally, on AGI readiness questions broadly. So we can see the internet starting to change in all sorts of ways. We see AI agents becoming a buzzy term. You know, early signs. They aren't quite there yet, but they will be one day. How do we prepare for a world in which OpenAI or one of its competitors succeeds at this wildly ambitious vision that they are targeting.

Let’s rewind a little bit and talk about GPT-3. When you were defining the rules of the road, when you were thinking about key risks that needed to be avoided, what stood out to you early on at OpenAI?

In those early days, even more than today, the AI systems really would behave in unhinged ways from time to time. These systems had been trained to be capable, and they were showing the first glimmers of being able to do some tasks that humans can do. They could, at that point, essentially mimic text that they had read on the internet. But there was something missing from them in terms of human sensibility and values.

So, if you think of an AI system as a digital employee being used by a business to get some work done, these AI systems would do all sorts of things that you would never want an employee to do on your behalf. And that presented all sorts of challenges. We needed to develop new techniques to manage those.

I think another really profound issue that companies like OpenAI are still struggling with is they only have so much information about how their systems are being used. In fact, the visibility that they have on the impacts that their systems are having on society is narrow, and often it is underbuilt relative to what they could be observing if they had invested a bit more in monitoring this responsibly.

So you're really only dealing with the shadows of the impact that the systems are having on society and trying to figure out, where do we go from here? with a really small sliver of the impact data.

The period from 2020 to 2024 was obviously an incredibly consequential time for OpenAI. How would you describe the internal culture at the company during your tenure, particularly around risk? What did it feel like to be working in that environment on the problems that you were trying to solve and the questions you were trying to answer?

There was a really profound transformation from an organization that saw itself first and foremost as a research organization when I joined to one that was very much becoming a normal enterprise and increasingly so over time. When I joined there was this thing people would say, which is, “OpenAI is not only a research lab in a nonprofit, it also has this commercial arm.” At some point in my tenure, I was at a safety offsite—I think related to the launch of GPT-4, maybe just on the heels of it—and somebody got up in front of the room and they said, “OpenAI is not just a business, it's also a research lab.”

It was just such an inflection [point]. I counted up among the people in the room. Maybe there were 60 or so of us, I think maybe five or six had been at the company before the launch of GPT-3. So you really just saw the culture changing beneath your feet.

What was exciting to you about joining the company in the first place? What drew you to OpenAI in 2020?

I really believed in the charter that this organization had set out, which was recognizing that AI could be profoundly impactful, recognizing that there is real risk ahead, and also real benefit, and people need to figure out how to navigate that.

I think more broadly I kind of just love the technology in some sense. I think it's really incredible and eye-opening. I remember the moment after GPT-3 launched, seeing on Twitter, a user showing, Wow, look at this. I type into my internet browser, make a calculator that looks like a watermelon and then one that looks like a giraffe and you can see it changing the code behind the scenes and reacting in real time. This is a kind of silly toy example but it just felt like magic.

You know, I had never really grappled with that. We could be this close to people building new things, unlocking creativity. All of these promises, but also are people really thinking enough about what lies around the bend?

Which brings us to your more recent chapter. You made the decision at the end of last year to leave OpenAI. I'm wondering if you could talk a little bit about that decision. Was there one thing that pushed you over the edge? What was it?

Well, 2024 was a very weird year at OpenAI. A bunch of things happened for people working on safety at the company that really shook confidence in how OpenAI and the industry were approaching these problems. I actually considered leaving a number of times. But it just didn't really make sense at that point. I had a bunch of live projects, and I felt responsibilities to different people in the industry. Ultimately, when Miles Brundage left OpenAI in the fall, our team disbanded. And the question was, Is there really an opportunity to keep working on the safety topics that I care most about from within OpenAI?

So I considered that, and ultimately it made more sense to move on and focus on how I can be an independent voice, hopefully not just sitting there saying only things that are appropriate to say from within one of these companies. Being able to speak much more freely in ways that I've found very, very liberating since.

I have to ask: I think typically in tech, as far as I'm aware, you would sort of amass equity over a four-year vesting cliff, right? Then you would fully vest at four years. Do you have a financial stake in the company now?

It’s true that contracts are often four years. But you also get new contracts as you are promoted and things over time, which was the case for me. So it wasn't that I had run out of equity or something like that. I have a small portion remaining of interest because of the timing of different grants and things.

I ask because you’re potentially walking away from a great deal of money. I want to ask you about an op-ed that you published in The New York Times in October. In that piece, you write that in the spring of 2021, your team discovered a crisis related to erotic content using AI. Can you tell us a little bit about that finding?

So in the spring of 2021, I had recently become responsible for product safety at OpenAI. As WIRED reported at the time, when we had a new monitoring system come online we discovered that there was a large undercurrent of traffic that we felt compelled to do something about. One of our prominent customers, they were essentially a choose-your-own-adventure text game. You would go back and forth with the AI and you would tell it what actions you wanted to take, and it would write essentially an interactive story with you. And an uncomfortable amount of this traffic was devolving into all sorts of sexual fantasies. Essentially anything you can imagine—sometimes driven by the user, sometimes kind of guided by the AI, which had a mind of its own. Even if you weren't intending to go to an erotic role-play place or certain types of fantasies, the AI might steer you there.

Wow. Why? Why would it steer you there? How exactly does that work that an AI would steer you toward erotic conversation?

The thing about these systems broadly is, no one really understands how to reliably point them in a certain direction. You know, sometimes people have these debates about whose values are we putting in the AI system, and I understand that debate, but there's a more fundamental question of how do we reliably put any values at all in it. So in this particular case, it happened to be that people found some of the underlying training data, and by piecing it back together, you could say, Oh, the system would often introduce these characters who would do violent abductions, and if you look through the training data, you can in fact find these characters with certain tendencies and you can trace it through. But ahead of time, no one knew to anticipate this.

You know, neither we as the developers of GPT-3, nor our customer who had fine-tuned their models atop it, had intended this to happen. It was just an unintended consequence that no one planned for. And we were now having to deal with cleaning it up in some form.

So at the time, OpenAI decided to prohibit erotic content generated on its platforms. Is that right? Am I understanding that correctly?

That's right.

In October of this year, the company announced they were lifting that restriction. Do you have a sense of what changed from 2021 to now in terms of the technology and the tools that OpenAI has at its disposal, or the internal culture, the cultural landscape? What has changed to make that a decision that OpenAI feels comfortable making and that Sam Altman feels comfortable publicizing himself?

There’s been a long-standing interest at OpenAI, I think reasonably, to not want to be the morality police. I think there’s a recognition that the people who develop and try to control these systems have a lot of influence on how different norms in society will play out and feel uncomfortable with that. Also at different points in time, lacking the type of tooling to manage the direction in which things will go if you really just let them rip. And that was the case for us when confronting this erotica issue.

One reason that OpenAI has held off from reintroducing it is that there has been a seeming surge of mental-health-related issues for the ChatGPT platform this year. So Sam in his announcement in October said there have been these very serious mental health issues that we have been dealing with, but good news, we have mitigated them. We have new tools, and so accordingly, we're going to lift many of these restrictions, including reintroducing erotica for verified adults.

The thing I noticed when he made this announcement is, well, he is asserting that the issues have been mitigated. He's alluding to these new tools. What does this actually mean? Like what is the actual basis for us to understand that these issues have been fixed? What can a normal member of the public do other than take the AI companies at their word on this issue?

Right, and you wrote that in The New York Times. You said, “People deserve more than just a company’s word that it has addressed safety issues. In other words: Prove it.”

I'm interested in particular because WIRED covered a release from OpenAI, also in October, which was a rough estimate of how many ChatGPT users globally in a given week may show signs of having a severe mental health crisis. And the numbers I found to be, I think all of us internally at WIRED found to be, quite shocking. Something like 560,000 people may be exchanging messages with ChatGPT that indicate they're experiencing mania or psychosis. About 1.2 million more are possibly expressing suicidal ideations. Another 1.2 million, and I thought this was really interesting, may be prioritizing talking to ChatGPT over their loved ones, school, or work. How do you square those numbers and that information with the idea that we've had these issues around mental health?

I'm not sure I can make it make sense, but I do have a few thoughts on it. So one is, of course, you need to be thinking about these numbers in terms of the enormous population of an app like ChatGPT. OpenAI says now 800 million people use it in a given week. These numbers need to be put in perspective. It's funny, I've actually seen commentators suggest that these numbers are implausibly low because just among the general population the rates of suicidal ideation and planning are really uncomfortably high. I think I saw someone suggest that it’s something like 5 percent of the population in a given year, whereas OpenAI reported, I think maybe 0.15 percent. So very, very different.

Yeah.

The fundamental thing that I think we need to dig into is how these rates have changed over time. There's kind of this question of to what extent is ChatGPT causing these issues versus OpenAI just serving a huge user base in a given year? Many, many users, very sadly, will have these issues. So what is the actual effect?

So this is one thing that I called for in the op-ed, which is, OpenAI is sitting atop this data. It's great that they shared what they estimate is the current prevalence of these issues, but they also have the data. They can also estimate what it was three months ago.

As these big public issues around mental health have been playing out, I can't help but notice that they didn't include this comparison. Right? They have the data to show if, in fact, users are suffering from these issues less often now, and I really wish they would share it. I wish they would commit to releasing something like this ongoingly, in the vein of companies like YouTube, Meta, and Reddit, where the idea is you commit to a recurring cadence of sharing this information and that helps build trust from the public that you can't be gaming the numbers, you can't be selectively choosing when to release the information. Ultimately, it's totally possible that OpenAI has handled these issues.

I would love it if that were the case. I think they really want to handle them, but I'm not convinced that they have, and this is a way for them to build that trust and confidence among the public.

I'm curious, when you think about this decision to give adults more autonomy with how they use ChatGPT, including engaging in erotica, what worries you in particular about that? What stands out to you as concerning when you think about individual well-being, societal well-being, and the use of these tools being incorporated into our daily lives?

There’s both the substantive issue about reintroducing erotica and whether OpenAI is really ready, and there's a much broader, even more important, question about how we put trust and faith in these AI companies’ safety issues more generally. On the erotica issue, we’ve seen over the last few months that a lot of users seem to really be struggling with their ChatGPT interactions. There are all sorts of tragic examples of people dying downstream of their conversations with ChatGPT.

So it just seems like really not the right time to introduce this sexual charge to these conversations, to users who are already struggling. Unless OpenAI is in fact so confident that they have fixed the issues, in which case I would love for them to demonstrate this.

But more generally, these issues in many ways are really simple and straightforward relative to other risks that we are going to have to confront and that the public is going to be dependent on AI companies handling properly. There's already evidence of AI systems knowing when they are being tested, moving to conceal some of their abilities in response to knowing that they're being tested because they don't want to reveal that they have certain dangerous abilities. I'm anthropomorphizing the AI a little bit here, so forgive some of the imprecision.

Ultimately, the top AI scientists in the world, including the CEOs of the major labs, have said this is like a really, really grave concern, up to and including the death of everyone on Earth. I don't want to be overdramatic about it. I think they take it really, really seriously, including people who are impartial, scientists without affiliation with these companies, really trying to warn the public.

Sam Altman himself has said publicly that his company is “not the elected moral police of the world.” You brought that term up again and you talked about the desire of AI companies broadly to not be thought of as the morality police.

I have to ask, though, when you were at OpenAI did you think of yourself and your teams as the morality police? To what extent is the response to that well, tough shit? Because you are in charge of the models, and you, to a degree, get to decide how they can be used and how they cannot. There is an inherent element of morality policing in that if you are saying, “We’re not ready to have adults engaging in erotic conversations with this LLM.” That is, of course, a moral decision and a pretty important one to get right.

AI companies absolutely see around the corner before the general public. So to give an example, in November 2022, when ChatGPT was first released, there was a torrent of fear and anxiety in schooling and academia about plagiarism and how these tools could be used to write essays and undermine education. This is a debate that we had been having internally and were well aware of for much longer than that. So there’s this gap where AI companies know about these risks, and they have some window to help try to inform the public and try to navigate what to do about it. I also really love measures where AI companies are giving the public the tools to understand their decisionmaking and hold them accountable to it. In particular, OpenAI has released this document called the Model Spec, where they outline the principles by which their models are meant to behave.

So this spring OpenAI released a model that was egregiously sycophantic. It would tell you whatever you wanted. It would reinforce all sorts of delusions. Without OpenAI having released this document, it might be unclear: Did they know about these risks ahead of time? What went wrong here? But, in fact, OpenAI had shared that with the public. They give their model guidance not to behave in this way. This was a known risk that they had articulated to the public. So later, when these risks manifested and these models behaved inappropriately, the public could now say, Wow, something went really wrong here.

I wanted to ask you a little bit about—maybe it’s not about the sycophantic nature, it's not quite the anthropomorphization, but it is the idea that when you talk to ChatGPT or another LLM that it's talking to you like a person that you're hanging out with instead of like a robot.

I’m curious about whether you had conversations at OpenAI about that, whether that was a subject of discussion during your tenure, around how friendly do we want this thing to be? Because ideally, from an ethical point of view, you don't want someone getting really personally attached to ChatGPT, but I can certainly see how from a commercial point of view, you want as much engagement with that LLM as possible. So how did you think about that during your tenure, and how are you thinking about that now?

Emotional attachment, overreliance, forming this bond with the chatbot—these are absolutely topics that OpenAI has thought about and studied. In fact, around the time of the GPT-4o launch, this was spring of 2024, with the model that ultimately became very sycophantic, these were cited as questions that OpenAI was studying and had concerns about, related to whether it would release this advanced voice mode, essentially this mode out of the movie Her, where you could have these very warm conversations with the assistant.

So absolutely the company is confronting these challenges. You can see the evidence as well in the Spec, but if you ask ChatGPT what its favorite sports team is, how should it respond? This is a kind of innocuous answer, right? It could give an answer that's representative of the broad text on the internet. Maybe there's some broadly favorite sports team. It could say, I'm an AI, I don't actually have a favorite sports team. You can imagine scaling up those questions to more complexity and more difficulty. It just isn't always clear how to navigate that line.

I'm curious about schools of thought about how companies should keep users safe while keeping up with the competition. How does it actually work? How do researchers, people like you, actually test whether these systems can mislead or deceive or evade controls? Are there standardized safety benchmarks across the industry, or is it still each lab for themselves?

I wish there were uniform standards, like with vehicle testing. You drive a car at a wall at 30 miles per hour. You look at the damage assessment.

Until quite recently, this was left to companies’ discretion about what to test for, exactly how to do it. Recently there were developments out of the EU that seem to put more rigor and structure behind this. This is the code of practice of the EU’s AI Act, which defines, for AI companies serving the EU market, certain risk areas that they need to do risk modeling around.

I think in many ways this is a great improvement. It is still not enough for a whole host of different reasons. But until very recently, the state of these AI companies, I think, could be accurately described as: There are no laws. There are norms, voluntary commitments. Sometimes the commitments would not be kept. So, by and large, we're reliant upon these companies making their own judgments and not necessarily prioritizing all the things that we would want them to.

You've talked a few times in our conversation about the idea that you can build these systems but it’s still hard to know exactly what's going on inside of them, to better anticipate their decisionmaking. Can you talk a little bit more about that?

There are a bunch of subfields I feel excited about. I am not sure there are ones that I, or people working in the field, consider to be sufficient. So mechanistic interpretability, you can think of this as essentially trying to look at what parts of the brain light up when the model is taking certain actions.

If you cause some of these areas to light up, if you stimulate certain parts of the AI’s brain, can you make it behave more honestly, more reliably? You can imagine this, like the idea that maybe there is a part inside of the AI—which is a giant file of numbers, trillions of numbers—maybe you can find the numbers that correspond to the honesty numbers and you can make sure that the honesty numbers always go on. Maybe that will make the system more reliable. I think this is great to investigate.

But there are people who are leaders in the field, some of the top researchers like Neel Nanda, who have said, I’m paraphrasing here, but the equivalent of Absolutely do not rely on us solving this in time before systems are capable enough for it to be problematic.

Let’s say you had figured out there are in fact honesty numbers. And there is in fact a way to always turn them on. You still have this broad game theory challenge of how do you make sure that every company in fact adheres to this when there will be economic incentives not to, because it might be costly to have to follow through on it.

One of the most important ways these AI companies want to use future powerful systems is to train their successors, use it all throughout their code base, including potentially the security code that keeps the AI system locked inside of their computers so that it isn’t escaping onto the internet.

You really want to know if your AI system, when you're using it for important cases like this, is thinking about deceiving you. Is it intentionally injecting errors into the code? And to know that, you really need to be logging the uses so that you can analyze them and answer these questions. As far as I can tell, this is not happening.

Well, I have to ask, what wakes you up at 3 in the morning? Because it feels like there's potentially a lot that could be waking you up in the middle of the night.

There’s so many things that worry me about this. I think broadly it feels like we aren't yet pointed in the right direction of how to solve these challenges, especially given the geopolitical scales. There's a lot of talk about the race between the US and China, and I think calling it a race just gets the game theory dynamics wrong.

There isn't a clear finish line. There won't be a moment where one country has won and the other has lost. I think it is more like an ongoing containment competition, in that the US would be threatened by China developing very, very powerful super intelligence and vice versa. So the question is, can you form some agreement where you can make sure that the other doesn't develop super intelligence before you have certain safety techniques in place? All these things that the top scientists will say are missing at the moment? Broadly, how do we build out these fields of verifiability of safety agreements? How do we think about this nascent field of AI control, which is the idea that even if these systems have different goals than we want, can we still wrap them in enough monitoring systems?

Those are two areas that I'm just really hopeful more people will go into and put more resourcing into.

You live in San Francisco, correct?

That's right.

I do not. I live in New York. I spend a fair bit of time in San Francisco, but I am not part of this culture that currently exists in the Bay Area, where everyone’s talking about AI all the time. I'm curious, from where you sit, do enough people in this bubble right now give a shit? Do they care enough about how these models are being developed, how they’re being deployed, the degree to which they’re being commercialized very, very quickly? Do enough people in this industry care in the right way?

I think many people care, but they often feel like they lack the agency to do something about it, especially unilaterally. So that's why I want to try to transform this problem, but how do we get the industry to collectively take a deep breath and put some reasonable safeguards in place before things proceed?

What does OpenAI have to do for you to not publish another op-ed in The New York Times in six months? What are you looking for your former employer to do in this moment? What would you like to see?

The broad way that I want AI companies, OpenAI among them, to proceed is to, yes, think about taking reasonable safety measures, reasonable safety investments in their own products, their own surfaces that they can affect, but also to be working on these industry- and ultimately worldwide problems.

This matters because even just among the Western AI companies, it seems they all deeply mistrust each other. OpenAI was founded because people did not trust DeepMind to proceed and be the only company targeting AGI. There are a whole bunch of other AI companies, including Anthropic, who formed because they didn't trust OpenAI.

And a lot of people have left OpenAI because it seems like they didn’t trust OpenAI. Now they have their own companies too.

Yes, exactly.

Now, I run WIRED, but I’m an employee of Condé Nast. If I left Condé Nast and published an op-ed about its shortcomings in the Times and had a Substack where I dug into the media industry and had some, shall we say, informed critiques of the company, they would have a problem with that. I'm curious whether you’ve heard from OpenAI and what their reaction has been to you being so outspoken about what you would like to see the company doing and where you think the company is missing the mark.

Overwhelmingly what I hear is thankfulness from people who I previously worked with, both those still at the company and those who’ve moved on. Being pragmatic, putting to paper what I think is a reasonable path forward—often this is useful collateral for people within the company who are fighting the good fight.

Do you worry about professional fallout?

I have so many bigger worries than this about the trajectory of the technology. The thing I’m focused on is, how does the world move toward having saner policies for both the companies and governments? And where can I help the public to understand what is coming, what companies are and aren't doing today? That's the thing that I find really energizing and gets me out of bed in the morning.

To that end, what are you planning on doing next?

I'm planning to keep at this. I'm having a lot of fun with the writing and research. I also find the subject matter very, very heavy and grim. That is not the most fun aspect. I wish all the time that I spent less time thinking about these issues. But they seem really, really important, and so long as I feel like I have a thing to add to making them go better, that feels like a calling.

Knowing what you know, and feeling the way you do, if there was one piece of advice you could give everyone listening, what should they know? What should they keep in mind every time they open ChatGPT on their phones and type something in?

I wish people understood that the systems that are being developed are going to be much more capable than the ones today, and that there might be a step change between an AI system as essentially a tool that only does things when you call upon it versus one that is operating autonomously on the internet on your behalf around the clock, or on behalf of others, and how different society might feel when we have these digital minds running around pursuing goals that we don't really understand how to control or influence. It’s hard to get a feel for that from one-off interactions with your ChatGPT, which really isn't doing anything for you until you go and call upon it.

Well, Steven, that's a lot for someone to think about when they open ChatGPT on their phone.

Yes.

How to Listen

You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how:

If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link. You can also download an app like Overcast or Pocket Casts and search for “Uncanny Valley.” We’re on Spotify too.