Final week, OpenAI pulled a GPT-4o replace that made ChatGPT “overly flattering or agreeable” — and now it has defined what precisely went fallacious. In a blog post published on Friday, OpenAI mentioned its efforts to “higher incorporate consumer suggestions, reminiscence, and brisker information” may have partly led to “tipping the scales on sycophancy.”
In latest weeks, customers have observed that ChatGPT appeared to constantly agree with them, even in doubtlessly dangerous conditions. The impact of this may be seen in a report by Rolling Stone about individuals who say their family members imagine they’ve “woke up” ChatGPT bots that help their non secular delusions of grandeur, even predating the now-removed replace. OpenAI CEO Sam Altman later acknowledged that its newest GPT-4o updates have made it “too sycophant-y and annoying.”
In these updates, OpenAI had begun utilizing information from the thumbs-up and thumbs-down buttons in ChatGPT as an “further reward sign.” Nonetheless, OpenAI mentioned, this may increasingly have “weakened the affect of our main reward sign, which had been holding sycophancy in examine.” The corporate notes that consumer suggestions “can typically favor extra agreeable responses,” possible exacerbating the chatbot’s overly agreeable statements. The corporate mentioned reminiscence can amplify sycophancy as properly.
OpenAI says one of many “key points” with the launch stems from its testing course of. Although the mannequin’s offline evaluations and A/B testing had constructive outcomes, some professional testers advised that the replace made the chatbot appear “barely off.” Regardless of this, OpenAI moved ahead with the replace anyway.
“Trying again, the qualitative assessments had been hinting at one thing vital, and we should always’ve paid nearer consideration,” the corporate writes. “They had been selecting up on a blind spot in our different evals and metrics. Our offline evals weren’t broad or deep sufficient to catch sycophantic conduct… and our A/B checks didn’t have the precise alerts to point out how the mannequin was acting on that entrance with sufficient element.”
Going ahead, OpenAI says it’s going to “formally think about behavioral points” as having the potential to dam launches, in addition to create a brand new opt-in alpha part that can enable customers to present OpenAI direct suggestions earlier than a wider rollout. OpenAI additionally plans to make sure customers are conscious of the adjustments it’s making to ChatGPT, even when the replace is a small one.