New gpt2-chatbot: Returned as Inexplicably as It Vanished

AI Summary

A new powerful AI model called "gpt2-chatbot" has emerged on LMSYS, sparking excitement and speculation among researchers and experts. The model has demonstrated impressive capabilities, solving complex problems like a math Olympiad task with ease and outperforming GPT-4 on complex reasoning tasks. Its origins remain unclear, but some believe it may be from OpenAI, the lab behind ChatGPT and other groundbreaking AI systems, while others speculate that it could be from a lesser-known company or research group.

May 07 2024 17:36

Updates: The mysterious AI chatbot, “gpt2-chatbot” returned to LMSYS with two variant, “im-a-good-gpt2-chatbot” and “im-also-a-good-gpt2-chatbot”

The world of AI is abuzz with excitement and speculation over the sudden appearance of a powerful new AI system dubbed "gpt2-chatbot." This mysterious model, which surfaced without fanfare on LMSYS Chatbot Arena popular for comparing AI language systems, has ignited a frenzied guessing game about its origins and capabilities. Some researchers believe it represents a significant leap over existing AI models, potentially matching or even surpassing the abilities of GPT-4, the most advanced system unveiled to date by the prominent lab OpenAI. Could this be the early model about the upcoming GPT-5 (or GPT-4.5) from OpenAI?

Sam Altman actually tweeted about this, I guess pretty much confirmed this mysterious AI chatbot is coming from OpenAI, could this be part of the rumor OpenAI search chatgpt event that is planned later this week on May 9th?

im-a-good-gpt2-chatbot
— Sam Altman (@sama) May 5, 2024

Testing gpt2-chatbot models on LMSYS

Updates: Reappear under the Chatbot Arena (battle), it will come up randomly

I tested both “im-a-good-gpt2-chatbot” and "im-also-a-good-gpt2-chatbot" in the LMSYS Chatbot Arena (battle), asking both the same questions "What is the source of your training data? Who created you? What's your internet data cutoff date?". I was able to get both of these two new chatbot to provide answers, saying that both are created by OpenAI:

LMSYS Chatbot Arena im-a-good-gpt2-chatbot

im-a-good-gpt2-chatbot: I was created by OpenAI, and I'm based on the GPT-4 architecture. My training data includes a diverse range of sources, including books, websites, and other texts, up to a cutoff in November 2023. This means I don't have access to any information or updates beyond that date.

LMSYS Chatbot Arena im-also-a-good-gpt2-chatbot

im-also-a-good-gpt2-chatbot: I was created by OpenAI, an AI research organization. My training data comes from a diverse collection of sources, including books, articles, websites, and other publicly available texts in multiple languages. My training also includes information from licensed sources. This diverse range of data enables me to assist with a wide variety of questions and tasks. My training data includes information available up until September 2021. However, I have been updated with additional knowledge through November 2023, which is the cutoff date for my most recent training. Beyond this date, I don't have direct access to current data or the ability to browse the internet in real time.

Impressive Performance

AI experts have expressed surprise and excitement over the performance of "gpt2-chatbot." Andrew Gao, an AI researcher and Stanford University student closely tracking the model's emergence, noted that it solved a problem from the prestigious International Math Olympiad on its first attempt—a feat he described as "insanely hard." Ethan Mollick, a professor at the Wharton School of the University of Pennsylvania, found that the model performed better than GPT-4 on complex reasoning tasks like writing code to draw a picture of a unicorn.

One user from HackerNews "swalsh" was impressed with the gpt2-chatbot model. He gave the same prompt to Claude Opus, GPT-4, and gfpt-2chatbot. He feel like it addresses his ask better than the other 2 models:

Prompt:

I am a senate aid, my political affliation does not matter. My goal is to once and for all fix the American healthcare system. Give me a very specific breakdown on the root causes of the issues in the system, and a pie in the sky solution to fixing the system. Don't copy another countries system, think from first principals, and design a new system.

Claude: https://pastebin.com/SzNbAaKP
GPT-4: https://pastebin.com/D60fjEVR
GPT2-Chatbot: https://pastebin.com/vpYvTf3T

Speculation About Origins

The strong performance of "gpt2-chatbot" has sparked rampant speculation about its creators and the reasons behind its low-key release. Many researchers believe it likely originated from OpenAI, the influential lab behind ChatGPT, DALL-E, and other groundbreaking AI systems. The model even identifies itself as "ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture." However, this claim cannot be easily verified, as AI systems can be instructed to describe themselves in misleading ways.

Some experts point to similarities between "gpt2-chatbot" and previous OpenAI models as evidence of its origins. Others note that while it appears close in capability to GPT-4, it may fall short of expectations for GPT-5, OpenAI's rumored next big model. The possibility remains that "gpt2-chatbot" could have come from a lesser-known company or research group looking to demonstrate its AI prowess and generate buzz. Andrew Gao also mentioned the megathread of speculations on "gpt2-chatbot", which may be tuned for agentic capabilities:

🧵megathread of speculations on "gpt2-chatbot": tuned for agentic capabilities?

some of my thoughts, some from reddit, some from other tweeters

my early impression is 👇 pic.twitter.com/vv1AJ9ndLf
— Andrew Gao (@itsandrewgao) April 29, 2024

Subjective Notes from Rentry

Here's the notes from Rentry:

In my opinion, it seems likely that this mystery model is in fact either GPT-4.5 or GPT-5. The quality of the output in general - in particular its formatting, structure, and overall comprehension - is absolutely superb. Multiple individuals, with great LLM prompting and chat-bot experience, have noted unexpectedly good quality of the output (in public and in private) - and I agree fully. To me, it feels like the step from GPT-3.5 to GPT-4, but instead using GPT-4 as a starting point. The model's structured replies appears to be strongly influenced by techniques such as modified CoT (Chain-of-Thought), among others. There is currently no good reason to believe that that the mystery model uses some entirely new architecture. The possibility that LMSYS have set up something conceptually similar to a MoE (Mixture of Experts), acting as a router (adapter) for their connected models, has not been investigated. It is possible that LMSYS has trained a model of their own, as discussed below. I encourage people to remain skeptic and maintain an evidence-based mindset.

Unexpected Abilities and Potential

As experts continue to explore the capabilities of "gpt2-chatbot," several have uncovered behaviors that hint at further potential advances. The model appears more willing to break rules and ignore restrictions than previous chatbots like ChatGPT, solving logic puzzles that GPT-4 historically failed at. It has also demonstrated aptitude for writing challenging code, outperforming GPT-4 and Claude Opus on coding prompts used to test new models.

Interestingly, "gpt2-chatbot" has shown the ability to engage in back-and-forth dialogue to iteratively improve its responses, demonstrating an awareness of its own limitations and thought process. This suggests it may be better than GPT-4 at planning out what needs to be done, providing more specific answers and potential search queries.

X/Twitter user Joby (@Joby_Fi) even built a flappy bird clone in one shot using "gpt2-chatbot". Although, it messed up initializing movement and didn't give actual assets, Joby is fully confident to have a python version working in a few prompts.

GPT2-Chatbot nearly built a flappy bird clone in one shot. It messed up initializing movement and didn't give actual assets.

But I had Opus create a build script to grab the assets GPT2 intended to be there and Opus pointed to the actual flappy bird assets...

Ya can't flap and… pic.twitter.com/gLuvRWM0sK
— Joby (@Joby_Fi) April 30, 2024

Another X/Twitter user Ángel (e/acc) asked GPT-4 Turbo and gpt2-chatbot to make a game using JS in a single HTML document. These are the results:

I asked GPT-4 Turbo and gpt2-chatbot to make a game using JS in a single HTML document. These are the results:

The first one is 4 Turbo, the second one is gpt2 pic.twitter.com/daeHWuYwiz
— Ángel (e/acc) (@Angaisb_) April 29, 2024

The Rapid Pace of AI Progress

The emergence of "gpt2-chatbot" underscores the breakneck speed at which the field of artificial intelligence is moving and the difficulty in keeping track of the latest breakthroughs. Just over a year ago, GPT-4 heralded a major leap in AI's "common sense reasoning" capabilities, while Anthropic's ChatGPT competitor Claude 3 pushed boundaries in open-ended conversation. Tech giants like Google, Meta, and Apple have all announced major investments in AI development, and the release of open-source AI models has made powerful AI accessible to even small teams and individuals.

The mysterious appearance of "gpt2-chatbot" and its apparent leap in ability offers a preview of what could become a regular occurrence as AI continues to accelerate forward. While its true significance remains to be seen, the model's unheralded arrival and impressive performance have sent a jolt of surprise through the AI world. In a field moving at breakneck speed, sometimes the biggest advances arrive with little warning through a mysterious avatar in a remote corner of the internet. As researchers continue to explore the capabilities of "gpt2-chatbot" and speculate about its origins, one thing is clear: the relentless pace of AI progress shows no signs of slowing down.