OpenAI Media Manager: Empowering Creators and Respecting Choices

AI Summary

OpenAI's Media Manager aims to empower creators and respect their choices on data usage and AI training, enabling them to identify what works they own and specify inclusion or exclusion from machine learning research and training. The tool uses cutting-edge machine learning to identify copyrighted content across multiple sources, with the goal of setting a new standard for creator control in the AI industry.

May 07 2024 21:04
OpenAI has been at the forefront of innovation with groundbreaking tools like ChatGPT, DALL·E, and Sora. As AI continues to transform the way we live, work, and learn, important conversations about data usage and creator rights have come to the forefront. But as AI becomes more powerful and pervasive, important questions are being raised about data usage, intellectual property, and the role of content creators in the age of AI. Let's look at OpenAI's approach to data and AI, their commitment to respecting creator choices, and their vision for a future where AI benefits everyone.

Empowering Creators and Solving Problems

A core principle in OpenAI's approach is respecting the choices of creators and content owners regarding how their works are used to train AI models. Last summer, OpenAI pioneered the use of web crawler permissions for AI, allowing web publishers to specify what portions of their sites could be accessed for machine learning. These signals are taken into account each time a new model is trained.

However, OpenAI recognizes that this is an incomplete solution. Many creators don't control the websites where their content appears, and works are often quoted, remixed, reposted and used as inspiration across multiple domains. An efficient, scalable system is needed for content owners to express their AI usage preferences.

OpenAI Introducing Media Manager

To address this, OpenAI is developing a new tool called Media Manager. Media Manager will enable creators and content owners to identify what works they own and specify exactly how they want those works included or excluded from machine learning research and training. The tool will use cutting-edge machine learning to identify copyrighted text, images, audio and video across multiple sources. Additional choices and features will be added over time.

OpenAI is collaborating closely with creators, content owners and regulators in developing Media Manager, with the goal of having it in place by 2025. They hope this first-of-its-kind tool will set a new standard for creator control across the AI industry.

Understanding OpenAI's Foundation Models

To build the best, most broadly beneficial AI systems, OpenAI trains its models on large, diverse datasets spanning many languages, cultures, subjects and industries. The more comprehensive the training data, the more knowledgeable and capable the resulting models.

Importantly, OpenAI's models are designed as learning machines, not databases. They learn patterns and relationships from training data to generate novel outputs, but do not directly store or reproduce that data. On the rare occasions a model might parrot expressive content, it is an unintended failure of the machine learning process that OpenAI works to prevent.

OpenAI takes care to minimize personal and sensitive information in training data, and its models are trained not to reveal such details. Techniques like data cleaning and synthetic data generation are increasingly used to maximize data utility and safety.

Notably, OpenAI does not train on customers' private data from products like ChatGPT Team, Enterprise, or the API platform. Free and Plus ChatGPT users can opt out of model improvement contributions.

Building an AI-Powered Creative Ecosystem

Beyond enabling more creator control, OpenAI wants to use AI to move beyond the current attention economy built for advertisers over users. Their ambition is to empower creators and publishers while enhancing the user experience through more useful AI-powered discovery engines.

They are working to display partner content directly in products like ChatGPT to enrich the user experience and help publishers reach new audiences. Global news publishers from the Financial Times to Le Monde are already on board. This content may also be used to train ChatGPT to better surface relevant publisher material.

OpenAI is crafting partnerships to benefit both users and content owners. For example, they are working with Khan Academy and ExamSolutions to improve ChatGPT's math capabilities in order to expand access to personalized AI tutoring on those platforms. The goal is vibrant ecosystems where all stakeholders gain value.

In the past year, OpenAI has struck content licensing deals with major publishers like the Associated Press, Axel Springer, Le Monde, Prisa Media, and most recently, the Financial Times. These agreements allow OpenAI to train on high-quality content while providing publishers compensation, attribution, and traffic from ChatGPT links. The FT deal also involves collaborating on new AI products and features for readers. Here is a quote from the FT Group CEO John Ridding:

This is an important agreement in a number of respects. It recognises the value of our award-winning journalism and will give us early insights into how content is surfaced through AI. We have long been a leader in news media innovation, pioneering the subscription model and engagement technologies, and this partnership will help to keep us at the forefront of developments in how people access and use information.

Publisher Dotdash Meredith, home to iconic brands like PEOPLE and Better Homes & Gardens, has gone a step further - partnering with OpenAI to both provide training data and leverage OpenAI's technology in its own AI-powered ad targeting platform. CEO Neil Vogel said the deal shows OpenAI is "doing the right things" by paying publishers and providing attribution.

We have not been shy about the fact that AI platforms should pay publishers for their content and that content must be appropriately attributed. This deal is a testament to the great work OpenAI is doing on both fronts to partner with creators and publishers and ensure a healthy Internet for the future.

However, not all publishers are satisfied. End of 2023, the New York Times is suing OpenAI and Microsoft for unauthorized use of its content to train ChatGPT. News Corp, Reuters and others say they are still in licensing talks with OpenAI and its peers. Clearly, more bridge-building work remains.

Uncharted Technological, Legal and Ethical Territory

OpenAI's approach to data and AI prioritizes empowering creators, respecting content owners, and collaborating with partners to build AI systems that benefit humanity as a whole. With tools like Media Manager, new products and features, and innovative publisher partnerships, they are working to establish higher standards and more sustainable economic models for the AI ecosystem.

The road ahead is complex, as OpenAI and the broader AI community navigate uncharted technological, legal and ethical territory. But OpenAI's commitment to its principles and collaborative spirit offer a promising path forward. As CEO Sam Altman has said, "We will do our best to do the right thing, and we will work with others to figure out what that is." Through collaboration with creators, content owners, and regulators, OpenAI is working to build a future where AI expands opportunities for everyone while fostering healthy ecosystems and exploring new economic models.