OpenAI this week granted users of its image-generating AI system, DALL-E 2, the right to use its generations for commercial projects, such as illustrations for children’s books and art for to bulletins The move makes sense, given OpenAI’s own business goals: The policy change coincided with the launch of the company’s payment plans for DALL-E 2. But it raises questions about the legal implications of AI as DALL-E 2, trained on public images around. the web and its potential to infringe existing copyrights.
DALL-E 2 was “trained” on approximately 650 million image-text pairs taken from the Internet, learning from this dataset the relationships between the images and the words used to describe them. But while OpenAI filtered images for specific content (for example, pornography and duplicates) and implemented additional filters at the API level, for example for prominent public figures, the company admits that the system can sometimes create works that include trademarked logos or characters. I will see:
“OpenAI will evaluate different approaches to handling potential copyright and trademark issues, which may include allowing such generations as part of ‘fair use’ or similar concepts, filtering specific types of content, and working directly with copyright.” [and] trademark owners about these issues,” the company wrote in an analysis published ahead of DALL-E 2’s beta release on Wednesday.
It’s not just a DALL-E 2 problem. As the AI โโcommunity creates open source implementations of DALL-E 2 and its predecessor, DALL-E, both free and paid services are being launched on models trained on less carefully filtered datasets. One of them, Pixelz.ai, which this week launched an image-generating app powered by a custom DALL-E model, makes it trivially easy to create photos featuring various Pokemon and Disney characters from movies like Guardians of the Galaxy and Frozen.
When contacted for comment, the Pixelz.ai team told TechCrunch that they had filtered the model’s training data for profanity, hate speech and “illegal activities” and blocked users from ยท request this type of images at the time of generation. The company also said it plans to add a reporting feature that will allow people to submit images that violate the terms of service to a team of human moderators. But when it comes to intellectual property (IP), Pixelz.ai leaves it up to users to exercise “responsibility” in using or distributing the images they generate (grey area or not).
“We discourage copyright infringement in both the dataset and our platform’s terms of service,” the team told TechCrunch. “That said, we provide an open text input and people will always find creative ways to abuse a platform.”
An image of Rocket Racoon from Disney’s/Marvel’s Guardians of the Galaxy, generated by the Pixelz.ai system.
Bradley J. Hulbert, a founding partner of the law firm MBHD and an expert in intellectual property law, believes that image generation systems are problematic from a copyright perspective in several ways. He noted that works of art that “demonstrably become” a “protected work” (ie, a copyrighted character) have generally been held by courts to be infringing, even if s ‘added additional elements. (Think of a picture of a Disney princess walking through a New York neighborhood.) In order to protect against copyright claims, the work must be “transformative,” meaning changed to the point where the IP is unrecognizable. .
“If a Disney princess can be recognized in an image generated by DALL-E 2, we can safely assume that The Walt Disney Co. will likely claim that the DALL-E 2 image is a derivative work and an infringement of its copyright ‘author at Disney. princess likeness,'” Hulbert told TechCrunch via email. “Substantial transformation is also a factor in determining whether a copy constitutes ‘fair use.'” But again, to the extent that a Disney princess can be recognized in a later work, we assume that Disney will claim that the subsequent work is copyright infringement.”
Of course, the battle between IP owners and alleged infringers is hardly new, and the Internet has only acted as an accelerator. In 2020, Warner Bros. Entertainment, which owns the rights to film depictions of the Harry Potter universe, removed certain fan art from social media platforms such as Instagram and Etsy. A year earlier, Disney and Lucasfilm asked Giphy to remove “Baby Yoda” GIFs.
But image-generating AI threatens to greatly escalate the problem by lowering the barrier to entry. The plight of large corporations is likely to garner sympathy (nor should it), and their efforts to enforce intellectual property often backfire in the court of public opinion. On the other hand, AI-generated artwork that infringes on, say, an independent artist’s characters could be life-threatening.
The other thorny legal issue surrounding systems like DALL-E 2 concerns the content of their training data sets. Did companies like OpenAI violate intellectual property law by using copyrighted images and artwork to develop their system? It’s a question that has already been raised in the context of Copilot, the commercial code generation tool jointly developed by OpenAI and GitHub. But unlike Copilot, which was trained on code that GitHub might have the right to use for this purpose under its terms of service (according to a legal analysis), systems like DALL-E 2 source images of countless public websites.
Ladies and gentlemen, I have my invitation for Dall-E 2! ๐๐ Here’s some pictures of Homer Simpson in Stranger Things before he starts tweeting the amazing stuff #dalle2 pic.twitter.com/PHPI6n9yJk
โ limb0wl ๐ฆ๐พ (@limb0wl) July 5, 2022
As Dave Gershgorn points out in a recent feature on The Verge, there is no direct legal precedent in the US that holds training data publicly available as fair use.
A potentially relevant case involves a Lithuanian company called Planner 5D. In 2020, the company sued Meta (then Facebook) for stealing thousands of Planner 5D software files, which were made available through a partnership with Princeton to contestants in Meta’s 2019 Scene Understanding and Modeling Challenge for computer vision researchers. Planner 5D claimed that Princeton, Meta and Oculus, Meta’s VR-focused hardware and software division, could have profited commercially from the training data extracted from it.
The case isn’t scheduled to go to trial until March 2023. But last April, the U.S. district judge overseeing the case denied Facebook and Princeton’s motions to dismiss Planner 5G’s allegations.
Not surprisingly, rights holders are not swayed by the fair use argument. A spokesperson for Getty Images told IEEE Spectrum in an article that there are “big questions” to answer about “the rights to the images and the people, places and objects within the images that [models like DALL-E 2] they were trained.โ Rachel Hill, CEO of the Association of Illustrators, who was also quoted in the piece, raised the issue of compensation for images in the training data.
Hulbert believes that it is unlikely that a judge will view copies of copyrighted works in training datasets as fair use, at least in the case of commercial systems such as DALL-E 2. He does not believe that it is outside doubt that the IP holders can come. after companies like OpenAI at some point and require licensing of the images used to train their systems.
“Copying … constitutes an infringement of the copyright of the original authors. And infringers are liable to the copyright owners for damages,” he added. “[If] DALL-E (or DALL-E 2) and its partners make a copy of a protected work, and the copy was not approved by the copyright owner or fair use, the copy constitutes an infringement of Copyright”.
Interestingly, the UK is exploring legislation that would remove the current requirement that systems trained using text and data mining, such as DALL-E 2, be used strictly for non-commercial purposes. While copyright holders could still demand payment under the proposed regime by putting their works behind a paywall, it would make the UK’s policy one of the most liberal in the world.
It seems unlikely that the United States will follow suit, given the lobbying power of intellectual property holders in the US. The issue seems likely to develop into a future lawsuit. But time will tell.