Tech giants Google and OpenAI are set to confront a torrent of legal challenges over copyright infringement claims. The Daily Mail alleges Google’s AI chatbot, Bard, has been trained on its articles without permission. This comes alongside authors Mona Awad and Paul Tremblay suing OpenAI, citing their books were unlawfully used to train the AI tool ChatGPT.
Proposed EU legislation may require AI companies to disclose copyrighted training data used in AI systems, potentially triggering further lawsuits.
- The Daily Mail is suing Google over its use of Daily Mail content as AI training data.
- Previously, OpenAI was also involved in a similar case.
- The utilization of copyrighted material for AI training purposes is a legally uncharted territory.
The allegations
Search engine behemoth Google finds itself on the receiving end of pending legal action brought about by Daily Mail’s owner, Lord Rothermere, both of whom are preparing to file a large-scale lawsuit against the tech titan. The case revolves around Google’s AI chatbot Bard, with allegations that it has been feeding off articles from Daily Mail for training purposes, without giving credit, compensation, or even seeking permission.
These accusations don’t come as a shock, given the growing concern over tech giants exploiting readily available online data for training AI models without consent. This controversial practice has drawn the ire of publishers and news outlets, who are worried about potential data theft and the negative ripple effects it may unleash.
In parallel, OpenAI, the creators of ChatGPT, is facing a similar predicament. Authors Mona Awad and Paul Tremblay have filed a lawsuit claiming that their copyrighted books were unlawfully used to train ChatGPT, the AI tool known for generating accurate summaries of their novels. Their lawyers argue that OpenAI is profiting from “stolen writing and ideas,” creating an unprecedented legal challenge for generative AI.
The legal implications
However, assessing financial losses caused specifically by the use of copyrighted material in AI training could prove challenging, given these models are also trained on publicly accessible online data. In the case of ChatGPT, it is believed that its training dataset, “Books2”, may have been sourced from shadow libraries, where books can be obtained in bulk via torrent systems.
If the courts deem the use of copyrighted material in this manner as “fair use” rather than unauthorized copying, the outcome of these lawsuits could potentially change. Yet, this “fair use” defence does not exist in the UK, and previous efforts by the UK government to establish exceptions to copyright for text and data mining have been met with resistance from authors, publishers, and the music industry.
The potential impact of EU legislation
Adding to the complexity of the situation is the proposed EU legislation that could require AI companies to disclose copyrighted training data used in their AI systems. This could open the door to a wave of copyright lawsuits, potentially impacting major industry players.
While the specifics of this requirement remain unknown and could well change during negotiations, the broader AI Act is expected to significantly reshape the AI landscape within the EU. The Act is set to classify AI systems based on perceived risk and will require companies responsible for building the most impactful tools to disclose crucial data related to safety, interpretability, performance, and more.
The consequences for AI deployment
These copyright concerns are already influencing the rollout of AI systems. Google’s AI assistant, Bard, has expanded to many countries, but only recently came available in the EU, likely due to the General Data Protection Regulation (GDPR). Similarly, OpenAI’s ChatGPT faced a temporary ban in Italy due to GDPR violations.
As tech giants face mounting legal challenges, the question of copyright in the age of AI has become more pertinent than ever. The repercussions of these lawsuits will undoubtedly have a significant impact on the future of AI development and the broader tech industry.