In the evolving landscape of digital news dissemination, there is a symbiotic yet contentious relationship between news publishers and technology companies. Publishers depend on these platforms for hosting content and traffic referrals, which they monetise through advertising. Technology platforms, such as Meta, act as intermediaries, driving traffic to news websites and taking a share of the advertising revenue.

The latest frontier of this struggle over the distribution of money between platforms and publishers is the field of Generative AI (GenAI). GenAI platforms utilise vast datasets drawn from the open Web to train their models. Some major news publishers, such as The Atlantic, are entering into contractual agreements to license their content to AI firms. Others, such as The New York Times, have taken legal action against AI firms for the use of copyrighted material as training data, and are seeking compensation. Recently, Asian News International (ANI) sued OpenAI for unauthorised use and storage of its copyrighted work to train the company’s Large Language Model (LLM).

Claims and defence

In the lawsuit, ANI first claimed that OpenAI used its copyrighted content for LLM training without authorisation. ANI formally notified OpenAI of the copyright infringement. In October, OpenAI blocked ANI by applying its opt-out policy, which allows websites to opt out of automated use of their text by AI scanners. It operates on the principles of fair use and exceptions for text and data mining (TDM) for scientific research. Fair use is a legal principle that permits limited use of copyrighted material without the owner’s permission, depending on the purpose and character of the use, the type of copyrighted work, the portion used, and its effect on the market for the copyrighted work.

However, ANI argued that opting out is ineffective because other websites and news organisations republish its content widely, which allows OpenAI’s crawlers to scrape its content through these third-party sites. So, ANI went to court.

Second, ANI accused OpenAI of generating responses that were either verbatim or substantially similar to ANI’s copyrighted content. OpenAI defended its stance on verbatim reproduction by arguing that copyright does not protect ideas or facts, only their expression. It contended that its models never deliver information to users in the same expression as its sources and that the language was modified sufficiently to claim copyright exceptions.

Third, ANI highlighted the issue of fabricated responses, where ChatGPT misleadingly attributed fabricated interviews or news stories to the agency. OpenAI stated that it resolved every instance of false attribution flagged by ANI and pledged to rectify similar issues in the future. In response to these claims, ANI is seeking an interim injunction to restrain OpenAI from storing, publishing, or reproducing its work. ANI is also requesting an order that prohibits OpenAI from accessing its content anywhere, including through its subscribers.

Implications

A lawsuit of this nature is a first in India. However, OpenAI asserts that there is no basis for legal action within the country, as no reproduction of content took place in India. The AI platform stated that it has no offices and servers in India, and so the AI model’s training and data processing occurs outside India.

ANI’s claims and OpenAI’s defence highlight two significant issues that AI faces: a balance between copyright infringement and fair use, and territoriality in data storage. The first issue has been persistent since the rise of the Internet. The dispute between AI platforms and content owners reproduces this old wine in new bottles. Fair use, TDM, and the ex-post opt-out option are rooted in two principles: permissionless innovation and free inquiry. The first principle advocates that experimentation with new technologies and business models should be allowed by default. Unless a compelling case proves that a new invention will cause serious harm to society, innovation should proceed unimpeded, with any arising issues addressed later. The second principle treats facts and data as a commons and advances public knowledge by allowing anyone to share data for scientific purposes.

In India, the law of ‘fair use’ outlines an exhaustive list of exceptions to copyright protection that do not directly or indirectly reference AI training models. Consequently, whether AI model training falls under fair use in India remains a grey area. Moreover, the absence of TDM provisions in Indian law raises questions about the country’s approach to fostering innovation in AI while creating a balance with copyright infringement. Given India’s lack of AI-inclusive provisions, policymakers should adopt a permissionless innovation approach to stimulate AI development while protecting the private rights of content creators.

The issue of territoriality in data storage poses major challenges to data sovereignty, which requires that data be regulated by the laws of its country of origin. OpenAI’s defence illustrates the complexities of applying territorial laws to cloud-based services and distributed AI models. While these services and models use data generated by Indian users or organisations, the data is dispersed across multiple servers or cloud environments making it challenging to extend traditional concepts of territoriality over data.

This lawsuit against OpenAI in India will set a precedent for determining AI developers’ legal accountability for content generated by their platforms.

This article originally appeared on the Hindu.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.