AI is a versatile technology that is revolutionizing various industries. Generative Artificial Intelligence has transformed the creation of writing, images, music, and code by enabling machines to produce original content. As artificial intelligence programs continue to evolve, founders, developers, and users encounter complex legal challenges. Jurisdictions worldwide have adopted diverse approaches to address these challenges, ranging from fair use and statutory exemptions to licensing-based frameworks. In this context, India faces increasing legal uncertainty due to the absence of explicit legislative guidance on AI training within copyright law.
The Department for the Promotion of Industry and Internal Trade (DPIIT) has released a Working Paper on Generative AI and Copyright, which examines the copyright issues raised by generative AI.
Generative AI
Generative Artificial Intelligence (AI) offers a wide range of advanced applications. This branch of AI can create new content across various domains, including writing, music, images, code, and even three-dimensional models. It is powered by deep learning models trained on large datasets to identify patterns and generate original, creative content that produces entirely new output from scratch. It can mimic human creativity by learning from vast amounts of data and producing results that appear to be human-created. A Large Language Model (LLM) is a type of generative AI that, like a chatbot, can communicate using natural language. A language model is trained on extensive text corporate to understand language patterns, much like a law student learns legal writing by reading numerous decisions and papers. The learner does not replicate a specific task but instead applies previously legal frameworks to new facts. Similarly, generative AI creates new content by recombining patterns from its training data, with close resemblance to any given source occurring only in rare cases.
How Generative AI effecting Copyrighted work
AI models are trained on massive datasets, which are often obtained from licensed sources or through web scraping. These datasets are then stored on servers and processed by copying, filtering, modifying, and organizing the content. The large datasets used for training these models via online scraping frequently include copyrighted works, such as novels, artwork, and images, which are often taken without the permission of the original creators. AI companies argue that this process constitutes straightforward learning, similar to how students learn art. However, rights holders contend that this practice amounts to unlawful reproduction. This situation raises three major issues:
- Whether copyrighted works can be used as input for training AI models.
- Does AI generated content cause infringement?
- Can AI created work be protected by copyright?
Global Approaches to AI Training and Copyright
Jurisdictions worldwide have adopted various approaches to copyright issues related to AI training.
United States: Under the United States Copyright law, the use of copyrighted work relies heavily on the Doctrine of fair use. The courts determine four main factors.
- Whether the use is transformative and adds a new expression or meaning?
- What was the nature of copyrighted works? Whether it is factual or creative?
- What is the amount and substantiality of copyrighted work used?
- What is the effect of the use on the potential market or value of the original work?
Human creativity is generally protected under U.S. copyright law. The U.S. Copyright Office frequently rejects works created solely by AI and will only register an original work of authorship if it incorporates human creative input. Only “the fruits of intellectual labor” derived from the creative faculties of the mind are protected by copyright law. In recent cases such as Kadrey v. Meta Platforms and Bartz v. Anthropic, U.S. courts recognized that large-scale automated AI training on protected information is permissible because it is used for pattern recognition. This use is considered transformative and qualifies as fair use since the purpose is analysis rather than expression. However, even if training is allowed, AI-generated outputs could still infringe copyright. Therefore, the outcome depends on the specifics of each case.
European Union: The Copyright in the Digital Single Market (CDSM) Directive of the European Union recognizes the Text and Data Mining (TDM) exemption, which permits AI models to be trained on copyrighted material. However, it maintains strict copyright protections with very limited exceptions. Article 3 of the CDSM Directive grants storage and legal access for scientific research purposes to research organizations and cultural heritage institutions, while Article 4 provides a broader exemption for all users, including commercial enterprises with legitimate access. Nevertheless, copyright holders retain the right to prohibit the use of their works. The extensive use of TDM techniques in generative AI and other AI systems is also acknowledged by the EU Artificial Intelligence Act. However, a 2025 study commissioned by the European Parliament argues that generative AI uses data to create new content, whereas text and data mining (TDM) employs data analysis to extract information.
Furthermore, the EU AI Act requires AI developers to be transparent by providing an adequate summary of the data used to train their models. The EU adopts a more conservative approach than the US, as it does not assign copyright to AI-generated output, recognizing only human creativity. Even if this approach slows AI development, the EU aims to balance robust author protection with innovation.
United Kingdom: According to the UK Copyright, Designs and Patents Act (CDPA), text and data mining (TDM) is permitted only for non-commercial research purposes. Without a commercial use license, AI training on copyrighted content is generally prohibited. The UK government previously proposed expanding this exemption to include commercial AI training; however, the proposal was abandoned due to strong opposition from authors and the creative industries. In the UK, copyright protects original human-created works. Although UK law recognizes computer-generated works, the protection afforded to them remains unclear and insufficient. If an AI-generated product is substantially similar to a copyrighted work and that similarity is not coincidental, copyright infringement may occur under UK law.
India’s Legal Position
The primary issue with generative AI in India is the lack of clear legislation regulating the use of copyrighted works for AI training, leaving rights holders and AI developers uncertain. Functionally, the reproduction right under Section 14 may not apply to the storage of works solely for the limited purpose of non-expressive analysis, where the use is confined to information extraction rather than the reproduction of protected expression.
India permits temporary storage for technical processes and AI training for private or non-commercial study in accordance with Section 52 of the Copyright Act. Under the “idea–expression dichotomy,” Indian copyright law protects the expression of ideas rather than the ideas, facts, or patterns themselves. AI training involves extracting data and identifying trends rather than using the expressive content of the work. However, India maintains that human creativity and originality are essential prerequisites for copyright protection. Currently, Indian law does not grant copyright protection to content created solely by AI.
In the ongoing lawsuit Ani Media (P) Ltd. v. OpenAI Inc., ANI contends that OpenAI used its copyrighted news content to train its large language model without authorization. The Delhi High Court is examining whether using copyrighted works to train generative AI models violates Indian law. This is the first significant case in India to specifically invoke copyright law in challenging AI training practices.
DPIIT’s Policy Analysis and the Hybrid Model
The DPIIT has established a committee to identify emerging generative AI technologies. The committee has divided the task into two parts:
Part 1: Whether copyrighted works can be used as input for training AI.
Part 2: How should AI generated works be handled in terms of authorship, originality, moral rights, and responsibility?
The DPIIT’s has released the first part of the paper right now. The committee reviewed various regulatory models used worldwide.
- Blanket Text and Data Mining (TDM) Exception: Many IT companies favor this approach. It permits the use of lawful materials for educational purposes without requiring additional licensing.
- Text and Data Mining (TDM) Exclusion with an Opt out: Some stakeholders prefer this model. The copyright holders may choose to prohibit use of their creations but TDM exception applies to the remaining portion.
- Voluntary Direct Licensing: It enables AI developers to legally use copyrighted works for training purposes through agreements made with copyright holders.
- Collective or Extended Collective Licensing: A collecting society negotiates licenses for a variety of works on behalf of its members.
- Statutory Licensing: The law explicitly permits the use of works for AI training in exchange for statutory compensation.
The committee concluded, after examining all of these models, that a compromise was necessary because none of the conventional models were effective. They rejected opt-out and blanket TDM because these approaches disadvantaged small producers and left creators unpaid. Additionally, attribution and traceability would be technically challenging, while direct negotiation would be costly and time-consuming. Consequently, the committee proposed a hybrid model.
In the hybrid model, a mandatory blanket license would allow AI developers to train models on legally accessible copyrighted works without needing to negotiate with each artist individually, while ensuring that rightsholders receive fees. To collect these fees, the government established the Copyright Royalties Collective for AI Training (CRCAT), a centralized nonprofit corporation. The royalty rates will be set by a government-appointed committee with the goal of establishing an open rate that will be reviewed by judges and reassessed every three years. This committee will also oversee the payment and collection of royalties. The concept aims to preserve statutory compensation for creators while reducing administrative burdens for AI developers.
The working paper eliminates holdout risk for AI developers by providing legal assurance that they can access legally available work without negotiating with millions of producers. This reduces transaction costs and enables smaller companies to succeed. It also safeguards the fundamental principles of copyright for creators. Creators should be compensated when their works are used for commercial AI training. Thanks to the unified collecting system, artists can receive compensation without engaging in individual litigation or negotiations.
In a dissenting opinion, DPIIT committee member Nasscom objected to the hybrid licensing approach, arguing that mandatory royalties would hinder innovation. Nasscom contended that mining publicly available, non-paywalled content should be permitted and that providers of both open and access-restricted content should have the freedom to exclude their content from LLM training. The committee disagreed, noting that small content creators might not be able to implement such opt-outs effectively.
Conclusion
Generative AI exposes the limitations of copyright law, which was never designed to regulate machine learning processes that extract patterns rather than replicate creative expression. The United States, the European Union, and the United Kingdom have each adopted various strategies, but none offer a definitive solution. In India, the lack of explicit legislative guidance has shifted the burden to the courts, resulting in inconsistent legal outcomes, as demonstrated in Ani Media (P) Ltd. v. OpenAI Inc. The DPIIT’s proposed hybrid licensing framework reflects a policy-driven recalibration of copyright law rather than strict doctrinal consistency. This approach aims to bridge the gap by ensuring access for AI developers while preserving fair compensation for artists.
-Priyanka Dey, Ductus Legal