Our coalition is a united front of companies, associations, and individuals advocating for the responsible use of digital content in AI training.
Transparent data practices enhance trust in AI models by allowing stakeholders to understand how AI is trained, the quality of the data used, and any inherent biases that may arise. The AI Coalition for Data Integrity is dedicated to pursuing an enforceable policy framework that recognizes the immense value of creative content and protects it against uncompensated and unauthorized taking by generative AI companies to “train” their large-scale commercial AI models.
Full and detailed training data disclosure of content used for training, fine tuning, and grounding, including sources where such data was collected.
Attribution for generative AI outputs.
Market-based licensing frameworks that fairly compensate creators and content owners.
The following proposals aim to address these issues, promoting transparency and ensuring that content creators are treated fairly in the evolving digital ecosystem. By establishing clear guidelines, these measures will protect creators' rights and foster a more equitable digital landscape.
The U.S. has a robust system of copyright laws designed to safeguard the rights of creators. These existing laws should be enforced as they apply to the development and deployment of AI systems. This would align with enforcement practices in other industries and help ensure fair treatment for content creators.
There should be no exceptions or loosening of established intellectual property laws for AI. The U.S. should not consider new text and data mining exceptions for AI, nor should it allow the improper exploitation of fair use law to undermine the economic future of creators.
Enforcing existing protections for creative content is just the first step to address emerging challenges and ensure that AI technologies operate within an ethical and legal framework.
Further legislative intervention may be appropriate to address potential gaps or provide clarity to improve incentives for responsible AI development in a rapidly developing environment.
Transparency is essential for accountability in AI, particularly regarding the datasets used to train, fine tune, and ground models. A robust transparency scheme would include both documentation and disclosure requirements. Clear documentation of data practices, including as to how and where the data was collected would help developers, researchers, and policymakers understand how AI systems are built, which is critical for addressing ethical concerns, including bias, privacy issues, and copyright infringement.
The development of AI systems relies on extensive training datasets, many of which were created from content scraped from the internet and contain vast amounts of pirated and unlicensed copyrighted material. Following training, data remains a crucial building block to fine-tune and enhance the accuracy of AI models through grounding. At all these stages, it is clear that there is widespread unauthorized use of copyrighted content. To protect creators' rights, without whom this ecosystem would not be able to function, an efficient, market- based licensing framework is needed, ensuring that copyrighted content used for AI training is both:
a) transparent and consensual, and;
b) adequately disclosed and appropriately permissioned.
AI developers should be required to obtain licenses for the copyrighted content they use, and it is the position of many in the copyright industries that existing law requires such licenses. Supporting the enforcement of existing IP law, as well as encouraging and incentivizing free-market license practices will ensure that creators are compensated for their contributions and that AI companies adhere to ethical business practices. Companies should maintain auditable and traceable records to verify the legality of their data licenses.
It must be pointed out that an actual market already exists and licensing deals have been made in the news/media industry. There are specific examples of companies such as Rightsify, Tollbit and ProRata, and groups such as the Dataset Providers Alliance that are focused on attribution and fair compensation for GenAI.
There are already several bills in Congress that represent positive steps toward achieving the transparency and accountability that the AI Coalition for Data Integrity supports. These legislative proposals lay important groundwork for addressing the challenges posed by generative AI and protecting the rights of content creators. While there is still more work to be done to get these efforts across the finish line, these bills are a promising start:
Introduced: April 9, 2024
Sponsors: Rep. Adam Schiff (D-CA)
Latest Action: 04/09/2024 Referred to the House Committee on the Judiciary.
Overview: The bill would require a notice to be submitted to the Register of Copyrights prior to the release of a new generative AI system with regard to all copyrighted works used in building or altering the training dataset for that system. The bill’s requirements would also apply retroactively to previously released generative AI systems.
Introduced: July 27, 2023
Sponsors: Sens. Brian Schatz (D-HI) and John Kennedy (R-LA)
Latest Action: 07/27/2023 Referred to the Committee on Commerce, Science, and Transportation.
Overview: The bill would require that every generative AI system that produces images, videos, audio, or multimedia content should include a clear and conspicuous disclosure that it was created with AI. The resulting output would then have to include information in its metadata that identifies it as AI-generated content, what AI tool was used, and the date and time the output was created.
Introduced: December 22, 2023
Sponsors: Reps. Don Beyer (D-VA) and Anna Eshoo (D-CA)
Latest Action: 12/22/2023 Referred to the House Committee on Energy and Commerce.
Overview: The AI Foundation Model Transparency Act would direct the Federal Trade Commission (FTC), in consultation with the National Institute of Standards and Technology (NIST) and the Office of Science and Technology Policy (OSTP), to set standards for what information high-impact foundation models must provide to the FTC and what information they must make available to the public. Information identified for increased transparency would include training data used, how the model is trained, and whether user data is collected in inference.
Introduced: March 7, 2024
Sponsors: Reps. James Comer (R-KY), Jamie Raskin (D-MD), Nancy Mace (R- SC), Alexandria Ocasio-Cortez (D-NY), Clay Higgins (R-LA), Gerald Connolly (D- VA), Nicholas Langworthy (R-NY), Ro Khanna (D-CA)
Latest Action: 03/07/24 Passed House Committee on Oversight and Accountability by a vote of 36-2.
Overview: Would require that AI applications used by the Federal Government ensure: “Transparency in publicly disclosing relevant information regarding the use of artificial intelligence to appropriate stakeholders, to the extent practicable and in accordance with any applicable law and policy, including with respect to the protection of privacy, civil liberties, and of sensitive law enforcement, national security, trade secrets or proprietary information, and other protected information.”
Disclosure, Credit & Compensation
To build a digital future where AI development respects and compensates content creators, ensuring innovation and fairness coexist.
Explore membership opportunities with the AI Coalition for Data Integrity. Contact us to learn how you can contribute to our mission of protecting digital content and ensuring ethical AI practices.