Disclosure, Credit & Compensation

Al Coalition for Data Integrity

Our coalition is a united front of companies, associations, and individuals advocating for the responsible use of digital content in AI training.

Unauthorized Data Scraping Instances montly
+4.6B
Impacted Digital Content Creators
+2M
Uncompensated Digital Works
+2.7B
Annual Revenue Lost Due to Unlicensed Use
+30B
Source: David W. Opderbeck, Seton Hall Law School
AI Transparency and Fair Use

Artificial Intelligence (AI)

data training transparency is crucial to ensuring the ethical and responsible development and deployment of AI systems, and necessary to holding training dataset creators accountable.

Transparent data practices enhance trust in AI models by allowing stakeholders to understand how AI is trained, the quality of the data used, and any inherent biases that may arise. The AI Coalition for Data Integrity is dedicated to pursuing an enforceable policy framework that recognizes the immense value of creative content and protects it against uncompensated and unauthorized taking by generative AI companies to “train” their large-scale commercial AI models.

We share three fundamental goals:

Transparency

Full and detailed training data disclosure of content used for training, fine tuning, and grounding, including sources where such data was collected.

Attribution

Attribution for generative AI outputs.

Fair Compensation

Market-based licensing frameworks that fairly compensate creators and content owners.

Our Proposals for Fairness and Transparency in AI Development

The following proposals aim to address these issues, promoting transparency and ensuring that content creators are treated fairly in the evolving digital ecosystem. By establishing clear guidelines, these measures will protect creators' rights and foster a more equitable digital landscape.

I. Enforcement of Existing Laws
a

The U.S. has a robust system of copyright laws designed to safeguard the rights of creators. These existing laws should be enforced as they apply to the development and deployment of AI systems. This would align with enforcement practices in other industries and help ensure fair treatment for content creators.

b

There should be no exceptions or loosening of established intellectual property laws for AI. The U.S. should not consider new text and data mining exceptions for AI, nor should it allow the improper exploitation of fair use law to undermine the economic future of creators.

c

Enforcing existing protections for creative content is just the first step to address emerging challenges and ensure that AI technologies operate within an ethical and legal framework.

d

Further legislative intervention may be appropriate to address potential gaps or provide clarity to improve incentives for responsible AI development in a rapidly developing environment.

II. Establish Data Maintenance and Disclosure Requirements for AI Models

Transparency is essential for accountability in AI, particularly regarding the datasets used to train, fine tune, and ground models. A robust transparency scheme would include both documentation and disclosure requirements. Clear documentation of data practices, including as to how and where the data was collected would help developers, researchers, and policymakers understand how AI systems are built, which is critical for addressing ethical concerns, including bias, privacy issues, and copyright infringement.

III. Encourage and Incentivize Training Data Licensing

The development of AI systems relies on extensive training datasets, many of which were created from content scraped from the internet and contain vast amounts of pirated and unlicensed copyrighted material. Following training, data remains a crucial building block to fine-tune and enhance the accuracy of AI models through grounding. At all these stages, it is clear that there is widespread unauthorized use of copyrighted content. To protect creators' rights, without whom this ecosystem would not be able to function, an efficient, market- based licensing framework is needed, ensuring that copyrighted content used for AI training is both:
a) transparent and consensual, and;
b) adequately disclosed and appropriately permissioned.

i

AI developers should be required to obtain licenses for the copyrighted content they use, and it is the position of many in the copyright industries that existing law requires such licenses. Supporting the enforcement of existing IP law, as well as encouraging and incentivizing free-market license practices will ensure that creators are compensated for their contributions and that AI companies adhere to ethical business practices. Companies should maintain auditable and traceable records to verify the legality of their data licenses.

It must be pointed out that an actual market already exists and licensing deals have been made in the news/media industry. There are specific examples of companies such as Rightsify, Tollbit and ProRata, and groups such as the Dataset Providers Alliance that are focused on attribution and fair compensation for GenAI.

IV. Proposed Federal Legislation

There are already several bills in Congress that represent positive steps toward achieving the transparency and accountability that the AI Coalition for Data Integrity supports. These legislative proposals lay important groundwork for addressing the challenges posed by generative AI and protecting the rights of content creators. While there is still more work to be done to get these efforts across the finish line, these bills are a promising start:

a

Introduced: April 9, 2024

b

Sponsors: Rep. Adam Schiff (D-CA)

c

Latest Action: 04/09/2024 Referred to the House Committee on the Judiciary.

d

Overview: The bill would require a notice to be submitted to the Register of Copyrights prior to the release of a new generative AI system with regard to all copyrighted works used in building or altering the training dataset for that system. The bill’s requirements would also apply retroactively to previously released generative AI systems.

a

Introduced: July 27, 2023

b

Sponsors: Sens. Brian Schatz (D-HI) and John Kennedy (R-LA)

c

Latest Action: 07/27/2023 Referred to the Committee on Commerce, Science, and Transportation.

d

Overview: The bill would require that every generative AI system that produces images, videos, audio, or multimedia content should include a clear and conspicuous disclosure that it was created with AI. The resulting output would then have to include information in its metadata that identifies it as AI-generated content, what AI tool was used, and the date and time the output was created.

a

Introduced: December 22, 2023

b

Sponsors: Reps. Don Beyer (D-VA) and Anna Eshoo (D-CA)

c

Latest Action: 12/22/2023 Referred to the House Committee on Energy and Commerce.

d

Overview: The AI Foundation Model Transparency Act would direct the Federal Trade Commission (FTC), in consultation with the National Institute of Standards and Technology (NIST) and the Office of Science and Technology Policy (OSTP), to set standards for what information high-impact foundation models must provide to the FTC and what information they must make available to the public. Information identified for increased transparency would include training data used, how the model is trained, and whether user data is collected in inference.

a

Introduced: March 7, 2024

b

Sponsors: Reps. James Comer (R-KY), Jamie Raskin (D-MD), Nancy Mace (R- SC), Alexandria Ocasio-Cortez (D-NY), Clay Higgins (R-LA), Gerald Connolly (D- VA), Nicholas Langworthy (R-NY), Ro Khanna (D-CA)

c

Latest Action: 03/07/24 Passed House Committee on Oversight and Accountability by a vote of 36-2.

d

Overview: Would require that AI applications used by the Federal Government ensure: “Transparency in publicly disclosing relevant information regarding the use of artificial intelligence to appropriate stakeholders, to the extent practicable and in accordance with any applicable law and policy, including with respect to the protection of privacy, civil liberties, and of sensitive law enforcement, national security, trade secrets or proprietary information, and other protected information.”

What we do

Disclosure, Credit & Compensation

Our Vision

To build a digital future where AI development respects and compensates content creators, ensuring innovation and fairness coexist.

Advocacy
Federal Legislation: Supporting and shaping bills that enforce transparency and fair use in AI training data, such as initiatives by key legislators.
State Legislation: Collaborating with state governments to pass laws that protect digital content from unauthorized use, including groundbreaking state-specific acts.
Administration Activity: Working with governmental agencies to develop and implement regulations that ensure ethical AI practices, such as executive orders, OMB directives, and NIST guidance.
International Efforts: Partnering with global entities to influence AI policies worldwide, including the EU AI Act and other international frameworks.
Education and Awareness
Public Campaigns: Launching outreach campaigns to educate the public and stakeholders about data integrity issues and our initiatives.
Workshops and Seminars: Organizing educational events to discuss best practices and the latest developments in AI data usage.
Publications: Producing comprehensive reports, articles, and guides on ethical AI practices and data integrity.
Collaboration
Industry Partnerships: Collaborating with companies and industry groups to develop standards and best practices for ethical AI.
Research and Development: Partnering with academic institutions and research organizations to study the impact of AI on digital content and develop technological solutions to prevent unauthorized data scraping.
Legal Support: Providing resources and support for legal actions against unauthorized use of digital content.
Contact Us

Get In Touch About Membership

Explore membership opportunities with the AI Coalition for Data Integrity. Contact us to learn how you can contribute to our mission of protecting digital content and ensuring ethical AI practices.

Thank you! Your submission has been received!
Ooops! Something went wrong!!!