Google is making it clear that it intends to use content from web publishers to enhance its artificial intelligence systems. The company is suggesting that companies should opt out, similar to how they currently do for search engine indexing, if they don’t want their content to be scraped.
Critics of this opt-out approach argue that it goes against copyright laws, which traditionally put the responsibility on those using copyrighted material to seek permission from the copyright holders.
Google’s plan was disclosed in its submission to the Australian government’s consultation on regulating high-risk AI applications. While Australia has been considering banning certain problematic uses of AI, such as disinformation and discrimination, Google is advocating for broad access to data for AI developers.
According to a report by The Guardian, Google stated to Australian policymakers that copyright laws should allow for fair and appropriate use of copyrighted content in AI training. The company referred to its standardized content crawler called “robots.txt,” which allows publishers to specify parts of their websites that should not be accessed by web crawlers.
However, Google did not provide specific details about how this opting-out process would function. In a blog post, the company vaguely mentioned the introduction of “standards and protocols” that would enable web creators to choose their level of involvement with AI.
Google has been lobbying Australia since May to ease copyright rules after launching its Bard AI chatbot in the country. Other than Google, OpenAI, the creator of ChatGPT, is also pursuing data mining ambitions. OpenAI aims to expand its training dataset with a new web crawler called GPTBot. Similarly to Google, OpenAI follows an opt-out model where publishers need to add a “disallow” rule if they don’t want their content scraped.
This approach is common among big tech companies that utilize AI, like deep learning and machine learning algorithms, to analyze user preferences and provide tailored content and advertisements.
This push for more data comes at a time when AI’s popularity is soaring. Systems like ChatGPT and Google’s Bard rely on processing vast amounts of text, images, and videos. According to OpenAI, “GPT-4 has learned from a variety of licensed, created, and publicly available data sources, which may include publicly available personal information.”
However, some experts argue that scraping web content without permission raises both copyright and ethical concerns. Publishers such as News Corp. are in discussions with AI firms, seeking compensation for the use of their content.
This debate highlights the tension between advancing AI through unlimited data access and respecting ownership rights. While utilizing more content enhances the capabilities of these systems, these tech companies are also benefiting from others’ work without adequately sharing the benefits.
Finding the right balance between these interests is a complex task. Google’s proposal essentially presents a choice to publishers: “share your work with our AI or take steps to opt out.” Smaller publishers with limited resources or knowledge might find opting out challenging.
Australia’s examination of AI ethics offers an opportunity to shape the evolution of these technologies. However, if public discourse gives in to data-hungry tech giants pursuing their self-interests, it could establish a situation where creations are consumed by AI systems by default, unless creators go to great lengths to prevent it.