A new study from researchers at the University of California San Diego and the University of Chicago finds that visual artists face significant challenges in protecting their work from being used without consent by generative AI tools. The research will be presented at the 2025 Internet Measurement Conference in Madison, Wisconsin.
The study highlights that most artists lack both access to technical tools and the expertise required to prevent their artwork from being collected by AI crawlers—programs that gather data for training AI models. “At the core of the conflict in this paper is the notion that content creators now wish to control how their content is used, not simply if it is accessible. While such rights are typically explicit in copyright law, they are not readily expressible, let alone enforceable in today’s Internet. Instead, a series of ad hoc controls have emerged based on repurposing existing web norms and firewall capabilities, none of which match the specificity, usability, or level of enforcement that is, in fact, desired by content creators,” the researchers wrote.
The team surveyed over 200 visual artists about their use of tools designed to block AI crawlers and reviewed more than 1,100 professional artist websites for evidence of such controls. They also assessed which methods were most effective at preventing unauthorized scraping.
One tool available to artists is Glaze—a method developed by co-authors at the University of Chicago—which can disguise original artworks from AI crawlers. Despite this option, many artists prefer to stop crawlers from accessing their work altogether. The study found that artists need protection against different types of AI crawlers: those collecting data for large language models powering chatbots, others for knowledge assistants, and some for search engines.
Survey results show that nearly 80% of respondents had attempted measures to keep their art out of AI training datasets; two-thirds reported using Glaze. Additionally, 60% reduced how much art they share online and over half post only low-resolution images. Almost all surveyed—96%—said they would like access to a tool capable of deterring AI crawlers; however, more than 60% were unfamiliar with robots.txt files—a basic method for restricting crawler access.
Robots.txt files allow website owners to specify which pages or directories should not be accessed by web crawlers. However, these restrictions are voluntary and not always followed by all bots. Researchers analyzed popular websites and discovered that over 10% had disallowed AI crawlers via robots.txt at some point. Some major media outlets later removed these blocks after making licensing agreements with AI companies.
Artists often cannot use robots.txt because most host their sites on third-party platforms that do not permit modifications to these files or provide information about what crawling is blocked. Among hosting services examined by researchers, only Squarespace offers an easy interface for blocking certain AI tools—and just 17% of its artist users had enabled this feature.
Compliance with robots.txt varies among crawler operators: “the majority of AI crawlers operated by big companies do respect robots.txt while the majority of AI assistant crawlers do not,” according to the study authors. Notably, TikTok owner ByteDance’s Bytespider was identified as a bot ignoring these rules.
Recently introduced features like Cloudflare’s “block AI bots” give site owners additional options; so far only a small percentage have adopted them. “While it is an ‘encouraging new option’, we hope that providers become more transparent with the operation and coverage of their tools (for example by providing the list of AI bots that are blocked),” said Elisa Luo, one author and Ph.D. student at UC San Diego.
Legal approaches remain uncertain as global regulation evolves. In Europe’s recently enacted AI Act, providers must obtain permission from copyright holders before using their data for model training; meanwhile in the United States courts continue debating fair use issues related to scraped data used in building generative models.
“There is reason to believe that confusion around the availability of legal remedies will only further focus attention on technical access controls,” researchers stated in their report.“To the extent that any U.S court finds an affirmative ‘fair use’ defense for AI model builders,this weakening of remedies on use will inevitably create an even stronger demand to enforce controls on access.”
This research received partial funding from NSF grant SaTC-2241303 and support from the Office of Naval Research project #N00014-24-1-2669.



