Drew DeVault on scrapers
"Scrapers have been a thorn in the side of sysadmins for a very long time, but it’s particularly important as LLM scrapers seize the entire Internet to feed into expensive, inefficient machine learning models —ignoring the copyright (or copyleft, as it were) of the data as they go.
"The serious costs and widespread performance issues and outages caused by reckless scrapers has been on everyone’s mind in the sysadmin community as of late, and has been the subject of much discussion online.
"Aside from the much-appreciated responses of incredulity towards LLM operators, and support and compassion for sysadmins from much of the community, a significant minority views this problem as less important than we believe it to be.
"Many of their arguments reduce to victim blaming—
- It’s not that hard to handle this volume of traffic;
- We should be optimized to better deal with it;
- We need more caching (or to improve our performance);
- We should pay a racketeer like CloudFlare to make the problem go away.
"Some suggest that sysadmins should be reaching out to LLM companies to offer them more efficient ways to access our data to address the problem."
Comments
Post a Comment
Empathy recommended