Many businesses invest in web scraping services to analyze data, generate leads, and automate marketing—but is it legal?
The good news is that the benefits of web scraping for business aren’t just compelling, they’re also perfectly legitimate as long as your service provider stays on the right side of the law. Today we’ll explain what that means and why you should hire a reputable, ethical scraper.
What Are the Laws Related to Data Scraping?
It’s a common misconception that web scraping is illegal—it isn’t, nor is it hacking or data theft. There are no specific laws that prohibit data scraping. Professional scrapers follow data protection rules and access only publicly available data.
Can Scrapers Extract Personal Data?
The cases in which scraping can become complicated is when it comes to personal information. In California, the European Union, and other specific jurisdictions, you need to be mindful of the data collected about identifiable people. Regulations differ, but GDPR (the General Data Protection Regulation) and the CCPA (California Consumer Privacy Act) apply protections to some data that scrapers might find online.
A quick summary:
- GDPR protects all personal data, wherever it is published. Even if the data is on a public, open server, it cannot be scraped.
- CCPA means that information published by the government is not protected, including things like business registrations.
- Next year, the CPRA (California Privacy Rights Act) will change the definition of publicly available data per the CCPA. Information that a person has published and previously made public will not be protected.
Depending on where your business is based and the sources of scraped data, you need to remain compliant with local laws and ensure any information you use doesn’t fall outside the scope of legal data.
Scraping Copyright Content and Fair Use
Technically, you can’t replicate any copyrighted content without permission or an appropriate license. Most information online, including website addresses, images, graphics, logos, and social media posts, has some copyright element.
However, the US fair use doctrine permits scrapers to access copyrighted content. To be considered fair use, scraped data must meet two criteria:
- It must be changed meaningfully from the original content. For example, you might use content from a brand web address but change it to a database of products and prices. You cannot publish a copy of the original HTML without altering it.
- It must be used for research or marketing—it can’t be the basis for competition. Web scraping as an analysis tool is fine, but you can’t scrape data to republish it as your own content.
The best approach is to only scrape data you require without copying large sections of original content. Restricting your data scraping activities to defined pieces of information will ensure you steer clear of any potential violations.
Using Ethical Data Scraping Techniques
There are multiple scenarios where data scraping is used to benefit the data subject (the person the data refers to). Examples include cross-referencing data to find missing people or scraping pricing information for consumer price comparison websites.
The below list of guidelines sets out a “code of conduct” you should look for before hiring a web scraper you haven’t worked with before:
- only extracting relevant, useful data within the bounds of the project
- scraping only publicly available information that isn’t protected by a password or database protection rules
- copying only factual data, such as a name or phone number, without infringing on copyrights
- adapting scraped web data for use in analysis or comparison without gaining a competitive advantage or attempting to steal market share
- giving credit where you scrape live data (such as a weather forecast from a news publication) as a courtesy to the original publisher
- limiting the scraping activities on a particular website to avoid slowing down traffic and overloading the web servers
If you have a legitimate interest in the data and are aware of applicable local regulations, you can scrape data safely and responsibly. Activities such as mining data to sell to a third party or crawling sites protected by Terms of Service aren’t allowed.
If you work with an experienced web scraper well versed in the relevant laws, they will undoubtedly know which actions are permitted and which are not.