What is data scraping?
Data scraping (or web scraping) is a method through which a software/’bot’ is used to import any data or information from a website into a readable output format. It is generally an automated process of extracting data from a website.
What are the potential legal issues with data scraping?
1. Possibility of infringement of IP rights: It is possible that in the exercise of data scraping, the automated tool may pick up such information that is protected under a trademark or copyright. In a case before the Delhi High Court, OLX had successfully obtained a permanent restraining order against a company to prevent them from using automated/manual means to scrape any data, including commercial data, pertaining to OLX’s website. The company lifted off listings, photographs and other information from OLX’s website, and posted it on its own website. OLX had contended before the Court that all this information qualifies as a ‘proprietary database’ of OLX built through tremendous amounts of skill, labour and creativity. It said that such database of information qualifies as ‘original literary work’ and hence, is entitled to protection under copyright law. The Court ruled in favour of OLX. This can also apply similarly to trademark infringement.
However, it is important to note that the data scraping was illegal here because the company was posting the information collected from OLX’s website on its own. There would have been no copyright infringement if the company had used the data for its own private use.
a. LinkedIn- “You agree that you will not…(D)evelop, support or use software, devices, scripts, robots or any other means or processes (including crawlers, browser plugins and add-ons or any other technology) to scrape the Services or otherwise copy profiles and other data from the Services.”
b. Facebook- “You will not engage in Automated Data Collection without Facebook’s express written permission.”
However, a US appeals court had interpreted a similar provision in USA’s Computer Fraud and Abuse Act (“CFAA”) differently. The matter concerned a suit filed by a data analytics company called ‘hiQ’, which scraped data from public LinkedIn profiles, such as name, job title, work history and skills. Among other contentions, LinkedIn had contended that hiQ’s actions violated the CFAA as it continued to intentionally access LinkedIn’s servers ‘without authorization’ and obtained information from there. However, the Court disagreed with LinkedIn’s arguments, and allowed hiQ to continue its data scraping activities. It had held that the prohibition on unauthorized access is applicable only to private information, which has restricted access through a password or other technical barriers. Since hiQ was only using publicly available information on LinkedIn, it did not violate the CFAA.
While Indian courts have never examined the relevant provisions of the IT Act in this context, it can be argued that the penalty under the IT Act does not apply to scraping of publicly available information. Under the rules governing sensitive personal data or information (“SPDI”) under the IT Act, information that is freely available or accessible in the public domain is excluded from the definition of SPDI. Even the Personal Data Protection Bill, 2019 allows processing of publicly available data without the consent of the data principal.
Authored by Arpit Gupta, Senior Associate, with inputs from Aman Taneja, Senior Associate and Nehaa Chaudhari, Partner.
For more on topic, please reach out to us at firstname.lastname@example.org
OLX BV and Ors. v. Padawan
Ltd., Delhi HC order 15 December 2016, http://delhihighcourt.nic.in/dhcqrydisp_o
OLX BV and Ors v. Padavan
Ltd., Delhi HC order dated 31 March 2016, http://delhihighcourt.nic.in/dhcqrydisp_o
 This is one among the many exceptions to copyright infringement given in Section 52 of the Copyright Act, 1957.
 Automated Data Collection Terms, Facebook, https://www.facebook.com/apps/site_scraping_tos_terms.php.
 Understand the limitations of
 Ryanair Ltd. v. P.R. Aviation BV, Court of Justice of the European Union, 15 January 2015, https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:62014CJ0030&from=EN.
 Sections 4 and 10A of the IT Act grant legal recognition to electronic contracts.
 Section 43 of the IT Act.
 hiQ Labs Inc. v. LinkedIn Corporation, US Court of Appeals for the Ninth Circuit, 09 September 2019, https://cases.justia.com/federal/appellate-courts/ca9/17-16783/17-16783-2019-09-09.pdf?ts=1568048483. Also see this EFF article- https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-linkedin-protects-scraping-public-data.
 Proviso to rule 3, The Information Technology (Reasonable security practices and procedures and sensitive personal data or information) Rules, 2011.
 Clause 14(2)(g), PDP Bill.