Detecting and Mitigating the New Generation of Scraping Bots

Abstract

Every day an invisible war for data takes place between e-commerce websites and web scrapers. E-commerce websites own the data at the heart of the conflict and would like to provide it only to genuine users. Web scrapers aim to have illimited and continuous access to the above-mentioned data to capitalize on it. To achieve this goal, scrapers send large amounts of requests to e-commerce websites, causing them financial problems. This led the security industry to engage in an arms race against scrapers to create better systems to detect and mitigate their requests. At present, the battle continues, but scrapers appear to have the upper hand, thanks to the usage of Residential IP Proxies (RESIPs). In this thesis, we aim to shift the balance by introducing novel detection and mitigation techniques that overcome the limitations of current state-of-the-art methods. We propose a deceptive mitigation technique that lures scrapers into believing they have obtained their target data while they receive modified information. We present two new detection techniques based on network measurements that identify scraping requests proxied through RESIPs. Thanks to an ongoing collaboration with Amadeus IT Group, we validate our results on real-world operational data. Being aware that scrapers will not stop looking for new ways to avoid detection and mitigation, this thesis provides additional contributions that can help in building the next defensive weapons for fighting scrapers. We propose a comprehensive characterization of RESIPs, the strongest weapon currently at the disposal of scrapers. Moreover, we investigate the possibility of acquiring threat intelligence on the scrapers by geolocating them when they send requests through a RESIP.

Type
Publication
In Ph.D. Dissertation, Sorbonné Université, Cryptography and Security
Elisa Chiapponi
Elisa Chiapponi
Security Researcher

Security Researcher in the Application Security Operation Center at Amadeus