I developed an automated script that scraped all the data present on TripAdvisor. The textual data was populated into a PostgreSQL database, and different categories of locations were saved into different databases. The images were saved onto a file-sharing platform (Dropbox). The script runs regularly and adds new locations and images to the data.
As the data was large, we used rotating proxies to get the data in parallel.
Key Features:
✔ Automated script that runs on regular intervals
✔ Deployed to Remote Desktop Server using Ubuntu
✔ File sharing through Dropbox
✔ Rotating proxies to protect the bot from being detected by the website
Using Python, Scrapy, and Selenium, this process generated profits for the client.
Need Custom Scraping Solutions? Let's collaborate!