Posted 11 Days Ago Job ID: 2085612 11 quotes received

Facebook marketplace scraper

Fixed Price$250-$500
Quotes (11)  ·  Premium Quotes (0)  ·  Invited (1)  ·  Hired (0)

  Send before: May 22, 2024

Send a Quote

Programming & Development Web Development & Design

Would you be able to write a scraping project listed below


Project Overview:The goal is to create a scalable scraping and crawling system utilizing scrapoxy.io for extracting data from Facebook, Craigslist, and OfferUp. This system will leverage AWS and GCP to manage multiple instances, ensuring efficient and reliable scraping operations.

Key Components:

  1. Scrapoxy Integration: Utilize scrapoxy.io, a proxy rotating service, to manage requests and IP rotation. It allows for better anonymity and mitigates potential blocks or bans while scraping.

  2. Scraping Script Development: Develop specific scraping scripts tailored for each platform (Facebook, Craigslist, OfferUp). These scripts should handle authentication, navigating through pages, parsing HTML content, and extracting desired data (e.g., posts, listings, user information).

  3. AWS and GCP Setup: Configure instances on AWS and GCP to run scraping scripts. Utilize services like AWS EC2 and GCP Compute Engine for deploying and managing instances.

  4. Load Balancing and Scaling: Implement load balancing to evenly distribute scraping tasks among multiple instances. Use auto-scaling features to dynamically adjust the number of instances based on workload.

  5. Error Handling and Monitoring: Implement robust error handling mechanisms to manage interruptions (e.g., network issues, site changes). Set up monitoring tools (e.g., AWS CloudWatch, GCP Stackdriver) to track scraping performance, instance health, and potential issues.

  6. Data Storage and Management: Determine a strategy for storing scraped data. Consider using AWS S3 or GCP Cloud Storage for scalable and reliable data storage. Implement data management practices for organizing and processing the collected information.

  7. Security and Compliance: Ensure compliance with the terms of service of each platform to avoid legal issues. Implement security measures such as data encryption and access controls to protect sensitive information.

Workflow:

  1. Input URLs or search queries for Facebook, Craigslist, or OfferUp into the scraping system.
  2. Scrapoxy.io manages proxy rotation and distributes requests to instances running on AWS and GCP.
  3. Scraping scripts navigate through the platforms, extract relevant data, and store it in the designated storage system.
  4. Load balancing ensures efficient resource utilization and scalability.
  5. Monitoring tools track performance and health, while error handling mechanisms manage interruptions.
... Show more
Sovannary T United States