Posted 2 Hours Ago Job ID: 2116732 19 quotes received

Python Web Scraper — Florida Court Data

Fixed Price$250-$500 W9 Required for U.S.
Quotes (19)  ·  Premium Quotes (0)  ·  Invited (0)  ·  Hired (0)

  Send before: April 30, 2026

Send a Quote

Florida Judiciary Web Scraper — Config-Driven, Resilient Architecture

I need a Python-based web scraping application to collect judge data from all 20 Florida judicial circuits and output it to a standardized CSV. The tool must be built for long-term maintainability — when a circuit website changes layout, only minimal configuration updates should be needed, not code rewrites.


Background: Florida has 20 circuits covering 67 counties. Each circuit publishes judge data differently: some offer Excel/CSV downloads, others publish HTML pages and subpages with varying structures. The master data source is:https://www.flcourts.gov/Florida-Courts/Trial-Courts-Circuit


Required Output Fields: (CSV)ID, Type, Name, Lastname, Assistant, Phone, Location, Street, City, State, Zip, County, Circuit, District, Courtroom, Hearingroom, Subdivision(Sample CSV will be provided — format must match exactly)


Architecture Requirements:

  1. Config-driven circuit registry — All 20 circuits must be defined in an external config file (JSON or YAML), not hardcoded. Each entry should include: circuit number, base URL(s), scraping method (HTML/table/CSV download), and field mappings. Adding or updating a circuit should require only a config change.
  2. Per-circuit adapter pattern — Each circuit should have its own scraping strategy/adapter to handle unique layouts. This isolates changes: if Circuit 11 redesigns their site, only that adapter needs updating.
  3. Change detection — On each run, compare results to the previous run and produce a diff report (new judges, removed judges, changed fields). Full output CSV is always saved, but the diff highlights what changed.
  4. Flexible execution — Support both a full scrape of all 20 circuits and targeted single-circuit runs (e.g., --circuit 17). This allows quick re-runs when a specific circuit fails.
  5. Error handling and logging — If a circuit scrape fails or returns no results, log the error with timestamp and circuit ID. Do not silently skip circuits. Optionally support email or webhook notification on failure.
  6. Scheduling-ready — The tool should run headlessly from the command line and be schedulable via cron or Windows Task Scheduler without manual intervention.


Tech Stack Preferences: Python 3.x, BeautifulSoup or Playwright (for JavaScript-rendered pages), pandas for CSV output. Deliverable should include a requirements.txt and brief setup documentation.


Deliverables:

  • Working Python application with all 20 circuits implemented
  • External config file for all circuit URLs and scraping strategies
  • Sample output CSV matching the provided format
  • Change-detection diff report on each run
  • README with setup, usage, and instructions for updating a circuit when its site changes


Additional Notes: Some circuits render content via JavaScript and may require a headless browser (Playwright). Please flag in your proposal which circuits you identify as JS-rendered. Prior experience scraping government/court websites is a strong plus.

... Show more
Michael K United States