I build production-grade ETL pipelines and data engineering systems that extract, transform, and deliver large scale datasets reliably — from thousands of sources into clean, structured, queryable formats.
If your business needs to process massive datasets, automate data collection, or build scalable data infrastructure, this is exactly what I deliver.
What I build:
- Automated scraping pipelines across thousands of municipal, government, and property data sources
- Large scale ETL workflows with scheduling, error handling, and monitoring
- PostgreSQL/PostGIS databases optimized for spatial and relational queries
- Elasticsearch migrations for high-performance search at massive scale
- GIS data pipelines processing parcel, zoning, and geospatial datasets
- Address standardization and data cleaning at scale
- Recent production work:
- Standardized 85M+ parcel and owner addresses scraped from thousands of inconsistent county sources into single strict queryable format
- Migrated property search system handling 91,450,673 records from PostgreSQL to Elasticsearch — reduced response time from 30 seconds to 7 seconds
- Built automated zoning change detection pipeline monitoring 8,000+ cities weekly with 99%+ uptime
- Built intelligent link monitoring and self-healing system for 9,000+ municipal GIS sources — 90% automated restoration, 70% less downtime
What makes my work different:Every pipeline I build includes proper error handling, automated restoration, logging, monitoring, and documentation — so it runs without constant maintenance.
Tools: Python, PostgreSQL, PostGIS, Elasticsearch, ArcGIS, QGIS, GeoPandas, Docker, Jenkins, AWS S3, Scrapy, Selenium, Playwright, BeautifulSoup