Banner Image

All Services

Programming & Development

Data Scraping

$50/hr Starting at $100

Data is the currency of the Internet. I specialize in keeping steady streams of that currency flowing to my clients. I have harvested data from myriad sources, for a wide variety of business applications. My data acquisition abilities have been forged in the heat of battle. And I do view data acquisition as a battle. When I set out to retrieve data for my clients, I go to war. My weapon of choice is the Scrapy framework. MongoDB is where I store the spoils, hard fought for and won, though MySQL works just as well. Time and waste are my mortal enemies. Yet, with efficient code and sufficient know-how, I manage to thwart them time and again. The would-be data warrior is easily discouraged by anti-scraping measures. But as a salty veteran, I scoff at these mechanisms. I've seen and circumvented enough of them to know there's always a chink in the armor. They may slow the march, but they can't withstand the siege. Inevitably, the data will be retrieved, cleaned, stored, and analysed. Novice scrapers struggle with pages that require Javascript to render and manipulate. In desperation, they turn to browser emulation, using tools like Selenium and PhantomJS. Though well-intentioned, they are soon drowning in unmaintainable, memory and time inefficient spaghetti code. Necessity has lead me to a seamless integration of JS rendering (with concurrency) into all my scrapers, via Scrapy and Splash. Data greenhorns' are confined to web pages. Old-timers recognize the web is just one of many valuable data mines. I've mined obscure software that my competition said could not be scraped. I've used image recognition to retrieve data from photographed documents. I've even scraped the Internet of Things, via the Shodan API. Data newbies deliver data in spreadsheet form. Aye, I believe we can all appreciate a neatly formatted sheet. But those who have earned their stripes in the scraping trenches know, real world data is complex, and spreadsheets are often inadequate. Experienced scrapers have mastered the art of combining data with the power of databases and document-stores, such as MySQL, MongoDB, and Elastic Search. The student of scraping is aware of its applications for lead harvesting. But beyond that, there lies a wide world of content generation, brand awareness, market research, and data science. The novice is concerned only with the acquisition of data. The veteran considers first the intended application of the data to be acquired, and allows that to inform the acquisition process. I've fought hard in many arenas of data acquisition, and offered gigabytes of plunder to my clients. I invite you to test my mettle with your data acquisition project. I have yet to meet a scraping task that proved my equal. Perhaps you are the client to provide me the challenge I seek. I crawled. I scraped. I conquered.

About

$50/hr Ongoing

Download Resume

Data is the currency of the Internet. I specialize in keeping steady streams of that currency flowing to my clients. I have harvested data from myriad sources, for a wide variety of business applications. My data acquisition abilities have been forged in the heat of battle. And I do view data acquisition as a battle. When I set out to retrieve data for my clients, I go to war. My weapon of choice is the Scrapy framework. MongoDB is where I store the spoils, hard fought for and won, though MySQL works just as well. Time and waste are my mortal enemies. Yet, with efficient code and sufficient know-how, I manage to thwart them time and again. The would-be data warrior is easily discouraged by anti-scraping measures. But as a salty veteran, I scoff at these mechanisms. I've seen and circumvented enough of them to know there's always a chink in the armor. They may slow the march, but they can't withstand the siege. Inevitably, the data will be retrieved, cleaned, stored, and analysed. Novice scrapers struggle with pages that require Javascript to render and manipulate. In desperation, they turn to browser emulation, using tools like Selenium and PhantomJS. Though well-intentioned, they are soon drowning in unmaintainable, memory and time inefficient spaghetti code. Necessity has lead me to a seamless integration of JS rendering (with concurrency) into all my scrapers, via Scrapy and Splash. Data greenhorns' are confined to web pages. Old-timers recognize the web is just one of many valuable data mines. I've mined obscure software that my competition said could not be scraped. I've used image recognition to retrieve data from photographed documents. I've even scraped the Internet of Things, via the Shodan API. Data newbies deliver data in spreadsheet form. Aye, I believe we can all appreciate a neatly formatted sheet. But those who have earned their stripes in the scraping trenches know, real world data is complex, and spreadsheets are often inadequate. Experienced scrapers have mastered the art of combining data with the power of databases and document-stores, such as MySQL, MongoDB, and Elastic Search. The student of scraping is aware of its applications for lead harvesting. But beyond that, there lies a wide world of content generation, brand awareness, market research, and data science. The novice is concerned only with the acquisition of data. The veteran considers first the intended application of the data to be acquired, and allows that to inform the acquisition process. I've fought hard in many arenas of data acquisition, and offered gigabytes of plunder to my clients. I invite you to test my mettle with your data acquisition project. I have yet to meet a scraping task that proved my equal. Perhaps you are the client to provide me the challenge I seek. I crawled. I scraped. I conquered.

Skills & Expertise

Application DevelopmentData ManagementData ScrapingJavaScriptLead GenerationMongoDBMySQLScrapy FrameworkSeleniumSoftware DesignSpreadsheetsWeb Development

0 Reviews

This Freelancer has not received any feedback.