Banner Image

All Services

Programming & Development

Textual data wrangling

$45/hr Starting at $45

Working with precision on large volumes of tabular text is a specialty of mine. Historically, it has been on scientific data such as the human genome, where mistakes are naturally extremely unwelcome. When I work on other data, though, I bring the same techniques to bear, and can offer you efficiency and accuracy developed over years of working with all kinds of textual and numeric content. For a programmer, it's easy to move back and forth from Excel to the command line to perform powerful tricks that can reduce time and increase accuracy for data merging or filtering tasks. In fact, I just finished teaching a mini-course for summer interns at my last workplace, where I taught advanced Excel concepts as a gateway to Unix programming. Applications of this technique include cleaning or filtering data, joining or merging multiple data sources, reclassifying items or applying transformations to them, converting among numeric or other formats, automating repetitive tasks, and other activities common to both business and science. The common thread in all these tasks is that they are rule-based, which is the difference between programmatic data handling and simple data entry. If a rule that applies to all the data can be stated, then it can be programmed. If there are exceptions to the rule, then an exception can be programmed; exceptions to the exception can be programmed, and so on. The amount of time your text-wrangling job takes simply depends on how consistent your data is, and how deep into the list of exceptions you want me to go before I hand the job back to you and let you handle the leftover items manually. If you have an Excel file, a CSV, a TSV, some XML, some JSON, or something else textual that's giving you a headache, get in touch and let's see what I can do for you!

About

$45/hr Ongoing

Download Resume

Working with precision on large volumes of tabular text is a specialty of mine. Historically, it has been on scientific data such as the human genome, where mistakes are naturally extremely unwelcome. When I work on other data, though, I bring the same techniques to bear, and can offer you efficiency and accuracy developed over years of working with all kinds of textual and numeric content. For a programmer, it's easy to move back and forth from Excel to the command line to perform powerful tricks that can reduce time and increase accuracy for data merging or filtering tasks. In fact, I just finished teaching a mini-course for summer interns at my last workplace, where I taught advanced Excel concepts as a gateway to Unix programming. Applications of this technique include cleaning or filtering data, joining or merging multiple data sources, reclassifying items or applying transformations to them, converting among numeric or other formats, automating repetitive tasks, and other activities common to both business and science. The common thread in all these tasks is that they are rule-based, which is the difference between programmatic data handling and simple data entry. If a rule that applies to all the data can be stated, then it can be programmed. If there are exceptions to the rule, then an exception can be programmed; exceptions to the exception can be programmed, and so on. The amount of time your text-wrangling job takes simply depends on how consistent your data is, and how deep into the list of exceptions you want me to go before I hand the job back to you and let you handle the leftover items manually. If you have an Excel file, a CSV, a TSV, some XML, some JSON, or something else textual that's giving you a headache, get in touch and let's see what I can do for you!

Skills & Expertise

BashData EntryExcelProgrammingPythonRestUnix

0 Reviews

This Freelancer has not received any feedback.