PDF Documents Data Extraction to Google Sheet
Summary
Hello! I have approximately 300 business contracts with accompanying invoices that I need data extracted to a predefined Google Spreadsheet. Note: there are about 4 different versions of the contracts within the 300 where additional data fields have been entered overtime that will need to be scraped. I assume you will be using software to ETL the data to Google sheets from documents and will need to edit it when errors are found.
This is a one time job as we are automatically extracting data in the future from a new CRM we are using.
Context:
We are in the process of switching to a new CRM where all business data will be stored in a database. However, past documents were done manually and are not in a database.
Overall Client Goal:
The goal of this project is to get past business data into a database adhering to a specific naming schema so that
a) new data coming in from the CRM will integrate to this same database you create b) be able to analyze trends and insights for business sales and operations from raw data and c) test out off the shelf data analysis tools on the database to continually extract value to save money and increase revenue.
Task:
- review pdf documents and versions
- review google sheet and naming schema definitions / organization sheet
- test on one contract and invoices and review outputs with client before moving on to the the rest of the pdf documents
- ETL data and quality assure output for punctuation, letter case, number formatting, schema, etc. It must exactly correlate with data schema requirements for all extracted data
- review with client and update for adjustments
- take ownership of the project. You are the domain expert. Provide suggestions and insights to better
- achieve client overall goals
- data security is a must. Dealing with sensitive data from past and current customers
Deliverables:
- Google sheet with data from all pof
documents(contacts and invoices)
- Quality assure all data is accurate and adheres to naming schema conventions
- maintain data security during and after the project is complete
Client will share:
- google sheet with headers with naming schema
- data schema sheet.
- google folder with contracts and invoices
Please ask questions and provide insight to assist client in achieving goals and minimizing back and forth so you work effectively. We will probably need to message and/or chat before starting the project to be in alignment.
Thank you!