Posted 15 Hours Ago Job ID: 2109457 39 quotes received

AI Agent for Newspaper Processing

Fixed PriceUnder $250

Quotes (39) · Premium Quotes (1) · Invited (0) · Hired (0)

Send before: September 13, 2025

Send a Quote

Programming & Development Programming & Software

Python Programming Artificial Intelligence OpenAI

Build an AI Agent for Historical Newspaper Page Processing

I’m looking for a talented AI developer to bring 19th-century newspapers into the digital age — without losing a single detail. This project blends cutting-edge OCR, computer vision, and natural language processing with the meticulous standards of historical archiving. If you love combining technology with cultural preservation, this is your chance to make history literally.

Description:

I am seeking an experienced AI developer to build a custom AI agent that follows a detailed editorial workflow to process high-resolution facsimiles of 19th-century newspaper pages.

The goal is to automate the extraction, transcription, and description of images from historical newspaper pages, following a strict set of archival and editorial rules. The AI agent must produce publication-ready structured output for use in a nonfiction history book.

Required Workflow (Agent Must Implement):

OCR & Bounding Box Extraction

Detect all images/illustrations and their bounding box coordinates.
Extract captions using OCR with high accuracy.

Newspaper Metadata Extraction

Identify and transcribe the full newspaper title (italics, exact spelling).
Identify exact date of publication (full date for internal use, year only for captions).

Image Description Writing

Generate factual, neutral descriptions (no speculation, Chicago Manual of Style).
Include relevant context from nearby articles when applicable.

Caption Processing

Transcribe captions exactly as printed, preserving punctuation and line breaks.
Append “Illustration of XXXX” (XXXX = year of publication).
Skip caption section if no caption exists.

Final Formatting & Style Validation

Output must match a given structured schema (JSON or text).
Ensure compliance with Chicago Manual of Style (17th edition).

Deliverables:

Fully functional AI agent that processes a folder of page images and outputs a structured file for each (JSON, CSV, or TXT).
Ability to run locally or in the cloud.
Modular architecture so OCR, bounding box detection, and text description can be improved independently.
Documentation on how to run, retrain, or adjust rules.

Preferred Skills:

OCR experience (Tesseract, Google Vision, AWS Textract, or equivalent)
Computer vision for object detection (bounding boxes)
NLP for structured text generation with style enforcement
Python, LangChain, LlamaIndex, or OpenAI function-calling
Familiarity with historical/archival text processing is a plus

Additional Notes:

Accuracy is critical; the AI must preserve original spelling and punctuation in captions.
No generative “creative writing” — only factual descriptions.
Output must be consistent across large batches of pages.

Deadline: one week

If you have experience building AI tools for document analysis, historical archives, or structured data extraction, please include:

Relevant past work or portfolio samples
Technical approach you would take
OCR and vision libraries you recommend

Job Q&A

Become a member to ask a question, view Q&A, and get more benefits.

Similar Jobs

AI AGENT / AUTOMATION
Fixed Price or HourlyPosted: August 14, 2025
AI Sermon Platform Developer Needed ASAP
Fixed Price or HourlyPosted: July 17, 2025
Intake AI Voice Agent wth Doc Processing
Fixed Price or HourlyPosted: July 30, 2025

Posted By

Michael v

Canada


Feedback	No Feedback 100.0%
Total Spend	$13,608
Jobs Posted	127
Jobs Paid	23 (18%)
Paid Invoices	40 (95%)
Outstanding Invoices	2

More Jobs from Michael v (1)

Native French Editor
Send before: August 22, 2025

Add to Watchlist Send a Quote