Posted 15 Hours Ago Job ID: 2109457 39 quotes received

AI Agent for Newspaper Processing

Fixed PriceUnder $250
Quotes (39)  ·  Premium Quotes (1)  ·  Invited (0)  ·  Hired (0)

  Send before: September 13, 2025

Send a Quote

Programming & Development Programming & Software

Build an AI Agent for Historical Newspaper Page Processing



I’m looking for a talented AI developer to bring 19th-century newspapers into the digital age — without losing a single detail. This project blends cutting-edge OCR, computer vision, and natural language processing with the meticulous standards of historical archiving. If you love combining technology with cultural preservation, this is your chance to make history literally.


Description:


 I am seeking an experienced AI developer to build a custom AI agent that follows a detailed editorial workflow to process high-resolution facsimiles of 19th-century newspaper pages.

The goal is to automate the extraction, transcription, and description of images from historical newspaper pages, following a strict set of archival and editorial rules. The AI agent must produce publication-ready structured output for use in a nonfiction history book.

Required Workflow (Agent Must Implement):


  1. OCR & Bounding Box Extraction
  • Detect all images/illustrations and their bounding box coordinates.
  • Extract captions using OCR with high accuracy.
  1. Newspaper Metadata Extraction
  • Identify and transcribe the full newspaper title (italics, exact spelling).
  • Identify exact date of publication (full date for internal use, year only for captions).
  1. Image Description Writing
  • Generate factual, neutral descriptions (no speculation, Chicago Manual of Style).
  • Include relevant context from nearby articles when applicable.
  1. Caption Processing
  • Transcribe captions exactly as printed, preserving punctuation and line breaks.
  • Append “Illustration of XXXX” (XXXX = year of publication).
  • Skip caption section if no caption exists.
  1. Final Formatting & Style Validation
  • Output must match a given structured schema (JSON or text).
  • Ensure compliance with Chicago Manual of Style (17th edition).


Deliverables:


  • Fully functional AI agent that processes a folder of page images and outputs a structured file for each (JSON, CSV, or TXT).
  • Ability to run locally or in the cloud.
  • Modular architecture so OCR, bounding box detection, and text description can be improved independently.
  • Documentation on how to run, retrain, or adjust rules.


Preferred Skills:


  • OCR experience (Tesseract, Google Vision, AWS Textract, or equivalent)
  • Computer vision for object detection (bounding boxes)
  • NLP for structured text generation with style enforcement
  • Python, LangChain, LlamaIndex, or OpenAI function-calling
  • Familiarity with historical/archival text processing is a plus


Additional Notes:


  • Accuracy is critical; the AI must preserve original spelling and punctuation in captions.
  • No generative “creative writing” — only factual descriptions.
  • Output must be consistent across large batches of pages.


 Deadline: one week


If you have experience building AI tools for document analysis, historical archives, or structured data extraction, please include:


  • Relevant past work or portfolio samples
  • Technical approach you would take
  • OCR and vision libraries you recommend



... Show more
Michael V Canada