SENIOR UNREAL ENGINE DIGITAL HUMAN DEVELOPER - PHOTOREALISTIC CONVERSATIONAL AVATAR PLATFORM
We are a startup (experienced team) building a photorealistic conversational avatar for a real-time AI interaction system, one component in a planned series of projects, and are looking for the right developer to build it and grow with us. We need competitive pricing (fixed price) and someone who values long-term collaboration over one-off rates.
WHAT WE ARE BUILDING
This is not a character-art project and not a one-off avatar delivery. We are building an avatar platform - a system that allows us to rapidly create and deploy dozens of distinct avatar personalities without rebuilding from scratch each time. The developer we hire is building that foundation, not just a single character.
The work is structured in three phases, each independently priced and paid on demonstration and acceptance. All three phases are expected from the developer we hire. This is not a Phase 1 only engagement.
PHASE 1 - COMPLETE AVATAR AND PIPELINE FOUNDATION
The first avatar is a test avatar only - it exists to prove the pipeline works end to end and is not a final production character. Appearance: a professional American woman in her early-to-mid thirties, composed, credible, and naturally expressive.
Deliver a fully working photorealistic digital human in Unreal Engine 5+ using MetaHuman or an equivalent pipeline, proven end to end in a live browser environment. Phase 1 must include all of the following:
Full-body rigged photorealistic avatar with face, neck, shoulders, arms, and hands. Face and upper body are the animation priority. Full-body rig is required for future scalability.
High-fidelity facial system capable of micro-expressions, subtle emotional transitions, and natural conversational behavior. Must be blendshape-driven or equivalent - not dependent solely on bones. ARKit 52 blendshape topology alignment is a hard requirement (see technical requirements below).
Natural upper-body animation including conversational hand gestures, neck rotation, forward and back lean, shoulder movement, and the subtle postural shifts of real seated conversation. The avatar must appear alive and present during both speaking and listening states. A static or frozen body is not acceptable.
A conversational motion library - named, triggerable motion clips invoked by conversational context. Examples: nodding while listening, leaning forward when engaged, hand gestures while making a point, head tilt when processing. Without this, the avatar reads as artificial regardless of facial quality.
Production-quality materials for skin (subsurface scattering), eyes, teeth, and hair under varied lighting. Demonstration animations for seated idle, conversational speaking with full upper body movement, emotional response states, and smooth transitions between all states.
Payment on successful demonstration of a working avatar in browser with lip sync confirmed and latency verified.
PHASE 2 - AVATAR CREATION STUDIO
Deliver an internal studio tool that allows our team to create new avatar personalities without developer involvement. The primary workflow is photo-based: a team member takes or uploads a photo, the studio analyzes the facial structure and maps it onto a new avatar face, and the result is a production-ready character shaped around that reference. This must be the fast path - not a manual rebuild from scratch.
The studio must also support manual sculpting controls for detailed feature refinement including eyelashes, nose shape, jaw, brow, skin tone, and age characteristics. Hair, clothing, accessories, and background must be swappable as layered elements independent of the face and rig, allowing full appearance control within a single session. Real-time preview is required before export.
A complete new avatar personality must be achievable within hours, not days, by a non-developer team member working independently.
Payment on demonstrated creation of a new avatar by our team using the studio, without developer assistance.
PHASE 3 - PERSONALITY AND EXPRESSION SYSTEM
Deliver an emotional overlay control system allowing our AI to drive avatar demeanor in real time based on conversational context. The system must support distinct states including empathy, warmth, authority, concern, urgency, and seriousness, with smooth natural transitions between them.
The emotional overlay must operate as a separate driver from the ARKit 52 lip sync stream. Both streams - viseme drive and expression drive - must run the rig simultaneously without conflict. The system must apply to any avatar created in Phase 2 without rebuilding expression logic per character.
Payment on demonstrated real-time emotional state control across at least three distinct avatars built in the Phase 2 studio.
TECHNICAL REQUIREMENTS
Our audio-driven lip sync pipeline is already built and operational. The developer is not building it. The existing chain runs: user speaks, STT transcribes, LLM responds, TTS converts to audio, a managed cloud service performs audio-to-blendshape inference outputting ARKit 52 frames at 30Hz, transmitted via WebRTC, decoded in browser via WASM, and drives the avatar rig frame by frame.
The developer must deliver a rig whose blendshape topology matches ARKit 52 exactly. When our pipeline writes jawOpen = 0.63, the rig must respond correctly for all 52 shapes. A mismatch forces a mapping layer, adding latency and costing realism. ARKit 52 alignment is a hard requirement. The developer is not responsible for TTS, audio-to-blendshape inference, real-time transport, or browser-side decoder - all are in place.
Latency target is under 300ms end to end from user stops speaking to avatar mouth begins moving. Hard ceiling is 500ms.
Browser delivery preference is client-side rendering (WebGL, WebGPU, Three.js, Babylon.js, or equivalent). Per-user GPU cost rules out Pixel Streaming as a production target. A hybrid approach is acceptable during development - Pixel Streaming for internal previews, client-side as the production target.
WHAT WE NEED IN YOUR PROPOSAL
Incomplete proposals will not be considered.
Portfolio - links to 2-3 real-time digital humans you personally built. Video links required. Images and screenshots will not be considered. State your specific role on each project.
Pricing - fixed price for each of the three phases with a delivery timeline in days per phase. No ranges. One number per phase.
Pipeline and tools - your recommended approach (MetaHuman, custom, or hybrid), the specific tools you will use, and why this approach fits this project.
ARKit 52 alignment - how your rig maps to all 52 shapes, any shapes you consider inadequate for conversational speech and what you would add, and how you wire the dual-stream drive so viseme and expression streams run simultaneously without conflict.
Facial realism - your specific approach to gaze, blink, micro-expressions, and avoiding uncanny valley. Be specific.
Upper body and motion - your approach to natural upper body animation and how you structure and expose the conversational motion library.
Avatar studio - how you would architect the Phase 2 studio, how photo-to-face mapping works in your approach, and how a non-developer team member creates and exports a new avatar independently.
Browser delivery - what MetaHuman features survive export to WebGL/WebGPU, what is lost, and how you approach the realism gap between Unreal native and browser runtime.
Performance - your approach to GPU/CPU load, mesh complexity, texture sizes, and bandwidth for browser delivery at scale.
Latency - realistic mouth-sync latency your pipeline achieves, where the dominant cost sits, and what levers exist to reduce it.
Technical risks - address each of the following and identify which is hardest for your approach: material fidelity loss on export, blendshape count budget in browser runtimes, listening-state believability, gaze and blink naturalism, upper body cohesion with speech and facial state. Candidates who identify additional risks will be evaluated favorably.
Scalability benchmark - time and cost to build each of the following after Phase 1 is accepted: a male professional around age 35, and a female around age 60 with warm expressive characteristics. Reusability benchmark, not a purchase commitment.
Additional features - list any capabilities, tools, or system features not described above that you believe should be part of this platform and why.
ENGAGEMENT DETAILS
Type: Fixed price per phase
Payment: On demonstration and acceptance of each phase
Location: Remote
Start date: Immediate
Timeline: Proposed by candidate per phase
We want someone who understands the difference between a technically impressive character and a truly believable conversational digital human, and who thinks in systems and pipelines rather than one-off deliveries. This is the first of several projects. The right developer will find consistent ongoing work here rewarding reliability and competitive pricing.
... Show more