Skip to content

How to Create a Digital Human

Create talking avatar videos by combining a portrait image with audio.

Overview

Digital human models generate realistic talking head videos from a still image and an audio file. The avatar's lips, expressions, and head movements are synchronized with the audio.

Quick Start

bash
curl -X POST "https://api.get3w.com/api/v3/get3w/infinitetalk" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image_url": "https://cdn.get3w.com/uploads/portrait.png",
    "audio_url": "https://cdn.get3w.com/uploads/speech.mp3"
  }'
python
import get3w

output = get3w.run(
    "get3w/infinitetalk",
    {
        "image_url": "https://cdn.get3w.com/uploads/portrait.png",
        "audio_url": "https://cdn.get3w.com/uploads/speech.mp3"
    }
)

print(output["outputs"][0])

Available Models

ModelQualitySpeedBest For
InfiniteTalkHighMediumGeneral talking heads
InfiniteTalk FastGoodFastQuick previews
MultiTalkHighMediumMulti-person scenes
LatentSyncGoodFastLip sync only

Complete Workflow

  1. Prepare a portrait — Upload a clear, front-facing portrait image
  2. Generate speech — Use a TTS model to create audio
  3. Create the video — Combine the portrait and audio with a digital human model
  4. Download the result — Save the generated video
python
import get3w

portrait_url = get3w.upload("/path/to/portrait.png")

speech = get3w.run("minimax/speech-02-hd", {
    "text": "Hello, welcome to our product demo.",
    "voice_id": "default"
})
audio_url = speech["outputs"][0]

video = get3w.run("get3w/infinitetalk", {
    "image_url": portrait_url,
    "audio_url": audio_url
})

print("Video URL:", video["outputs"][0])

Tips

  • Use a front-facing portrait — Models work best with clear, well-lit face images
  • High-quality audio — Clean audio produces better lip sync
  • Keep it short — Start with short clips (10-30 seconds) for best quality
  • Neutral expression — A neutral starting expression gives the model more room to animate

Next Steps

Released under the MIT License.