From Image Model to Finished Clip
Seedance 2.0 Real Human Video API: Access, Setup, and Prompting


Date
Author
Andrew Zheng
Real people have always been the hard case for AI video. Synthetic characters are forgiving, a slightly different face between frames reads as style. A real person is not forgiving. The moment the identity drifts, the eye catches it, and the whole clip falls apart. That single problem is why most real-person AI video still looks off.
Seedance 2.0 closes that gap. Hand it one real portrait, and it locks that identity, holding the face, expression, and lip sync stable from the first frame to the last, with audio generated alongside the video. The remaining question is how you reach the Seedance 2.0 real human capability without a pile of setup in the way. This guide walks through generating real human video on Infron from your own code, start to finish.
What "real human" means in Seedance 2.0
Capability | What it means for real human video |
|---|---|
Identity consistency | The same real face, hair, and outfit from the first frame to the last, with no drift |
Native audio video sync | Dialogue, lip sync, and sound generated in one pass, not added in a separate dubbing step |
Multi-shot storytelling | Up to 9 images guide a cut sequence while the person stays consistent across every shot |
Continuous camera moves | One unbroken shot gliding through a scene, controlled entirely through the prompt |
Seedance 2.0 splits people into two modes, and the distinction decides which model string you call.
Need | What it does | Typical input | Infron model page |
|---|---|---|---|
Real-person video | Drives a real human from a portrait you supply, locking that identity across the clip | One reference portrait you have the rights to use | https://infron.ai/models/bytedance/seedance-2.0/virtual-portrait-reference-to-video/api-reference |
Multi-reference video | Builds a multi-shot clip from several references, used for animated or character-driven work | Up to 9 images, plus video and audio references | https://infron.ai/models/bytedance/seedance-2.0/reference-to-video |
The Seedance 2.0 real human capability runs through the virtual portrait line. You hand it a reference image of an actual person, the model locks that identity, then it holds the face, expressions, and lip sync stable across the entire clip. A real person stays the same person from the first frame to the last, with native audio generated alongside the video rather than added in a separate dubbing pass.
Because the input is a real identity, every serious provider gates it. That gate is the whole reason the next section exists.
Why the usual real human routes are slow
Reaching Seedance 2.0 real human output is rarely the problem. The friction sits in everything you do before the first generation.
The ComfyUI tutorial is excellent if you live inside ComfyUI and generate one clip at a time. It does not fit a team that wants real human video as a programmatic step inside a larger pipeline, where every clip should come from an API call rather than a manual node run. That is the gap Infron fills.
ComfyUI nodes | Reseller gateway | Infron | |
|---|---|---|---|
Setup before first clip | Install nodes, log into a permitted network, clear a one-time liveness check | New account plus a separate identity verification step | Request enterprise access once, then call one endpoint |
Ongoing friction | Local graph upkeep and asset or group IDs to manage by hand | Extra account and portal to manage | No node graph, no second verification portal to babysit |
Best fit | One clip at a time, inside ComfyUI | Casual self-serve access | Real human as a programmatic pipeline step |
How to generate Seedance 2.0 real human video on Infron
Three steps: get access, upload your reference, then generate. The code below calls the live endpoint and follows the request shape the API expects.
Step 1: Get enterprise access
Seedance 2.0 real human is not open self-serve. Infron opens the virtual portrait line to enterprise users through an allowlist, so that real-person generation stays tied to an accountable, authorized organization rather than anonymous traffic. This is deliberate. Driving a real identity is exactly the capability that should sit behind consent and authorization, and the allowlist is how Infron keeps it there.
For your team this means one short onboarding rather than a recurring verification portal. Contact Infron to have your organization added, and the same API key that unlocks real human also carries the rest of the catalog.
Step 2: Upload your reference asset
Real-person reference images cannot be passed as a raw URL. If you try, the request is rejected by privacy detection with an InputImageSensitiveContentDetected.PrivacyInformation error. The portrait has to be uploaded as an asset first, then referenced by the asset URI the upload returns.
# Upload the reference portrait as an asset (multipart file stream) curl -X POST https://media.onerouter.pro/v1/upload/resources \ -H "Authorization: Bearer $INFRON_API_KEY" \ -F "model=bytedance/seedance-2.0/virtual-portrait-reference-to-video" \ -F "file=@/path/to/reference_portrait.png"
# Upload the reference portrait as an asset (multipart file stream) curl -X POST https://media.onerouter.pro/v1/upload/resources \ -H "Authorization: Bearer $INFRON_API_KEY" \ -F "model=bytedance/seedance-2.0/virtual-portrait-reference-to-video" \ -F "file=@/path/to/reference_portrait.png"
The upload returns an id and an upstream_asset_uri (in the form asset://asset-xxxx). Poll the status endpoint until upstream_status reads Active before you generate. A real portrait runs a consistency check first, so it usually sits in Processing for a moment. Uploaded assets are temporary: they expire after 7 days, and an account holds up to 1000 at a time.
import time import requests BASE = "https://media.onerouter.pro/v1" HEADERS = {"Authorization": f"Bearer {INFRON_API_KEY}"} MODEL = "bytedance/seedance-2.0/virtual-portrait-reference-to-video" def upload_reference(path): with open(path, "rb") as f: r = requests.post( f"{BASE}/upload/resources", headers=HEADERS, data={"model": MODEL}, files={"file": f}, ) r.raise_for_status() body = r.json()["data"] return body["id"], body["upstream_asset_uri"] def wait_until_active(resource_id, timeout=180): deadline = time.time() + timeout while time.time() < deadline: r = requests.get(f"{BASE}/status/resources/{resource_id}", headers=HEADERS) r.raise_for_status() data = r.json()["data"] if data.get("upstream_status") == "Active": return True if data.get("upstream_status") == "Failed": raise RuntimeError(data.get("failure_code", "upload failed")) time.sleep(2) raise TimeoutError("Reference asset did not reach Active state in time") resource_id, asset_uri = upload_reference("reference_portrait.png") wait_until_active(resource_id)
import time import requests BASE = "https://media.onerouter.pro/v1" HEADERS = {"Authorization": f"Bearer {INFRON_API_KEY}"} MODEL = "bytedance/seedance-2.0/virtual-portrait-reference-to-video" def upload_reference(path): with open(path, "rb") as f: r = requests.post( f"{BASE}/upload/resources", headers=HEADERS, data={"model": MODEL}, files={"file": f}, ) r.raise_for_status() body = r.json()["data"] return body["id"], body["upstream_asset_uri"] def wait_until_active(resource_id, timeout=180): deadline = time.time() + timeout while time.time() < deadline: r = requests.get(f"{BASE}/status/resources/{resource_id}", headers=HEADERS) r.raise_for_status() data = r.json()["data"] if data.get("upstream_status") == "Active": return True if data.get("upstream_status") == "Failed": raise RuntimeError(data.get("failure_code", "upload failed")) time.sleep(2) raise TimeoutError("Reference asset did not reach Active state in time") resource_id, asset_uri = upload_reference("reference_portrait.png") wait_until_active(resource_id)
Step 3: Generate with virtual-portrait-reference-to-video
You can see the full parameter schema on here: https://infron.ai/models/bytedance/seedance-2.0/virtual-portrait-reference-to-video/api-reference
Pass the asset URI from Step 2 in image_urls. Camera movement, pacing, and expression are all controlled through the prompt text, so a single portrait asset is the only image reference the call needs. Generation is asynchronous: you submit the job, receive a task_id, then poll the task until it completes.
def submit_generation(asset_uri, prompt): payload = { "model": MODEL, "image_urls": [asset_uri], "video_urls": [], "audio_urls": [], "prompt": prompt, "aspect_ratio": "16:9", "duration": "10", "resolution": "720p", "generate_audio": True, "n": 1, } r = requests.post(f"{BASE}/videos/generations", headers=HEADERS, json=payload) r.raise_for_status() return r.json()["task_id"] def wait_for_video(task_id, timeout=600): deadline = time.time() + timeout while time.time() < deadline: r = requests.get(f"{BASE}/videos/tasks/{task_id}", headers=HEADERS) r.raise_for_status() body = r.json() if body.get("status") == "completed": return body time.sleep(5) raise TimeoutError("Video task did not complete in time") task_id = submit_generation( asset_uri, "One single continuous unbroken camera shot gliding through a sunlit cafe, " "the woman from the reference seated by the window, looking up and smiling as " "she speaks, never cutting, warm natural light, gentle handheld motion." ) result = wait_for_video(task_id) print(result)
def submit_generation(asset_uri, prompt): payload = { "model": MODEL, "image_urls": [asset_uri], "video_urls": [], "audio_urls": [], "prompt": prompt, "aspect_ratio": "16:9", "duration": "10", "resolution": "720p", "generate_audio": True, "n": 1, } r = requests.post(f"{BASE}/videos/generations", headers=HEADERS, json=payload) r.raise_for_status() return r.json()["task_id"] def wait_for_video(task_id, timeout=600): deadline = time.time() + timeout while time.time() < deadline: r = requests.get(f"{BASE}/videos/tasks/{task_id}", headers=HEADERS) r.raise_for_status() body = r.json() if body.get("status") == "completed": return body time.sleep(5) raise TimeoutError("Video task did not complete in time") task_id = submit_generation( asset_uri, "One single continuous unbroken camera shot gliding through a sunlit cafe, " "the woman from the reference seated by the window, looking up and smiling as " "she speaks, never cutting, warm natural light, gentle handheld motion." ) result = wait_for_video(task_id) print(result)
For the fast tier, swap the model string to bytedance/seedance-2.0/fast/virtual-portrait-reference-to-video. Everything else stays the same.
Prompting real human video that actually holds identity
The model locks the face. You control everything else through language, and a few habits consistently produce cleaner results.
Write the camera as one move, not a sequence of cuts. Phrases like "one single continuous unbroken camera shot gliding through" keep Seedance 2.0 from inserting hard transitions that break the sense of a real person filmed live. When a scene has more than one subject, name each person in spatial order and add "never cutting, never skipping anyone" so the model visits them in a readable path. For energy, reach for fast, punchy, handheld, motion-blur language rather than listing technical camera specs. And because seeds vary, run several and select the best take rather than betting a deadline on a single generation.
Keep the prompt describing performance and motion. The identity is already handled by the reference asset, so spending prompt tokens re-describing the face tends to fight the lock rather than reinforce it.
Pricing and access
Seedance 2.0 real human runs inside the enterprise allowlist. Once your organization is approved, generation is billed by usage and rolls up into a single unified bill across everything you call, so real human video sits on the same invoice and the same key as the rest of your stack. There is no separate plan to negotiate per model. Reach out to [Infron Sales] to scope access for your team.
One API, 400+ models
Real human video is rarely the whole job. A typical pipeline generates a reference portrait first, locks identity, then renders motion, and Seedance 2.0 real human is the final link in that chain rather than the entire chain.
That is the practical reason to run it on Infron. The same OpenAI-compatible key that produces real human video also reaches 400+ AI models, including the image models you would use to create or clean up a reference portrait before it ever touches the virtual portrait line. You build the upstream image, the identity lock, and the final render against one endpoint, one auth header, and one bill, instead of stitching three vendors together. The real human capability is the headline. The single integration across 400+ AI models is what keeps it in production.
FAQ
Is Seedance 2.0 real human the same as the virtual portrait model?
Yes. Real human generation runs through bytedance/seedance-2.0/virtual-portrait-reference-to-video. You supply a reference image of a real person and the model holds that identity across the clip.
Do I need ComfyUI to generate real human video?
No. The ComfyUI partner nodes are one way to reach the capability. On Infron you call the API directly, with no node graph and no local liveness flow to maintain.
Can I upload a real person's photo?
Yes, for authorized use. The portrait must be uploaded as an asset first, which is what passes Seedance 2.0's privacy detection, and real human access is opened to enterprise users on the allowlist so that use stays consented and accountable.
Who can access real human generation?
Enterprise users on the Infron allowlist. Contact Infron to add your organization.
Does it generate audio?
Yes. Set generate_audio to true and Seedance 2.0 produces synchronized audio with the video in a single pass, including lip sync.
Can it do multiple shots and continuous camera moves?
Yes. Multi-shot sequences and single continuous camera moves are both controlled through the prompt. Describe the camera as one unbroken move and name subjects in order for the cleanest result.
Real people have always been the hard case for AI video. Synthetic characters are forgiving, a slightly different face between frames reads as style. A real person is not forgiving. The moment the identity drifts, the eye catches it, and the whole clip falls apart. That single problem is why most real-person AI video still looks off.
Seedance 2.0 closes that gap. Hand it one real portrait, and it locks that identity, holding the face, expression, and lip sync stable from the first frame to the last, with audio generated alongside the video. The remaining question is how you reach the Seedance 2.0 real human capability without a pile of setup in the way. This guide walks through generating real human video on Infron from your own code, start to finish.
What "real human" means in Seedance 2.0
Capability | What it means for real human video |
|---|---|
Identity consistency | The same real face, hair, and outfit from the first frame to the last, with no drift |
Native audio video sync | Dialogue, lip sync, and sound generated in one pass, not added in a separate dubbing step |
Multi-shot storytelling | Up to 9 images guide a cut sequence while the person stays consistent across every shot |
Continuous camera moves | One unbroken shot gliding through a scene, controlled entirely through the prompt |
Seedance 2.0 splits people into two modes, and the distinction decides which model string you call.
Need | What it does | Typical input | Infron model page |
|---|---|---|---|
Real-person video | Drives a real human from a portrait you supply, locking that identity across the clip | One reference portrait you have the rights to use | https://infron.ai/models/bytedance/seedance-2.0/virtual-portrait-reference-to-video/api-reference |
Multi-reference video | Builds a multi-shot clip from several references, used for animated or character-driven work | Up to 9 images, plus video and audio references | https://infron.ai/models/bytedance/seedance-2.0/reference-to-video |
The Seedance 2.0 real human capability runs through the virtual portrait line. You hand it a reference image of an actual person, the model locks that identity, then it holds the face, expressions, and lip sync stable across the entire clip. A real person stays the same person from the first frame to the last, with native audio generated alongside the video rather than added in a separate dubbing pass.
Because the input is a real identity, every serious provider gates it. That gate is the whole reason the next section exists.
Why the usual real human routes are slow
Reaching Seedance 2.0 real human output is rarely the problem. The friction sits in everything you do before the first generation.
The ComfyUI tutorial is excellent if you live inside ComfyUI and generate one clip at a time. It does not fit a team that wants real human video as a programmatic step inside a larger pipeline, where every clip should come from an API call rather than a manual node run. That is the gap Infron fills.
ComfyUI nodes | Reseller gateway | Infron | |
|---|---|---|---|
Setup before first clip | Install nodes, log into a permitted network, clear a one-time liveness check | New account plus a separate identity verification step | Request enterprise access once, then call one endpoint |
Ongoing friction | Local graph upkeep and asset or group IDs to manage by hand | Extra account and portal to manage | No node graph, no second verification portal to babysit |
Best fit | One clip at a time, inside ComfyUI | Casual self-serve access | Real human as a programmatic pipeline step |
How to generate Seedance 2.0 real human video on Infron
Three steps: get access, upload your reference, then generate. The code below calls the live endpoint and follows the request shape the API expects.
Step 1: Get enterprise access
Seedance 2.0 real human is not open self-serve. Infron opens the virtual portrait line to enterprise users through an allowlist, so that real-person generation stays tied to an accountable, authorized organization rather than anonymous traffic. This is deliberate. Driving a real identity is exactly the capability that should sit behind consent and authorization, and the allowlist is how Infron keeps it there.
For your team this means one short onboarding rather than a recurring verification portal. Contact Infron to have your organization added, and the same API key that unlocks real human also carries the rest of the catalog.
Step 2: Upload your reference asset
Real-person reference images cannot be passed as a raw URL. If you try, the request is rejected by privacy detection with an InputImageSensitiveContentDetected.PrivacyInformation error. The portrait has to be uploaded as an asset first, then referenced by the asset URI the upload returns.
# Upload the reference portrait as an asset (multipart file stream) curl -X POST https://media.onerouter.pro/v1/upload/resources \ -H "Authorization: Bearer $INFRON_API_KEY" \ -F "model=bytedance/seedance-2.0/virtual-portrait-reference-to-video" \ -F "file=@/path/to/reference_portrait.png"
The upload returns an id and an upstream_asset_uri (in the form asset://asset-xxxx). Poll the status endpoint until upstream_status reads Active before you generate. A real portrait runs a consistency check first, so it usually sits in Processing for a moment. Uploaded assets are temporary: they expire after 7 days, and an account holds up to 1000 at a time.
import time import requests BASE = "https://media.onerouter.pro/v1" HEADERS = {"Authorization": f"Bearer {INFRON_API_KEY}"} MODEL = "bytedance/seedance-2.0/virtual-portrait-reference-to-video" def upload_reference(path): with open(path, "rb") as f: r = requests.post( f"{BASE}/upload/resources", headers=HEADERS, data={"model": MODEL}, files={"file": f}, ) r.raise_for_status() body = r.json()["data"] return body["id"], body["upstream_asset_uri"] def wait_until_active(resource_id, timeout=180): deadline = time.time() + timeout while time.time() < deadline: r = requests.get(f"{BASE}/status/resources/{resource_id}", headers=HEADERS) r.raise_for_status() data = r.json()["data"] if data.get("upstream_status") == "Active": return True if data.get("upstream_status") == "Failed": raise RuntimeError(data.get("failure_code", "upload failed")) time.sleep(2) raise TimeoutError("Reference asset did not reach Active state in time") resource_id, asset_uri = upload_reference("reference_portrait.png") wait_until_active(resource_id)
Step 3: Generate with virtual-portrait-reference-to-video
You can see the full parameter schema on here: https://infron.ai/models/bytedance/seedance-2.0/virtual-portrait-reference-to-video/api-reference
Pass the asset URI from Step 2 in image_urls. Camera movement, pacing, and expression are all controlled through the prompt text, so a single portrait asset is the only image reference the call needs. Generation is asynchronous: you submit the job, receive a task_id, then poll the task until it completes.
def submit_generation(asset_uri, prompt): payload = { "model": MODEL, "image_urls": [asset_uri], "video_urls": [], "audio_urls": [], "prompt": prompt, "aspect_ratio": "16:9", "duration": "10", "resolution": "720p", "generate_audio": True, "n": 1, } r = requests.post(f"{BASE}/videos/generations", headers=HEADERS, json=payload) r.raise_for_status() return r.json()["task_id"] def wait_for_video(task_id, timeout=600): deadline = time.time() + timeout while time.time() < deadline: r = requests.get(f"{BASE}/videos/tasks/{task_id}", headers=HEADERS) r.raise_for_status() body = r.json() if body.get("status") == "completed": return body time.sleep(5) raise TimeoutError("Video task did not complete in time") task_id = submit_generation( asset_uri, "One single continuous unbroken camera shot gliding through a sunlit cafe, " "the woman from the reference seated by the window, looking up and smiling as " "she speaks, never cutting, warm natural light, gentle handheld motion." ) result = wait_for_video(task_id) print(result)
For the fast tier, swap the model string to bytedance/seedance-2.0/fast/virtual-portrait-reference-to-video. Everything else stays the same.
Prompting real human video that actually holds identity
The model locks the face. You control everything else through language, and a few habits consistently produce cleaner results.
Write the camera as one move, not a sequence of cuts. Phrases like "one single continuous unbroken camera shot gliding through" keep Seedance 2.0 from inserting hard transitions that break the sense of a real person filmed live. When a scene has more than one subject, name each person in spatial order and add "never cutting, never skipping anyone" so the model visits them in a readable path. For energy, reach for fast, punchy, handheld, motion-blur language rather than listing technical camera specs. And because seeds vary, run several and select the best take rather than betting a deadline on a single generation.
Keep the prompt describing performance and motion. The identity is already handled by the reference asset, so spending prompt tokens re-describing the face tends to fight the lock rather than reinforce it.
Pricing and access
Seedance 2.0 real human runs inside the enterprise allowlist. Once your organization is approved, generation is billed by usage and rolls up into a single unified bill across everything you call, so real human video sits on the same invoice and the same key as the rest of your stack. There is no separate plan to negotiate per model. Reach out to [Infron Sales] to scope access for your team.
One API, 400+ models
Real human video is rarely the whole job. A typical pipeline generates a reference portrait first, locks identity, then renders motion, and Seedance 2.0 real human is the final link in that chain rather than the entire chain.
That is the practical reason to run it on Infron. The same OpenAI-compatible key that produces real human video also reaches 400+ AI models, including the image models you would use to create or clean up a reference portrait before it ever touches the virtual portrait line. You build the upstream image, the identity lock, and the final render against one endpoint, one auth header, and one bill, instead of stitching three vendors together. The real human capability is the headline. The single integration across 400+ AI models is what keeps it in production.
FAQ
Is Seedance 2.0 real human the same as the virtual portrait model?
Yes. Real human generation runs through bytedance/seedance-2.0/virtual-portrait-reference-to-video. You supply a reference image of a real person and the model holds that identity across the clip.
Do I need ComfyUI to generate real human video?
No. The ComfyUI partner nodes are one way to reach the capability. On Infron you call the API directly, with no node graph and no local liveness flow to maintain.
Can I upload a real person's photo?
Yes, for authorized use. The portrait must be uploaded as an asset first, which is what passes Seedance 2.0's privacy detection, and real human access is opened to enterprise users on the allowlist so that use stays consented and accountable.
Who can access real human generation?
Enterprise users on the Infron allowlist. Contact Infron to add your organization.
Does it generate audio?
Yes. Set generate_audio to true and Seedance 2.0 produces synchronized audio with the video in a single pass, including lip sync.
Can it do multiple shots and continuous camera moves?
Yes. Multi-shot sequences and single continuous camera moves are both controlled through the prompt. Describe the camera as one unbroken move and name subjects in order for the cleanest result.
More Articles

Seedance 2.0 Real Human Pipeline
How to Build a Seedance 2.0 Real Human Pipeline With Reference Images

Seedance 2.0 Real Human Pipeline
How to Build a Seedance 2.0 Real Human Pipeline With Reference Images

Research
SEAR: Schema-Based Evaluation and Routing for LLM Gateways

Research
SEAR: Schema-Based Evaluation and Routing for LLM Gateways

Customer Case Study
Why ISEKAI ZERO is choosing Infron for its inference layer

Customer Case Study
Why ISEKAI ZERO is choosing Infron for its inference layer
Less orchestration.
More innovation.
Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.
Less orchestration.
More innovation.
Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.
Less orchestration.
More innovation.
Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.