From Image Model to Finished Clip

Seedance 2.0 Real Human Video API: Access, Setup, and Prompting

Seedance 2.0 is live on Infron
Seedance 2.0 is live on Infron
Date

Author

Andrew Zheng

Real people have always been the hard case for AI video. Synthetic characters are forgiving, a slightly different face between frames reads as style. A real person is not forgiving. The moment the identity drifts, the eye catches it, and the whole clip falls apart. That single problem is why most real-person AI video still looks off.

Seedance 2.0 closes that gap. Hand it one real portrait, and it locks that identity, holding the face, expression, and lip sync stable from the first frame to the last, with audio generated alongside the video. The remaining question is how you reach the Seedance 2.0 real human capability without a pile of setup in the way. This guide walks through generating real human video on Infron from your own code, start to finish.


What "real human" means in Seedance 2.0


Capability

What it means for real human video

Identity consistency

The same real face, hair, and outfit from the first frame to the last, with no drift

Native audio video sync

Dialogue, lip sync, and sound generated in one pass, not added in a separate dubbing step

Multi-shot storytelling

Up to 9 images guide a cut sequence while the person stays consistent across every shot

Continuous camera moves

One unbroken shot gliding through a scene, controlled entirely through the prompt


Seedance 2.0 splits people into two modes, and the distinction decides which model string you call.

Need

What it does

Typical input

Infron model page

Real-person video

Drives a real human from a portrait you supply, locking that identity across the clip

One reference portrait you have the rights to use

https://infron.ai/models/bytedance/seedance-2.0/virtual-portrait-reference-to-video/api-reference

Multi-reference video

Builds a multi-shot clip from several references, used for animated or character-driven work

Up to 9 images, plus video and audio references

https://infron.ai/models/bytedance/seedance-2.0/reference-to-video

The Seedance 2.0 real human capability runs through the virtual portrait line. You hand it a reference image of an actual person, the model locks that identity, then it holds the face, expressions, and lip sync stable across the entire clip. A real person stays the same person from the first frame to the last, with native audio generated alongside the video rather than added in a separate dubbing pass.

Because the input is a real identity, every serious provider gates it. That gate is the whole reason the next section exists.

Why the usual real human routes are slow

Reaching Seedance 2.0 real human output is rarely the problem. The friction sits in everything you do before the first generation.

The ComfyUI tutorial is excellent if you live inside ComfyUI and generate one clip at a time. It does not fit a team that wants real human video as a programmatic step inside a larger pipeline, where every clip should come from an API call rather than a manual node run. That is the gap Infron fills.


ComfyUI nodes

Reseller gateway

Infron

Setup before first clip

Install nodes, log into a permitted network, clear a one-time liveness check

New account plus a separate identity verification step

Request enterprise access once, then call one endpoint

Ongoing friction

Local graph upkeep and asset or group IDs to manage by hand

Extra account and portal to manage

No node graph, no second verification portal to babysit

Best fit

One clip at a time, inside ComfyUI

Casual self-serve access

Real human as a programmatic pipeline step

How to generate Seedance 2.0 real human video on Infron

Three steps: get access, upload your reference, then generate. The code below calls the live endpoint and follows the request shape the API expects.

Step 1: Get enterprise access

Seedance 2.0 real human is not open self-serve. Infron opens the virtual portrait line to enterprise users through an allowlist, so that real-person generation stays tied to an accountable, authorized organization rather than anonymous traffic. This is deliberate. Driving a real identity is exactly the capability that should sit behind consent and authorization, and the allowlist is how Infron keeps it there.

For your team this means one short onboarding rather than a recurring verification portal. Contact Infron to have your organization added, and the same API key that unlocks real human also carries the rest of the catalog.

Step 2: Upload your reference asset

Real-person reference images cannot be passed as a raw URL. If you try, the request is rejected by privacy detection with an InputImageSensitiveContentDetected.PrivacyInformation error. The portrait has to be uploaded as an asset first, then referenced by the asset URI the upload returns.

# Upload the reference portrait as an asset (multipart file stream)
curl -X POST https://media.onerouter.pro/v1/upload/resources \
  -H "Authorization: Bearer $INFRON_API_KEY" \
  -F "model=bytedance/seedance-2.0/virtual-portrait-reference-to-video" \
  -F "file=@/path/to/reference_portrait.png"
# Upload the reference portrait as an asset (multipart file stream)
curl -X POST https://media.onerouter.pro/v1/upload/resources \
  -H "Authorization: Bearer $INFRON_API_KEY" \
  -F "model=bytedance/seedance-2.0/virtual-portrait-reference-to-video" \
  -F "file=@/path/to/reference_portrait.png"

The upload returns an id and an upstream_asset_uri (in the form asset://asset-xxxx). Poll the status endpoint until upstream_status reads Active before you generate. A real portrait runs a consistency check first, so it usually sits in Processing for a moment. Uploaded assets are temporary: they expire after 7 days, and an account holds up to 1000 at a time.

import time
import requests

BASE = "https://media.onerouter.pro/v1"
HEADERS = {"Authorization": f"Bearer {INFRON_API_KEY}"}
MODEL = "bytedance/seedance-2.0/virtual-portrait-reference-to-video"

def upload_reference(path):
    with open(path, "rb") as f:
        r = requests.post(
            f"{BASE}/upload/resources",
            headers=HEADERS,
            data={"model": MODEL},
            files={"file": f},
        )
    r.raise_for_status()
    body = r.json()["data"]
    return body["id"], body["upstream_asset_uri"]

def wait_until_active(resource_id, timeout=180):
    deadline = time.time() + timeout
    while time.time() < deadline:
        r = requests.get(f"{BASE}/status/resources/{resource_id}", headers=HEADERS)
        r.raise_for_status()
        data = r.json()["data"]
        if data.get("upstream_status") == "Active":
            return True
        if data.get("upstream_status") == "Failed":
            raise RuntimeError(data.get("failure_code", "upload failed"))
        time.sleep(2)
    raise TimeoutError("Reference asset did not reach Active state in time")

resource_id, asset_uri = upload_reference("reference_portrait.png")
wait_until_active(resource_id)
import time
import requests

BASE = "https://media.onerouter.pro/v1"
HEADERS = {"Authorization": f"Bearer {INFRON_API_KEY}"}
MODEL = "bytedance/seedance-2.0/virtual-portrait-reference-to-video"

def upload_reference(path):
    with open(path, "rb") as f:
        r = requests.post(
            f"{BASE}/upload/resources",
            headers=HEADERS,
            data={"model": MODEL},
            files={"file": f},
        )
    r.raise_for_status()
    body = r.json()["data"]
    return body["id"], body["upstream_asset_uri"]

def wait_until_active(resource_id, timeout=180):
    deadline = time.time() + timeout
    while time.time() < deadline:
        r = requests.get(f"{BASE}/status/resources/{resource_id}", headers=HEADERS)
        r.raise_for_status()
        data = r.json()["data"]
        if data.get("upstream_status") == "Active":
            return True
        if data.get("upstream_status") == "Failed":
            raise RuntimeError(data.get("failure_code", "upload failed"))
        time.sleep(2)
    raise TimeoutError("Reference asset did not reach Active state in time")

resource_id, asset_uri = upload_reference("reference_portrait.png")
wait_until_active(resource_id)

Step 3: Generate with virtual-portrait-reference-to-video

You can see the full parameter schema on here: https://infron.ai/models/bytedance/seedance-2.0/virtual-portrait-reference-to-video/api-reference

Pass the asset URI from Step 2 in image_urls. Camera movement, pacing, and expression are all controlled through the prompt text, so a single portrait asset is the only image reference the call needs. Generation is asynchronous: you submit the job, receive a task_id, then poll the task until it completes.

def submit_generation(asset_uri, prompt):
    payload = {
        "model": MODEL,
        "image_urls": [asset_uri],
        "video_urls": [],
        "audio_urls": [],
        "prompt": prompt,
        "aspect_ratio": "16:9",
        "duration": "10",
        "resolution": "720p",
        "generate_audio": True,
        "n": 1,
    }
    r = requests.post(f"{BASE}/videos/generations", headers=HEADERS, json=payload)
    r.raise_for_status()
    return r.json()["task_id"]

def wait_for_video(task_id, timeout=600):
    deadline = time.time() + timeout
    while time.time() < deadline:
        r = requests.get(f"{BASE}/videos/tasks/{task_id}", headers=HEADERS)
        r.raise_for_status()
        body = r.json()
        if body.get("status") == "completed":
            return body
        time.sleep(5)
    raise TimeoutError("Video task did not complete in time")

task_id = submit_generation(
    asset_uri,
    "One single continuous unbroken camera shot gliding through a sunlit cafe, "
    "the woman from the reference seated by the window, looking up and smiling as "
    "she speaks, never cutting, warm natural light, gentle handheld motion."
)
result = wait_for_video(task_id)
print(result)
def submit_generation(asset_uri, prompt):
    payload = {
        "model": MODEL,
        "image_urls": [asset_uri],
        "video_urls": [],
        "audio_urls": [],
        "prompt": prompt,
        "aspect_ratio": "16:9",
        "duration": "10",
        "resolution": "720p",
        "generate_audio": True,
        "n": 1,
    }
    r = requests.post(f"{BASE}/videos/generations", headers=HEADERS, json=payload)
    r.raise_for_status()
    return r.json()["task_id"]

def wait_for_video(task_id, timeout=600):
    deadline = time.time() + timeout
    while time.time() < deadline:
        r = requests.get(f"{BASE}/videos/tasks/{task_id}", headers=HEADERS)
        r.raise_for_status()
        body = r.json()
        if body.get("status") == "completed":
            return body
        time.sleep(5)
    raise TimeoutError("Video task did not complete in time")

task_id = submit_generation(
    asset_uri,
    "One single continuous unbroken camera shot gliding through a sunlit cafe, "
    "the woman from the reference seated by the window, looking up and smiling as "
    "she speaks, never cutting, warm natural light, gentle handheld motion."
)
result = wait_for_video(task_id)
print(result)

For the fast tier, swap the model string to bytedance/seedance-2.0/fast/virtual-portrait-reference-to-video. Everything else stays the same.

Prompting real human video that actually holds identity

The model locks the face. You control everything else through language, and a few habits consistently produce cleaner results.

Write the camera as one move, not a sequence of cuts. Phrases like "one single continuous unbroken camera shot gliding through" keep Seedance 2.0 from inserting hard transitions that break the sense of a real person filmed live. When a scene has more than one subject, name each person in spatial order and add "never cutting, never skipping anyone" so the model visits them in a readable path. For energy, reach for fast, punchy, handheld, motion-blur language rather than listing technical camera specs. And because seeds vary, run several and select the best take rather than betting a deadline on a single generation.

Keep the prompt describing performance and motion. The identity is already handled by the reference asset, so spending prompt tokens re-describing the face tends to fight the lock rather than reinforce it.

Pricing and access

Seedance 2.0 real human runs inside the enterprise allowlist. Once your organization is approved, generation is billed by usage and rolls up into a single unified bill across everything you call, so real human video sits on the same invoice and the same key as the rest of your stack. There is no separate plan to negotiate per model. Reach out to [Infron Sales] to scope access for your team.


One API, 400+ models

Real human video is rarely the whole job. A typical pipeline generates a reference portrait first, locks identity, then renders motion, and Seedance 2.0 real human is the final link in that chain rather than the entire chain.

That is the practical reason to run it on Infron. The same OpenAI-compatible key that produces real human video also reaches 400+ AI models, including the image models you would use to create or clean up a reference portrait before it ever touches the virtual portrait line. You build the upstream image, the identity lock, and the final render against one endpoint, one auth header, and one bill, instead of stitching three vendors together. The real human capability is the headline. The single integration across 400+ AI models is what keeps it in production.


FAQ

Is Seedance 2.0 real human the same as the virtual portrait model?

Yes. Real human generation runs through bytedance/seedance-2.0/virtual-portrait-reference-to-video. You supply a reference image of a real person and the model holds that identity across the clip.

Do I need ComfyUI to generate real human video?

No. The ComfyUI partner nodes are one way to reach the capability. On Infron you call the API directly, with no node graph and no local liveness flow to maintain.

Can I upload a real person's photo?

Yes, for authorized use. The portrait must be uploaded as an asset first, which is what passes Seedance 2.0's privacy detection, and real human access is opened to enterprise users on the allowlist so that use stays consented and accountable.

Who can access real human generation?

Enterprise users on the Infron allowlist. Contact Infron to add your organization.

Does it generate audio?

Yes. Set generate_audio to true and Seedance 2.0 produces synchronized audio with the video in a single pass, including lip sync.

Can it do multiple shots and continuous camera moves?

Yes. Multi-shot sequences and single continuous camera moves are both controlled through the prompt. Describe the camera as one unbroken move and name subjects in order for the cleanest result.

Real people have always been the hard case for AI video. Synthetic characters are forgiving, a slightly different face between frames reads as style. A real person is not forgiving. The moment the identity drifts, the eye catches it, and the whole clip falls apart. That single problem is why most real-person AI video still looks off.

Seedance 2.0 closes that gap. Hand it one real portrait, and it locks that identity, holding the face, expression, and lip sync stable from the first frame to the last, with audio generated alongside the video. The remaining question is how you reach the Seedance 2.0 real human capability without a pile of setup in the way. This guide walks through generating real human video on Infron from your own code, start to finish.


What "real human" means in Seedance 2.0


Capability

What it means for real human video

Identity consistency

The same real face, hair, and outfit from the first frame to the last, with no drift

Native audio video sync

Dialogue, lip sync, and sound generated in one pass, not added in a separate dubbing step

Multi-shot storytelling

Up to 9 images guide a cut sequence while the person stays consistent across every shot

Continuous camera moves

One unbroken shot gliding through a scene, controlled entirely through the prompt


Seedance 2.0 splits people into two modes, and the distinction decides which model string you call.

Need

What it does

Typical input

Infron model page

Real-person video

Drives a real human from a portrait you supply, locking that identity across the clip

One reference portrait you have the rights to use

https://infron.ai/models/bytedance/seedance-2.0/virtual-portrait-reference-to-video/api-reference

Multi-reference video

Builds a multi-shot clip from several references, used for animated or character-driven work

Up to 9 images, plus video and audio references

https://infron.ai/models/bytedance/seedance-2.0/reference-to-video

The Seedance 2.0 real human capability runs through the virtual portrait line. You hand it a reference image of an actual person, the model locks that identity, then it holds the face, expressions, and lip sync stable across the entire clip. A real person stays the same person from the first frame to the last, with native audio generated alongside the video rather than added in a separate dubbing pass.

Because the input is a real identity, every serious provider gates it. That gate is the whole reason the next section exists.

Why the usual real human routes are slow

Reaching Seedance 2.0 real human output is rarely the problem. The friction sits in everything you do before the first generation.

The ComfyUI tutorial is excellent if you live inside ComfyUI and generate one clip at a time. It does not fit a team that wants real human video as a programmatic step inside a larger pipeline, where every clip should come from an API call rather than a manual node run. That is the gap Infron fills.


ComfyUI nodes

Reseller gateway

Infron

Setup before first clip

Install nodes, log into a permitted network, clear a one-time liveness check

New account plus a separate identity verification step

Request enterprise access once, then call one endpoint

Ongoing friction

Local graph upkeep and asset or group IDs to manage by hand

Extra account and portal to manage

No node graph, no second verification portal to babysit

Best fit

One clip at a time, inside ComfyUI

Casual self-serve access

Real human as a programmatic pipeline step

How to generate Seedance 2.0 real human video on Infron

Three steps: get access, upload your reference, then generate. The code below calls the live endpoint and follows the request shape the API expects.

Step 1: Get enterprise access

Seedance 2.0 real human is not open self-serve. Infron opens the virtual portrait line to enterprise users through an allowlist, so that real-person generation stays tied to an accountable, authorized organization rather than anonymous traffic. This is deliberate. Driving a real identity is exactly the capability that should sit behind consent and authorization, and the allowlist is how Infron keeps it there.

For your team this means one short onboarding rather than a recurring verification portal. Contact Infron to have your organization added, and the same API key that unlocks real human also carries the rest of the catalog.

Step 2: Upload your reference asset

Real-person reference images cannot be passed as a raw URL. If you try, the request is rejected by privacy detection with an InputImageSensitiveContentDetected.PrivacyInformation error. The portrait has to be uploaded as an asset first, then referenced by the asset URI the upload returns.

# Upload the reference portrait as an asset (multipart file stream)
curl -X POST https://media.onerouter.pro/v1/upload/resources \
  -H "Authorization: Bearer $INFRON_API_KEY" \
  -F "model=bytedance/seedance-2.0/virtual-portrait-reference-to-video" \
  -F "file=@/path/to/reference_portrait.png"

The upload returns an id and an upstream_asset_uri (in the form asset://asset-xxxx). Poll the status endpoint until upstream_status reads Active before you generate. A real portrait runs a consistency check first, so it usually sits in Processing for a moment. Uploaded assets are temporary: they expire after 7 days, and an account holds up to 1000 at a time.

import time
import requests

BASE = "https://media.onerouter.pro/v1"
HEADERS = {"Authorization": f"Bearer {INFRON_API_KEY}"}
MODEL = "bytedance/seedance-2.0/virtual-portrait-reference-to-video"

def upload_reference(path):
    with open(path, "rb") as f:
        r = requests.post(
            f"{BASE}/upload/resources",
            headers=HEADERS,
            data={"model": MODEL},
            files={"file": f},
        )
    r.raise_for_status()
    body = r.json()["data"]
    return body["id"], body["upstream_asset_uri"]

def wait_until_active(resource_id, timeout=180):
    deadline = time.time() + timeout
    while time.time() < deadline:
        r = requests.get(f"{BASE}/status/resources/{resource_id}", headers=HEADERS)
        r.raise_for_status()
        data = r.json()["data"]
        if data.get("upstream_status") == "Active":
            return True
        if data.get("upstream_status") == "Failed":
            raise RuntimeError(data.get("failure_code", "upload failed"))
        time.sleep(2)
    raise TimeoutError("Reference asset did not reach Active state in time")

resource_id, asset_uri = upload_reference("reference_portrait.png")
wait_until_active(resource_id)

Step 3: Generate with virtual-portrait-reference-to-video

You can see the full parameter schema on here: https://infron.ai/models/bytedance/seedance-2.0/virtual-portrait-reference-to-video/api-reference

Pass the asset URI from Step 2 in image_urls. Camera movement, pacing, and expression are all controlled through the prompt text, so a single portrait asset is the only image reference the call needs. Generation is asynchronous: you submit the job, receive a task_id, then poll the task until it completes.

def submit_generation(asset_uri, prompt):
    payload = {
        "model": MODEL,
        "image_urls": [asset_uri],
        "video_urls": [],
        "audio_urls": [],
        "prompt": prompt,
        "aspect_ratio": "16:9",
        "duration": "10",
        "resolution": "720p",
        "generate_audio": True,
        "n": 1,
    }
    r = requests.post(f"{BASE}/videos/generations", headers=HEADERS, json=payload)
    r.raise_for_status()
    return r.json()["task_id"]

def wait_for_video(task_id, timeout=600):
    deadline = time.time() + timeout
    while time.time() < deadline:
        r = requests.get(f"{BASE}/videos/tasks/{task_id}", headers=HEADERS)
        r.raise_for_status()
        body = r.json()
        if body.get("status") == "completed":
            return body
        time.sleep(5)
    raise TimeoutError("Video task did not complete in time")

task_id = submit_generation(
    asset_uri,
    "One single continuous unbroken camera shot gliding through a sunlit cafe, "
    "the woman from the reference seated by the window, looking up and smiling as "
    "she speaks, never cutting, warm natural light, gentle handheld motion."
)
result = wait_for_video(task_id)
print(result)

For the fast tier, swap the model string to bytedance/seedance-2.0/fast/virtual-portrait-reference-to-video. Everything else stays the same.

Prompting real human video that actually holds identity

The model locks the face. You control everything else through language, and a few habits consistently produce cleaner results.

Write the camera as one move, not a sequence of cuts. Phrases like "one single continuous unbroken camera shot gliding through" keep Seedance 2.0 from inserting hard transitions that break the sense of a real person filmed live. When a scene has more than one subject, name each person in spatial order and add "never cutting, never skipping anyone" so the model visits them in a readable path. For energy, reach for fast, punchy, handheld, motion-blur language rather than listing technical camera specs. And because seeds vary, run several and select the best take rather than betting a deadline on a single generation.

Keep the prompt describing performance and motion. The identity is already handled by the reference asset, so spending prompt tokens re-describing the face tends to fight the lock rather than reinforce it.

Pricing and access

Seedance 2.0 real human runs inside the enterprise allowlist. Once your organization is approved, generation is billed by usage and rolls up into a single unified bill across everything you call, so real human video sits on the same invoice and the same key as the rest of your stack. There is no separate plan to negotiate per model. Reach out to [Infron Sales] to scope access for your team.


One API, 400+ models

Real human video is rarely the whole job. A typical pipeline generates a reference portrait first, locks identity, then renders motion, and Seedance 2.0 real human is the final link in that chain rather than the entire chain.

That is the practical reason to run it on Infron. The same OpenAI-compatible key that produces real human video also reaches 400+ AI models, including the image models you would use to create or clean up a reference portrait before it ever touches the virtual portrait line. You build the upstream image, the identity lock, and the final render against one endpoint, one auth header, and one bill, instead of stitching three vendors together. The real human capability is the headline. The single integration across 400+ AI models is what keeps it in production.


FAQ

Is Seedance 2.0 real human the same as the virtual portrait model?

Yes. Real human generation runs through bytedance/seedance-2.0/virtual-portrait-reference-to-video. You supply a reference image of a real person and the model holds that identity across the clip.

Do I need ComfyUI to generate real human video?

No. The ComfyUI partner nodes are one way to reach the capability. On Infron you call the API directly, with no node graph and no local liveness flow to maintain.

Can I upload a real person's photo?

Yes, for authorized use. The portrait must be uploaded as an asset first, which is what passes Seedance 2.0's privacy detection, and real human access is opened to enterprise users on the allowlist so that use stays consented and accountable.

Who can access real human generation?

Enterprise users on the Infron allowlist. Contact Infron to add your organization.

Does it generate audio?

Yes. Set generate_audio to true and Seedance 2.0 produces synchronized audio with the video in a single pass, including lip sync.

Can it do multiple shots and continuous camera moves?

Yes. Multi-shot sequences and single continuous camera moves are both controlled through the prompt. Describe the camera as one unbroken move and name subjects in order for the cleanest result.

Less orchestration.
More innovation.

Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.

Less orchestration.
More innovation.

Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.

Less orchestration.
More innovation.

Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.