Customer Case Study

Why ISEKAI ZERO is choosing Infron for its inference layer

Date

Author

Andrew Zheng

ISEKAI ZERO sits at the high end of AI roleplay. Every player's message has to come back fast, in character, and at a cost the unit economics can absorb. On standard LLM gateways, none of the three were holding.

  • Customer: ISEKAI ZERO (isekai.world)

  • Parent company: ARX MEDIA SDN BHD

  • Industry: Interactive entertainment / AI roleplay

  • With Infron: Batch API (in production), Smart Router with Auto + Provider Lock (under evaluation)

  • Default model: DeepSeek v3.2

  • Reach: 10-language auto-translation, 4.76★ on Google Play

What ISEKAI ZERO is building

ISEKAI ZERO came out of ARX MEDIA in late 2025. The team had spent years in AR and VR before turning to AI entertainment. The product they wanted to build wasn't a chatbot. It was a live world: players design their own characters, build the scene, write the storyline, and watch it unfold in three modes: narration, light novel, manga.

The product targets the high end of the AI roleplay market. Pricing is based on microtransactions (USD $7.99 = 500 Arcane currency), with a moderated marketplace of user-made characters and auto-translation into 10 languages.

ISEKAI ZERO's stickiness comes from creation, not consumption. Players design their own characters, build their own worlds, and watch them come to life. That depth of UGC engagement turns the account into a player's body of work rather than a subscription, and it shows in monetization: per-user spend runs 3–6x what comparable roleplay products see. A 4,000+ active Discord community of player-creators feeds back into new player acquisition.

How the unit economics work. ISEKAI ZERO charges users in Arcane currency, and every player message is a model call. The Arcane spent on that message has to cover the real cost of the call, with what's left funding the platform and creator payouts. Inference isn't a line item for ISEKAI ZERO. It is the COGS.

The three problems on standard gateways

ISEKAI ZERO had picked DeepSeek v3.2 as the default after testing the alternatives. Cheaper alternatives lost users on quality. v3.2 was the only model in its class that met the bar.

That made cost optimization a routing problem, not a model problem. And on standard LLM gateways, routing was breaking in three ways.

1. Content filters didn't understand roleplay. Most cloud-hosted endpoints ship with default safety filters. A character cursing at a villain, describing fictional combat, or staying in tone during a tense scene gets flagged the same as a real policy violation. The API call fails on exactly the dialogue the product is built to deliver.

2. Discounted endpoints quietly returned worse output. A regional, low-priced endpoint of the same model would return lower-quality answers than the reference deployment. The list price said one thing. The user experience said another.

3. Traffic spikes caused silent failover to expensive providers. Under load, gateways switched to backup providers without notice. The cheapest provider on paper became more expensive in practice. And it happened right when ISEKAI ZERO needed cheap inference most.

What's different about Infron

ISEKAI ZERO has begun routing inference traffic through Infron. Three things about how Infron works shaped the decision.

1. The customer chooses how routing works. Infron's router has two modes. Auto lets Infron route across providers and handle failovers. Provider Lock lets the customer pin a single model to a single provider. Either way, the customer keeps Infron's uptime SLA, retry logic, and observability.

For ISEKAI ZERO specifically: DeepSeek v3.2 on Alibaba Cloud's Beijing endpoint is the lowest-priced deployment of the model in the market. A direct connection to it, however, doesn't hold up under live player traffic (rate limits, quality drift, regional node hiccups all show up). Provider Lock combines both sides: pin the cheapest endpoint, but route through Infron's reliability layer. The list price stays. The effective availability doesn't drop.

Infron Smart Router with Auto and Provider Lock modes. For ISEKAI ZERO's path forward, Provider Lock is the relevant mode: pin the regional DeepSeek v3.2 endpoint with the lowest landed cost without losing the reliability layer.

2. Infron works directly with the providers. A single application team filing a feature request to a cloud platform usually gets absorbed by the sales or support layer. It rarely reaches the product or model team. Infron's position is different: aggregated traffic across many customers, and direct relationships with the provider's product owners and model teams. Same request, different escalation path. From a single customer it's a ticket. From Infron it's a platform-level signal.

For ISEKAI ZERO specifically, that has meant joint engineering with Alibaba Cloud across three areas: turning off the default content filter that was blocking ordinary roleplay dialogue, closing the quality gap between the regional endpoint and the reference DeepSeek v3.2 deployment, and improving throughput on burst traffic. The throughput work continues as the regional site absorbs heavier industry load.

And once these fixes ship, they apply to every Infron customer routing through that provider. The optimization a single customer negotiates for themselves serves only themselves. On Infron, it becomes the default capability the next similar customer inherits at onboarding.

3. Elastic capacity for high-concurrency traffic. ISEKAI ZERO is a live player conversation product. Traffic moves with DAU. Evening peaks and popular character launches produce visible bursts. A single direct account hits provider rate limits at peak (429s, or silent failover to a more expensive backup). Infron distributes the load across a multi-provider capacity pool, absorbing peaks that would throttle a single account, with throughput reaching up to 100M TPM. The customer doesn't pre-provision for worst-case capacity, and doesn't pay the silent-overflow premium when peaks hit.

Frequently asked questions

Q: We're already on a different LLM gateway. How disruptive is migrating to Infron?

Infron exposes an OpenAI-compatible API, so for most teams migration is changing the base URL and API key. Application code stays the same. If you currently rely on a specific provider's quirks, our docs cover provider-level routing controls [link] including fallback ordering and endpoint pinning.

Q: Does using Infron mean I pay more than going direct to a provider?

No. Infron passes through the underlying provider's pricing without markup, and charges a small service fee (currently 3%) on top. You generally get the same per-token rate as you would direct, plus the failover, observability, and multi-provider capacity you don't get from a single provider account. We wrote about why this multi-provider model matters in Top LLM Gateways in 2026 [link].

Q: How does Provider Lock differ from just calling the provider directly?

Provider Lock pins your traffic to a specific provider/endpoint of your choice (the "cheapest one I trust" pattern), but keeps Infron's reliability layer in front: retries, observability, and failover to a backup if the locked provider is unreachable. Going direct gives you the same pricing without any of that. Implementation details are in our inference provider routing docs [link].

Q: Where does the bandwidth for 100M TPM actually come from?

Infron aggregates capacity across 100+ providers and 400+ models, which means peak load gets distributed across a much larger pool than any single provider account can offer. We've written more about how this routing logic works in OneRouter: The World's First Agentic LLM Router [link].

Q: What's the difference between an LLM gateway and an LLM router?

A gateway gives you a single API in front of many providers. A router decides which provider handles each request. Infron does both, plus the provider co-engineering work that's harder to see from a feature list. See What is an AI Gateway? [link] for the architectural background.

Why this is showing up across AI applications

Most AI products at scale have already picked their model. Margin pressure is no longer about which model. It's about how the model gets served.

The real cost of inference hides in moderation behavior, regional quality drift, and overflow events that turn list prices into fiction. The teams that win are the ones whose infrastructure partner fixes those issues at the source, instead of routing around them.

If your inference costs feel less predictable than they should, we'd like to talk [link].

ISEKAI ZERO sits at the high end of AI roleplay. Every player's message has to come back fast, in character, and at a cost the unit economics can absorb. On standard LLM gateways, none of the three were holding.

  • Customer: ISEKAI ZERO (isekai.world)

  • Parent company: ARX MEDIA SDN BHD

  • Industry: Interactive entertainment / AI roleplay

  • With Infron: Batch API (in production), Smart Router with Auto + Provider Lock (under evaluation)

  • Default model: DeepSeek v3.2

  • Reach: 10-language auto-translation, 4.76★ on Google Play

What ISEKAI ZERO is building

ISEKAI ZERO came out of ARX MEDIA in late 2025. The team had spent years in AR and VR before turning to AI entertainment. The product they wanted to build wasn't a chatbot. It was a live world: players design their own characters, build the scene, write the storyline, and watch it unfold in three modes: narration, light novel, manga.

The product targets the high end of the AI roleplay market. Pricing is based on microtransactions (USD $7.99 = 500 Arcane currency), with a moderated marketplace of user-made characters and auto-translation into 10 languages.

ISEKAI ZERO's stickiness comes from creation, not consumption. Players design their own characters, build their own worlds, and watch them come to life. That depth of UGC engagement turns the account into a player's body of work rather than a subscription, and it shows in monetization: per-user spend runs 3–6x what comparable roleplay products see. A 4,000+ active Discord community of player-creators feeds back into new player acquisition.

How the unit economics work. ISEKAI ZERO charges users in Arcane currency, and every player message is a model call. The Arcane spent on that message has to cover the real cost of the call, with what's left funding the platform and creator payouts. Inference isn't a line item for ISEKAI ZERO. It is the COGS.

The three problems on standard gateways

ISEKAI ZERO had picked DeepSeek v3.2 as the default after testing the alternatives. Cheaper alternatives lost users on quality. v3.2 was the only model in its class that met the bar.

That made cost optimization a routing problem, not a model problem. And on standard LLM gateways, routing was breaking in three ways.

1. Content filters didn't understand roleplay. Most cloud-hosted endpoints ship with default safety filters. A character cursing at a villain, describing fictional combat, or staying in tone during a tense scene gets flagged the same as a real policy violation. The API call fails on exactly the dialogue the product is built to deliver.

2. Discounted endpoints quietly returned worse output. A regional, low-priced endpoint of the same model would return lower-quality answers than the reference deployment. The list price said one thing. The user experience said another.

3. Traffic spikes caused silent failover to expensive providers. Under load, gateways switched to backup providers without notice. The cheapest provider on paper became more expensive in practice. And it happened right when ISEKAI ZERO needed cheap inference most.

What's different about Infron

ISEKAI ZERO has begun routing inference traffic through Infron. Three things about how Infron works shaped the decision.

1. The customer chooses how routing works. Infron's router has two modes. Auto lets Infron route across providers and handle failovers. Provider Lock lets the customer pin a single model to a single provider. Either way, the customer keeps Infron's uptime SLA, retry logic, and observability.

For ISEKAI ZERO specifically: DeepSeek v3.2 on Alibaba Cloud's Beijing endpoint is the lowest-priced deployment of the model in the market. A direct connection to it, however, doesn't hold up under live player traffic (rate limits, quality drift, regional node hiccups all show up). Provider Lock combines both sides: pin the cheapest endpoint, but route through Infron's reliability layer. The list price stays. The effective availability doesn't drop.

Infron Smart Router with Auto and Provider Lock modes. For ISEKAI ZERO's path forward, Provider Lock is the relevant mode: pin the regional DeepSeek v3.2 endpoint with the lowest landed cost without losing the reliability layer.

2. Infron works directly with the providers. A single application team filing a feature request to a cloud platform usually gets absorbed by the sales or support layer. It rarely reaches the product or model team. Infron's position is different: aggregated traffic across many customers, and direct relationships with the provider's product owners and model teams. Same request, different escalation path. From a single customer it's a ticket. From Infron it's a platform-level signal.

For ISEKAI ZERO specifically, that has meant joint engineering with Alibaba Cloud across three areas: turning off the default content filter that was blocking ordinary roleplay dialogue, closing the quality gap between the regional endpoint and the reference DeepSeek v3.2 deployment, and improving throughput on burst traffic. The throughput work continues as the regional site absorbs heavier industry load.

And once these fixes ship, they apply to every Infron customer routing through that provider. The optimization a single customer negotiates for themselves serves only themselves. On Infron, it becomes the default capability the next similar customer inherits at onboarding.

3. Elastic capacity for high-concurrency traffic. ISEKAI ZERO is a live player conversation product. Traffic moves with DAU. Evening peaks and popular character launches produce visible bursts. A single direct account hits provider rate limits at peak (429s, or silent failover to a more expensive backup). Infron distributes the load across a multi-provider capacity pool, absorbing peaks that would throttle a single account, with throughput reaching up to 100M TPM. The customer doesn't pre-provision for worst-case capacity, and doesn't pay the silent-overflow premium when peaks hit.

Frequently asked questions

Q: We're already on a different LLM gateway. How disruptive is migrating to Infron?

Infron exposes an OpenAI-compatible API, so for most teams migration is changing the base URL and API key. Application code stays the same. If you currently rely on a specific provider's quirks, our docs cover provider-level routing controls [link] including fallback ordering and endpoint pinning.

Q: Does using Infron mean I pay more than going direct to a provider?

No. Infron passes through the underlying provider's pricing without markup, and charges a small service fee (currently 3%) on top. You generally get the same per-token rate as you would direct, plus the failover, observability, and multi-provider capacity you don't get from a single provider account. We wrote about why this multi-provider model matters in Top LLM Gateways in 2026 [link].

Q: How does Provider Lock differ from just calling the provider directly?

Provider Lock pins your traffic to a specific provider/endpoint of your choice (the "cheapest one I trust" pattern), but keeps Infron's reliability layer in front: retries, observability, and failover to a backup if the locked provider is unreachable. Going direct gives you the same pricing without any of that. Implementation details are in our inference provider routing docs [link].

Q: Where does the bandwidth for 100M TPM actually come from?

Infron aggregates capacity across 100+ providers and 400+ models, which means peak load gets distributed across a much larger pool than any single provider account can offer. We've written more about how this routing logic works in OneRouter: The World's First Agentic LLM Router [link].

Q: What's the difference between an LLM gateway and an LLM router?

A gateway gives you a single API in front of many providers. A router decides which provider handles each request. Infron does both, plus the provider co-engineering work that's harder to see from a feature list. See What is an AI Gateway? [link] for the architectural background.

Why this is showing up across AI applications

Most AI products at scale have already picked their model. Margin pressure is no longer about which model. It's about how the model gets served.

The real cost of inference hides in moderation behavior, regional quality drift, and overflow events that turn list prices into fiction. The teams that win are the ones whose infrastructure partner fixes those issues at the source, instead of routing around them.

If your inference costs feel less predictable than they should, we'd like to talk [link].

Less orchestration.
More innovation.

Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.

Less orchestration.
More innovation.

Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.

Less orchestration.
More innovation.

Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.