microphone-linesUniversal Audio API

Unified API for speech-to-text-transcription models、speech-to-text-translation models、text-to-speech-creation models

API Overview

To simplify the integration of different speech-to-text-transcription models (stt)speech-to-text-translation models (stt)text-to-speech-creation models (tts), Infron AI provides a unified image API.

API Specification

text-to-speech-creation models (tts)

Generates audio from the input text.

curl https://audio.onerouter.pro/v1/audio/speech \
    -H "Content-Type: application/json" \
    -H "Authorization: <API_KEY>" \
    -d '{
    "model": "gpt-4o-mini-tts",
    "input": "A cute baby sea otter",
    "voice": "alloy"
  }' \
  --output speech.mp3
  • <API_KEY> is your API Key generated in API pagearrow-up-right.

  • model is the model name, such as gpt-4o-mini-tts, available model list can be access in Model pagearrow-up-right.

  • The voice to use when generating the audio. Supported voices are alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, and verse.

Example response

circle-info

The audio file content.

speech-to-text-translation models (stt)

Translates audio into English.

  • <API_KEY> is your API Key generated in API pagearrow-up-right.

  • model is the model name, such as whisper-1, available model list can be access in Model pagearrow-up-right.

  • file is the audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

Example response

speech-to-text-transcription models (stt)

Transcribes audio into the input language.

  • <API_KEY> is your API Key generated in API pagearrow-up-right.

  • model is the model name, such as whisper-1, available model list can be access in Model pagearrow-up-right.

  • file is the audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

Example response

Last updated