folder-bookmarkEmbeddings API

Generate vector embeddings from text

Embeddings are numerical representations of text that capture semantic meaning. They convert text into vectors (arrays of numbers) that can be used for various machine learning tasks.

Infron AI provides a unified API to access embedding models from multiple providers.

What are Embeddings?

Embeddings transform text into high-dimensional vectors where semantically similar texts are positioned closer together in vector space. For example, "cat" and "kitten" would have similar embeddings, while "cat" and "airplane" would be far apart.

These vector representations enable machines to understand relationships between pieces of text, making them essential for many AI applications.

Common Use Cases

Embeddings are used in a wide variety of applications:

  • RAG (Retrieval-Augmented Generation): Build RAG systems that retrieve relevant context from a knowledge base before generating answers. Embeddings help find the most relevant documents to include in the LLM's context.

  • Semantic Search: Convert documents and queries into embeddings, then find the most relevant documents by comparing vector similarity. This provides more accurate results than traditional keyword matching because it understands meaning rather than just matching words.

  • Recommendation Systems: Generate embeddings for items (products, articles, movies) and user preferences to recommend similar items. By comparing embedding vectors, you can find items that are semantically related even if they don't share obvious keywords.

  • Clustering and Classification: Group similar documents together or classify text into categories by analyzing embedding patterns. Documents with similar embeddings likely belong to the same topic or category.

  • Duplicate Detection: Identify duplicate or near-duplicate content by comparing embedding similarity. This works even when text is paraphrased or reworded.

  • Anomaly Detection: Detect unusual or outlier content by identifying embeddings that are far from typical patterns in your dataset.

How to Use Embeddings

Basic Request

To generate embeddings, send a POST request to /embeddings with your text input and chosen model:

Batch Processing

You can generate embeddings for multiple texts in a single request by passing an array of strings:

Practical Example

Here's a complete example of building a semantic search system using embeddings:

Expected output:

Best Practices

  • Choose the Right Model: Different embedding models have different strengths. Smaller models (like qwen-qwen3-embedding-0.6b or openai-text-embedding-3-small) are faster and cheaper, while larger models (like openai-text-embedding-3-large) provide better quality. Test multiple models to find the best fit for your use case.

  • Batch Your Requests: When processing multiple texts, send them in a single request rather than making individual API calls. This reduces latency and costs.

  • Cache Embeddings: Embeddings for the same text are deterministic (they don't change). Store embeddings in a database or vector store to avoid regenerating them repeatedly.

  • Normalize for Comparison: When comparing embeddings, use cosine similarity rather than Euclidean distance. Cosine similarity is scale-invariant and works better for high-dimensional vectors.

  • Consider Context Length: Each model has a maximum input length (context window). Longer texts may need to be chunked or truncated. Check the model's specifications before processing long documents.

  • Use Appropriate Chunking: For long documents, split them into meaningful chunks (paragraphs, sections) rather than arbitrary character limits. This preserves semantic coherence.

Limitations

  • No Streaming: Unlike chat completions, embeddings are returned as complete responses. Streaming is not supported.

  • Token Limits: Each model has a maximum input length. Texts exceeding this limit will be truncated or rejected.

  • Deterministic Output: Embeddings for the same input text will always be identical (no temperature or randomness).

  • Language Support: Some models are optimized for specific languages. Check model documentation for language capabilities.

Advanced features

OneRouter has standardized and aggregated the API formats and parameters across multiple providers to ensure a consistent and unified developer experience. This allows users to interact with diverse model endpoints through a common interface without needing to adjust code for each provider.

However, please note that some Embedding APIs offered by specific providers include advanced features or custom parameters that go beyond the unified specification. These additional capabilities are not always fully represented within the OneRouter unified API definition. For detailed information about such advanced parameters, behaviors, or usage examples, please refer to the original official documentation of the respective provider.

Last updated