# Overview

Embeddings are numerical representations of text that capture its semantic meaning. They transform text into vectors (arrays of numbers) that can be used in a wide range of machine learning tasks. Infron AI offers a unified API that allows you to access embedding models from multiple providers through a single interface.

### Common Use Cases <a href="#common-use-cases" id="common-use-cases"></a>

Embeddings are used in a wide variety of applications:

* **RAG (Retrieval-Augmented Generation)**: Build RAG systems that retrieve relevant context from a knowledge base before generating answers. Embeddings help find the most relevant documents to include in the LLM's context.
* **Semantic Search**: Convert documents and queries into embeddings, then find the most relevant documents by comparing vector similarity. This provides more accurate results than traditional keyword matching because it understands meaning rather than just matching words.
* **Recommendation Systems**: Generate embeddings for items (products, articles, movies) and user preferences to recommend similar items. By comparing embedding vectors, you can find items that are semantically related even if they don't share obvious keywords.
* **Clustering and Classification**: Group similar documents together or classify text into categories by analyzing embedding patterns. Documents with similar embeddings likely belong to the same topic or category.
* **Duplicate Detection**: Identify duplicate or near-duplicate content by comparing embedding similarity. This works even when text is paraphrased or reworded.
* **Anomaly Detection**: Detect unusual or outlier content by identifying embeddings that are far from typical patterns in your dataset.

### How to Use Embeddings <a href="#how-to-use-embeddings" id="how-to-use-embeddings"></a>

#### Basic Request <a href="#basic-request" id="basic-request"></a>

To generate embeddings, send a POST request to `/embeddings` with your text input and chosen model:

{% tabs %}
{% tab title="Python" %}

```python
import requests

response = requests.post(
  "https://llm.onerouter.pro/v1/embeddings",
  headers={
    "Authorization": f"Bearer {{API_KEY_REF}}",
    "Content-Type": "application/json",
  },
  json={
    "model": "{{MODEL}}",
    "input": "The quick brown fox jumps over the lazy dog"
  }
)

data = response.json()
embedding = data["data"][0]["embedding"]
print(f"Embedding dimension: {len(embedding)}")
```

{% endtab %}

{% tab title="TypeScript (fetch)" %}

```typescript
const response = await fetch('https://llm.onerouter.pro/v1/embeddings', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer {{API_KEY_REF}}',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: '{{MODEL}}',
    input: 'The quick brown fox jumps over the lazy dog'
  }),
});

const data = await response.json();
const embedding = data.data[0].embedding;
console.log(`Embedding dimension: ${embedding.length}`);
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY_REF" \
  -d '{
    "model": "{{MODEL}}",
    "input": "The quick brown fox jumps over the lazy dog"
  }'
```

{% endtab %}
{% endtabs %}

#### Batch Processing <a href="#batch-processing" id="batch-processing"></a>

You can generate embeddings for multiple texts in a single request by passing an array of strings:

Python

{% tabs %}
{% tab title="Python" %}

```python
import requests

response = requests.post(
  "https://llm.onerouter.pro/v1/embeddings",
  headers={
    "Authorization": f"Bearer {{API_KEY_REF}}",
    "Content-Type": "application/json",
  },
  json={
    "model": "{{MODEL}}",
    "input": [
      "Machine learning is a subset of artificial intelligence",
      "Deep learning uses neural networks with multiple layers",
      "Natural language processing enables computers to understand text"
    ]
  }
)

data = response.json()
for i, item in enumerate(data["data"]):
  print(f"Embedding {i}: {len(item['embedding'])} dimensions")
```

{% endtab %}

{% tab title="TypeScript (fetch)" %}

```typescript
const response = await fetch('https://llm.onerouter.pro/v1/embeddings', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer {{API_KEY_REF}}',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: '{{MODEL}}',
    input: [
      'Machine learning is a subset of artificial intelligence',
      'Deep learning uses neural networks with multiple layers',
      'Natural language processing enables computers to understand text'
    ]
  }),
});

const data = await response.json();
data.data.forEach((item, index) => {
  console.log(`Embedding ${index}: ${item.embedding.length} dimensions`);
});
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY_REF" \
  -d '{
    "model": "{{MODEL}}",
    "input": [
      "Machine learning is a subset of artificial intelligence",
      "Deep learning uses neural networks with multiple layers",
      "Natural language processing enables computers to understand text"
    ]
  }'
```

{% endtab %}
{% endtabs %}

#### Semantic Search <a href="#semantic-search" id="semantic-search"></a>

{% tabs %}
{% tab title="Python" %}

```python
import requests
import numpy as np

API_KEY = "{{API_KEY_REF}}"

# Sample documents
documents = [
  "The cat sat on the mat",
  "Dogs are loyal companions",
  "Python is a programming language",
  "Machine learning models require training data",
  "The weather is sunny today"
]

def cosine_similarity(a, b):
  """Calculate cosine similarity between two vectors"""
  dot_product = np.dot(a, b)
  magnitude_a = np.linalg.norm(a)
  magnitude_b = np.linalg.norm(b)
  return dot_product / (magnitude_a * magnitude_b)

def semantic_search(query, documents):
  """Perform semantic search using embeddings"""
  # Generate embeddings for query and all documents
  response = requests.post(
    "https://llm.onerouter.pro/v1/embeddings",
    headers={
      "Authorization": f"Bearer {API_KEY}",
      "Content-Type": "application/json",
    },
    json={
      "model": "{{MODEL}}",
      "input": [query] + documents
    }
  )
  
  data = response.json()
  query_embedding = np.array(data["data"][0]["embedding"])
  doc_embeddings = [np.array(item["embedding"]) for item in data["data"][1:]]
  
  # Calculate similarity scores
  results = []
  for i, doc in enumerate(documents):
    similarity = cosine_similarity(query_embedding, doc_embeddings[i])
    results.append({"document": doc, "similarity": similarity})
  
  # Sort by similarity (highest first)
  results.sort(key=lambda x: x["similarity"], reverse=True)
  
  return results

# Search for documents related to pets
results = semantic_search("pets and animals", documents)
print("Search results:")
for i, result in enumerate(results):
  print(f"{i + 1}. {result['document']} (similarity: {result['similarity']:.4f})")
```

{% endtab %}
{% endtabs %}

Expected output:

```
Search results:
1. Dogs are loyal companions (similarity: 0.8234)
2. The cat sat on the mat (similarity: 0.7891)
3. The weather is sunny today (similarity: 0.3456)
4. Machine learning models require training data (similarity: 0.2987)
5. Python is a programming language (similarity: 0.2654)
```

### Best Practices

* Choose the Right Model: Different embedding models have different strengths. Smaller models (such as qwen-qwen3-embedding-0.6b or openai-text-embedding-3-small) are faster and more cost‑efficient, while larger models (such as openai-text-embedding-3-large) generally produce higher‑quality embeddings. Test multiple models to determine which one best fits your use case.
* Batch Your Requests: When processing multiple text inputs, send them in a single request instead of making separate API calls. This helps reduce latency and overall cost.
* Cache Embeddings: Embeddings are deterministic for the same input text. Store them in a database or vector store so you don’t need to regenerate them repeatedly.
* Normalize for Comparison: When comparing embeddings, use cosine similarity rather than Euclidean distance. Cosine similarity is scale‑invariant and performs better for high‑dimensional vectors.
* Consider Context Length: Each model has a maximum input size. Longer texts may need to be chunked or truncated. Review the model’s specifications before processing large documents.
* Use Meaningful Chunking: For long documents, split them into semantically meaningful units (such as paragraphs or sections) instead of relying on fixed character counts. This helps preserve context and coherence.

### Limitations

* No Streaming: Unlike chat completions, embeddings are returned as complete responses. Streaming is not supported.
* Token Limits: Each model has a maximum input length. Texts that exceed this limit will be truncated or rejected.
* Deterministic Output: The same input text will always produce identical embeddings; there is no randomness or temperature involved.
* Language Support: Some models are optimized for specific languages. Refer to the model’s documentation for details on language coverage.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://infronai.gitbook.io/docs/llm-apis/embeddings-api/overview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
