Embedding

Artifact retrieval published

Also known as: Vector Embedding, Text Embedding, Dense Vector

Definition

A dense numerical vector representation of content (text, images, audio) where similar items have similar vectors. Embeddings compress semantic meaning into a fixed-size array of floats (typically 384-3072 dimensions) that can be compared mathematically. They're the foundation of vector search—without embeddings, there's nothing to search over.

What this is NOT

Not the embedding model (the model produces embeddings; an embedding is a single vector)
Not a sparse vector (embeddings are dense; TF-IDF vectors are sparse)
Not the text itself (embeddings are numerical representations of text)

Alternative Interpretations

Different communities use this term differently:

llm-practitioners

The output of an embedding model (OpenAI text-embedding-3, Cohere embed, sentence-transformers) that converts text into a vector for storage in a vector database and similarity comparison.

Sources: OpenAI Embeddings API documentation, Sentence-Transformers documentation, MTEB benchmark

deep-learning

A learned representation where items are mapped to points in a continuous vector space such that geometric relationships reflect semantic relationships. Word2Vec and GloVe pioneered this for words; modern models extend to sentences and documents.

Sources: Word2Vec paper (Mikolov et al., 2013), BERT and transformer-based embedding models

Examples

OpenAI's text-embedding-3-small producing a 1536-dimensional vector
Embedding a product description for a recommendation system
Converting code snippets to vectors for code search
Multimodal embeddings that represent both text and images

Counterexamples

Things that might seem like Embedding but are not:

TF-IDF vectors (sparse, not learned)
One-hot encodings (not semantic)
Raw text strings

Relations

requires vector-search (Vector search operates on embeddings)
requires retrieval-augmented-generation (RAG typically uses embeddings for retrieval)
overlapsWith chunking (Chunks are what get embedded)

Implementations

Tools and frameworks that implement this concept:

Azure OpenAI Service secondary
Chroma secondary
Cohere primary
Jina AI primary
Milvus secondary
OpenAI primary
pgvector secondary
Voyage AI primary
Weaviate secondary