Embedding

Artifact retrieval published

Also known as: Vector Embedding, Text Embedding, Dense Vector

Definition

A dense numerical vector representation of content (text, images, audio) where similar items have similar vectors. Embeddings compress semantic meaning into a fixed-size array of floats (typically 384-3072 dimensions) that can be compared mathematically. They're the foundation of vector search—without embeddings, there's nothing to search over.

What this is NOT

  • Not the embedding model (the model produces embeddings; an embedding is a single vector)
  • Not a sparse vector (embeddings are dense; TF-IDF vectors are sparse)
  • Not the text itself (embeddings are numerical representations of text)

Alternative Interpretations

Different communities use this term differently:

llm-practitioners

The output of an embedding model (OpenAI text-embedding-3, Cohere embed, sentence-transformers) that converts text into a vector for storage in a vector database and similarity comparison.

Sources: OpenAI Embeddings API documentation, Sentence-Transformers documentation, MTEB benchmark

deep-learning

A learned representation where items are mapped to points in a continuous vector space such that geometric relationships reflect semantic relationships. Word2Vec and GloVe pioneered this for words; modern models extend to sentences and documents.

Sources: Word2Vec paper (Mikolov et al., 2013), BERT and transformer-based embedding models

Examples

  • OpenAI's text-embedding-3-small producing a 1536-dimensional vector
  • Embedding a product description for a recommendation system
  • Converting code snippets to vectors for code search
  • Multimodal embeddings that represent both text and images

Counterexamples

Things that might seem like Embedding but are not:

  • TF-IDF vectors (sparse, not learned)
  • One-hot encodings (not semantic)
  • Raw text strings

Relations

Implementations

Tools and frameworks that implement this concept: