Weights & Biases

platform active freemium

An ML experiment tracking and model management platform that has expanded to support LLM development. W&B provides experiment logging, hyperparameter tracking, artifact versioning, and collaborative dashboards. Their Prompts and Weave products add LLM-specific tracing and evaluation.

Implements

Concepts this tool claims to implement:

  • Benchmark primary

    W&B Weave for LLM application tracing and evaluation. Track prompts, completions, and evaluation metrics. Compare runs across experiments.

  • Fine-Tuning secondary

    Experiment tracking for fine-tuning runs. Log training metrics, hyperparameters, and model checkpoints. Compare fine-tuning experiments.

  • Training Data secondary

    Dataset versioning and lineage tracking with W&B Artifacts. Track which data was used for which training runs.

Integration Surfaces

  • Python SDK (wandb)
  • JavaScript SDK
  • REST API
  • LangChain integration
  • Hugging Face integration
  • Web dashboard

Details

Vendor
Weights & Biases Inc.
License
MIT (client) / Proprietary (cloud)
Runs On
cloud, local
Used By
human, system

Notes

W&B is the dominant experiment tracking platform in ML. Their LLM tools (Weave, Prompts) are newer but leverage their existing infrastructure. Strong community and integrations. Self-hosted option available for enterprise. Good choice if already using W&B for traditional ML.