What it's used for

Weights & Biases (W&B) is the leading platform for ML experiment tracking, providing tools to log, visualize, and compare model training runs across your team. It automatically captures everything needed to reproduce any experiment.

Key use cases include:

Experiment tracking — log hyperparameters, metrics, loss curves, and system stats automatically
Run comparison — visually compare hundreds of experiments side by side
Model registry — version and stage models through development to production
Artifact management — track datasets, model checkpoints, and evaluation results
Sweeps — automated hyperparameter optimization with Bayesian search
LLM fine-tuning tracking — log prompts, completions, and eval metrics for LLM workflows

W&B is used by ML teams at companies of all sizes (from solo researchers to OpenAI, NVIDIA, and Microsoft) who need to move beyond spreadsheets and ad-hoc experiment management. It integrates seamlessly with PyTorch, Hugging Face Transformers, Keras, and every major training framework.

Getting started

Create a free account at wandb.ai.

Install and authenticate:

pip install wandb
wandb login
# Paste your API key from wandb.ai/authorize

Add tracking to your training script:

import wandb
wandb.init(project='my-project')
for epoch in range(10):
    loss = train_one_epoch()
    wandb.log({'loss': loss, 'epoch': epoch})
wandb.finish()

View results in the dashboard at wandb.ai.

Pricing: Free for personal use (unlimited experiments). Teams plan starts at $50/user/month with collaboration features. Enterprise pricing is custom. See wandb.ai/pricing.

Case studies

Real Weights & Biases projects

Submitted by verified specialists

60% fewer duplicate experiments Fintech

12-Person ML Team — 60% Fewer Duplicate Experiments

Series B fintech, ML platform team

› Challenge

A 12-person ML team had no centralized experiment tracking. Engineers were duplicating experiments unknowingly, spending 30% of compute budget re-running work that had already been done.

› Solution

Set up W&B with automatic experiment logging, artifact versioning, and sweep configurations. Built a shared model registry with approval workflows and automated comparison dashboards for weekly model reviews.

› Results

Duplicate experiments reduced 60%. Compute costs fell 28%. Model deployment frequency increased from twice per quarter to every two weeks. New engineers onboard and contribute meaningful experiments within their first week.

Tools

Weights & Biases AWS SageMaker MLflow

Hire an expert

18% lower perplexity AI / LLM

LoRA Config Sweeps Finding Optimal Hyperparams

AI startup preparing Series A

› Challenge

A startup fine-tuning open-source LLMs was spending weeks manually tuning LoRA hyperparameters (rank, alpha, dropout, learning rate) with no systematic approach, missing optimal configurations.

› Solution

Ran W&B Sweeps across 400+ fine-tuning experiments with Bayesian optimization. Each run logged training loss, eval perplexity, GPU utilization, and domain benchmark scores automatically for comparison.

› Results

Optimal LoRA configuration found 8x faster than manual search. Final perplexity 18% lower than any manually-tuned config. The sweep methodology became the startup's standard fine-tuning workflow.

Tools

Weights & Biases Hugging Face Modal

Hire an expert

Used Weights & Biases professionally?

Add your case study and get discovered by clients.

Submit a case study