Weights & Biases

Weights & Biases

ML experiment tracking

Data Dev Framework

What it's used for

Weights & Biases (W&B) is the leading platform for ML experiment tracking, providing tools to log, visualize, and compare model training runs across your team. It automatically captures everything needed to reproduce any experiment.

Key use cases include:

  • Experiment tracking — log hyperparameters, metrics, loss curves, and system stats automatically
  • Run comparison — visually compare hundreds of experiments side by side
  • Model registry — version and stage models through development to production
  • Artifact management — track datasets, model checkpoints, and evaluation results
  • Sweeps — automated hyperparameter optimization with Bayesian search
  • LLM fine-tuning tracking — log prompts, completions, and eval metrics for LLM workflows

W&B is used by ML teams at companies of all sizes (from solo researchers to OpenAI, NVIDIA, and Microsoft) who need to move beyond spreadsheets and ad-hoc experiment management. It integrates seamlessly with PyTorch, Hugging Face Transformers, Keras, and every major training framework.

Getting started

  1. Create a free account at wandb.ai.
  2. Install and authenticate:
    pip install wandb
    wandb login
    # Paste your API key from wandb.ai/authorize
  3. Add tracking to your training script:
    import wandb
    wandb.init(project='my-project')
    for epoch in range(10):
        loss = train_one_epoch()
        wandb.log({'loss': loss, 'epoch': epoch})
    wandb.finish()
  4. View results in the dashboard at wandb.ai.

Pricing: Free for personal use (unlimited experiments). Teams plan starts at $50/user/month with collaboration features. Enterprise pricing is custom. See wandb.ai/pricing.

Case studies

Real Weights & Biases projects

60% fewer duplicate experiments Fintech

12-Person ML Team — 60% Fewer Duplicate Experiments

Series B fintech, ML platform team

Challenge

A 12-person ML team had no centralized experiment tracking. Engineers were duplicating experiments unknowingly, spending 30% of compute budget re-running work that had already been done.

Solution

Set up W&B with automatic experiment logging, artifact versioning, and sweep configurations. Built a shared model registry with approval workflows and automated comparison dashboards for weekly model reviews.

Results

Duplicate experiments reduced 60%. Compute costs fell 28%. Model deployment frequency increased from twice per quarter to every two weeks. New engineers onboard and contribute meaningful experiments within their first week.

18% lower perplexity AI / LLM

LoRA Config Sweeps Finding Optimal Hyperparams

AI startup preparing Series A

Challenge

A startup fine-tuning open-source LLMs was spending weeks manually tuning LoRA hyperparameters (rank, alpha, dropout, learning rate) with no systematic approach, missing optimal configurations.

Solution

Ran W&B Sweeps across 400+ fine-tuning experiments with Bayesian optimization. Each run logged training loss, eval perplexity, GPU utilization, and domain benchmark scores automatically for comparison.

Results

Optimal LoRA configuration found 8x faster than manual search. Final perplexity 18% lower than any manually-tuned config. The sweep methodology became the startup's standard fine-tuning workflow.

Used Weights & Biases professionally?

Add your case study and get discovered by clients.

Submit a case study

Related tools in Data

Need a Weights & Biases expert?

Submit a brief and we'll match you with vetted specialists who have proven Weights & Biases experience.

Submit a brief — it's free