Weights & Biases

Weights & Biases

ML experiment tracking

2 case studies
Data Dev Framework

What it's used for

Weights & Biases is used to track ML experiments, log metrics, compare model runs, and visualize training performance across your team. It automatically captures hyperparameters, loss curves, GPU utilization, and model artifacts so you can reproduce any experiment and understand what actually improved results.

Getting started

Install with `pip install wandb` and run `wandb login` to authenticate with your API key from wandb.ai/authorize. Add `wandb.init()` at the start of your training script and `wandb.log()` to record metrics. Free tier covers personal projects; team features require a paid plan.

$ pip install wandb` and run `wandb login` to authenticate with your API key from wandb

Case studies

Real Weights & Biases projects

60% fewer duplicate experiments Fintech

12-Person ML Team — 60% Fewer Duplicate Experiments

Series B fintech, ML platform team

Challenge

A 12-person ML team had no centralized experiment tracking. Engineers were duplicating experiments unknowingly, spending 30% of compute budget re-running work that had already been done.

Solution

Set up W&B with automatic experiment logging, artifact versioning, and sweep configurations. Built a shared model registry with approval workflows and automated comparison dashboards for weekly model reviews.

Results

Duplicate experiments reduced 60%. Compute costs fell 28%. Model deployment frequency increased from twice per quarter to every two weeks. New engineers onboard and contribute meaningful experiments within their first week.

18% lower perplexity AI / LLM

LoRA Config Sweeps Finding Optimal Hyperparams

AI startup preparing Series A

Challenge

A startup fine-tuning open-source LLMs was spending weeks manually tuning LoRA hyperparameters (rank, alpha, dropout, learning rate) with no systematic approach, missing optimal configurations.

Solution

Ran W&B Sweeps across 400+ fine-tuning experiments with Bayesian optimization. Each run logged training loss, eval perplexity, GPU utilization, and domain benchmark scores automatically for comparison.

Results

Optimal LoRA configuration found 8x faster than manual search. Final perplexity 18% lower than any manually-tuned config. The sweep methodology became the startup's standard fine-tuning workflow.

Used Weights & Biases professionally?

Add your case study and get discovered by clients.

Submit a case study

Related tools in Data

Need a Weights & Biases expert?

Submit a brief and we'll match you with vetted specialists who have proven Weights & Biases experience.

Submit a brief — it's free