What it's used for

Microsoft Phi is a family of small language models (SLMs) that deliver surprisingly strong performance relative to their compact size. Ranging from 1.5B to 14B parameters, Phi models are designed for scenarios where larger models are impractical due to latency, cost, or hardware constraints.

Key use cases include:

On-device AI for mobile apps, embedded systems, and IoT devices
Low-latency local inference on laptops and consumer GPUs
Cost-constrained deployments where API costs must be minimized
Edge computing and offline-capable applications
Fine-tuning for domain-specific tasks with modest compute
Educational and research applications

Phi models are used by teams building products that need AI capabilities without cloud dependency — mobile developers, hardware manufacturers, and enterprises deploying at the edge. Phi-3.5 Mini (3.8B) matches or exceeds many 7B models on standard benchmarks.

Available under the MIT license, Phi models can be freely used in commercial applications with no restrictions.

Getting started

Download Phi models from Hugging Face (search for microsoft/phi-*).

Run locally with your preferred runtime:

# Ollama (simplest)
ollama run phi3.5

# Transformers
pip install transformers torch
from transformers import pipeline
pipe = pipeline('text-generation', model='microsoft/Phi-3.5-mini-instruct')
pipe('Hello!')

For cloud deployment, Phi is available as a serverless endpoint on Azure AI Studio (requires Azure subscription).

Pricing: Model weights are free (MIT license). No API key needed for local use. Azure AI Studio endpoints are billed per token through your Azure subscription. Self-hosting requires only modest GPU resources (4-8GB VRAM for smaller variants).

Microsoft Phi

What it's used for

Getting started

Commonly paired with

No case studies yet

Related tools in General

Need a Microsoft Phi expert?