What it's used for

Modal is a serverless GPU cloud that lets you run Python functions on cloud GPUs without managing any infrastructure. You define your environment and dependencies entirely in code, and Modal handles container building, GPU scheduling, auto-scaling, and cold-start optimization behind the scenes.

Serverless GPU functions — decorate any Python function with @app.function(gpu='A100') and it runs on cloud GPUs with zero infra management
Fine-tuning jobs — run model fine-tuning with automatic checkpointing and volume mounts for datasets
Batch inference — process thousands of inputs in parallel with automatic scaling and pay only for active compute time
Model serving — deploy models as web endpoints with @app.cls that stay warm and autoscale based on traffic
Scheduled jobs — run recurring ML pipelines (retraining, embeddings updates) on a cron schedule
Custom containers — define Docker-like environments in Python with modal.Image — no Dockerfile needed

ML engineers, indie hackers, and AI startups love Modal for its developer experience — the feedback loop from local code to running on an H100 is seconds, not hours. It eliminates the DevOps burden of Kubernetes, Docker registries, and GPU orchestration.

Modal is particularly strong for bursty workloads that do not justify reserved GPU instances. You pay per-second of compute with no idle costs, making it significantly cheaper than always-on instances for workloads that run intermittently.

Getting started

Install the Modal package:
```
pip install modal
```
Authenticate — run the setup command which opens your browser for login:
```
modal setup
```

Write a GPU function — create a Python file (e.g., app.py):

import modal

app = modal.App('my-ml-app')
image = modal.Image.debian_slim().pip_install('torch', 'transformers')

@app.function(gpu='A100', image=image)
def generate(prompt: str):
    from transformers import pipeline
    pipe = pipeline('text-generation', model='meta-llama/Llama-2-7b-hf', device=0)
    return pipe(prompt, max_new_tokens=100)

Run it — execute locally and it runs on cloud GPUs:
```
modal run app.py
```
Deploy as an endpoint — add a web endpoint decorator and deploy:
```
modal deploy app.py
```

Pricing: Pay per second of GPU time. A100 40GB costs ~$1.10/hr, H100 costs ~$3.95/hr. Free tier includes $30/month in compute credits — enough for significant experimentation. No charge for idle time. Full pricing details.

Tip: Use modal.Volume to persist model weights and datasets across function calls — this avoids re-downloading large models every time a container cold-starts. Also use @app.cls(keep_warm=1) for endpoints that need instant response times.

Modal

What it's used for

Getting started

Commonly paired with

No case studies yet

Modal specialists

Charles Frye

AI leaders using Modal

Erik Bernhardsson

Related tools in General

Need a Modal expert?