What it's used for

Together AI provides fast, cost-effective inference for popular open-source models through an OpenAI-compatible API, making it a drop-in replacement in any codebase already using the OpenAI SDK. It is one of the leading platforms for teams that want open-model performance without managing GPU infrastructure.

OpenAI-compatible inference — swap your base URL and run Llama 3, Mixtral, DeepSeek, and 100+ other open models with zero code changes
Fine-tuning — fine-tune open models on your own data using LoRA or full-parameter tuning, with built-in evaluation and deployment
Function calling — structured output and tool use with open models, matching the reliability of proprietary API providers
Embeddings — generate embeddings using open models (BGE, E5) at significantly lower cost than proprietary embedding APIs
Custom model hosting — upload and serve your own fine-tuned or custom models on dedicated GPU instances

AI engineers, startups, and enterprises use Together AI when they want the flexibility and cost advantages of open-source models with the reliability and convenience of a managed API. Its OpenAI compatibility means existing LangChain, LlamaIndex, and Vercel AI SDK integrations work immediately.

Together AI is particularly strong for teams that need to evaluate multiple open models quickly — you can switch between Llama 3 70B, Mixtral 8x22B, and DBRX in a single line of code without provisioning different infrastructure for each.

Getting started

Create an account — sign up at together.ai and copy your API key from the dashboard. New accounts receive free credits.

Use with OpenAI SDK — point the standard OpenAI client at Together's API:

from openai import OpenAI

client = OpenAI(
    api_key='YOUR_TOGETHER_API_KEY',
    base_url='https://api.together.xyz/v1'
)

response = client.chat.completions.create(
    model='meta-llama/Llama-3-70b-chat-hf',
    messages=[{'role': 'user', 'content': 'Hello!'}]
)

Or use the Together SDK:

pip install together

import together
client = together.Together(api_key='YOUR_KEY')
response = client.chat.completions.create(
    model='meta-llama/Llama-3-70b-chat-hf',
    messages=[{'role': 'user', 'content': 'Hello!'}]
)

Fine-tune a model — upload a JSONL dataset and start a fine-tuning job from the dashboard or via the API. Together handles GPU allocation and hyperparameter selection.

Pricing: Pay per token. Llama 3 8B: ~$0.20/M tokens. Llama 3 70B: ~$0.90/M tokens. Mixtral 8x7B: ~$0.60/M tokens. Fine-tuning priced per training token. Full pricing details.

Tip: Together's Turbo models offer 2-3x faster inference at the same price. Look for model names ending in -turbo in the model list. For cost optimization, use JSON mode to get structured outputs without wasting tokens on formatting instructions.

Together AI

What it's used for

Getting started

Commonly paired with

No case studies yet

AI leaders using Together AI

Vipul Prakash

Percy Liang

Related tools in General

Need a Together AI expert?