Together AI

Together AI

Fast open-model inference API

General Infrastructure

What it's used for

Together AI provides fast, cost-effective inference for popular open-source models through an OpenAI-compatible API, making it a drop-in replacement in any codebase already using the OpenAI SDK. It is one of the leading platforms for teams that want open-model performance without managing GPU infrastructure.

  • OpenAI-compatible inference — swap your base URL and run Llama 3, Mixtral, DeepSeek, and 100+ other open models with zero code changes
  • Fine-tuning — fine-tune open models on your own data using LoRA or full-parameter tuning, with built-in evaluation and deployment
  • Function calling — structured output and tool use with open models, matching the reliability of proprietary API providers
  • Embeddings — generate embeddings using open models (BGE, E5) at significantly lower cost than proprietary embedding APIs
  • Custom model hosting — upload and serve your own fine-tuned or custom models on dedicated GPU instances

AI engineers, startups, and enterprises use Together AI when they want the flexibility and cost advantages of open-source models with the reliability and convenience of a managed API. Its OpenAI compatibility means existing LangChain, LlamaIndex, and Vercel AI SDK integrations work immediately.

Together AI is particularly strong for teams that need to evaluate multiple open models quickly — you can switch between Llama 3 70B, Mixtral 8x22B, and DBRX in a single line of code without provisioning different infrastructure for each.

Getting started

  1. Create an account — sign up at together.ai and copy your API key from the dashboard. New accounts receive free credits.
  2. Use with OpenAI SDK — point the standard OpenAI client at Together's API:
    from openai import OpenAI
    
    client = OpenAI(
        api_key='YOUR_TOGETHER_API_KEY',
        base_url='https://api.together.xyz/v1'
    )
    
    response = client.chat.completions.create(
        model='meta-llama/Llama-3-70b-chat-hf',
        messages=[{'role': 'user', 'content': 'Hello!'}]
    )
  3. Or use the Together SDK:
    pip install together
    import together
    client = together.Together(api_key='YOUR_KEY')
    response = client.chat.completions.create(
        model='meta-llama/Llama-3-70b-chat-hf',
        messages=[{'role': 'user', 'content': 'Hello!'}]
    )
  4. Fine-tune a model — upload a JSONL dataset and start a fine-tuning job from the dashboard or via the API. Together handles GPU allocation and hyperparameter selection.

Pricing: Pay per token. Llama 3 8B: ~$0.20/M tokens. Llama 3 70B: ~$0.90/M tokens. Mixtral 8x7B: ~$0.60/M tokens. Fine-tuning priced per training token. Full pricing details.

Tip: Together's Turbo models offer 2-3x faster inference at the same price. Look for model names ending in -turbo in the model list. For cost optimization, use JSON mode to get structured outputs without wasting tokens on formatting instructions.

No case studies yet

Be the first to share a Together AI case study and get discovered by clients.

Submit a case study

Thought leaders

AI leaders using Together AI

Follow for insights, tutorials, and thought leadership

Related tools in General

Need a Together AI expert?

Submit a brief and we'll match you with vetted specialists who have proven Together AI experience.

Submit a brief — it's free