What it's used for

Google Vertex AI is Google Cloud's unified MLOps platform for building, deploying, and managing ML models at scale. It serves a dual purpose: managing custom model lifecycles and providing direct API access to Google's Gemini foundation models for fine-tuning and serving.

Custom model training — run distributed training jobs on TPUs or GPUs with AutoML or custom containers
Model Garden — browse and deploy 150+ foundation and open-source models (Gemini, Llama, Mistral, Stable Diffusion) with one click
Vertex AI Pipelines — orchestrate ML workflows using Kubeflow Pipelines or TFX, with built-in metadata tracking
Online/batch prediction — deploy models as auto-scaling endpoints or run batch predictions against BigQuery and GCS data
Feature Store — manage and serve ML features with low-latency lookups for real-time inference
Gemini fine-tuning — customize Gemini models on your own data with supervised tuning or RLHF directly through the console or API

Data scientists and ML engineers on GCP use Vertex AI for its deep integration with BigQuery for data access, Cloud Storage for artifacts, and Dataflow for preprocessing. It is especially powerful for teams that want to combine custom models with Google's foundation models under one platform.

Vertex AI also provides Model Monitoring for detecting data drift and skew, Explainable AI for feature attributions, and Vertex AI Search (formerly Enterprise Search) for building grounded, RAG-based applications.

Getting started

Enable the Vertex AI API — in the Google Cloud Console, navigate to Vertex AI and enable the API. You need a GCP project with billing enabled.
Install the SDK — install the Python client library:
```
pip install google-cloud-aiplatform
```

Authenticate — set up credentials with the gcloud CLI:

gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID

Use Gemini via Vertex — access Gemini models in Python:

import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project='my-project', location='us-central1')
model = GenerativeModel('gemini-1.5-pro')
response = model.generate_content('Explain quantum computing')

Train a custom model — submit a training job using the console or SDK with your custom container and GCS data paths. Vertex AI manages the infrastructure and stores the model artifact in the Model Registry.

Pricing: Varies by service. Gemini API calls are priced per token. Custom training is billed per node-hour (e.g., n1-standard-8 + T4 GPU ~$1.40/hr). Prediction endpoints are billed per node-hour while deployed. Full pricing details.

Tip: Use Vertex AI Workbench managed notebooks for quick experimentation — they come with pre-installed ML frameworks and direct BigQuery integration. For cost savings, enable autoscaling to zero on prediction endpoints during off-hours.