GitHub Copilot CLI: BYOK and Local Model Support
GitHub Copilot CLI now supports BYOK and local models like Ollama and vLLM, enabling offline air-gapped workflows without GitHub authentication.
Sources & Mentions
4 external resources covering this update
GitHub Copilot CLI Gains BYOK and Local Model Support
GitHub has significantly expanded the flexibility of the Copilot CLI by introducing support for Bring Your Own Key (BYOK) and fully local model deployments. Previously, the CLI required routing all requests through GitHub's hosted model infrastructure. With this update, developers can now plug in their own model provider or run models entirely on their own hardware.
Connecting Any Model Provider
The Copilot CLI can now be configured to connect to Azure OpenAI, Anthropic, or any endpoint that implements the OpenAI Chat Completions API. Configuration is done through environment variables set before launching the CLI, making it straightforward to integrate with existing provider accounts. This extends to locally running inference servers as well — Ollama, vLLM, and Microsoft Foundry Local are all supported out of the box.
Fully Offline and Air-Gapped Workflows
A new COPILOT_OFFLINE=true environment variable instructs the CLI to avoid contacting GitHub's servers entirely. In this mode, all telemetry is disabled and the CLI communicates exclusively with the configured local or remote provider. Combined with a locally hosted model, this enables fully air-gapped development environments — a capability long requested by teams in security-sensitive or regulated industries.
Optional GitHub Authentication
When a custom model provider is configured, GitHub authentication is no longer required to use the CLI. Developers can start working immediately with just their provider credentials. Signing in to GitHub remains optional and unlocks additional capabilities such as the /delegate command, GitHub Code Search integration, and access to MCP servers.
Model Requirements
Not all models are compatible: the CLI requires models that support both tool calling (function calling) and streaming. GitHub recommends a context window of at least 128k tokens for the best experience. Built-in sub-agents automatically inherit the provider configuration, and invalid or unsupported provider settings produce actionable error messages rather than silent failures. Setup instructions are accessible directly from the terminal by running copilot help providers.