
CodeLLM
Kimi Linear 48B
A3B Instruct API
All You Need To Know About Kimi Linear 48B A3B Instruct API
Overview
Model Provider:Moonshot AI
Model Type:CODE/LLM
State:Ready
Key Specs
Quantization:BF16
Parameters:48B
Context:128k
Pricing:$0.10 input / $0.40 output
Try Model
Quick Start
Reserve Dedicated Endpoint
Introduction
Kimi Linear is a hybrid linear attention architecture that outperforms traditional attention methods. It is powered by Kimi Delta Attention (KDA), a novel mechanism that optimizes RNN memory for superior hardware efficiency and performance.
For long-context tasks up to 1 million tokens, Kimi Linear reduces KV cache requirements by 75% and boosts decoding throughput by 6x. The core KDA kernel and model checkpoints, trained on 5.7T tokens, are open-sourced.
Kimi Linear 48B A3B Instruct API Usage

Endpoint
moonshot-ai/Kimi-Linear-48B-A3B-Instruct