MoonshotAI: Kimi Linear 48B A3B Instruct
Kimi Linear is a hybrid linear attention architecture that outperforms traditional full attention methods across various contexts, including short, long, and reinforcement learning (RL) scaling regimes. At its core is Kimi Delta Attention (KDA)—a refined version of Gated DeltaNet that introduces a more efficient gating mechanism to optimize the use of finite-state RNN memory. Kimi Linear achieves superior performance and hardware efficiency, especially for long-context tasks. It reduces the need for large KV caches by up to 75% and boosts decoding throughput by up to 6x for contexts as long as 1M tokens.
Parameters
48B
Context Window
1,048,576
tokens
Input Price
$0.3
per 1M tokens
Output Price
$0.6
per 1M tokens
Capabilities
Model capabilities and supported modalities
Performance
-
-
-
-
Modalities
text
text
LLM Price Calculator
Calculate the cost of using this model
Monthly Cost Estimator
Based on different usage levels
