Kimi K2.6 is Live onKimi K2.6 is Live on Canopy Wave. Try it NowDeepSeek V3.1

API Rate Limiting and Service Level Agreement

Design Principles

To ensure all users enjoy stable and reliable inference services, we adopt a tiered resource isolation and dynamic rate management mechanism:

  • Fair Use: Monthly quotas are set for "Unlimited Token Plans." After exceeding the quota, accounts enter Basic Assurance Mode to prevent abnormal loads from individual accounts from affecting overall platform stability.
  • Predictable Experience: Requests that exceed the threshold will immediately receive a standard HTTP 429 response, enabling clients to perform graceful degradation.

Rate Limits Overview

1. Standard Plans and Pay-as-you-go

The following plans apply to regular development, production, and commercial calling scenarios. If any dimension (RPS / RPM / RPH / RPD) hits the upper limit, a 429 Rate Limit will be returned.

Plan TypeRPS (Requests/Second)RPM (Requests/Minute)RPH (Requests/Hour)RPD (Requests/Day)
Free Trial13
Paid / Recharged Users16
Enterprise / API GatewayAs per contractAs per contractAs per contractAs per contract

Note:
Rate limits, concurrency capabilities, and timeout policies for dedicated Enterprise plans can be customized per account. Please contact your account manager for an exclusive SLA.

2. Unlimited Plans Fair Use Policy

"Unlimited Plans" adopt a "Monthly Token Quota + Excess Basic Assurance" model. Within the monthly Token quota, you can enjoy the high-speed inference service corresponding to that plan; once Token consumption reaches the monthly cap, the account will automatically enter Basic Assurance Mode, with rate limits adjusted as follows:

Plan TierMonthly Token QuotaExcess RPSExcess RPMExcess RPHExcess RPD
Unlimited 50M50,000,000 Tokens121050
Unlimited 200M200,000,000 Tokens1840200
Unlimited 500M500,000,000 Tokens120100500

Important Notice:
Basic Assurance Mode only ensures basic API availability and is not suitable for production-grade high-concurrency loads. If you need to restore standard or higher performance, you can upgrade your plan at any time or contact your account manager to customize an Enterprise plan.

Rate Limit Dimensions Explained

DimensionFull NameMeaningBehavior Upon Trigger
RPSRequests Per SecondNumber of requests per secondInstant rejection upon exceeding
RPMRequests Per MinuteNumber of requests per minuteRemaining requests in the current minute window are rejected
RPHRequests Per HourNumber of requests per hourLimited to the corresponding tier within the current hour window
RPDRequests Per DayNumber of requests per dayLimited to the corresponding tier within the current day window

Calculation Logic:
The system counts each limit independently. For example, for an Unlimited 50M user after exceeding the quota, even if their RPD still has remaining quota, once RPM reaches 2, subsequent requests within that minute will also be rejected.

Over-limit Response and Best Practices

1. HTTP Response Example

When a request triggers the rate limit, the API will return the following response:

HTTP/2 429 Too Many Requests
Content-Type: application/json
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry later or upgrade your plan for higher throughput.",
    "type": "request_limit_exceeded"
  }
}

2. Client Best Practices

• Exponential Backoff

After receiving a 429, it is recommended to retry with intervals of 1s → 2s → 4s → 8s → ... Avoid high-frequency hard retries, as they may trigger longer throttling.

• Rate Limit Warm-up

For burst traffic (e.g., batch tasks), avoid instantly maxing out concurrency. Adopt gradual speed increases to allow requests to enter the system smoothly.

• Monitor Usage

You can view Token consumption and current plan usage at Model API KEY → Monthly subscription to plan upgrades in advance.

If you need any help or have questions, feel free to contact us via Discord or Online Customer Support .Our support team is always here for you.