Performance

Serverless Inference

AI Inference Service
where AI meets reality

Our Inferencing as a Service (InfaaS) achieves AI Inference with Canopy Wave API

Model Library

We have built an open-source model library covering all types and fields. Users can call it directly via API without

additional development or adaptation.

CODE
DEEPSEEK V3.1 logo
DEEPSEEK V3.1
671B
128K context
Learn more
CHAT
KIMI-K2-THINKING logo
KIMI-K2-THINKING
1T
256K context
Learn more
CODE
MINIMAX-M2 logo
MINIMAX-M2
230B
128K context
Learn more
CODE
GLM 4.6 logo
GLM 4.6
355B
128K context
Learn more
CHAT
QWEN3 CODER 480B A35B INSTRUCT logo
QWEN3 CODER 480B A35B INSTRUCT
480B
256K context
Learn more
CHAT
DEEPSEEK V3.2 EXP logo
DEEPSEEK V3.2 EXP
685B
128K context
Learn more
CHAT
LLAMA 3.3 8B INSTRUCT logo
LLAMA 3.3 8B INSTRUCT
8B
128K context
Learn more
CHAT
LLAMA 3.3 70B INSTRUCT logo
LLAMA 3.3 70B INSTRUCT
70B
128K context
Learn more
CHAT
GEMMA 3 27B logo
GEMMA 3 27B
27B
32K context
Learn more
CHAT
GPT-OSS 120B logo
GPT-OSS 120B
120B
128K context
Learn more
CHAT
QWEN 2.5 7B INSTRUCT logo
QWEN 2.5 7B INSTRUCT
7B
128K context
Learn more
CHAT
MIXTRAL 8X22B INSTRUCT logo
MIXTRAL 8X22B INSTRUCT
141B
64K context
Learn more
CHAT
GPT-OSS 20B logo
GPT-OSS 20B
20B
128K context
Learn more
CHAT
PHI-3 MEDIUM INSTRUCT logo
PHI-3 MEDIUM INSTRUCT
14B
128K context
Learn more
CHAT
QWEN3-235B-A22B-INSTRUCT logo
QWEN3-235B-A22B-INSTRUCT
235B
256K context
Learn more
CHAT
DEEPSEEK V3 0324 logo
DEEPSEEK V3 0324
671B
128K context
Learn more
CHAT
DEEPSEEK R1 0528 logo
DEEPSEEK R1 0528
685B
128K context
Learn more
CHAT
GLM 4.5 logo
GLM 4.5
355B
128K context
Learn more
CHAT
QWEN3 14B INSTRUCT logo
QWEN3 14B INSTRUCT
14B
128K context
Learn more
CHAT
PIXTRAL 12B INSTRUCT logo
PIXTRAL 12B INSTRUCT
12B
128K context
Learn more
CHAT
MISTRAL NEMO 12B INSTRUCT logo
MISTRAL NEMO 12B INSTRUCT
12B
128K context
Learn more
CHAT
LLAMA 4 SCOUT INSTRUCT logo
LLAMA 4 SCOUT INSTRUCT
109B
128K context
Learn more
CHAT
KIMI K2 INSTRUCT-0905 logo
KIMI K2 INSTRUCT-0905
1T
256K context
Learn more
CODE
QWEN2.5-32B-CODER logo
QWEN2.5-32B-CODER
32B
128K context
Learn more
CODE
STARCODER2 15B logo
STARCODER2 15B
15B
16K context
Learn more
CODE
CODEGEMMA 7B logo
CODEGEMMA 7B
7B
8K context
Learn more
CODE
PHIND-CODELLAMA 34B logo
PHIND-CODELLAMA 34B
34B
4K context
Learn more
CODE
DEEPSEEK-CODER V2 16B logo
DEEPSEEK-CODER V2 16B
16B
128K context
Learn more
CODE
KIMI-LINEAR-48B-A3B-INSTRUCT logo
KIMI-LINEAR-48B-A3B-INSTRUCT
48B
1M context
Learn more
VISION
QWEN2.5-VL-72B logo
QWEN2.5-VL-72B
72B
128K context
Learn more
VISION
GLM4.5V logo
GLM4.5V
106B
128K context
Learn more
VISION
INTERN VL 2.0 logo
INTERN VL 2.0
26B
4K context
Learn more
VIDEO
WAN 2.2 T2V logo
WAN 2.2 T2V
27B
Learn more
VIDEO
MOCHI 1 logo
MOCHI 1
10B
Learn more
VIDEO
HUNYUANVIDEO-I2V logo
HUNYUANVIDEO-I2V
13B
Learn more
IMAGE
STABLE DIFFUSION 3 MEDIUM logo
STABLE DIFFUSION 3 MEDIUM
2B
Learn more
IMAGE
FLUX.1 DEV logo
FLUX.1 DEV
12B
Learn more
IMAGE
FLUX.1 KONTEXT MAX logo
FLUX.1 KONTEXT MAX
12B
Learn more

Inference that is
fast, cost-effective, and secure

Users can run pre-trained models through simple API calls without managing infrastructure, achieving efficient "pay-as-you-go" inference.

FAST RESPONSE

With API calls, the first response time is <100ms, and the output speed reaches up to 400 tokens/s (DeepSeek-V3.1 671B full precision). NVIDIA's latest-generation GPU + edge caching ensure low latency globally, with no cold start issues.

COST REDUCTION

No need to pay for idle GPUs. Charges are based on the number of model calls, compute duration (e.g., token count/image resolution), and required compute specifications. Truly "pay for what you use," with no costs for idle GPU resources.

DATA PRIVACY

Our models are deployed within a private cloud environment in our internal data center, ensuring complete data isolation, significantly enhanced control, and enterprise-grade security.

zero retaining policy
no training usage

Which deployment fits your needs

Serverless Endpoints

Canopy Wave gives you instant access to the most popular OSS models — optimized for cost, speed, and quality on the fastest AI cloud.

Dedicated Endpoints

Canopy Wave allows you to create on-demand deployments of GPU cluster that are reserved for your own use.

Simplest setup
No hard rate limits
Highest flexibility
Predictable performance
Provide popular models on the market
Custom large models can be deployed.
Pay per token
Pay for GPU runtime

Get started today

Experience AI inference that just works — no setup, no waiting.

Try InfaaS and see how inference becomes the simplest, most powerful part of your AI workflow.

Contact us
Contact us

Hi. Need any help?