Serverless Inference for AI Developers Fast Reliable

Serverless Inference

AI Inference Service
where AI meets reality

AI Inference Service where AI meets reality

Our Inferencing as a Service (InfaaS) achieves AI Inference with Canopy Wave API

Start building

Model Library

We have built an open-source model library covering all types and fields.

Users can call it directly via API without additional development or adaptation.

New

CODE

GLM-5.2

$1.40

Input

$4.40

Output

$0.26

Cache

200K

Context

VISION

Kimi-K2.6

$0.95

Input

$4.00

Output

$0.16

Cache

256K

Context

VISION

MiMo-V2.5

$0.14

Input

$0.28

Output

$0.07

Cache

Context

CODE

MiniMax-M2.5

$0.27

Input

$1.08

Output

$0.03

Cache

205K

Context

Inference that is
fast, cost-effective, and secure

Users can run pre-trained models through simple API calls without managing infrastructure, achieving efficient "pay-as-you-go" inference.

Fast Response

With API calls, the first response time is <100ms, and the output speed reaches up to 400 tokens/s (DeepSeek-V3.1 671B full precision). NVIDIA's latest-generation GPU + edge caching ensure low latency globally, with no cold start issues.

Cost Reduction

No need to pay for idle GPUs. Charges are based on the number of model calls, compute duration (e.g., token count/image resolution), and required compute specifications. Truly "pay for what you use," with no costs for idle GPU resources.

Data Privacy

Our models are deployed within a private cloud environment in our internal data center, ensuring complete data isolation, significantly enhanced control, and enterprise-grade security.

zero retaining policy

no training usage

Trusted by Enterprises

"CanopyWave has transformed our AI performance. Since adopting their models, traffic is up nearly 40% and daily active users have surged 100-300%. Their GLM-4-6-turbo and DeepSeek-V3.1 lead CrofAI with unmatched speed and accuracy. Our team and users consistently rave about the experience — CanopyWave is our clear choice for powerful, high-accuracy AI."

——Founder of nahcrof

Game AI Pricing: Token to Subscription

Canopy Wave designed a subscription pass system for Pax Historia, simplifying complex AI cost structures into fixed pricing models, enhancing predictability, cost efficiency, and product scalability.

Which deployment fits your needs

Serverless Endpoints

Canopy Wave gives you instant access to the most popular OSS models — optimized for cost, speed, and quality on the fastest AI cloud.

Simplest setup

Highest flexibility

Provide popular models on the market

Pay per token

Start Building

Dedicated Endpoints

Canopy Wave allows you to create on-demand deployments of GPU cluster that are reserved for your own use.

No hard rate limits

Predictable performance

Custom large models can be deployed.

Pay for GPU runtime

Apply for

Serverless Endpoints

Canopy Wave gives you instant access to the most popular OSS models — optimized for cost, speed, and quality on the fastest AI cloud.

Dedicated Endpoints

Canopy Wave allows you to create on-demand deployments of GPU cluster that are reserved for your own use.

Simplest setup

No hard rate limits

Highest flexibility

Predictable performance

Provide popular models on the market

Custom large models can be deployed.

Pay per token

Pay for GPU runtime

Start building

Learn more

Questions and Answers

What is serverless inference?

Why choose serverless inference?

How does billing work?

Which models can I use?

Is data secure?

Can I upgrade to a dedicated deployment?

Get started today

Experience AI inference that just works — no setup, no waiting.

Try InfaaS and see how inference becomes the simplest, most powerful part of your AI workflow.