Compute Services
Canopy Wave uses Virtualization technology to provide world leading
performance GPU for AI training & inference
Massive Parallel Processing Power
AI workloads require performing millions (or billions) of mathematical operations simultaneously
GPUs have thousands of cores designed for parallel computation, ideal for training and running neural networks efficiently
Canopy Wave on-demand GPU Cluster
NVIDIA GPUs
Featuring access to NVIDIA HGX H100, HGX H200, with connection of NVLINKS and 400G RoCEV2 or InfiniBand networking
MULTI-GPU instances
Train and fine-tune AI models across instance types that best suits your need: 1x, 2x, 4x, 8x and up to 64 NVIDIA GPUs instances, real on-demand, billed by minute
Canopy Wave private cloud
Best GPU cluster performance in the industry. With 99.99% up time. Have all you GPUs under same datacenter, your workload and privacy is protected
Leadership in AI-Optimized H100 and H200
- ▶
The most high-end GPU platforms custom-built for AI and include large numbers of Tensor Cores, NVLink and Transformer Engine
- ▶
Tailored for modern AI workloads, and are benchmark leaders in training and inference performance
NVIDIA HGX H200
Unmatched Memory Bandwidth & Capacity for Large AI Models
141 GB of HBM3e memory
Large language models (LLMs) and generative AI systems need to process huge datasets and massive parameter matrices. Speed and scale depend heavily on how fast and how much memory the GPU can access
4.8 TB/s memory bandwidth
the fastest of any NVIDIA GPU to date
Optimized for memory-bound workloads, including:
- • Large transformer models
- • Retrieval-augmented generation (RAG)
- • Generative vision-language models
- • Inference on massive context windows (e.g. greater than 100k tokens)
NVIDIA HGX H100
Transformer Engine: Purpose-Built for Training and Running Large AI Models
Higher Accuracy
Transformer Engine uses FP8 precision (8-bit floating point) with dynamic range scaling
Better Performance
Delivers up to 9x faster training and 30x faster inference vs old-generation GPUs like A100
Flexible Configurations
Dynamically switches between FP8 and FP16/FP32 for optimal accuracy + speed
Better Access to compute
Includes 72 billion transistors, 80 GB of HBM3 memory, and supports NVLink and PCIe 5.0 for fast interconnects
Why NVIDIA
CUDA is the de facto standard for AI/ML workloads, with deep integration into frameworks like TensorFlow and PyTorch. It’s not just hardware, but also the ecosystem that forms massive compatibility
CPU Nodes
Our CPU instances are optimized for general-purpose, compute-heavy, and memory-bound applications, providing flexibility and performance at scale
Processor
Utilize the latest 6th-Gen Intel Xeon Scalable processors, offering up to 64 vCPUs per instance
Memory
Each instance supports up to 256TB of DIMM, delivering high throughput for compute-intensive workloads
Intel Xeon Scalable Processors
(6th Gen)
The latest generation utilizes a disaggregated design with multiple compute and I/O chiplets interconnected via EMIB (Embedded Multi-Die Interconnect Bridge)
Core count & frequency
Engineering samples (ES1) of Granite Rapids feature up to 56 cores (1.1-2.7 GHz base/turbo), with production models expected to reach 84-90 cores
Memory support
12-channel DDR5-6400 with MCR DIMMs, delivering up to 1.6x higher bandwidth than previous generations
Cache & interconnect
Each compute tile includes 2MB L2 cache and 4MB L3 cache, while the platform supports PCIe Gen5 (136 lanes) and CXL 2.0 for GPU/FPGA acceleration
Enhanced GPU cluster performance
Canopy Wave uses powerful and efficient CPUs to enable higher utility and performance from GPU clusters. Let CPUs handle generalized computing needs, freeing GPUs to focus on high-intensity tasks
Parallel processing & AI acceleration
Modern CPU servers leverage AVX-512 and VNNI (Vector Neural Network Instructions) to boost AI inference throughput by 2-4x compared to older architectures
Multi-threading
Hyper-Threading enables 112 threads on a 56-core CPU, optimizing multi-tasking efficiency for virtualization and HPC workloads
Energy efficiency
Intel’s Dynamic Voltage and Frequency Scaling (DVFS) and RAPL (Runtime Average Power Limiting) reduce idle power consumption by 30%, while TCO improvements reach 68% through server consolidation (5-10:1 replacement ratio)
Bare metal GPU cluster in private cloud
Private, secure GPU cluster for large AI deployments. Short or long term contracts for 256 to 2000 GPUs in InfiniBand or RoCEV2 networking
Get the latest and greatest NVIDIA GPUs
Canopy Wave provides the best performing GPUs clusters with 99.99% uptime, 24/7 support to maximize reliability. We use highest safety stander to ensure data security
NVIDIA HGX B200
The NVIDIA HGX B200, powered by eight NVIDIA Blackwell GPUs and fifth-generation NVLink™, delivers up to 3× faster training and 15× faster inference compared to previous generations, making it the ideal unified AI platform for businesses at any stage
NVIDIA HGX H200
The first GPU featuring HBM3e memory, the H200 sets new standards for generative AI and HPC workloads with unprecedented memory capacity and bandwidth, significantly accelerating LLM training and inference performance
NVIDIA HGX H100
Built on the NVIDIA Hopper™ architecture with dedicated Transformer Engine, the H100 accelerates LLMs by up to 30×, setting new benchmarks for conversational AI and efficiently powering trillion-parameter language models
Get full visibility of your Cluster
Canopy Wave DCIM Platform provide you with full visibility of the cluster. Getting to know your utilization rate, health condition, and uptime in one single dashboard to get your cluster fully under control
Our DCIM platform can help early detect possible failure and send out corresponding work orders to minimize interruption and keep industry leading performance and uptime
Ready to get started?
Create your Canopy Wave cloud account to launch GPU clusters immediately or contact us to reserve a long term contract