Skip to content

DigitalOcean and AMD Supercharge Character.ai’s Billion-Query AI Model

A billion daily queries, halved costs, and lightning-fast responses? See how AMD’s MI300X GPUs and DigitalOcean’s tweaks redefined AI inference at scale. The secret wasn’t just hardware.

The image shows a stunning view of Glacier National Park in Montana, with a lake in the foreground...
The image shows a stunning view of Glacier National Park in Montana, with a lake in the foreground surrounded by lush green trees and snow-capped mountains in the background. The sky is filled with white, fluffy clouds, creating a beautiful contrast with the deep blue of the lake.

DigitalOcean and AMD Supercharge Character.ai’s Billion-Query AI Model

DigitalOcean has collaborated with Character.ai and AMD to enhance AI inference performance on its cloud platform. The project focused on deploying a high-capacity model to handle over a billion daily user queries. Early results show doubled throughput, halved costs, and improved response times compared to traditional GPU setups.

The project centered on running Character.ai's Qwen 235-billion-parameter mixture-of-experts model using AMD Instinct MI300X and MI325X GPUs. Engineers from all three companies worked together to fine-tune the system, including adjustments to ROCm, vLLM, and AMD's AITER framework. Their goal was to manage latency-sensitive conversational workloads while maintaining strict response time targets.

DigitalOcean's approach went beyond hardware selection. The team implemented hardware-aware scheduling and inference runtime optimizations, uncovering performance gains that generic configurations often miss. By reducing communication overhead and staying within latency budgets, they directly improved cost efficiency.

A key outcome was the ability to process twice the workload at half the token cost, without sacrificing speed. Character.ai confirmed that p90 time-to-first-token and time-per-output-token remained within required thresholds, even under heavy demand. The success highlights how platform-level co-optimization can outperform standard GPU deployments.

The deployment demonstrates that cloud providers can now compete on inference performance rather than just hardware specs. By combining AMD's accelerators with tailored software optimizations, DigitalOcean delivered measurable gains in throughput, cost, and latency. The results suggest that diversified GPU strategies, paired with deep platform tuning, can meet production-grade AI workload demands.

Read also:

Latest