Release Notes

AI 2.5.0 New and Optimized FeaturesAlauda AI Platform Control PlaneJobSet OperatorLlama Stack Milvus Vector Store IntegrationvLLM Expert Parallel InferencevLLM Speculative DecodingKubeflow and MLflow Storage ConfigurationAlauda Build of Envoy AI GatewayDeprecated FeaturesFixed IssuesKnown Issues

AI 2.5.0

New and Optimized Features

Alauda AI Platform Control Plane

Alauda AI Platform Control Plane adds unified component management for Alauda AI. Administrators can use the Alauda AI Operator to manage deployment and upgrade workflows for supported components, reducing component maintenance effort. This capability is currently an alpha feature and supports only operator-based components; cluster plugin and Helm chart components are not managed by this capability.

JobSet Operator

JobSet Operator enables users to run coordinated distributed workloads for AI, machine learning, and HPC scenarios. Users can define a group of related Kubernetes jobs as one JobSet, helping distributed training and batch workloads manage multiple workers, stable networking, and failure recovery as a single workload.

Llama Stack Milvus Vector Store Integration

Llama Stack supports Milvus-backed vector stores for agent and retrieval workflows. Administrators can configure a reachable Milvus endpoint and optional authentication for the Llama Stack server, and users can create vector stores with provider_id="milvus-remote" from the client API.

vLLM Expert Parallel Inference

vLLM Expert Parallel support provides a configuration path for serving Mixture-of-Experts models with expert parallelism. Users can enable expert parallel settings in an inference service YAML when using a compatible vLLM runtime and a model that supports this serving pattern.

vLLM Speculative Decoding

vLLM Speculative Decoding provides guidance for enabling speculative decoding on vLLM inference services. Users can configure supported methods such as N-gram or EAGLE-3 in the vLLM startup arguments and validate the effect with representative workloads before using the configuration in production.

Kubeflow and MLflow Storage Configuration

Kubeflow Pipelines and MLflow support external storage configuration. Administrators can configure external object storage for pipeline artifacts and external PostgreSQL storage for MLflow metadata.

Alauda Build of Envoy AI Gateway

Alauda Build of Envoy AI Gateway is refactored from a cluster plugin to an operator-managed component so it can be managed by the Alauda AI Operator. The delivered component is updated to v0.4.2, a newer version aligned with the community release, including component bug fixes.

Deprecated Features

None.

Fixed Issues

Fixed the issue that when deploying an inference service using multiple cards in an Ascend NPU environment, the service fails to be Ready and HCCL initialization fails during startup. This issue is caused by the vLLM Ascend multi-card scenario not adapting both root and non-root operation modes, resulting in abnormal initialization of NPU multi-card communication. Dual mode support has been added.
When the Workbench component is not installed, entering the Cluster Storage page results in a 404 error, the PVC list fails to load, and entries such as Create and Edit are blocked in the page experience.

Known Issues

No issues in this release.

#Release Notes

#TOC

#AI 2.5.0

#New and Optimized Features

#Alauda AI Platform Control Plane

#JobSet Operator

#Llama Stack Milvus Vector Store Integration

#vLLM Expert Parallel Inference

#vLLM Speculative Decoding

#Kubeflow and MLflow Storage Configuration

#Alauda Build of Envoy AI Gateway

#Deprecated Features

#Fixed Issues

#Known Issues