Release Notes
TOC
AI 2.5.0New and Optimized FeaturesAlauda AI Platform Control PlaneJobSet OperatorLlama Stack Milvus Vector Store IntegrationvLLM Expert Parallel InferencevLLM Speculative DecodingKubeflow and MLflow Storage ConfigurationAlauda Build of Envoy AI GatewayDeprecated FeaturesFixed IssuesKnown IssuesAI 2.5.0
New and Optimized Features
Alauda AI Platform Control Plane
Alauda AI Platform Control Plane adds unified component management for Alauda AI. Administrators can use the Alauda AI Operator to manage deployment and upgrade workflows for supported components, reducing component maintenance effort. This capability is currently an alpha feature and supports only operator-based components; cluster plugin and Helm chart components are not managed by this capability.
JobSet Operator
JobSet Operator enables users to run coordinated distributed workloads for AI, machine learning, and HPC scenarios. Users can define a group of related Kubernetes jobs as one JobSet, helping distributed training and batch workloads manage multiple workers, stable networking, and failure recovery as a single workload.
Llama Stack Milvus Vector Store Integration
Llama Stack supports Milvus-backed vector stores for agent and retrieval workflows. Administrators can configure a reachable Milvus endpoint and optional authentication for the Llama Stack server, and users can create vector stores with provider_id="milvus-remote" from the client API.
vLLM Expert Parallel Inference
vLLM Expert Parallel support provides a configuration path for serving Mixture-of-Experts models with expert parallelism. Users can enable expert parallel settings in an inference service YAML when using a compatible vLLM runtime and a model that supports this serving pattern.
vLLM Speculative Decoding
vLLM Speculative Decoding provides guidance for enabling speculative decoding on vLLM inference services. Users can configure supported methods such as N-gram or EAGLE-3 in the vLLM startup arguments and validate the effect with representative workloads before using the configuration in production.
Kubeflow and MLflow Storage Configuration
Kubeflow Pipelines and MLflow support external storage configuration. Administrators can configure external object storage for pipeline artifacts and external PostgreSQL storage for MLflow metadata.
Alauda Build of Envoy AI Gateway
Alauda Build of Envoy AI Gateway is refactored from a cluster plugin to an operator-managed component so it can be managed by the Alauda AI Operator. The delivered component is updated to v0.4.2, a newer version aligned with the community release, including component bug fixes.
Deprecated Features
None.
Fixed Issues
- Fixed the issue that when deploying an inference service using multiple cards in an Ascend NPU environment, the service fails to be Ready and HCCL initialization fails during startup. This issue is caused by the vLLM Ascend multi-card scenario not adapting both root and non-root operation modes, resulting in abnormal initialization of NPU multi-card communication. Dual mode support has been added.
- When the Workbench component is not installed, entering the Cluster Storage page results in a 404 error, the PVC list fails to load, and entries such as Create and Edit are blocked in the page experience.
Known Issues
No issues in this release.