Release Notes

AI 2.1.0

New and Optimized Features

Image-Based Model Support

The platform now supports deploying models using container images. By leveraging the ModelCar capability in KServe, users can package models as OCI container images and create model inference services directly from these images without downloading model artifacts at runtime.

Using OCI containers for model storage and distribution provides several benefits:

Reduced startup time – Model artifacts are packaged within the container image, avoiding repeated downloads when deploying or scaling inference services.
Lower disk space usage – Container image layer reuse reduces redundant storage of identical model files across nodes.
Improved inference performance stability – Images can be pre-fetched and cached on nodes, enabling faster and more predictable service startup.

This capability standardizes the model deployment workflow and leverages the container image ecosystem for efficient model versioning, distribution, and lifecycle management.

Model Compression Toolkit

A Model Compression Toolkit has been introduced by integrating the llm-compressor library to provide model compression capabilities for large language models.

The toolkit supports advanced optimization techniques such as weight quantization, activation quantization, and model sparsification. These techniques enable users to reduce the computational and memory requirements of large models while maintaining model quality. Compression jobs can be executed within Notebook environments or automated pipelines, helping organizations reduce hardware costs and improve inference performance.

Event-Driven Autoscaling

Event-driven autoscaling capabilities have been introduced through integration with KEDA, enabling model inference services to automatically scale based on real-time workload signals.

Unlike traditional autoscaling strategies that rely solely on CPU or GPU utilization, event-driven autoscaling can react to metrics such as request rate, queue length, or message events. This enables more responsive scaling of inference services and improves overall resource efficiency and system stability.

Notebook Base Image Library

A new Notebook base image library has been added to provide prebuilt development environments for data science and AI workloads.

These images include commonly used machine learning frameworks, deep learning libraries, and data processing tools, allowing users to quickly start Notebook environments for experimentation and model development while reducing environment setup overhead.

TrustyAI Drift Detection

The platform introduces model drift detection capabilities powered by TrustyAI.

This feature continuously monitors inference data distributions and model behavior to detect potential data drift or prediction drift in production environments. It helps teams identify model performance degradation early and maintain the reliability of deployed AI systems.

Safety Guardrails

Safety guardrails for generative AI applications have been introduced through TrustyAI.

This feature enables policy-based monitoring and filtering of model outputs, allowing organizations to detect and restrict unsafe or non-compliant content generated by AI models. It helps improve the safety, governance, and compliance of generative AI services.

Language Model Evaluation Harness

A language model evaluation harness has been introduced to support standardized evaluation of large language models.

The evaluation framework supports multiple benchmark tasks and datasets, enabling users to systematically measure model performance and make data-driven decisions when selecting or optimizing models.

Deprecated Features

None.

Fixed Issues

After deleting a model, the list page fails to reflect the deletion result immediately, and the deleted model still briefly exists in the list.
When accessing an AI page within a namespace that is not under management, you cannot switch to a page within a namespace that is under management.

Known Issues

Modifying library_name in Gitlab by directly editing the readme file does not synchronize the model type change on the page.
Temporary solution: Use UI operation to modify the library_name to avoid direct operation in Gitlab.
When the platform access address utilizes a self-signed certificate, updating other access addresses of the platform will trigger the re-issuance of the self-signed certificate. Until the new certificate is synchronized to the model downloading program of the inference service, model downloads will fail.
Temporary Solution: The certificate for the platform access address will be automatically synchronized in the background. If you encounter model download failures due to certificate verification errors, please wait a few minutes and then attempt to restart the inference service.
When using VictoriaMetrics for monitoring data collection of inference services operating in Serverless mode, there is a known issue where the inference services cannot scale down to zero.
When deploying an inference service, if users implement image downloading from OCI by modifying the YAML file, after the inference service is created, any subsequent update that triggers modifications and submission via the UI form will result in the invalidation of the storageUri field data for the model. Consequently, the model will fail to start.
Temporary Solution: For inference services that implement image downloading from OCI via YAML, if an update is required, please make the changes through the YAML editor on the page. Alternatively, after updating via the page interface, double-check the storageUri field in the YAML editor, correct it if necessary, and then submit the changes.

#Release Notes

#TOC

#AI 2.1.0

#New and Optimized Features

#Image-Based Model Support

#Model Compression Toolkit

#Event-Driven Autoscaling

#Notebook Base Image Library

#TrustyAI Drift Detection

#Safety Guardrails

#Language Model Evaluation Harness

#Deprecated Features

#Fixed Issues

#Known Issues