etcd Backup and Restore
The etcd service on the cluster is a distributed key-value store responsible for storing cluster configuration information. etcd is deployed on all control plane nodes of the cluster.
After installing the Alauda Container Platform Cluster Enhancer plugin, an EtcdBackupConfiguration resource is automatically created for the cluster configuration. The EtcdBackupConfiguration contains information about backup data sources (control nodes, backup paths), backup data storage locations, backup methods, and more. Each backup execution based on the policy generates a new backup record, enabling you to back up cluster configurations on-demand or automatically on a periodic basis.
TOC
PrerequisitesHow it worksConfiguration ReferenceSchedule and RetentionViewing Backup RecordsUsing the Platform UIUsing the CLIS3 Backup ConfigurationPrerequisitesStep 1: Create S3 SecretStep 2: Configure EtcdBackupConfigurationStep 3: Verify Backupetcd RestorePrerequisitesStep 1: Backup Original Data and Modify etcd ConfigurationStep 2: Copy Backup SnapshotStep 3: Restore etcdStep 4: Distribute Restored DataStep 5: Restart Cluster ComponentsStep 6: Verify RecoveryConfiguration ManagementPrerequisites
To enable etcd backup:
- Download Alauda Container Platform Cluster Enhancer from the Customer Portal.
- Upload the package to the platform.
- Install the plugin on your cluster.
After installation, an EtcdBackupConfiguration resource is automatically created.
How it works
- etcd backup is provided by Alauda Container Platform Cluster Enhancer
- Supports both local storage and S3-compatible object storage. By default, backups are stored locally in
/cpaas. Configuring S3 storage creates an additional copy in the S3 bucket; local backups continue to be generated. - For clusters running on Immutable OS, S3 storage is required (local storage is not supported)
Configuration Reference
You can configure the EtcdBackupConfiguration resource to customize backup schedules, retention policies, and storage options.
Schedule and Retention
schedule: Defines the backup frequency using standard cron syntax.- Example:
0 0 * * *(Run backup daily at midnight).
- Example:
localStorage: Configures local backup storage.path: The directory on the host where backups are stored. Default is/cpaas.ttl: The retention period for backup files in seconds. Backups older than this duration will be automatically deleted.- Example:
7776000(approximately 90 days).
- Example:
paused: Set totrueto temporarily suspend automatic backups without deleting the configuration.
Example configuration:
Viewing Backup Records
To view etcd backup records, you can use the platform UI or the command line.
Using the Platform UI
- In the left navigation bar, click Operation Center > Monitor > Dashboards.
- Click Switch in the upper right corner of the page.
- Click Cluster → etcd backup to view the etcd backup records.
Using the CLI
You can verify backup status and history by checking the status field of the EtcdBackupConfiguration resource:
The output contains a status.records list with details for each backup, including:
backupTimestamp: Time the backup was created.fileName: Name of the backup file (e.g.,snapshot-etcd-<date>-<time>-<ip>.tar).result: Outcome of the backup operation (e.g.,Success).
S3 Backup Configuration
To enable S3 storage for etcd backups, follow these steps:
Prerequisites
- Alauda Container Platform Cluster Enhancer is installed on the cluster.
Step 1: Create S3 Secret
Prepare your S3 access credentials and create a Kubernetes secret in the cpaas-system namespace:
Step 2: Configure EtcdBackupConfiguration
Modify the EtcdBackupConfiguration resource to add the remoteStorage field with S3 configuration:
Step 3: Verify Backup
Trigger a manual etcd backup to verify the configuration:
After the backup completes, verify that backup files exist in your S3 bucket.
etcd Restore
Warning:
- This operation performs a disastrous recovery of the etcd cluster. It will overwrite the existing data. Ensure you have a valid backup snapshot before proceeding.
- This procedure entails significant risks. If you are unsure about the operation, please contact technical support.
- During the recovery process, the Kubernetes API Server will be unavailable.
Prerequisites
- The Kubernetes cluster is deployed using hostnames (
kubectl get nodeshows hostnames as node names). - An etcd backup snapshot is available.
- The cluster is malfunctioning due to the failure of etcd nodes (e.g., more than half of the control plane nodes are down).
- This recovery procedure is specifically designed for a 3-node control plane cluster. If your cluster has 5 or more control plane nodes, please contact technical support for assistance.
Step 1: Backup Original Data and Modify etcd Configuration
Execute the following commands on all control plane nodes:
Note: Verify the indentation of --initial-cluster-state=existing in /etc/kubernetes/manifests/etcd.yaml.
Step 2: Copy Backup Snapshot
Copy the latest etcd backup snapshot to the /tmp directory on the first control plane node and name it snapshot.db.
Step 3: Restore etcd
Execute the following script on the first control plane node to restore the snapshot.
Note: The following script assumes a 3-node control plane cluster. If your cluster has 5 or more nodes, please contact technical support.
After the script completes, three directories (etcd_$host) are generated in the /root directory.
Step 4: Distribute Restored Data
-
Transfer the restored data directories to the corresponding control plane nodes. Use
scpor a similar tool to copy the directories generated in Step 3 (/root/etcd_<hostname>) from the first node to the others.For example, transfer to etcd-2 and etcd-3:
-
Restore the data to the etcd data directory (
/var/lib/etcd) on each control plane node.
Step 5: Restart Cluster Components
Execute the following commands on all control plane nodes:
Step 6: Verify Recovery
-
Check if the etcd cluster is healthy. You can execute this command inside any
etcdpod or using theetcdctlbinary on the host: -
Check if the Kubernetes pods are running correctly:
-
Restart kubelet on all nodes (both control plane and worker nodes) to ensure all components reconnect to the recovered etcd:
Configuration Management
To modify the default etcd backup configuration, contact technical support for detailed configuration options and advanced settings.