Harbor Webhook Delivery Is Delayed or Not Sent

Problem Description

After an event occurs in a Harbor project (for example, pushing an image), the configured Webhook endpoint does not receive the notification in a timely manner. Symptoms include:

  • Webhook consumers (such as artifact triggers in a CI/CD pipeline) react with long delays (minutes or longer) or not at all.
  • Harbor UI → Job Service Dashboard shows a growing number of WEBHOOK jobs in the pending queue.

Root Cause

Harbor delivers Webhooks through the following path:

  1. harbor-core enqueues a key into Redis for each event.
  2. harbor-jobservice consumes the queue and calls the configured Webhook endpoints.
  3. If the HTTP call fails, harbor-jobservice retries up to 10 times before dropping the event.

Two common conditions cause the pipeline to back up:

  • Unreachable Webhook endpoints piling up retries. When one or more endpoints configured on a project are no longer reachable (decommissioned service, wrong URL, firewall changed), every event targeting them burns all 10 retries before being dropped. With enough invalid endpoints, jobservice spends most of its time retrying dead targets, and legitimate Webhooks queue behind them.
  • harbor-jobservice being restarted (for example, OOM-killed). If the Pod repeatedly restarts under memory pressure, in-flight jobs are interrupted and the queue keeps growing.

Troubleshooting

Step 1 - Inspect the Job Service Dashboard

In the Harbor UI, open Job Service Dashboard and look at jobs with Type = WEBHOOK. Pay attention to:

  • Pending Count — the number of Webhook jobs waiting to be sent.
  • Latency — how long the oldest pending job has been waiting.

A steadily growing Pending Count is a strong signal that retries are piling up.

Step 2 - Check jobservice health

Confirm harbor-jobservice is not crash-looping:

kubectl -n <NAMESPACE> get pods -l component=jobservice
kubectl -n <NAMESPACE> describe pod <RELEASE>-harbor-jobservice-xxxxx

If you see frequent restarts or an OOMKilled reason in the Pod status, the jobservice is under-resourced. Increase its CPU/memory request and limit in the Harbor CR.

Step 3 - Check jobservice logs for failing endpoints

kubectl -n <NAMESPACE> logs <RELEASE>-harbor-jobservice-xxxxx | grep -i webhook

Repeated failures against the same URL indicate an unreachable endpoint that should be cleaned up.

Step 4 - Validate endpoints manually

From a Pod that has network access similar to jobservice, test each Webhook URL configured on the affected project:

curl -v -X POST <WEBHOOK_URL>

Any URL that times out or returns an error from here is a candidate for removal.

Solution

1. Remove invalid Webhook configurations

In Harbor UI, open the affected project → Webhooks and delete every endpoint identified as unreachable in Step 4. This stops new events from feeding the retry loop.

2. Drain the stuck queue

In Job Service Dashboard, select the WEBHOOK jobs that are pending and click STOP. After this, Pending Count for WEBHOOK should drop to 0 and legitimate events will flow again.

3. Restart jobservice if the queue does not recover

If the queue is still stuck after stopping the jobs, restart harbor-jobservice:

kubectl -n <NAMESPACE> rollout restart deployment <RELEASE>-harbor-jobservice

4. Right-size jobservice resources

If Step 2 showed the Pod being OOM-killed, raise the resource limits on the Harbor CR so jobservice has enough memory for its working set. A small Harbor under active use typically needs at least 4 CPU / 8 Gi memory for jobservice to remain stable.

Notes

  • Even with all three mitigations, Webhook delivery is best-effort. Consumers must not assume every event will be delivered — the retry budget is finite (10 attempts) and events are dropped after that.
  • When designing integrations that rely on Harbor Webhooks, prefer idempotent consumers and periodically reconcile against the Harbor API rather than trusting a single notification.