Harbor Webhook Delivery Is Delayed or Not Sent
TOC
Problem DescriptionRoot CauseTroubleshootingStep 1 - Inspect the Job Service DashboardStep 2 - Check jobservice healthStep 3 - Check jobservice logs for failing endpointsStep 4 - Validate endpoints manuallySolution1. Remove invalid Webhook configurations2. Drain the stuck queue3. Restart jobservice if the queue does not recover4. Right-size jobservice resourcesNotesProblem Description
After an event occurs in a Harbor project (for example, pushing an image), the configured Webhook endpoint does not receive the notification in a timely manner. Symptoms include:
- Webhook consumers (such as artifact triggers in a CI/CD pipeline) react with long delays (minutes or longer) or not at all.
- Harbor UI → Job Service Dashboard shows a growing number of
WEBHOOKjobs in the pending queue.
Root Cause
Harbor delivers Webhooks through the following path:
harbor-coreenqueues a key into Redis for each event.harbor-jobserviceconsumes the queue and calls the configured Webhook endpoints.- If the HTTP call fails,
harbor-jobserviceretries up to 10 times before dropping the event.
Two common conditions cause the pipeline to back up:
- Unreachable Webhook endpoints piling up retries. When one or more endpoints configured on a project are no longer reachable (decommissioned service, wrong URL, firewall changed), every event targeting them burns all 10 retries before being dropped. With enough invalid endpoints,
jobservicespends most of its time retrying dead targets, and legitimate Webhooks queue behind them. harbor-jobservicebeing restarted (for example, OOM-killed). If the Pod repeatedly restarts under memory pressure, in-flight jobs are interrupted and the queue keeps growing.
Troubleshooting
Step 1 - Inspect the Job Service Dashboard
In the Harbor UI, open Job Service Dashboard and look at jobs with Type = WEBHOOK. Pay attention to:
- Pending Count — the number of Webhook jobs waiting to be sent.
- Latency — how long the oldest pending job has been waiting.
A steadily growing Pending Count is a strong signal that retries are piling up.
Step 2 - Check jobservice health
Confirm harbor-jobservice is not crash-looping:
If you see frequent restarts or an OOMKilled reason in the Pod status, the jobservice is under-resourced. Increase its CPU/memory request and limit in the Harbor CR.
Step 3 - Check jobservice logs for failing endpoints
Repeated failures against the same URL indicate an unreachable endpoint that should be cleaned up.
Step 4 - Validate endpoints manually
From a Pod that has network access similar to jobservice, test each Webhook URL configured on the affected project:
Any URL that times out or returns an error from here is a candidate for removal.
Solution
1. Remove invalid Webhook configurations
In Harbor UI, open the affected project → Webhooks and delete every endpoint identified as unreachable in Step 4. This stops new events from feeding the retry loop.
2. Drain the stuck queue
In Job Service Dashboard, select the WEBHOOK jobs that are pending and click STOP. After this, Pending Count for WEBHOOK should drop to 0 and legitimate events will flow again.
3. Restart jobservice if the queue does not recover
If the queue is still stuck after stopping the jobs, restart harbor-jobservice:
4. Right-size jobservice resources
If Step 2 showed the Pod being OOM-killed, raise the resource limits on the Harbor CR so jobservice has enough memory for its working set. A small Harbor under active use typically needs at least 4 CPU / 8 Gi memory for jobservice to remain stable.
Notes
- Even with all three mitigations, Webhook delivery is best-effort. Consumers must not assume every event will be delivered — the retry budget is finite (10 attempts) and events are dropped after that.
- When designing integrations that rely on Harbor Webhooks, prefer idempotent consumers and periodically reconcile against the Harbor API rather than trusting a single notification.