Metrics Collection for Tekton Components

Overview

Tekton components expose Prometheus-compatible metrics via HTTP endpoints. By deploying ServiceMonitor resources, Prometheus (or VictoriaMetrics) can automatically discover and scrape these metrics.

Namespace Note: This document uses tekton-pipelines as the default namespace for control-plane components (Pipelines, Triggers, Results, Chains). The primary exception is EventListener Services, which run in application namespaces where EventListeners are created.

If your deployment uses different namespaces, update both the commands and the namespaceSelector fields in ServiceMonitor resources below.

This document covers metrics for the following Tekton components:

  • Tekton Pipelines - PipelineRun / TaskRun execution metrics
  • Tekton Triggers - EventListener, TriggerBinding, and related resource metrics
  • Tekton Results - Run deletion and storage metrics
  • Tekton Chains - Signing and provenance metrics
  • Controller Framework - Infrastructure metrics shared by all controllers

It also covers:

  • How to configure metrics behavior via config-observability
  • How to deploy ServiceMonitor resources for scraping
  • How to verify that metrics collection is working

Prerequisites

  • Tekton control-plane components are installed and running (at minimum, the components you plan to scrape: Pipelines, Triggers, Results, and/or Chains).
  • kubectl is configured against the target cluster, and your account can create ServiceMonitor resources in the monitoring namespace.
  • A monitoring stack is deployed (Prometheus or compatible VictoriaMetrics) and can discover/scrape ServiceMonitor resources (or equivalent scrape-discovery objects used by your platform).
  • Your Prometheus/VictoriaMetrics instance is configured to discover the ServiceMonitor objects you create (namespace and label selectors must match).
  • Network policies and firewalls allow scraper pods to reach Tekton metrics ports (9090 for most control-plane services, 9000 for Triggers controller and EventListener sink).
  • If you want EventListener sink metrics, EventListeners must exist in their target namespaces and expose the http-metrics port.

Tekton Pipelines

The Tekton Pipelines component includes multiple sub-services that expose metrics on port 9090:

ServiceDescriptionMetrics Port
tekton-pipelines-controllerMain reconciler for PipelineRun / TaskRun9090
tekton-pipelines-webhookAdmission webhook9090
tekton-events-controllerCloudEvents controller9090
tekton-pipelines-remote-resolversRemote resource resolution9090

The Pipeline controller metrics use the prefix tekton_pipelines_controller_.

PipelineRun Metrics

Metric NameTypeDescriptionLabels
pipelinerun_duration_secondsHistogram / LastValuePipelineRun execution time in secondsstatus, namespace, pipeline*, pipelinerun*, reason*
pipelinerun_totalCounterTotal number of completed PipelineRunsstatus
running_pipelinerunsLastValue (Gauge)Number of currently running PipelineRunsControlled by metrics.running-pipelinerun.level (see below)
running_pipelineruns_waiting_on_pipeline_resolutionLastValue (Gauge)PipelineRuns waiting on Pipeline reference resolution-
running_pipelineruns_waiting_on_task_resolutionLastValue (Gauge)PipelineRuns waiting on Task reference resolution-

* Labels marked with * are optional and depend on the config-observability configuration.

running_pipelineruns Label Levels

The running_pipelineruns metric labels are controlled by metrics.running-pipelinerun.level:

LevelLabels
"" (default, cluster)No labels
"namespace"namespace
"pipeline"namespace, pipeline
"pipelinerun"namespace, pipeline, pipelinerun

Status Label Values

For PipelineRun metrics:

  • success - PipelineRun completed successfully
  • failed - PipelineRun failed
  • cancelled - PipelineRun was cancelled

For TaskRun metrics:

  • success - TaskRun completed successfully
  • failed - TaskRun failed

TaskRun Metrics

Metric NameTypeDescriptionLabels
taskrun_duration_secondsHistogram / LastValueStandalone TaskRun execution time in secondsstatus, namespace, task*, taskrun*, reason*
pipelinerun_taskrun_duration_secondsHistogram / LastValueTaskRun execution time when part of a PipelineRunstatus, namespace, task*, taskrun*, pipeline*, pipelinerun*, reason*
taskrun_totalCounterTotal number of completed TaskRunsstatus
running_taskrunsLastValue (Gauge)Number of currently running TaskRuns-
running_taskruns_waiting_on_task_resolution_countLastValue (Gauge)TaskRuns waiting on Task reference resolution-
running_taskruns_throttled_by_quotaLastValue (Gauge)TaskRuns throttled by ResourceQuotanamespace*
running_taskruns_throttled_by_nodeLastValue (Gauge)TaskRuns throttled by node-level resource constraintsnamespace*
taskruns_pod_latency_millisecondsLastValuePod scheduling latency for TaskRuns in millisecondsnamespace, pod, task*, taskrun*

config-observability Configuration

The config-observability ConfigMap in the tekton-pipelines namespace controls metrics behavior for the Pipeline controller. This ConfigMap is managed by the Tekton Operator and should be configured via the TektonConfig resource's spec.pipeline.options.configMaps field. See Adjusting Optional Configuration Items for Subcomponents for details.

Hot reload behavior: config-observability is watched at runtime. Most key changes (for example metrics.*) take effect without restarting Pods. Allow one or two scrape intervals for dashboard/query changes to appear. A restart is only needed when Pod spec settings change (for example changing CONFIG_OBSERVABILITY_NAME in the Deployment).

Example configuration via TektonConfig:

apiVersion: operator.tekton.dev/v1alpha1
kind: TektonConfig
metadata:
  name: config
spec:
  pipeline:
    options:
      disabled: false
      configMaps:
        config-observability:
          data:
            metrics.backend-destination: prometheus

            # PipelineRun metrics aggregation level.
            # Values: "pipelinerun" | "pipeline" (default) | "namespace"
            #   - "pipelinerun": includes pipeline + pipelinerun labels; duration uses LastValue
            #   - "pipeline": includes pipeline label only
            #   - "namespace": no pipeline/pipelinerun labels
            metrics.pipelinerun.level: "pipeline"

            # TaskRun metrics aggregation level.
            # Values: "taskrun" | "task" (default) | "namespace"
            #   - "taskrun": includes task + taskrun labels; duration uses LastValue
            #   - "task": includes task label only
            #   - "namespace": no task/taskrun labels
            metrics.taskrun.level: "task"

            # Duration metric type for PipelineRun / TaskRun.
            # Values: "histogram" (default) | "lastvalue"
            # Note: When pipelinerun.level is "pipelinerun" or taskrun.level is "taskrun",
            #       duration type is forced to "lastvalue" regardless of this setting.
            metrics.pipelinerun.duration-type: "histogram"
            metrics.taskrun.duration-type: "histogram"

            # Running PipelineRun metrics aggregation level.
            # Values: "pipelinerun" | "pipeline" | "namespace" | "" (default, cluster-level)
            metrics.running-pipelinerun.level: ""

            # Include reason label on duration metrics (pipelinerun_duration_seconds,
            # taskrun_duration_seconds, pipelinerun_taskrun_duration_seconds).
            # Values: "true" | "false" (default)
            # Warning: Enabling this increases label cardinality.
            # Note: Despite the key name, this does NOT affect count metrics
            # (pipelinerun_total / taskrun_total), only duration metrics.
            metrics.count.enable-reason: "false"

            # Include namespace label on throttled TaskRun metrics.
            # Values: "true" | "false" (default)
            metrics.taskrun.throttle.enable-namespace: "false"

Histogram Buckets

When the duration type is histogram, the following bucket boundaries (in seconds) are used:

10, 30, 60, 300, 900, 1800, 3600, 5400, 10800, 21600, 43200, 86400

This corresponds to: 10s, 30s, 1m, 5m, 15m, 30m, 1h, 1.5h, 3h, 6h, 12h, 24h.

For production environments, use aggregated levels to control label cardinality:

metrics.pipelinerun.level: "pipeline"
metrics.taskrun.level: "task"
metrics.pipelinerun.duration-type: "histogram"
metrics.taskrun.duration-type: "histogram"
metrics.count.enable-reason: "false"

If you need per-run granularity for debugging, temporarily switch to:

metrics.pipelinerun.level: "pipelinerun"
metrics.taskrun.level: "taskrun"

Note that this will significantly increase the number of time series.


Tekton Triggers

The Tekton Triggers component exposes two categories of metrics from different processes.

Controller Metrics (port 9000)

The Triggers controller reports resource count metrics every 60 seconds.

ServiceMetrics Port
tekton-triggers-controller9000

The Triggers controller metrics use the prefix controller_.

Metric NameTypeDescriptionLabels
eventlistener_countLastValue (Gauge)Number of EventListener resources-
triggerbinding_countLastValue (Gauge)Number of TriggerBinding resources-
clustertriggerbinding_countLastValue (Gauge)Number of ClusterTriggerBinding resources-
triggertemplate_countLastValue (Gauge)Number of TriggerTemplate resources-
clusterinterceptor_countLastValue (Gauge)Number of ClusterInterceptor resources-

EventListener Sink Metrics

Each EventListener pod exposes additional HTTP and event processing metrics. These metrics come from the EventListener sink process (not the controller). The Prometheus metric prefix is eventlistener_.

Metric Name (Prometheus)TypeDescriptionLabels
eventlistener_http_duration_secondsHistogramEventListener HTTP request duration-
eventlistener_event_received_countCounterTotal events received by the sinkstatus
eventlistener_triggered_resourcesCounterTotal resources created by triggerskind
  • eventlistener_http_duration_seconds histogram buckets: 0.001, 0.01, 0.1, 1, 10 (seconds)
  • eventlistener_event_received_count status values: succeeded, failed
  • eventlistener_triggered_resources kind values: the Kubernetes resource Kind of the created object (e.g., PipelineRun, TaskRun)

These sink metrics are exposed per EventListener pod, not from the central controller. You may need a separate ServiceMonitor or PodMonitor to scrape them if the EventListener pods expose a metrics port.


Tekton Results

Tekton Results has two sub-services that expose metrics.

ServiceDescriptionMetrics Port
tekton-results-watcherWatches and cleans up PipelineRun/TaskRun resources9090
tekton-results-apigRPC/REST API server9090

Watcher Metrics

The Watcher metrics use the prefix watcher_.

Deletion Metrics

Metric NameTypeDescriptionLabels
pipelinerun_delete_countCounterTotal number of deleted PipelineRunsstatus, namespace
pipelinerun_delete_duration_secondsHistogram / LastValueTime from PipelineRun completion to deletionstatus, namespace, pipeline*
taskrun_delete_countCounterTotal number of deleted TaskRunsstatus, namespace
taskrun_delete_duration_secondsHistogram / LastValueTime from TaskRun completion to deletionstatus, namespace, pipeline*, task*

* Optional labels depend on config-observability settings for the Results Watcher.

Note: pipelinerun_delete_count, pipelinerun_delete_duration_seconds, taskrun_delete_count, and taskrun_delete_duration_seconds are only recorded when the Watcher actually deletes runs. These metrics will remain empty (no data points) unless the --completed_run_grace_period flag is set to a non-zero value on the tekton-results-watcher Deployment. By default this flag is 0, which disables automatic deletion. Set it to a positive duration (e.g. 10m) to enable deletion after a grace period, or to a negative value to delete immediately after archiving.

Status label values for Results Watcher:

  • success - Run completed successfully
  • failed - Run failed
  • cancelled - Run was cancelled

Shared Metrics

These metrics are registered by both the PipelineRun and TaskRun reconcilers in the Watcher, tracking storage-related events.

Metric NameTypeDescriptionLabels
runs_not_stored_countCounterRuns deleted without being stored to Resultskind, namespace
run_storage_latency_secondsHistogramTime from run completion to successful storagekind, namespace

The kind label identifies run type (PipelineRun / TaskRun in some metric series, pipelinerun / taskrun in others).

Note: runs_not_stored_count is only recorded when a run is externally deleted (e.g. via kubectl delete) while the Watcher is holding a finalizer to coordinate archiving. It will remain empty unless all of the following conditions are met:

  1. The --logs_api flag is false (log storage disabled) — if logs are enabled, the Watcher skips finalizer-based coordination entirely.
  2. The --disable_crd_update flag is false (annotation updates enabled).
  3. The --store_deadline flag is set to a non-zero duration — this is the maximum time the Watcher waits for archiving to complete before giving up and allowing deletion.
  4. A run is externally deleted before it is successfully archived (no results.tekton.dev/stored=true annotation), and the store_deadline has elapsed.

In normal operation (runs archived before deletion, or deletion triggered by the Watcher itself via --completed_run_grace_period), this counter stays at zero. A non-zero value indicates potential data loss: runs were deleted before their state could be saved to the Results API.

Quick reproduction (test environment): If you do not see this metric, that usually means the trigger conditions were not met, not that the metric is missing.

  1. Configure Results Watcher via TektonConfig so that logs_api=false, disable_crd_update=false, and store_deadline is non-zero (for example 30s).
  2. Temporarily set Results API replicas to 0 via TektonConfig (spec.result.options.deployments.tekton-results-api.spec.replicas: 0) so runs cannot be archived.
  3. Create a TaskRun or PipelineRun and wait until it completes.
  4. Wait until store_deadline has elapsed, then externally delete the run (kubectl delete ...).
  5. Check Watcher /metrics or Prometheus for watcher_runs_not_stored_count (component-prefixed name in exposition format); it should increase.
  6. Restore the original TektonConfig (re-enable Results API replicas and normal logs_api settings).

The run_storage_latency_seconds histogram uses the following bucket boundaries (in seconds):

0.1, 0.5, 1, 2, 5, 10, 30, 60, 120, 300, 600, 1800

Watcher config-observability

The Results Watcher has its own config-observability ConfigMap (named via the CONFIG_OBSERVABILITY_NAME environment variable, typically tekton-results-config-observability). This ConfigMap is managed by the Tekton Operator and should be configured via the TektonConfig resource's spec.results.options.configMaps field. See Adjusting Optional Configuration Items for Subcomponents for details.

Hot reload behavior: Results Watcher also watches this ConfigMap and applies most key changes without Pod restarts. A restart is only needed when Deployment-level settings (such as env vars/args) are changed.

It supports the following keys:

KeyDefaultValuesDescription
metrics.pipelinerun.levelpipelinepipeline, namespaceControls pipeline label on delete duration metrics
metrics.taskrun.leveltasktask, namespaceControls task label on delete duration metrics
metrics.pipelinerun.duration-typehistogramhistogram, lastvalueDuration metric aggregation type for both PipelineRun and TaskRun deletion
metrics.taskrun.duration-typehistogramhistogram, lastvalueParsed but currently not used; metrics.pipelinerun.duration-type controls both

Note: Unlike Tekton Pipelines, the Results Watcher does not support pipelinerun / taskrun individual-run granularity levels. It also does not have the metrics.count.enable-reason, metrics.running-pipelinerun.level, or metrics.taskrun.throttle.enable-namespace keys.

Known issue in upstream: taskrun_delete_duration_seconds uses metrics.pipelinerun.duration-type (not metrics.taskrun.duration-type) to determine the aggregation type. This appears to be a copy-paste bug in the Results source code.

API Server Metrics

The API server exposes standard gRPC Prometheus metrics via the go-grpc-prometheus library on port 9090. These include:

  • grpc_server_handled_total - Total RPCs completed on the server
  • grpc_server_started_total - Total RPCs started on the server
  • grpc_server_msg_received_total / grpc_server_msg_sent_total - Message counts
  • grpc_server_handling_seconds (if PROMETHEUS_HISTOGRAM is enabled) - RPC handling duration

Tekton Chains

Tekton Chains is a security component that generates, signs, and stores provenance for artifacts built with Tekton Pipelines. It observes completed TaskRuns and PipelineRuns, then creates attestations and signatures.

ServiceDescriptionMetrics Port
tekton-chains-metricsChains watcher/controller9090 (http-metrics)

The Chains controller metrics use the prefix watcher_ (same as Results Watcher, but the custom metric names are different, so there are no collisions).

Chains Metrics

All Chains metrics are Counters with no labels.

Metric Name (Prometheus)TypeDescription
watcher_taskrun_sign_created_totalCounterTotal signed messages for TaskRuns
watcher_taskrun_payload_stored_totalCounterTotal stored payloads for TaskRuns
watcher_taskrun_marked_signed_totalCounterTotal TaskRuns marked as signed
watcher_pipelinerun_sign_created_totalCounterTotal signed messages for PipelineRuns
watcher_pipelinerun_payload_stored_totalCounterTotal stored payloads for PipelineRuns
watcher_pipelinerun_marked_signed_totalCounterTotal PipelineRuns marked as signed

Note: The official Tekton Chains documentation also mentions *_signing_failures_total counters for both TaskRun and PipelineRun, but these are not present in the current upstream source code. Verify against your deployed version.


Controller Framework Metrics

All Tekton controllers automatically expose the following infrastructure metrics. These metrics use the same prefix as the component's custom metrics (e.g., tekton_pipelines_controller_, controller_, watcher_).

Metric Name (without prefix)TypeDescription
client_latencyHistogramKubernetes API client request latency (seconds)
client_resultsCounterKubernetes API request count (by status code)
workqueue_depthGaugeCurrent workqueue depth
workqueue_adds_totalCounterTotal workqueue additions
workqueue_queue_latency_secondsHistogramTime items spend waiting in the workqueue
workqueue_work_duration_secondsHistogramTime spent processing workqueue items
workqueue_retries_totalCounterTotal workqueue retries
workqueue_unfinished_work_secondsHistogramDuration of unfinished workqueue items
workqueue_longest_running_processor_secondsHistogramDuration of longest running workqueue processor
reconcile_countCounterTotal reconciler invocations (labeled by reconciler, success, namespace_name)
reconcile_latencyHistogramReconciler invocation latency (labeled by reconciler, success, namespace_name)

Setting Up ServiceMonitor

To enable Prometheus scraping for Tekton components, deploy ServiceMonitor resources.

Prerequisites are listed in Prerequisites.

Use the following guidance based on your monitoring stack:

  • If you use Prometheus (Prometheus Operator), labels such as metadata.labels.prometheus: kube-prometheus must match the Prometheus CR spec.serviceMonitorSelector; otherwise, this ServiceMonitor will not be scraped.
  • If you use VictoriaMetrics, you typically do not need labels like prometheus: kube-prometheus; create ServiceMonitor/VMServiceScrape according to your monitoring setup.

When using Prometheus, use the following commands to find and verify the selector:

# 1) Locate Prometheus CRs (resource type: monitoring.coreos.com/v1, Kind=Prometheus)
$ kubectl get prometheus -A

# 2) Check ServiceMonitor selector on the target Prometheus instance
$ kubectl get prometheus -n <prometheus-namespace> <prometheus-name> -o yaml | yq '.spec.serviceMonitorSelector'

If no Prometheus CR exists in your cluster, monitoring is usually platform-managed (for example, VictoriaMetrics) or implemented differently. In such cases, labels like prometheus: kube-prometheus are usually not required; follow your platform scraping rules.

For more info please refer to Integrating External Metrics.

Pipeline ServiceMonitor

Pipeline ServiceMonitor YAML
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: tekton-pipelines-metrics
  namespace: tekton-pipelines
  labels:
    app.kubernetes.io/name: tekton-pipelines
    # prometheus: kube-prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/part-of: tekton-pipelines
  endpoints:
  - port: http-metrics
    path: /metrics
    interval: 30s
  namespaceSelector:
    matchNames:
    - tekton-pipelines

This ServiceMonitor matches Pipeline services with the label app.kubernetes.io/part-of: tekton-pipelines (including remote-resolvers) and scrapes them in the tekton-pipelines namespace.

Triggers ServiceMonitor

Triggers ServiceMonitor YAML
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: tekton-triggers-metrics
  namespace: tekton-pipelines
  labels:
    app.kubernetes.io/name: tekton-triggers
    # prometheus: kube-prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/part-of: tekton-triggers
      app.kubernetes.io/component: controller
  endpoints:
  - port: http-metrics
    path: /metrics
    interval: 30s
  namespaceSelector:
    matchNames:
    - tekton-pipelines

This ServiceMonitor collects Triggers controller metrics (controller_*) only. It does not include EventListener sink metrics.

EventListener Sink ServiceMonitor

EventListener Sink ServiceMonitor YAML
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: tekton-eventlistener-sink-metrics
  namespace: tekton-pipelines
  labels:
    app.kubernetes.io/name: tekton-eventlistener-sink
    # prometheus: kube-prometheus
spec:
  selector:
    matchExpressions:
    - key: eventlistener
      operator: Exists
    - key: app.kubernetes.io/managed-by
      operator: In
      values:
      - EventListener
  endpoints:
  - port: http-metrics
    path: /metrics
    interval: 30s
  namespaceSelector:
    any: true

EventListener Services usually run in application namespaces, so this example uses namespaceSelector.any: true for cross-namespace scraping. If you need tighter scope, switch to matchNames and list allowed namespaces explicitly.

Results ServiceMonitor

The Results services have both app.kubernetes.io/part-of: tekton-results and app.kubernetes.io/name labels. To precisely target API + Watcher (and exclude Postgres), this example matches app.kubernetes.io/name:

Results ServiceMonitor YAML
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: tekton-results-metrics
  namespace: tekton-pipelines
  labels:
    app.kubernetes.io/name: tekton-results
    # prometheus: kube-prometheus
spec:
  selector:
    matchExpressions:
    - key: app.kubernetes.io/name
      operator: In
      values:
      - tekton-results-api
      - tekton-results-watcher
  endpoints:
  - port: prometheus
    path: /metrics
    interval: 30s
  - port: metrics
    path: /metrics
    interval: 30s
  namespaceSelector:
    matchNames:
    - tekton-pipelines

The Results API server uses port name prometheus (9090) and the Watcher uses port name metrics (9090). Each service only exposes one of these port names, so only the matching endpoint will be scraped.

Chains ServiceMonitor

Chains ServiceMonitor YAML
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: tekton-chains-metrics
  namespace: tekton-pipelines
  labels:
    app.kubernetes.io/name: tekton-chains
    # prometheus: kube-prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/part-of: tekton-chains
  endpoints:
  - port: http-metrics
    path: /metrics
    interval: 30s
  namespaceSelector:
    matchNames:
    - tekton-pipelines

Verification

After deploying the ServiceMonitor resources, verify that Prometheus is scraping the targets.

Check Metrics Endpoints Directly

# Pipeline controller
$ kubectl port-forward -n tekton-pipelines svc/tekton-pipelines-controller 9090:9090
$ curl -s http://localhost:9090/metrics | grep tekton_pipelines_controller_

# HELP tekton_pipelines_controller_client_latency How long Kubernetes API requests take
# TYPE tekton_pipelines_controller_client_latency histogram
tekton_pipelines_controller_client_latency_bucket{name="",le="1e-05"} 0
tekton_pipelines_controller_client_latency_bucket{name="",le="0.0001"} 0
tekton_pipelines_controller_client_latency_bucket{name="",le="0.001"} 0

# Triggers controller
$ kubectl port-forward -n tekton-pipelines svc/tekton-triggers-controller 9000:9000
$ curl -s http://localhost:9000/metrics | grep controller_

# HELP controller_client_latency How long Kubernetes API requests take
# TYPE controller_client_latency histogram
controller_client_latency_bucket{name="",le="1e-05"} 0
controller_client_latency_bucket{name="",le="0.0001"} 1
controller_client_latency_bucket{name="",le="0.001"} 2

# EventListener sink metrics (replace namespace/service)
$ kubectl port-forward -n <eventlistener-namespace> svc/<eventlistener-service> 9000:9000
$ curl -s http://localhost:9000/metrics | grep eventlistener_

# HELP eventlistener_client_latency How long Kubernetes API requests take
# TYPE eventlistener_client_latency histogram
eventlistener_client_latency_bucket{name="",le="1e-05"} 0
eventlistener_client_latency_bucket{name="",le="0.0001"} 0
eventlistener_client_latency_bucket{name="",le="0.001"} 0

# HELP eventlistener_triggered_resources Count of the number of triggered eventlistener resources
# TYPE eventlistener_triggered_resources counter
eventlistener_triggered_resources{kind="PipelineRun"} 10

# Results watcher
$ kubectl port-forward -n tekton-pipelines svc/tekton-results-watcher 9091:9090
$ curl -s http://localhost:9091/metrics | grep watcher_

# HELP watcher_client_latency How long Kubernetes API requests take
# TYPE watcher_client_latency histogram
watcher_client_latency_bucket{name="",le="1e-05"} 0
watcher_client_latency_bucket{name="",le="0.0001"} 0
watcher_client_latency_bucket{name="",le="0.001"} 0

# Results API
$ kubectl port-forward -n tekton-pipelines svc/tekton-results-api-service 9092:9090
$ curl -s http://localhost:9092/metrics | grep grpc_server_

# HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE grpc_server_handled_total counter
grpc_server_handled_total{grpc_code="Aborted",grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Aborted",grpc_method="CreateRecord",grpc_service="tekton.results.v1alpha2.Results",grpc_type="unary"} 0
grpc_server_handled_total{grpc_code="Aborted",grpc_method="CreateResult",grpc_service="tekton.results.v1alpha2.Results",grpc_type="unary"} 0

# HELP grpc_server_started_total Total number of RPCs started on the server.
# TYPE grpc_server_started_total counter
grpc_server_started_total{grpc_method="Check",grpc_service="grpc.health.v1.Health",grpc_type="unary"} 337606
grpc_server_started_total{grpc_method="CreateRecord",grpc_service="tekton.results.v1alpha2.Results",grpc_type="unary"} 10301
grpc_server_started_total{grpc_method="CreateResult",grpc_service="tekton.results.v1alpha2.Results",grpc_type="unary"} 832

# Chains controller
$ kubectl port-forward -n tekton-pipelines svc/tekton-chains-metrics 9093:9090
$ curl -s http://localhost:9093/metrics | grep watcher_

# HELP watcher_client_latency How long Kubernetes API requests take
# TYPE watcher_client_latency histogram
watcher_client_latency_bucket{name="",le="1e-05"} 0
watcher_client_latency_bucket{name="",le="0.0001"} 0
watcher_client_latency_bucket{name="",le="0.001"} 0

EventListener sink metrics such as eventlistener_event_received_count and eventlistener_http_duration_seconds are request-driven. Send at least one request to the EventListener before validating these metrics.

Check Prometheus Targets

# Verify ServiceMonitor resources exist
$ kubectl get servicemonitor -n tekton-pipelines

NAME                                AGE
tekton-chains-metrics               10m
tekton-eventlistener-sink-metrics   10m
tekton-pipelines-metrics            10m
tekton-results-metrics              10m
tekton-triggers-metrics             10m

# Check Prometheus targets (via Prometheus UI or API)
# Look for targets with job labels matching the ServiceMonitor names

Example PromQL Queries

# PipelineRun cumulative success rate (avoids misinterpretation in empty completion windows)
100 * sum(tekton_pipelines_controller_pipelinerun_total{status="success"}) / clamp_min(sum(tekton_pipelines_controller_pipelinerun_total), 1)

# Completed PipelineRuns in the last 5 minutes (throughput)
round(sum(increase(tekton_pipelines_controller_pipelinerun_total[5m])))

# PipelineRun duration P95 (histogram mode)
histogram_quantile(0.95,
  rate(tekton_pipelines_controller_pipelinerun_duration_seconds_bucket[5m])
)

# TaskRun duration P95 (histogram mode, includes standalone + in-pipeline TaskRuns)
histogram_quantile(0.95,
  (
    sum by (le) (rate(tekton_pipelines_controller_taskrun_duration_seconds_bucket[5m]))
    +
    sum by (le) (rate(tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_bucket[5m]))
  )
)

# PipelineRun duration (lastvalue mode)
avg_over_time(tekton_pipelines_controller_pipelinerun_duration_seconds[5m])

# Currently running PipelineRuns (single series to avoid duplicate legends)
max(tekton_pipelines_controller_running_pipelineruns)

# TaskRuns throttled by resource quota
max(tekton_pipelines_controller_running_taskruns_throttled_by_quota)

# Trigger resource counts
controller_eventlistener_count
controller_triggertemplate_count

# Chains signing activity
watcher_taskrun_sign_created_total
watcher_pipelinerun_sign_created_total

MonitorDashboard Examples

The following MonitorDashboard resources provide ready-to-use dashboards for monitoring Tekton components. Deploy them to the cpaas-system namespace under the tekton folder.

Important: Each panel must include id (unique integer), datasource: prometheus, and transformations: []. Each target must include datasource: prometheus and refId. Duration P50/P95 panels in this document use *_bucket queries and require metrics.*.duration-type=histogram; if you use lastvalue, replace those queries with LastValue-style expressions such as avg_over_time(...).

Tekton Pipeline Dashboard

Tekton Pipeline Dashboard YAML
kind: MonitorDashboard
apiVersion: ait.alauda.io/v1alpha2
metadata:
  labels:
    cpaas.io/dashboard.folder: tekton
    cpaas.io/dashboard.is.home.dashboard: "false"
    cpaas.io/dashboard.tag.tekton: "true"
  name: tekton-pipeline
  namespace: cpaas-system
spec:
  body:
    titleZh: Tekton Pipeline Overview
    tags:
      - tekton
    time:
      from: now-1h
      to: now
    templating:
      list: []
    panels:
      - id: 1
        title: PipelineRun Total (by status)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 0, y: 0 }
        targets:
          - datasource: prometheus
            expr: sum by (status) (tekton_pipelines_controller_pipelinerun_total)
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 2
        title: TaskRun Total (by status)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 8, y: 0 }
        targets:
          - datasource: prometheus
            expr: sum by (status) (tekton_pipelines_controller_taskrun_total)
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 3
        title: PipelineRun Success Rate (cumulative)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 4, x: 16, y: 0 }
        targets:
          - datasource: prometheus
            expr: "100 * sum(tekton_pipelines_controller_pipelinerun_total{status=\"success\"}) / clamp_min(sum(tekton_pipelines_controller_pipelinerun_total), 1)"
            refId: A
        fieldConfig:
          defaults:
            unit: percent
            color: { mode: thresholds }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds:
              mode: absolute
              steps:
                - { color: red, value: null }
                - { color: orange, value: 80 }
                - { color: green, value: 95 }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 12
        title: Completed PipelineRuns (last 5m)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 4, x: 20, y: 0 }
        targets:
          - datasource: prometheus
            expr: "round(sum(increase(tekton_pipelines_controller_pipelinerun_total[5m])))"
            legendFormat: completed
            refId: A
        fieldConfig:
          defaults:
            unit: short
            decimals: 0
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 4
        title: Running PipelineRuns
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 0, y: 8 }
        targets:
          - datasource: prometheus
            expr: max(tekton_pipelines_controller_running_pipelineruns)
            legendFormat: running
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 5
        title: Running TaskRuns
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 8, y: 8 }
        targets:
          - datasource: prometheus
            expr: max(tekton_pipelines_controller_running_taskruns)
            legendFormat: running
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 6
        title: TaskRuns Throttled
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 16, y: 8 }
        targets:
          - datasource: prometheus
            expr: max(tekton_pipelines_controller_running_taskruns_throttled_by_quota)
            legendFormat: by quota
            refId: A
          - datasource: prometheus
            expr: max(tekton_pipelines_controller_running_taskruns_throttled_by_node)
            legendFormat: by node
            refId: B
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: orange, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 7
        title: PipelineRun Duration P50 / P95
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 0, y: 16 }
        targets:
          - datasource: prometheus
            expr: (histogram_quantile(0.5, sum by (le) (rate(tekton_pipelines_controller_pipelinerun_duration_seconds_bucket[5m])))) and on() (sum(rate(tekton_pipelines_controller_pipelinerun_duration_seconds_bucket{le="+Inf"}[5m])) > 0)
            legendFormat: P50
            refId: A
          - datasource: prometheus
            expr: (histogram_quantile(0.95, sum by (le) (rate(tekton_pipelines_controller_pipelinerun_duration_seconds_bucket[5m])))) and on() (sum(rate(tekton_pipelines_controller_pipelinerun_duration_seconds_bucket{le="+Inf"}[5m])) > 0)
            legendFormat: P95
            refId: B
        fieldConfig:
          defaults:
            unit: s
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 8
        title: TaskRun Duration P50 / P95 (Standalone)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 8, y: 16 }
        targets:
          - datasource: prometheus
            expr: (histogram_quantile(0.5, sum by (le) (rate(tekton_pipelines_controller_taskrun_duration_seconds_bucket[5m])))) and on() (sum(rate(tekton_pipelines_controller_taskrun_duration_seconds_bucket{le="+Inf"}[5m])) > 0)
            legendFormat: P50
            refId: A
          - datasource: prometheus
            expr: (histogram_quantile(0.95, sum by (le) (rate(tekton_pipelines_controller_taskrun_duration_seconds_bucket[5m])))) and on() (sum(rate(tekton_pipelines_controller_taskrun_duration_seconds_bucket{le="+Inf"}[5m])) > 0)
            legendFormat: P95
            refId: B
        fieldConfig:
          defaults:
            unit: s
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 13
        title: TaskRun Duration P50 / P95 (In-Pipeline)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 16, y: 16 }
        targets:
          - datasource: prometheus
            expr: (histogram_quantile(0.5, sum by (le) (rate(tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_bucket[5m])))) and on() (sum(rate(tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_bucket{le="+Inf"}[5m])) > 0)
            legendFormat: P50
            refId: A
          - datasource: prometheus
            expr: (histogram_quantile(0.95, sum by (le) (rate(tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_bucket[5m])))) and on() (sum(rate(tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_bucket{le="+Inf"}[5m])) > 0)
            legendFormat: P95
            refId: B
        fieldConfig:
          defaults:
            unit: s
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 9
        title: Workqueue Depth
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 0, y: 24 }
        targets:
          - datasource: prometheus
            expr: max(tekton_pipelines_controller_workqueue_depth)
            legendFormat: depth
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 10
        title: Reconcile Count (by success)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 8, y: 24 }
        targets:
          - datasource: prometheus
            expr: sum(increase(tekton_pipelines_controller_reconcile_count{success="true"}[5m]))
            legendFormat: success=true
            refId: A
          - datasource: prometheus
            expr: sum(increase(tekton_pipelines_controller_reconcile_count{success="false"}[5m]))
            legendFormat: success=false
            refId: B
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 11
        title: Resolution Waiting
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 8, x: 16, y: 24 }
        targets:
          - datasource: prometheus
            expr: max(tekton_pipelines_controller_running_pipelineruns_waiting_on_pipeline_resolution)
            legendFormat: PR waiting pipeline
            refId: A
          - datasource: prometheus
            expr: max(tekton_pipelines_controller_running_pipelineruns_waiting_on_task_resolution)
            legendFormat: PR waiting task
            refId: B
          - datasource: prometheus
            expr: max(tekton_pipelines_controller_running_taskruns_waiting_on_task_resolution_count)
            legendFormat: TR waiting task
            refId: C
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: orange, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []

Tekton Pipeline Dashboard Interpretation (Common Questions)

  • PipelineRun Total (by status) is a completion-event counter recorded by the controller, not the total number of PipelineRun objects. In the current implementation, user-triggered cancellation (spec.status=Cancelled) may not enter this counting path, so the cancelled series may be absent. To validate cancellation volume, check PipelineRun objects and events.
  • Running PipelineRuns is a real-time snapshot (how many are running now). It can change independently from PipelineRun Total.
  • Completed PipelineRuns (last 5m) is throughput (newly completed runs in the last 5 minutes). Seeing 0 during low traffic or idle periods is expected.
  • PipelineRun Success Rate (cumulative) is cumulative since controller start, not a 5-minute window success rate. A short-term failure does not immediately cause a large shift.
  • Reconcile Count (by success) measures controller reconcile loops, not PipelineRun counts.
  • Status series are shown only for label values that actually have samples in the selected time range. If a status has no samples in the window, its curve/legend will not appear.
  • TaskRun Duration P50 / P95 (Standalone) and TaskRun Duration P50 / P95 (In-Pipeline) are split to avoid mixed-query instability. In environments that only expose one histogram family, the other panel may be empty, which is expected.

Tekton Triggers Dashboard

Tekton Triggers Dashboard YAML
kind: MonitorDashboard
apiVersion: ait.alauda.io/v1alpha2
metadata:
  labels:
    cpaas.io/dashboard.folder: tekton
    cpaas.io/dashboard.is.home.dashboard: "false"
    cpaas.io/dashboard.tag.tekton: "true"
  name: tekton-triggers
  namespace: cpaas-system
spec:
  body:
    titleZh: Tekton Triggers Overview
    tags:
      - tekton
    time:
      from: now-1h
      to: now
    templating:
      list: []
    panels:
      - id: 1
        title: EventListener Count
        type: timeseries
        datasource: prometheus
        gridPos: { h: 6, w: 5, x: 0, y: 0 }
        targets:
          - datasource: prometheus
            expr: controller_eventlistener_count
            legendFormat: EventListener
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 2
        title: TriggerTemplate Count
        type: timeseries
        datasource: prometheus
        gridPos: { h: 6, w: 5, x: 5, y: 0 }
        targets:
          - datasource: prometheus
            expr: controller_triggertemplate_count
            legendFormat: TriggerTemplate
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 3
        title: TriggerBinding Count
        type: timeseries
        datasource: prometheus
        gridPos: { h: 6, w: 5, x: 10, y: 0 }
        targets:
          - datasource: prometheus
            expr: controller_triggerbinding_count
            legendFormat: TriggerBinding
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 4
        title: ClusterTriggerBinding
        type: timeseries
        datasource: prometheus
        gridPos: { h: 6, w: 5, x: 15, y: 0 }
        targets:
          - datasource: prometheus
            expr: controller_clustertriggerbinding_count
            legendFormat: ClusterTriggerBinding
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 5
        title: ClusterInterceptor
        type: timeseries
        datasource: prometheus
        gridPos: { h: 6, w: 4, x: 20, y: 0 }
        targets:
          - datasource: prometheus
            expr: controller_clusterinterceptor_count
            legendFormat: ClusterInterceptor
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 6
        title: All Trigger Resource Counts (trend)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 24, x: 0, y: 6 }
        targets:
          - datasource: prometheus
            expr: controller_eventlistener_count
            legendFormat: EventListener
            refId: A
          - datasource: prometheus
            expr: controller_triggertemplate_count
            legendFormat: TriggerTemplate
            refId: B
          - datasource: prometheus
            expr: controller_triggerbinding_count
            legendFormat: TriggerBinding
            refId: C
          - datasource: prometheus
            expr: controller_clustertriggerbinding_count
            legendFormat: ClusterTriggerBinding
            refId: D
          - datasource: prometheus
            expr: controller_clusterinterceptor_count
            legendFormat: ClusterInterceptor
            refId: E
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []

Tekton Triggers Dashboard Interpretation (Common Questions)

  • EventListener Count, TriggerTemplate Count, TriggerBinding Count, ClusterTriggerBinding, and ClusterInterceptor are object-count snapshots, not request volume or event-processing throughput.
  • All Trigger Resource Counts (trend) shows the combined trend for the same resource counts. Short deviations versus the single-resource trend panels within a scrape interval are expected.
  • Showing 0 when no Triggers resources exist is normal and does not indicate a scraping failure.

Tekton Results Dashboard

Tekton Results Dashboard YAML
kind: MonitorDashboard
apiVersion: ait.alauda.io/v1alpha2
metadata:
  labels:
    cpaas.io/dashboard.folder: tekton
    cpaas.io/dashboard.is.home.dashboard: "false"
    cpaas.io/dashboard.tag.tekton: "true"
  name: tekton-results
  namespace: cpaas-system
spec:
  body:
    titleZh: Tekton Results Overview
    tags:
      - tekton
    time:
      from: now-1h
      to: now
    templating:
      list: []
    panels:
      - id: 1
        title: PipelineRun Reconcile Count (last 5m)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 0, y: 0 }
        targets:
          - datasource: prometheus
            expr: round(sum(increase(watcher_reconcile_count{reconciler="github.com.tektoncd.results.pkg.watcher.reconciler.pipelinerun.Reconciler",success="true"}[5m])))
            legendFormat: success=true
            refId: A
          - datasource: prometheus
            expr: round(sum(increase(watcher_reconcile_count{reconciler="github.com.tektoncd.results.pkg.watcher.reconciler.pipelinerun.Reconciler",success="false"}[5m])))
            legendFormat: success=false
            refId: B
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 2
        title: TaskRun Reconcile Count (last 5m)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 12, y: 0 }
        targets:
          - datasource: prometheus
            expr: round(sum(increase(watcher_reconcile_count{reconciler="github.com.tektoncd.results.pkg.watcher.reconciler.taskrun.Reconciler",success="true"}[5m])))
            legendFormat: success=true
            refId: A
          - datasource: prometheus
            expr: round(sum(increase(watcher_reconcile_count{reconciler="github.com.tektoncd.results.pkg.watcher.reconciler.taskrun.Reconciler",success="false"}[5m])))
            legendFormat: success=false
            refId: B
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 3
        title: PipelineRun Reconcile Latency P95
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 0, y: 8 }
        targets:
          - datasource: prometheus
            expr: histogram_quantile(0.95, sum by (le) (rate(watcher_reconcile_latency_bucket{reconciler="github.com.tektoncd.results.pkg.watcher.reconciler.pipelinerun.Reconciler"}[5m])))
            legendFormat: P95
            refId: A
        fieldConfig:
          defaults:
            unit: ms
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 4
        title: TaskRun Reconcile Latency P95
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 12, y: 8 }
        targets:
          - datasource: prometheus
            expr: histogram_quantile(0.95, sum by (le) (rate(watcher_reconcile_latency_bucket{reconciler="github.com.tektoncd.results.pkg.watcher.reconciler.taskrun.Reconciler"}[5m])))
            legendFormat: P95
            refId: A
        fieldConfig:
          defaults:
            unit: ms
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 5
        title: Workqueue Depth (PipelineRun vs TaskRun)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 0, y: 16 }
        targets:
          - datasource: prometheus
            expr: sum(watcher_work_queue_depth{reconciler="github.com.tektoncd.results.pkg.watcher.reconciler.pipelinerun.Reconciler"})
            legendFormat: pipelinerun
            refId: A
          - datasource: prometheus
            expr: sum(watcher_work_queue_depth{reconciler="github.com.tektoncd.results.pkg.watcher.reconciler.taskrun.Reconciler"})
            legendFormat: taskrun
            refId: B
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 6
        title: Workqueue Adds (last 5m)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 12, y: 16 }
        targets:
          - datasource: prometheus
            expr: round(sum(increase(watcher_workqueue_adds_total{name=~"github.com.tektoncd.results.pkg.watcher.reconciler.pipelinerun.Reconciler-(consumer|fast|slow)"}[5m])))
            legendFormat: pipelinerun adds
            refId: A
          - datasource: prometheus
            expr: round(sum(increase(watcher_workqueue_adds_total{name=~"github.com.tektoncd.results.pkg.watcher.reconciler.taskrun.Reconciler-(consumer|fast|slow)"}[5m])))
            legendFormat: taskrun adds
            refId: B
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 7
        title: gRPC Request Rate (Results API)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 0, y: 24 }
        targets:
          - datasource: prometheus
            expr: "sum(rate(grpc_server_handled_total{grpc_service=~\"tekton.results.*\"}[5m]))"
            legendFormat: requests
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 8
        title: gRPC Error Percentage (Results API, excl. NotFound/Canceled)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 12, y: 24 }
        targets:
          - datasource: prometheus
            expr: "100 * ((sum(rate(grpc_server_handled_total{grpc_service=~\"tekton.results.*\",grpc_code!~\"OK|NotFound|Canceled\"}[5m])) or vector(0)) / clamp_min((sum(rate(grpc_server_handled_total{grpc_service=~\"tekton.results.*\"}[5m])) or vector(0)), 0.001))"
            legendFormat: error %
            refId: A
        fieldConfig:
          defaults:
            unit: percent
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: red, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []

Tekton Results Dashboard Interpretation (Common Questions)

  • This dashboard revision is based on Results Watcher reconcile/workqueue metrics plus Results API gRPC metrics, so it stays populated under common deployments (logs_api=true, automatic deletion disabled).
  • PipelineRun Reconcile Count (last 5m) and TaskRun Reconcile Count (last 5m) show separate 5-minute increments for success=true and success=false.
  • PipelineRun Reconcile Latency P95 and TaskRun Reconcile Latency P95 are calculated from watcher reconcile latency histograms. Under low traffic, the line can be sparse.
  • Workqueue Depth shows current queue depth, and Workqueue Adds (last 5m) shows enqueue volume over the last 5 minutes.
  • gRPC Error Percentage (Results API, excl. NotFound/Canceled) is the percentage of abnormal errors over total requests, excluding common business return codes (NotFound, Canceled).

Tekton Chains Dashboard

Tekton Chains Dashboard YAML
kind: MonitorDashboard
apiVersion: ait.alauda.io/v1alpha2
metadata:
  labels:
    cpaas.io/dashboard.folder: tekton
    cpaas.io/dashboard.is.home.dashboard: "false"
    cpaas.io/dashboard.tag.tekton: "true"
  name: tekton-chains
  namespace: cpaas-system
spec:
  body:
    titleZh: Tekton Chains Overview
    tags:
      - tekton
    time:
      from: now-1h
      to: now
    templating:
      list: []
    panels:
      - id: 1
        title: TaskRun Signatures Created (last 5m)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 0, y: 0 }
        targets:
          - datasource: prometheus
            expr: round(increase(watcher_taskrun_sign_created_total[5m]))
            legendFormat: sign created
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 2
        title: PipelineRun Signatures Created (last 5m)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 12, y: 0 }
        targets:
          - datasource: prometheus
            expr: round(increase(watcher_pipelinerun_sign_created_total[5m]))
            legendFormat: sign created
            refId: A
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 3
        title: Payloads Stored (last 5m, TaskRun vs PipelineRun)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 0, y: 8 }
        targets:
          - datasource: prometheus
            expr: round(increase(watcher_taskrun_payload_stored_total[5m]))
            legendFormat: TaskRun
            refId: A
          - datasource: prometheus
            expr: round(increase(watcher_pipelinerun_payload_stored_total[5m]))
            legendFormat: PipelineRun
            refId: B
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []
      - id: 4
        title: Marked Signed (last 5m, TaskRun vs PipelineRun)
        type: timeseries
        datasource: prometheus
        gridPos: { h: 8, w: 12, x: 12, y: 8 }
        targets:
          - datasource: prometheus
            expr: round(increase(watcher_taskrun_marked_signed_total[5m]))
            legendFormat: TaskRun
            refId: A
          - datasource: prometheus
            expr: round(increase(watcher_pipelinerun_marked_signed_total[5m]))
            legendFormat: PipelineRun
            refId: B
        fieldConfig:
          defaults:
            color: { mode: palette-classic }
            custom: { drawStyle: line, fillOpacity: 0, lineWidth: 1, spanNulls: false }
            thresholds: { mode: absolute, steps: [{ color: green, value: null }] }
          overrides: []
        options:
          legend: { calcs: [latest], displayMode: list, placement: bottom, showLegend: true }
          tooltip: { mode: multi, sort: desc }
        transformations: []

Tekton Chains Dashboard Interpretation (Common Questions)

  • TaskRun Signatures Created (last 5m), PipelineRun Signatures Created (last 5m), Payloads Stored (last 5m), and Marked Signed (last 5m) use increase(...[5m]) and represent increments in the last five minutes.
  • When there is no new signing or storage activity, these lines drop to 0; this does not imply a component fault.
  • Payloads Stored and Marked Signed represent different processing stages, so their values are not expected to always match.