Amazon OpenSearch Service Metrics and CloudWatch Statistics

·By Elysiate·Updated Jun 4, 2026·
awsopensearchelasticsearchcloudwatchobservabilitysearch
·

Level: intermediate · ~15 min read · Intent: informational

Audience: AWS platform engineers, SRE teams, backend engineers, cloud architects, DevOps engineers

Prerequisites

  • basic familiarity with Amazon OpenSearch Service
  • basic CloudWatch metrics and alarm experience
  • some exposure to search, indexing, or log analytics workloads

Key takeaways

  • For provisioned Amazon OpenSearch Service domains, most operational metrics still live in the AWS/ES CloudWatch namespace, even though the service name is now OpenSearch.
  • The most useful production dashboard starts with cluster health, storage headroom, JVM pressure, CPU, search latency, indexing latency, request errors, node count, and thread pool rejection metrics.
  • Use Maximum for node-risk metrics such as CPUUtilization and JVMMemoryPressure, Minimum for low-headroom metrics such as FreeStorageSpace and Nodes, and Sum for request counts, errors, and rejection counters.
  • Serverless OpenSearch has a separate AWS/AOSS namespace and different metrics, so do not copy provisioned-domain alarms directly into serverless collections.

References

FAQ

Where are Amazon OpenSearch Service metrics in CloudWatch?
For provisioned OpenSearch Service domains, CloudWatch metrics are under the AWS/ES namespace. Serverless collections use AWS/AOSS instead.
Why do AWS Elasticsearch Service statistics searches still matter?
Amazon Elasticsearch Service was renamed to Amazon OpenSearch Service, but many teams still use the old name in dashboards, runbooks, billing reports, and search queries. The underlying operational intent is usually OpenSearch Service CloudWatch metrics.
Which OpenSearch CloudWatch alarms should I create first?
Start with ClusterStatus.red, ClusterStatus.yellow, FreeStorageSpace, ClusterIndexWritesBlocked, Nodes, AutomatedSnapshotFailure, CPUUtilization, JVMMemoryPressure, OldGenJVMMemoryPressure, MasterCPUUtilization, 5xx errors, and thread pool queues or rejections.
Should I use Average or Maximum for OpenSearch CPU and JVM alarms?
Use Maximum when a single hot node can hurt the cluster, especially for CPUUtilization and JVMMemoryPressure. Average is useful for trend dashboards but can hide one overloaded node.
Do OpenSearch Serverless metrics use the same alarms as provisioned domains?
No. OpenSearch Serverless uses the AWS/AOSS namespace and exposes collection-oriented metrics such as SearchRequestLatency, IngestionRequestLatency, SearchOCU, IndexingOCU, and ingestion or search errors.
0

Amazon OpenSearch Service metrics are easy to find and surprisingly easy to misunderstand.

Part of the confusion is historical.

Many teams still say "AWS Elasticsearch Service statistics" or "Amazon Elasticsearch Service stats" even though Amazon Elasticsearch Service was renamed to Amazon OpenSearch Service. AWS still carries some legacy naming in places that matter operationally. For provisioned domains, the CloudWatch namespace is still AWS/ES, and at least one request metric is explicitly documented as OpenSearchRequests, previously ElasticsearchRequests.

So if you are trying to debug an old dashboard, rebuild alarms after a domain upgrade, or understand why search latency spiked after an index rollout, do not start with a giant list of every metric.

Start with the operational questions:

  • Is the cluster healthy?
  • Are writes blocked?
  • Is one node out of disk?
  • Is JVM pressure close to failure?
  • Are searches slow because the workload is heavy, the shards are skewed, or the disks are throttled?
  • Are indexing requests backing up or being rejected?
  • Did a deployment, blue/green change, or node replacement reset cumulative counters?

This guide focuses on the OpenSearch Service metrics that answer those questions, how to choose CloudWatch statistics for each one, and how to build a dashboard that helps during incidents instead of burying you in charts.

For broader platform selection and architecture, read Amazon OpenSearch: A Practical Guide for Fast, Scalable Search. For the product-line comparison, read OpenSearch vs Elasticsearch: When to Choose Each.

Executive Summary

For provisioned Amazon OpenSearch Service domains:

Area Metrics to start with Statistic to prefer What it tells you
Cluster health ClusterStatus.red, ClusterStatus.yellow, ClusterStatus.green Maximum Whether shard allocation is healthy
Storage FreeStorageSpace, ClusterUsedSpace Minimum for free space, Maximum for used space Whether a single node is close to write-blocking
Writes ClusterIndexWritesBlocked, IndexingRate, IndexingLatency, ThreadpoolWriteQueue, ThreadpoolWriteRejected Maximum for blocks and latency, Sum for rejections Whether ingestion is keeping up
Search SearchRate, SearchLatency, ThreadpoolSearchQueue, ThreadpoolSearchRejected, 5xx Maximum for latency and queues, Sum for errors Whether query traffic is saturating the cluster
JVM and CPU JVMMemoryPressure, OldGenJVMMemoryPressure, CPUUtilization Maximum Whether one node is near failure
Masters MasterCPUUtilization, MasterJVMMemoryPressure, MasterReachableFromNode Maximum, and Minimum for reachability Whether cluster control plane stability is at risk
EBS ReadLatency, WriteLatency, DiskQueueDepth, BurstBalance, IopsThrottle, ThroughputThrottle Maximum for throttles and latency, Minimum for credits Whether storage is the bottleneck
Requests OpenSearchRequests, 2xx, 3xx, 4xx, 5xx, InvalidHostHeaderRequests, TLSNegotiationError Sum Whether client or endpoint behavior changed

The most important habit is simple:

Use Maximum for "one bad node can hurt us" metrics, Minimum for low-headroom metrics, and Sum for counters.

That single rule prevents many quiet monitoring mistakes.

AWS Elasticsearch Service Statistics vs OpenSearch Metrics

If someone on your team asks for AWS Elasticsearch Service statistics, they probably mean one of three things:

  1. CloudWatch metrics for a managed OpenSearch Service domain.
  2. Old dashboards created before the service rename.
  3. Historical billing, cost, or alarm language that still uses Elasticsearch wording.

AWS renamed the service to Amazon OpenSearch Service on September 8, 2021. The rename changed service names, API names, instance type naming, Dashboards terminology, and some CloudWatch metric names.

But not everything became visually clean overnight.

The provisioned-domain CloudWatch namespace is still:

AWS/ES

That means this command is still a normal starting point for provisioned domains:

aws cloudwatch list-metrics --namespace "AWS/ES"

Serverless is different. Amazon OpenSearch Serverless reports to:

AWS/AOSS

Do not mix these two namespaces in one runbook without labeling them clearly. A provisioned cluster alarm and a serverless collection alarm may use similar words, but they are watching different operating models.

The Metrics That Should Be On Your First Dashboard

A useful OpenSearch dashboard is not a museum of every metric. It is a triage surface.

When the page is slow, ingestion is delayed, or users are seeing errors, the dashboard should quickly answer:

  • Is this a health issue?
  • Is this a capacity issue?
  • Is this a storage issue?
  • Is this a search workload issue?
  • Is this an indexing workload issue?
  • Is this a client/API issue?

1. Cluster health

Start with:

  • ClusterStatus.green
  • ClusterStatus.yellow
  • ClusterStatus.red
  • Shards.active
  • Shards.unassigned
  • Nodes

Use Maximum for cluster status values because these metrics are binary health signals. A value of 1 means the condition is true.

Red means at least one primary shard is not allocated. That is an incident.

Yellow means primary shards are allocated but at least one replica shard is not. That might be expected on a single-node test domain, but it is not something to ignore in production.

The Nodes metric deserves a separate alarm. If the minimum node count drops below the number you expect, at least one node was unreachable during the evaluation window.

2. Storage and write blocking

Watch:

  • FreeStorageSpace
  • ClusterUsedSpace
  • ClusterIndexWritesBlocked
  • IopsThrottle
  • ThroughputThrottle

FreeStorageSpace is one of the easiest metrics to configure badly.

Use Minimum, not only Average.

Average free storage can look fine while one node is running out of disk. The node with the least free space is the one that can push the cluster toward write failures. AWS documents FreeStorageSpace in MiB in CloudWatch, while the console displays it in GiB, so avoid copying a GiB threshold into a MiB alarm by accident.

ClusterIndexWritesBlocked should be treated as urgent. It means the cluster is blocking incoming write requests. Low free space and high JVM pressure are common contributors, but the alarm should send engineers straight into storage, shard distribution, and heap pressure checks.

A simple rule:

Signal Interpretation
FreeStorageSpace falling Capacity or skew problem
ClusterIndexWritesBlocked = 1 Writes are already blocked
IopsThrottle = 1 I/O limit is being hit
ThroughputThrottle = 1 Disk throughput limit is being hit
DiskQueueDepth rising Storage work is backing up

If storage is consistently part of your incidents, the fix is rarely "add one alarm." Look at shard count, index lifecycle policy, rollover size, retention, hot/warm tiering, and whether the domain is sized around the real workload.

CPU, JVM, and the Single-Hot-Node Problem

For OpenSearch, the average can lie politely.

One hot data node can make the cluster feel unstable while average CPU looks acceptable. That is why Maximum is usually the safer statistic for:

  • CPUUtilization
  • JVMMemoryPressure
  • OldGenJVMMemoryPressure
  • JVMGCOldCollectionCount
  • JVMGCOldCollectionTime

AWS recommends alarming when CPUUtilization or WarmCPUUtilization stays at or above 80% for a sustained window. For JVM pressure, AWS recommended alarms include JVMMemoryPressure at 95% and OldGenJVMMemoryPressure at 80%.

Those are not magic numbers for every workload, but they are good default guardrails.

Use dashboard panels differently from alarms:

  • Dashboard: show Average and Maximum together so you can see cluster-wide load and hot-node risk.
  • Alarm: use Maximum when one node can create a user-visible failure.
  • Investigation: drill into node-level dimensions when the cluster-level maximum spikes.

SysMemoryUtilization is worth displaying, but do not treat it as your main heap-risk indicator. In OpenSearch, high system memory usage can be normal. JVM memory pressure is usually a more relevant stability signal.

Search Metrics: Latency, Rate, Queues, and Errors

Search problems usually have four parts:

  1. The amount of search work arriving.
  2. The latency of that work.
  3. Whether queues are filling.
  4. Whether requests are failing or being rejected.

Start with:

  • SearchRate
  • SearchLatency
  • ThreadpoolSearchQueue
  • ThreadpoolSearchRejected
  • 5xx
  • OpenSearchRequests

SearchRate is not the same as user-facing request rate. AWS documents it as search requests per minute for all shards on a data node. One _search request can touch many shards, so shard layout affects the metric.

That is useful.

If application request volume is flat but SearchRate rises, the problem may be shard fan-out, broader queries, index expansion, or a change in query routing.

Use SearchLatency with Maximum when troubleshooting incidents. Average latency can be fine while one node or shard group is dragging the user experience down.

Use ThreadpoolSearchQueue and ThreadpoolSearchRejected to tell the difference between "queries are slower" and "queries are piling up or being dropped." Rejections deserve attention even when the total number is low, because they indicate the cluster has crossed from latency into failed work.

Indexing Metrics: When Writes Fall Behind

Indexing incidents usually show up as delayed logs, stale search results, bulk ingestion failures, or a sudden write block.

Start with:

  • IndexingRate
  • IndexingLatency
  • ThreadpoolWriteQueue
  • ThreadpoolWriteRejected
  • ClusterIndexWritesBlocked
  • FreeStorageSpace
  • JVMMemoryPressure

IndexingRate counts indexing operations, not just API calls. A single bulk request can represent many operations, and the work can be spread across nodes.

That means you should compare:

  • application bulk request volume,
  • OpenSearch indexing rate,
  • indexing latency,
  • write queue depth,
  • write rejections,
  • and storage or JVM pressure.

If indexing latency rises but write rejections stay low, the cluster may still be absorbing the workload. If queue depth and rejections rise together, ingestion concurrency or cluster capacity is likely beyond a healthy level.

If ClusterIndexWritesBlocked flips to 1, do not treat it as a normal performance event. It is a write-availability event.

Master Node Metrics Are Stability Metrics

Dedicated master nodes are not where your application queries should land, but they are still part of the cluster's ability to stay coherent.

Watch:

  • MasterCPUUtilization
  • MasterJVMMemoryPressure
  • MasterOldGenJVMMemoryPressure
  • MasterReachableFromNode
  • Nodes

AWS recommended alarms keep master CPU thresholds lower than data node CPU thresholds because masters are responsible for cluster stability and configuration changes.

That is the right mental model.

Master node pressure can show up during:

  • index creation storms,
  • shard churn,
  • too many small indices,
  • blue/green deployments,
  • node replacement,
  • mapping explosions,
  • or cluster state growth.

If data-node metrics look fine but the cluster feels unstable during deployments or index lifecycle activity, check the master metrics and shard counts before chasing application code.

EBS Metrics: When Search Is Waiting On Storage

Provisioned domains with EBS storage need storage-level visibility.

Watch:

  • ReadLatency
  • WriteLatency
  • ReadIOPS
  • WriteIOPS
  • ReadThroughput
  • WriteThroughput
  • DiskQueueDepth
  • BurstBalance
  • IopsThrottle
  • ThroughputThrottle
  • VolumeStalledIOcheck

EBS problems often masquerade as search problems.

For example:

  • search latency rises,
  • CPU is not maxed,
  • JVM is not near the danger zone,
  • but disk queue depth rises and throttling appears.

That points away from query syntax and toward storage throughput, IOPS, shard placement, or index lifecycle design.

If BurstBalance matters for your volume type, display it with Minimum. If a node exhausts burst credits, the weakest node matters more than the cluster average.

Request Metrics: Separate User Errors From Service Pain

Track:

  • OpenSearchRequests
  • 2xx
  • 3xx
  • 4xx
  • 5xx
  • InvalidHostHeaderRequests
  • TLSNegotiationError

Use Sum for these.

The useful view is often a ratio:

5xx / OpenSearchRequests

AWS recommends considering alarms when 5xx responses reach a meaningful percentage of OpenSearch requests. The exact threshold depends on the workload, but a sustained rise in 5xx responses usually means users or ingestion clients are experiencing real failures.

4xx is different. A spike in 4xx responses often points to client behavior, permissions, request shape, auth, missing indexes, or rejected invalid requests. Still alert on it when it is abnormal for your application, but route it differently from a cluster-health alarm.

InvalidHostHeaderRequests and TLSNegotiationError are useful security and integration signals. They can catch clients using the wrong endpoint, broken TLS settings, scans against a public domain, or misconfigured proxies.

Which CloudWatch Statistic Should You Use?

CloudWatch lets you view statistics such as Average, Maximum, Minimum, and Sum.

For OpenSearch Service, the statistic is not a cosmetic choice.

Use this decision table:

Metric type Best statistic Why
Binary health flags Maximum You want to know if the bad state happened
Low free headroom Minimum The weakest node matters
CPU and JVM pressure Maximum One hot node can hurt the cluster
Latency Maximum for alarms, Average plus Maximum for dashboards Average hides tail pain
Request counts Sum Counts need total volume
Error counts Sum Errors need total volume
Rejection counters Sum or metric math difference Rejections are cumulative-style signals
Node count Minimum You want to detect missing nodes
Burst credits Minimum One depleted node can bottleneck

Also remember that AWS notes cumulative metrics can reset during node drops, node bounces, node replacements, and blue/green deployments. That makes raw cumulative counters less useful than trends, differences, and "did this increase during the last period?" alarms.

Example: Pull One Metric With AWS CLI

For a provisioned domain, list metrics first:

aws cloudwatch list-metrics \
  --namespace "AWS/ES"

Then pull a specific metric. This example asks for maximum JVM pressure over five-minute periods.

aws cloudwatch get-metric-statistics \
  --namespace "AWS/ES" \
  --metric-name "JVMMemoryPressure" \
  --dimensions Name=DomainName,Value=my-domain Name=ClientId,Value=123456789012 \
  --start-time 2026-06-04T00:00:00Z \
  --end-time 2026-06-04T01:00:00Z \
  --period 300 \
  --statistics Maximum

For per-node investigation, use dimensions that include the node identifier where available. For domain-level dashboards, use the per-domain dimensions so the chart stays readable.

The exact dimensions available depend on the metric family. AWS documents domain, node, shard-role, and serverless collection dimensions separately, so avoid assuming every metric has the same dimension set.

A Practical Alarm Set For Provisioned Domains

Use this as a starting point, then tune it to your workload.

Alarm Suggested starting rule Route to
Red cluster ClusterStatus.red Maximum >= 1 Immediate incident
Yellow cluster ClusterStatus.yellow Maximum >= 1 for several periods Platform review or incident depending on workload
Low free storage FreeStorageSpace Minimum <= 25% of node storage Capacity/on-call
Writes blocked ClusterIndexWritesBlocked Maximum >= 1 Immediate incident
Node missing Nodes Minimum < expected node count Immediate investigation
Snapshot failure AutomatedSnapshotFailure Maximum >= 1 Reliability review
High data CPU CPUUtilization Maximum >= 80% sustained Capacity/performance
High JVM JVMMemoryPressure Maximum >= 95% Immediate investigation
Old-gen pressure OldGenJVMMemoryPressure Maximum >= 80% Heap and shard review
Master CPU MasterCPUUtilization Maximum >= 50% sustained Cluster stability review
Search queue ThreadpoolSearchQueue Maximum above baseline Search workload review
Write queue ThreadpoolWriteQueue Average or Maximum above baseline Ingestion review
Search rejected increase in ThreadpoolSearchRejected User-facing performance incident
Write rejected increase in ThreadpoolWriteRejected Ingestion incident
5xx ratio 5xx / OpenSearchRequests above baseline Application and cluster triage
TLS errors TLSNegotiationError Sum above baseline Client or security review

Do not copy these into production blindly.

A logging cluster, a product-search cluster, and a RAG retrieval cluster have different traffic patterns. The right alarm window for one may be noisy or too slow for another.

Serverless OpenSearch Metrics Are A Different Model

Amazon OpenSearch Serverless has its own CloudWatch namespace:

AWS/AOSS

The metrics are collection-oriented rather than node-oriented. Start with:

  • ActiveCollection
  • SearchRequestLatency
  • SearchRequestErrors
  • SearchRequestRate
  • SearchOCU
  • IngestionRequestLatency
  • IngestionRequestErrors
  • IngestionDocumentErrors
  • IngestionDocumentRate
  • IndexingOCU
  • SearchableDocuments

The operational questions change:

  • Is the collection active?
  • Are search requests getting slower?
  • Are ingestion requests failing?
  • Are document-level ingestion errors rising?
  • Are OCUs scaling with workload as expected?
  • Is cost rising because search or indexing OCUs increased?

Do not try to port JVMMemoryPressure, Nodes, or FreeStorageSpace alarms from provisioned domains to serverless collections. They are not the same operating surface.

How Cluster Insights Fits In

CloudWatch is still the monitoring backbone, but Cluster Insights can make OpenSearch-specific diagnosis faster.

Cluster Insights surfaces cluster health, shard count, node count, index count, document statistics, indexing and search rates, latencies, JVM pressure, CPU utilization, and query-level information in OpenSearch UI.

Use it as an investigation layer:

  • CloudWatch alarm fires.
  • Dashboard tells you the affected metric family.
  • Cluster Insights helps identify the domain, node, index, shard, or query pattern involved.

That is especially useful for problems like:

  • large shards,
  • node or shard skew,
  • high-latency queries,
  • hot shards,
  • resource-intensive query shapes,
  • and best-practice drift.

CloudWatch should still own alerting. Cluster Insights is strongest when the human is already investigating and needs OpenSearch-native context.

A Simple Runbook For OpenSearch Metric Spikes

When an alarm fires, move in this order.

Step 1: Check cluster health

Look at:

  • ClusterStatus.red
  • ClusterStatus.yellow
  • Nodes
  • Shards.unassigned

If the cluster is red or missing nodes, handle that before tuning queries.

Step 2: Check write availability

Look at:

  • ClusterIndexWritesBlocked
  • FreeStorageSpace
  • JVMMemoryPressure
  • ThreadpoolWriteRejected

If writes are blocked, treat it as an availability incident. Free storage, shard distribution, and heap pressure are common first checks.

Step 3: Split search from indexing

If users report slow search, check:

  • SearchLatency
  • SearchRate
  • ThreadpoolSearchQueue
  • ThreadpoolSearchRejected
  • 5xx

If data is stale or ingestion is delayed, check:

  • IndexingLatency
  • IndexingRate
  • ThreadpoolWriteQueue
  • ThreadpoolWriteRejected

Do not assume search and indexing are separate. Heavy ingestion can affect search performance, and broad searches can compete for the same shared resources.

Step 4: Look for hot-node behavior

Compare Average and Maximum for:

  • CPUUtilization
  • JVMMemoryPressure
  • SearchLatency
  • IndexingLatency
  • EBS latency metrics

If maximum is much worse than average, investigate node-level dimensions, shard allocation, and uneven index traffic.

Step 5: Check the storage path

Look at:

  • ReadLatency
  • WriteLatency
  • DiskQueueDepth
  • BurstBalance
  • IopsThrottle
  • ThroughputThrottle

If storage is saturated, scaling CPU may not help. You may need different volume settings, larger nodes, better shard sizing, less fan-out, or lifecycle changes.

Common Monitoring Mistakes

Mistake 1: Using Average everywhere

Average hides exactly the kind of single-node problems OpenSearch clusters often have.

Use Maximum for CPU, JVM, queue depth, and latency alarms.

Mistake 2: Alerting on every metric AWS exposes

More alarms do not create better operations.

Start with health, writes, storage, JVM, CPU, latency, queue, rejection, master, and 5xx signals. Add specialized metrics only when your workload needs them.

Mistake 3: Ignoring the rename boundary

Legacy Elasticsearch naming still appears in old dashboards, old runbooks, and old team vocabulary.

Document the mapping once:

  • Amazon Elasticsearch Service is now Amazon OpenSearch Service.
  • Provisioned-domain CloudWatch namespace is AWS/ES.
  • Serverless namespace is AWS/AOSS.
  • Some old metrics were renamed during OpenSearch upgrades.
  • Billing and historical reports may still need old and new service filters.

Mistake 4: Treating serverless like provisioned

Provisioned domains expose nodes, JVM, EBS, shard, and master metrics.

Serverless collections expose collection, ingestion, search, and OCU-oriented metrics.

The dashboards should look different.

Mistake 5: No dashboard for deploy windows

OpenSearch metrics can reset during node replacements and blue/green deployments. If your release process changes mappings, index settings, cluster configuration, or ingestion volume, keep a deploy-window dashboard that shows health, latency, JVM, storage, queues, requests, and errors at the same time.

Final Checklist

For a production provisioned OpenSearch Service domain, build this first:

  • Cluster health panel with green, yellow, red, nodes, and unassigned shards.
  • Storage panel with free space minimum, used space, write blocks, and storage throttles.
  • JVM and CPU panel with average and maximum values.
  • Search panel with rate, latency, queue, rejections, and request errors.
  • Indexing panel with rate, latency, queue, rejections, and write blocks.
  • Master node panel with CPU, JVM, and reachability.
  • EBS panel with latency, IOPS, throughput, queue depth, burst balance, and throttles.
  • Request panel with OpenSearch requests, 4xx, 5xx, invalid host headers, and TLS negotiation errors.
  • Alarm set for red/yellow health, low storage, writes blocked, node loss, snapshot failure, high CPU, high JVM, master pressure, thread pool rejections, and 5xx ratio.
  • Separate serverless dashboard and alarms if you run OpenSearch Serverless.

OpenSearch monitoring is not about memorizing every metric name.

It is about keeping the failure path visible:

health -> storage -> JVM/CPU -> search/indexing pressure -> queues/rejections -> client errors -> shard and node diagnosis.

If your dashboard follows that path, the next incident will be a lot less mysterious.

About the author

Elysiate publishes practical guides and privacy-first tools for data workflows, developer tooling, SEO, and product engineering.

Related posts