It's 3am.
Your phone is buzzing.
Again.

Another alert. Another context switch. Another 30 minutes gathering data before you even know if it's real.

By the time you find the answer — if you find it — the incident is either resolved or escalated. You're exhausted. Your team is exhausted.

This isn't sustainable. But you've accepted it as normal.

It doesn't have to be.

You've been alerted 47 times this week. 4 were actionable.

Alert fatigue isn't a you problem.
It's a system problem.

You're not bad at your job. Your tools are.

Here's what's actually happening:

Metrics tell you something is wrong

CPU spike. Latency increase. Error rate up. But not why.

Traces tell you where it's slow

This service called that service called that database. But not why it's slow today when it wasn't yesterday.

Logs have the answer

But they're a haystack. Unstructured. Scattered across 15 services. A nightmare to search.

So you end up here:
0min

Incident Alert: checkout-service

Latency spike detected. Time to investigate.

The answer is in your logs.
But you've been trained to ignore them.

Here's the uncomfortable truth: logs are the richest source of context you have.

Logs capture what actually happened. Not aggregated metrics. Not sampled traces. The actual events, in sequence, with context.

So why do you treat them as a last resort?

Because they're unstructured.

Most of your logs are just a message blob. No fields. No schema. Just text.

Because they're noisy.

Thousands of lines per second. DEBUG statements mixed with actual errors. Signal buried in noise.

Because they're expensive.

So you drop them, sample them, archive them to cold storage. The data you need most is the data you can't afford to keep.

Because searching them is misery.

grep. Regex. Kibana queries that time out. You've been burned enough times that you don't even try anymore.

The result? Logs become a last resort. You check metrics, check traces, check dashboards, check Slack — and only when you're truly desperate do you dive into the logs.

By then, you've already wasted 30 minutes.

logs — streaming (5,000+ lines/sec)
2024-12-20 03:00:00.171Z DEBUG [notification-service] Validating request parameters
2024-12-20 03:00:07.576Z INFO [user-service] Inventory updated product=PROD-7161 quantity=56
2024-12-20 03:00:14.814Z DEBUG [inventory-service] Rate limiter check: bucket=api remaining=71
2024-12-20 03:00:22.200Z e checkout-service Payment declined order=ORD-62706 reason="card declined"
{"@timestamp":"2024-12-20 03:00:29.044Z","level":"DEBUG","service":"auth-service","message":"Rate limiter check: bucket=checkout remaining=62"}
{"@timestamp":"2024-12-20 03:00:36.900Z","level":"INFO","service":"checkout-service","message":"Deployment v1.8.63 rolled out successfully"}
2024-12-20 03:00:44.132Z [INFO] user-service: Order ORD-37442 created total=429.89
2024-12-20 03:00:51.323Z INFO [user-service] Session created for user=user-314
2024-12-20 03:00:57.724Z INFO [kube-controller-manager] Pulling image gcr.io/project/shipping-service:latest
2024-12-20 03:01:04.893Z INFO [notification-service] Health check passed
ts=2024-12-20 03:01:12.444Z level=debug svc=inventory-service msg="Rate limiter check: bucket=checkout remaining=73"
{"@timestamp":"2024-12-20 03:01:19.377Z","level":"ERROR","service":"shipping-service","message":"Timeout waiting for response from auth-service"}
ts=2024-12-20 03:01:26.959Z level=info svc=notification-service msg="Request completed status=200 duration=2790ms"
2024-12-20 03:01:34.317Z [WARN] order-service: Queue depth exceeding threshold current=4993
{"@timestamp":"2024-12-20 03:01:41.136Z","level":"DEBUG","service":"api-gateway","message":"Cache lookup for key=cache:xo28nsbx"}
2024-12-20 03:01:48.956Z WARN [checkout-service] High memory usage: 82% of limit
2024-12-20 03:01:55.611Z w order-service Connection pool running low available=2
2024-12-20 03:02:03.166Z [DEBUG] shipping-service: Executing query: SELECT * FROM sessions WHERE id=8341
ts=2024-12-20 03:02:09.931Z level=warn svc=cart-service msg="Connection pool running low available=4"
2024-12-20 03:02:17.498Z [WARN] payment-gateway: High memory usage: 80% of limit
2024-12-20 03:02:24.323Z d api-gateway Validating request parameters
2024-12-20 03:02:31.764Z INFO [order-service] Scheduled job cleanup completed
2024-12-20 03:02:39.356Z DEBUG [payment-gateway] Loading configuration from environment
ts=2024-12-20 03:02:46.185Z level=warn svc=shipping-service msg="Slow query detected duration=2852ms query=SELECT * FROM orders WHERE..."
ts=2024-12-20 03:02:53.772Z level=info svc=order-service msg="Inventory updated product=PROD-5293 quantity=92"
{"@timestamp":"2024-12-20 03:03:00.557Z","level":"INFO","service":"cart-service","message":"Cache hit rate=91%"}
{"@timestamp":"2024-12-20 03:03:07.682Z","level":"INFO","service":"shipping-service","message":"Payment processed amount=489.67 currency=USD"}
2024-12-20 03:03:14.808Z [INFO] order-service: Payment processed amount=445.54 currency=USD
2024-12-20 03:03:22.103Z d checkout-service Validating request parameters
2024-12-20 03:03:29.467Z [INFO] etcd: OOMKilled: container init exceeded memory limit
{"@timestamp":"2024-12-20 03:03:36.375Z","level":"DEBUG","service":"shipping-service","message":"Validating request parameters"}
2024-12-20 03:03:44.029Z d shipping-service Serializing response payload size=34025
ts=2024-12-20 03:03:51.159Z level=info svc=inventory-service msg="Health check passed"
2024-12-20 03:03:58.451Z INFO [auth-service] Cache hit rate=76%
2024-12-20 03:04:05.173Z INFO [order-service] Cache hit rate=82%
2024-12-20 03:04:12.876Z DEBUG [checkout-service] Serializing response payload size=35718
{"@timestamp":"2024-12-20 03:04:19.488Z","level":"INFO","service":"cart-service","message":"Scheduled job report completed"}
2024-12-20 03:04:26.822Z [INFO] coredns: Volume config mounted to api-gateway-e13et
2024-12-20 03:04:34.119Z i user-service Session created for user=user-830
ts=2024-12-20 03:04:41.063Z level=debug svc=user-service msg="Validating request parameters"
ts=2024-12-20 03:04:48.154Z level=info svc=payment-gateway msg="Payment processed amount=299.39 currency=USD"
2024-12-20 03:04:55.483Z [INFO] api-gateway: Payment processed amount=413.85 currency=USD
{"@timestamp":"2024-12-20 03:05:02.844Z","level":"DEBUG","service":"shipping-service","message":"Rate limiter check: bucket=api remaining=50"}
{"@timestamp":"2024-12-20 03:05:10.114Z","level":"INFO","service":"payment-gateway","message":"Health check passed"}
ts=2024-12-20 03:05:17.250Z level=debug svc=user-service msg="Serializing response payload size=12715"
ts=2024-12-20 03:05:24.064Z level=debug svc=notification-service msg="Executing query: SELECT * FROM sessions WHERE id=1651"
2024-12-20 03:05:31.746Z [INFO] notification-service: Payment processed amount=440.35 currency=USD
2024-12-20 03:05:39.395Z DEBUG [order-service] Loading configuration from environment
ts=2024-12-20 03:05:46.106Z level=info svc=inventory-service msg="Inventory updated product=PROD-8766 quantity=3"
2024-12-20 03:05:52.970Z [INFO] cart-service: Scheduled job cleanup completed
{"@timestamp":"2024-12-20 03:06:00.091Z","level":"WARN","service":"inventory-service","message":"Retry attempt 3/3 for cache"}
2024-12-20 03:06:08.073Z DEBUG [cart-service] Processing request id=khr2zxwi
{"@timestamp":"2024-12-20 03:06:14.792Z","level":"WARN","service":"user-service","message":"Rate limit approaching threshold=89"}
2024-12-20 03:06:22.140Z DEBUG [notification-service] Loading configuration from environment
ts=2024-12-20 03:06:29.667Z level=info svc=kube-scheduler msg="Pod api-gateway-3fz9o scheduled on node node-2"
2024-12-20 03:06:36.167Z WARN [notification-service] High memory usage: 80% of limit
{"@timestamp":"2024-12-20 03:06:43.694Z","level":"DEBUG","service":"notification-service","message":"Serializing response payload size=6245"}
ts=2024-12-20 03:06:50.730Z level=info svc=notification-service msg="Order ORD-21951 created total=21.82"
ts=2024-12-20 03:06:58.147Z level=info svc=shipping-service msg="Payment processed amount=285.96 currency=USD"
ts=2024-12-20 03:07:05.239Z level=info svc=kube-controller-manager msg="Container sidecar started in pod checkout-service-r2ty5"
{"@timestamp":"2024-12-20 03:07:12.354Z","level":"INFO","service":"kube-controller-manager","message":"Container app started in pod api-gateway-37m7c"}
2024-12-20 03:07:19.362Z INFO [user-service] Cache hit rate=75%
2024-12-20 03:07:26.660Z w api-gateway Queue depth exceeding threshold current=4625
{"@timestamp":"2024-12-20 03:07:34.248Z","level":"DEBUG","service":"user-service","message":"Connection pool stats: active=16 idle=9"}
2024-12-20 03:07:40.946Z i auth-service Payment processed amount=438.08 currency=USD
2024-12-20 03:07:48.834Z d notification-service Rate limiter check: bucket=api remaining=100
2024-12-20 03:07:55.725Z DEBUG [checkout-service] Cache lookup for key=cache:33n4q629
{"@timestamp":"2024-12-20 03:08:02.785Z","level":"ERROR","service":"auth-service","message":"Invalid request payload: connection refused"}
{"@timestamp":"2024-12-20 03:08:09.691Z","level":"INFO","service":"auth-service","message":"Deployment v1.8.18 rolled out successfully"}
2024-12-20 03:08:17.460Z d payment-gateway Cache lookup for key=cache:8ge3m0se
{"@timestamp":"2024-12-20 03:08:24.452Z","level":"ERROR","service":"notification-service","message":"Failed to process request error=\"invalid response\""}
ts=2024-12-20 03:08:31.448Z level=debug svc=checkout-service msg="Validating request parameters"
{"@timestamp":"2024-12-20 03:08:38.434Z","level":"DEBUG","service":"payment-gateway","message":"Loading configuration from environment"}
2024-12-20 03:08:46.593Z i etcd Created container init
{"@timestamp":"2024-12-20 03:08:53.644Z","level":"INFO","service":"user-service","message":"Payment processed amount=161.97 currency=USD"}
2024-12-20 03:09:00.474Z i checkout-service Session created for user=user-993
2024-12-20 03:09:07.865Z [INFO] etcd: Pulling image gcr.io/project/notification-service:latest
2024-12-20 03:09:14.861Z WARN [cart-service] Deprecated API endpoint accessed path=/api/v1/orders
2024-12-20 03:09:22.186Z w inventory-service Queue depth exceeding threshold current=2516
{"@timestamp":"2024-12-20 03:09:29.106Z","level":"INFO","service":"auth-service","message":"Inventory updated product=PROD-7361 quantity=37"}
2024-12-20 03:09:36.064Z [DEBUG] cart-service: Processing request id=ohyclx4x
{"@timestamp":"2024-12-20 03:09:44.029Z","level":"DEBUG","service":"inventory-service","message":"Validating request parameters"}
2024-12-20 03:09:50.927Z d shipping-service Validating request parameters
2024-12-20 03:09:57.629Z i coredns OOMKilled: container app exceeded memory limit
{"@timestamp":"2024-12-20 03:10:04.954Z","level":"INFO","service":"cart-service","message":"Payment processed amount=220.74 currency=USD"}
{"@timestamp":"2024-12-20 03:10:12.106Z","level":"ERROR","service":"api-gateway","message":"Out of memory: killed process 2727"}
2024-12-20 03:10:19.213Z i inventory-service Request completed status=200 duration=2436ms
2024-12-20 03:10:26.778Z WARN [shipping-service] Retry attempt 2/3 for cache
2024-12-20 03:10:33.823Z [DEBUG] api-gateway: Rate limiter check: bucket=checkout remaining=35
2024-12-20 03:10:41.649Z INFO [order-service] Inventory updated product=PROD-7055 quantity=91
ts=2024-12-20 03:10:48.073Z level=info svc=cart-service msg="User user-319 logged in successfully"
2024-12-20 03:10:55.631Z INFO [notification-service] Request completed status=200 duration=584ms
2024-12-20 03:11:02.459Z [INFO] etcd: Pulling image gcr.io/project/api-gateway:latest
2024-12-20 03:11:10.570Z [WARN] payment-gateway: Rate limit approaching threshold=80
2024-12-20 03:11:17.114Z ERROR [notification-service] Circuit breaker opened for inventory-service
{"@timestamp":"2024-12-20 03:11:24.822Z","level":"DEBUG","service":"payment-gateway","message":"Connection pool stats: active=18 idle=8"}
{"@timestamp":"2024-12-20 03:11:31.648Z","level":"INFO","service":"coredns","message":"OOMKilled: container init exceeded memory limit"}
2024-12-20 03:11:38.930Z w shipping-service Connection pool running low available=3
{"@timestamp":"2024-12-20 03:11:45.667Z","level":"INFO","service":"cart-service","message":"Scheduled job report completed"}
2024-12-20 03:11:52.967Z [DEBUG] checkout-service: Executing query: SELECT * FROM users WHERE id=7716
2024-12-20 03:00:00.171Z DEBUG [notification-service] Validating request parameters
2024-12-20 03:00:07.576Z INFO [user-service] Inventory updated product=PROD-7161 quantity=56
2024-12-20 03:00:14.814Z DEBUG [inventory-service] Rate limiter check: bucket=api remaining=71
2024-12-20 03:00:22.200Z e checkout-service Payment declined order=ORD-62706 reason="card declined"
{"@timestamp":"2024-12-20 03:00:29.044Z","level":"DEBUG","service":"auth-service","message":"Rate limiter check: bucket=checkout remaining=62"}
{"@timestamp":"2024-12-20 03:00:36.900Z","level":"INFO","service":"checkout-service","message":"Deployment v1.8.63 rolled out successfully"}
2024-12-20 03:00:44.132Z [INFO] user-service: Order ORD-37442 created total=429.89
2024-12-20 03:00:51.323Z INFO [user-service] Session created for user=user-314
2024-12-20 03:00:57.724Z INFO [kube-controller-manager] Pulling image gcr.io/project/shipping-service:latest
2024-12-20 03:01:04.893Z INFO [notification-service] Health check passed
ts=2024-12-20 03:01:12.444Z level=debug svc=inventory-service msg="Rate limiter check: bucket=checkout remaining=73"
{"@timestamp":"2024-12-20 03:01:19.377Z","level":"ERROR","service":"shipping-service","message":"Timeout waiting for response from auth-service"}
ts=2024-12-20 03:01:26.959Z level=info svc=notification-service msg="Request completed status=200 duration=2790ms"
2024-12-20 03:01:34.317Z [WARN] order-service: Queue depth exceeding threshold current=4993
{"@timestamp":"2024-12-20 03:01:41.136Z","level":"DEBUG","service":"api-gateway","message":"Cache lookup for key=cache:xo28nsbx"}
2024-12-20 03:01:48.956Z WARN [checkout-service] High memory usage: 82% of limit
2024-12-20 03:01:55.611Z w order-service Connection pool running low available=2
2024-12-20 03:02:03.166Z [DEBUG] shipping-service: Executing query: SELECT * FROM sessions WHERE id=8341
ts=2024-12-20 03:02:09.931Z level=warn svc=cart-service msg="Connection pool running low available=4"
2024-12-20 03:02:17.498Z [WARN] payment-gateway: High memory usage: 80% of limit
2024-12-20 03:02:24.323Z d api-gateway Validating request parameters
2024-12-20 03:02:31.764Z INFO [order-service] Scheduled job cleanup completed
2024-12-20 03:02:39.356Z DEBUG [payment-gateway] Loading configuration from environment
ts=2024-12-20 03:02:46.185Z level=warn svc=shipping-service msg="Slow query detected duration=2852ms query=SELECT * FROM orders WHERE..."
ts=2024-12-20 03:02:53.772Z level=info svc=order-service msg="Inventory updated product=PROD-5293 quantity=92"
{"@timestamp":"2024-12-20 03:03:00.557Z","level":"INFO","service":"cart-service","message":"Cache hit rate=91%"}
{"@timestamp":"2024-12-20 03:03:07.682Z","level":"INFO","service":"shipping-service","message":"Payment processed amount=489.67 currency=USD"}
2024-12-20 03:03:14.808Z [INFO] order-service: Payment processed amount=445.54 currency=USD
2024-12-20 03:03:22.103Z d checkout-service Validating request parameters
2024-12-20 03:03:29.467Z [INFO] etcd: OOMKilled: container init exceeded memory limit
{"@timestamp":"2024-12-20 03:03:36.375Z","level":"DEBUG","service":"shipping-service","message":"Validating request parameters"}
2024-12-20 03:03:44.029Z d shipping-service Serializing response payload size=34025
ts=2024-12-20 03:03:51.159Z level=info svc=inventory-service msg="Health check passed"
2024-12-20 03:03:58.451Z INFO [auth-service] Cache hit rate=76%
2024-12-20 03:04:05.173Z INFO [order-service] Cache hit rate=82%
2024-12-20 03:04:12.876Z DEBUG [checkout-service] Serializing response payload size=35718
{"@timestamp":"2024-12-20 03:04:19.488Z","level":"INFO","service":"cart-service","message":"Scheduled job report completed"}
2024-12-20 03:04:26.822Z [INFO] coredns: Volume config mounted to api-gateway-e13et
2024-12-20 03:04:34.119Z i user-service Session created for user=user-830
ts=2024-12-20 03:04:41.063Z level=debug svc=user-service msg="Validating request parameters"
ts=2024-12-20 03:04:48.154Z level=info svc=payment-gateway msg="Payment processed amount=299.39 currency=USD"
2024-12-20 03:04:55.483Z [INFO] api-gateway: Payment processed amount=413.85 currency=USD
{"@timestamp":"2024-12-20 03:05:02.844Z","level":"DEBUG","service":"shipping-service","message":"Rate limiter check: bucket=api remaining=50"}
{"@timestamp":"2024-12-20 03:05:10.114Z","level":"INFO","service":"payment-gateway","message":"Health check passed"}
ts=2024-12-20 03:05:17.250Z level=debug svc=user-service msg="Serializing response payload size=12715"
ts=2024-12-20 03:05:24.064Z level=debug svc=notification-service msg="Executing query: SELECT * FROM sessions WHERE id=1651"
2024-12-20 03:05:31.746Z [INFO] notification-service: Payment processed amount=440.35 currency=USD
2024-12-20 03:05:39.395Z DEBUG [order-service] Loading configuration from environment
ts=2024-12-20 03:05:46.106Z level=info svc=inventory-service msg="Inventory updated product=PROD-8766 quantity=3"
2024-12-20 03:05:52.970Z [INFO] cart-service: Scheduled job cleanup completed
{"@timestamp":"2024-12-20 03:06:00.091Z","level":"WARN","service":"inventory-service","message":"Retry attempt 3/3 for cache"}
2024-12-20 03:06:08.073Z DEBUG [cart-service] Processing request id=khr2zxwi
{"@timestamp":"2024-12-20 03:06:14.792Z","level":"WARN","service":"user-service","message":"Rate limit approaching threshold=89"}
2024-12-20 03:06:22.140Z DEBUG [notification-service] Loading configuration from environment
ts=2024-12-20 03:06:29.667Z level=info svc=kube-scheduler msg="Pod api-gateway-3fz9o scheduled on node node-2"
2024-12-20 03:06:36.167Z WARN [notification-service] High memory usage: 80% of limit
{"@timestamp":"2024-12-20 03:06:43.694Z","level":"DEBUG","service":"notification-service","message":"Serializing response payload size=6245"}
ts=2024-12-20 03:06:50.730Z level=info svc=notification-service msg="Order ORD-21951 created total=21.82"
ts=2024-12-20 03:06:58.147Z level=info svc=shipping-service msg="Payment processed amount=285.96 currency=USD"
ts=2024-12-20 03:07:05.239Z level=info svc=kube-controller-manager msg="Container sidecar started in pod checkout-service-r2ty5"
{"@timestamp":"2024-12-20 03:07:12.354Z","level":"INFO","service":"kube-controller-manager","message":"Container app started in pod api-gateway-37m7c"}
2024-12-20 03:07:19.362Z INFO [user-service] Cache hit rate=75%
2024-12-20 03:07:26.660Z w api-gateway Queue depth exceeding threshold current=4625
{"@timestamp":"2024-12-20 03:07:34.248Z","level":"DEBUG","service":"user-service","message":"Connection pool stats: active=16 idle=9"}
2024-12-20 03:07:40.946Z i auth-service Payment processed amount=438.08 currency=USD
2024-12-20 03:07:48.834Z d notification-service Rate limiter check: bucket=api remaining=100
2024-12-20 03:07:55.725Z DEBUG [checkout-service] Cache lookup for key=cache:33n4q629
{"@timestamp":"2024-12-20 03:08:02.785Z","level":"ERROR","service":"auth-service","message":"Invalid request payload: connection refused"}
{"@timestamp":"2024-12-20 03:08:09.691Z","level":"INFO","service":"auth-service","message":"Deployment v1.8.18 rolled out successfully"}
2024-12-20 03:08:17.460Z d payment-gateway Cache lookup for key=cache:8ge3m0se
{"@timestamp":"2024-12-20 03:08:24.452Z","level":"ERROR","service":"notification-service","message":"Failed to process request error=\"invalid response\""}
ts=2024-12-20 03:08:31.448Z level=debug svc=checkout-service msg="Validating request parameters"
{"@timestamp":"2024-12-20 03:08:38.434Z","level":"DEBUG","service":"payment-gateway","message":"Loading configuration from environment"}
2024-12-20 03:08:46.593Z i etcd Created container init
{"@timestamp":"2024-12-20 03:08:53.644Z","level":"INFO","service":"user-service","message":"Payment processed amount=161.97 currency=USD"}
2024-12-20 03:09:00.474Z i checkout-service Session created for user=user-993
2024-12-20 03:09:07.865Z [INFO] etcd: Pulling image gcr.io/project/notification-service:latest
2024-12-20 03:09:14.861Z WARN [cart-service] Deprecated API endpoint accessed path=/api/v1/orders
2024-12-20 03:09:22.186Z w inventory-service Queue depth exceeding threshold current=2516
{"@timestamp":"2024-12-20 03:09:29.106Z","level":"INFO","service":"auth-service","message":"Inventory updated product=PROD-7361 quantity=37"}
2024-12-20 03:09:36.064Z [DEBUG] cart-service: Processing request id=ohyclx4x
{"@timestamp":"2024-12-20 03:09:44.029Z","level":"DEBUG","service":"inventory-service","message":"Validating request parameters"}
2024-12-20 03:09:50.927Z d shipping-service Validating request parameters
2024-12-20 03:09:57.629Z i coredns OOMKilled: container app exceeded memory limit
{"@timestamp":"2024-12-20 03:10:04.954Z","level":"INFO","service":"cart-service","message":"Payment processed amount=220.74 currency=USD"}
{"@timestamp":"2024-12-20 03:10:12.106Z","level":"ERROR","service":"api-gateway","message":"Out of memory: killed process 2727"}
2024-12-20 03:10:19.213Z i inventory-service Request completed status=200 duration=2436ms
2024-12-20 03:10:26.778Z WARN [shipping-service] Retry attempt 2/3 for cache
2024-12-20 03:10:33.823Z [DEBUG] api-gateway: Rate limiter check: bucket=checkout remaining=35
2024-12-20 03:10:41.649Z INFO [order-service] Inventory updated product=PROD-7055 quantity=91
ts=2024-12-20 03:10:48.073Z level=info svc=cart-service msg="User user-319 logged in successfully"
2024-12-20 03:10:55.631Z INFO [notification-service] Request completed status=200 duration=584ms
2024-12-20 03:11:02.459Z [INFO] etcd: Pulling image gcr.io/project/api-gateway:latest
2024-12-20 03:11:10.570Z [WARN] payment-gateway: Rate limit approaching threshold=80
2024-12-20 03:11:17.114Z ERROR [notification-service] Circuit breaker opened for inventory-service
{"@timestamp":"2024-12-20 03:11:24.822Z","level":"DEBUG","service":"payment-gateway","message":"Connection pool stats: active=18 idle=8"}
{"@timestamp":"2024-12-20 03:11:31.648Z","level":"INFO","service":"coredns","message":"OOMKilled: container init exceeded memory limit"}
2024-12-20 03:11:38.930Z w shipping-service Connection pool running low available=3
{"@timestamp":"2024-12-20 03:11:45.667Z","level":"INFO","service":"cart-service","message":"Scheduled job report completed"}
2024-12-20 03:11:52.967Z [DEBUG] checkout-service: Executing query: SELECT * FROM users WHERE id=7716

This is what you see. Every. Single. Time.

There's a map buried in your logs. You just can't see it.

What if you didn't have to search? What if an AI could read your logs — all of them — and just... tell you what's happening?

Your logs aren't just noise. They're a record of every interaction, every call, every failure in your system. Hidden in that chaos is:

  • Which services talk to which
  • What the normal patterns look like
  • Where things are breaking down right now

You can't see it. It's buried in unstructured text across dozens of sources.
But an LLM can.

logs — chaos mode
ts=2024-12-20 03:00:00.061Z level=warn svc=notification-service msg="High memory usage: 79% of limit"
2024-12-20 03:00:07.688Z i inventory-service Payment processed amount=428.38 currency=USD
2024-12-20 03:00:14.457Z [ERROR] checkout-service: Timeout waiting for response from payment-gateway
2024-12-20 03:00:21.886Z i coredns Liveness probe failed for shipping-service-9vok3
2024-12-20 03:00:29.516Z INFO [order-service] Health check passed
2024-12-20 03:00:36.222Z DEBUG [checkout-service] Validating request parameters
2024-12-20 03:00:43.289Z DEBUG [shipping-service] Processing request id=zgtts64u
2024-12-20 03:00:50.959Z i etcd Volume data mounted to inventory-service-lfvae
ts=2024-12-20 03:00:57.783Z level=info svc=coredns msg="Pod inventory-service-hg2oj scheduled on node node-1"
2024-12-20 03:01:05.087Z i kube-scheduler Pulling image gcr.io/project/order-service:latest
2024-12-20 03:01:12.635Z WARN [payment-gateway] Slow query detected duration=656ms query=SELECT * FROM orders WHERE...
{"@timestamp":"2024-12-20 03:01:20.191Z","level":"INFO","service":"ingress-nginx","message":"OOMKilled: container init exceeded memory limit"}
2024-12-20 03:01:26.436Z d inventory-service Rate limiter check: bucket=checkout remaining=5
{"@timestamp":"2024-12-20 03:01:34.170Z","level":"DEBUG","service":"api-gateway","message":"Cache lookup for key=cache:1tgpahzo"}
2024-12-20 03:01:41.058Z INFO [user-service] Session created for user=user-401
2024-12-20 03:01:48.138Z WARN [inventory-service] Deprecated API endpoint accessed path=/api/v1/users
{"@timestamp":"2024-12-20 03:01:55.638Z","level":"ERROR","service":"checkout-service","message":"Authentication failed for user=user-144"}
{"@timestamp":"2024-12-20 03:02:02.608Z","level":"WARN","service":"api-gateway","message":"Retry attempt 3/3 for database"}
2024-12-20 03:02:10.546Z [INFO] inventory-service: Request completed status=200 duration=2986ms
ts=2024-12-20 03:02:17.728Z level=debug svc=cart-service msg="Processing request id=1ccgzmfj"
ts=2024-12-20 03:02:24.932Z level=debug svc=api-gateway msg="Validating request parameters"
ts=2024-12-20 03:02:31.617Z level=debug svc=shipping-service msg="Connection pool stats: active=10 idle=0"
2024-12-20 03:02:38.931Z WARN [inventory-service] Queue depth exceeding threshold current=1638
2024-12-20 03:02:46.489Z [INFO] payment-gateway: Payment processed amount=284.92 currency=USD
{"@timestamp":"2024-12-20 03:02:53.669Z","level":"INFO","service":"coredns","message":"OOMKilled: container init exceeded memory limit"}
2024-12-20 03:03:00.593Z WARN [order-service] Deprecated API endpoint accessed path=/api/v1/products
ts=2024-12-20 03:03:08.120Z level=error svc=cart-service msg="Out of memory: killed process 1424"
2024-12-20 03:03:14.971Z INFO [user-service] Cache hit rate=85%
{"@timestamp":"2024-12-20 03:03:21.813Z","level":"INFO","service":"cart-service","message":"User user-740 logged in successfully"}
{"@timestamp":"2024-12-20 03:03:29.216Z","level":"DEBUG","service":"auth-service","message":"Loading configuration from environment"}
2024-12-20 03:03:36.618Z [INFO] etcd: OOMKilled: container app exceeded memory limit
2024-12-20 03:03:43.828Z [DEBUG] payment-gateway: Serializing response payload size=1149
ts=2024-12-20 03:03:51.181Z level=error svc=payment-gateway msg="Invalid request payload: invalid response"
{"@timestamp":"2024-12-20 03:03:58.000Z","level":"DEBUG","service":"checkout-service","message":"Rate limiter check: bucket=checkout remaining=48"}
2024-12-20 03:04:05.390Z [ERROR] inventory-service: Timeout waiting for response from payment-gateway
{"@timestamp":"2024-12-20 03:04:12.657Z","level":"DEBUG","service":"shipping-service","message":"Cache lookup for key=cache:n99d1w6p"}
2024-12-20 03:04:19.500Z [INFO] checkout-service: Session created for user=user-934
2024-12-20 03:04:27.161Z d notification-service Processing request id=r4gpeqc6
{"@timestamp":"2024-12-20 03:04:34.588Z","level":"DEBUG","service":"auth-service","message":"Processing request id=71zfc8rl"}
2024-12-20 03:04:41.772Z i kube-scheduler Pulling image gcr.io/project/inventory-service:latest
2024-12-20 03:04:48.473Z [INFO] user-service: Cache hit rate=92%
ts=2024-12-20 03:04:55.953Z level=info svc=coredns msg="Readiness probe failed for payment-gateway-jvctk"
{"@timestamp":"2024-12-20 03:05:02.441Z","level":"DEBUG","service":"order-service","message":"Loading configuration from environment"}
ts=2024-12-20 03:05:10.481Z level=info svc=user-service msg="Scheduled job sync completed"
{"@timestamp":"2024-12-20 03:05:17.028Z","level":"WARN","service":"inventory-service","message":"Rate limit approaching threshold=91"}
{"@timestamp":"2024-12-20 03:05:24.165Z","level":"DEBUG","service":"shipping-service","message":"Loading configuration from environment"}
{"@timestamp":"2024-12-20 03:05:32.009Z","level":"ERROR","service":"payment-gateway","message":"Out of memory: killed process 9951"}
ts=2024-12-20 03:05:39.014Z level=info svc=shipping-service msg="User user-845 logged in successfully"
ts=2024-12-20 03:05:46.053Z level=error svc=inventory-service msg="Invalid request payload: connection refused"
{"@timestamp":"2024-12-20 03:05:53.376Z","level":"ERROR","service":"notification-service","message":"Failed to process request error=\"timeout\""}
ts=2024-12-20 03:00:00.061Z level=warn svc=notification-service msg="High memory usage: 79% of limit"
2024-12-20 03:00:07.688Z i inventory-service Payment processed amount=428.38 currency=USD
2024-12-20 03:00:14.457Z [ERROR] checkout-service: Timeout waiting for response from payment-gateway
2024-12-20 03:00:21.886Z i coredns Liveness probe failed for shipping-service-9vok3
2024-12-20 03:00:29.516Z INFO [order-service] Health check passed
2024-12-20 03:00:36.222Z DEBUG [checkout-service] Validating request parameters
2024-12-20 03:00:43.289Z DEBUG [shipping-service] Processing request id=zgtts64u
2024-12-20 03:00:50.959Z i etcd Volume data mounted to inventory-service-lfvae
ts=2024-12-20 03:00:57.783Z level=info svc=coredns msg="Pod inventory-service-hg2oj scheduled on node node-1"
2024-12-20 03:01:05.087Z i kube-scheduler Pulling image gcr.io/project/order-service:latest
2024-12-20 03:01:12.635Z WARN [payment-gateway] Slow query detected duration=656ms query=SELECT * FROM orders WHERE...
{"@timestamp":"2024-12-20 03:01:20.191Z","level":"INFO","service":"ingress-nginx","message":"OOMKilled: container init exceeded memory limit"}
2024-12-20 03:01:26.436Z d inventory-service Rate limiter check: bucket=checkout remaining=5
{"@timestamp":"2024-12-20 03:01:34.170Z","level":"DEBUG","service":"api-gateway","message":"Cache lookup for key=cache:1tgpahzo"}
2024-12-20 03:01:41.058Z INFO [user-service] Session created for user=user-401
2024-12-20 03:01:48.138Z WARN [inventory-service] Deprecated API endpoint accessed path=/api/v1/users
{"@timestamp":"2024-12-20 03:01:55.638Z","level":"ERROR","service":"checkout-service","message":"Authentication failed for user=user-144"}
{"@timestamp":"2024-12-20 03:02:02.608Z","level":"WARN","service":"api-gateway","message":"Retry attempt 3/3 for database"}
2024-12-20 03:02:10.546Z [INFO] inventory-service: Request completed status=200 duration=2986ms
ts=2024-12-20 03:02:17.728Z level=debug svc=cart-service msg="Processing request id=1ccgzmfj"
ts=2024-12-20 03:02:24.932Z level=debug svc=api-gateway msg="Validating request parameters"
ts=2024-12-20 03:02:31.617Z level=debug svc=shipping-service msg="Connection pool stats: active=10 idle=0"
2024-12-20 03:02:38.931Z WARN [inventory-service] Queue depth exceeding threshold current=1638
2024-12-20 03:02:46.489Z [INFO] payment-gateway: Payment processed amount=284.92 currency=USD
{"@timestamp":"2024-12-20 03:02:53.669Z","level":"INFO","service":"coredns","message":"OOMKilled: container init exceeded memory limit"}
2024-12-20 03:03:00.593Z WARN [order-service] Deprecated API endpoint accessed path=/api/v1/products
ts=2024-12-20 03:03:08.120Z level=error svc=cart-service msg="Out of memory: killed process 1424"
2024-12-20 03:03:14.971Z INFO [user-service] Cache hit rate=85%
{"@timestamp":"2024-12-20 03:03:21.813Z","level":"INFO","service":"cart-service","message":"User user-740 logged in successfully"}
{"@timestamp":"2024-12-20 03:03:29.216Z","level":"DEBUG","service":"auth-service","message":"Loading configuration from environment"}
2024-12-20 03:03:36.618Z [INFO] etcd: OOMKilled: container app exceeded memory limit
2024-12-20 03:03:43.828Z [DEBUG] payment-gateway: Serializing response payload size=1149
ts=2024-12-20 03:03:51.181Z level=error svc=payment-gateway msg="Invalid request payload: invalid response"
{"@timestamp":"2024-12-20 03:03:58.000Z","level":"DEBUG","service":"checkout-service","message":"Rate limiter check: bucket=checkout remaining=48"}
2024-12-20 03:04:05.390Z [ERROR] inventory-service: Timeout waiting for response from payment-gateway
{"@timestamp":"2024-12-20 03:04:12.657Z","level":"DEBUG","service":"shipping-service","message":"Cache lookup for key=cache:n99d1w6p"}
2024-12-20 03:04:19.500Z [INFO] checkout-service: Session created for user=user-934
2024-12-20 03:04:27.161Z d notification-service Processing request id=r4gpeqc6
{"@timestamp":"2024-12-20 03:04:34.588Z","level":"DEBUG","service":"auth-service","message":"Processing request id=71zfc8rl"}
2024-12-20 03:04:41.772Z i kube-scheduler Pulling image gcr.io/project/inventory-service:latest
2024-12-20 03:04:48.473Z [INFO] user-service: Cache hit rate=92%
ts=2024-12-20 03:04:55.953Z level=info svc=coredns msg="Readiness probe failed for payment-gateway-jvctk"
{"@timestamp":"2024-12-20 03:05:02.441Z","level":"DEBUG","service":"order-service","message":"Loading configuration from environment"}
ts=2024-12-20 03:05:10.481Z level=info svc=user-service msg="Scheduled job sync completed"
{"@timestamp":"2024-12-20 03:05:17.028Z","level":"WARN","service":"inventory-service","message":"Rate limit approaching threshold=91"}
{"@timestamp":"2024-12-20 03:05:24.165Z","level":"DEBUG","service":"shipping-service","message":"Loading configuration from environment"}
{"@timestamp":"2024-12-20 03:05:32.009Z","level":"ERROR","service":"payment-gateway","message":"Out of memory: killed process 9951"}
ts=2024-12-20 03:05:39.014Z level=info svc=shipping-service msg="User user-845 logged in successfully"
ts=2024-12-20 03:05:46.053Z level=error svc=inventory-service msg="Invalid request payload: connection refused"
{"@timestamp":"2024-12-20 03:05:53.376Z","level":"ERROR","service":"notification-service","message":"Failed to process request error=\"timeout\""}

Logs that sort themselves.

The first problem: your logs are chaos. Different formats, different services, different schemas, all dumped into the same bucket.

What if they organized themselves?

AI can identify patterns in your logs that you can't see. Using "log format fingerprinting," it recognizes the structure of each log line and groups similar logs together.

No manual tagging. No brittle pipelines. No regex.

94% accuracy. Automatic. Instant.

logs — chaos
ts=2024-12-20 03:00:00.453Z level=debug svc=cart-service msg="Connection pool stats: active=15 idle=0"
2024-12-20 03:00:07.963Z w notification-service Queue depth exceeding threshold current=2768
2024-12-20 03:00:14.512Z [DEBUG] api-gateway: Rate limiter check: bucket=auth remaining=74
{"@timestamp":"2024-12-20 03:00:21.915Z","level":"DEBUG","service":"auth-service","message":"Serializing response payload size=45868"}
2024-12-20 03:00:29.522Z i etcd Pod checkout-service-gqxhw scheduled on node node-5
2024-12-20 03:00:36.038Z INFO [auth-service] Session created for user=user-992
2024-12-20 03:00:43.356Z [INFO] notification-service: Order ORD-54418 created total=422.87
2024-12-20 03:00:50.787Z ERROR [user-service] Out of memory: killed process 6604
ts=2024-12-20 03:00:58.438Z level=debug svc=checkout-service msg="Connection pool stats: active=19 idle=6"
2024-12-20 03:01:05.450Z i user-service Inventory updated product=PROD-6004 quantity=28
2024-12-20 03:01:12.216Z i payment-gateway Session created for user=user-848
{"@timestamp":"2024-12-20 03:01:19.397Z","level":"DEBUG","service":"auth-service","message":"Validating request parameters"}
ts=2024-12-20 03:01:27.234Z level=info svc=payment-gateway msg="Request completed status=200 duration=1465ms"
{"@timestamp":"2024-12-20 03:01:34.115Z","level":"INFO","service":"etcd","message":"OOMKilled: container sidecar exceeded memory limit"}
ts=2024-12-20 03:01:41.415Z level=warn svc=cart-service msg="High memory usage: 87% of limit"
2024-12-20 03:01:48.902Z [WARN] cart-service: Certificate expiring in 28 days
2024-12-20 03:01:55.533Z e payment-gateway Service unavailable endpoint=auth-service
{"@timestamp":"2024-12-20 03:02:02.453Z","level":"DEBUG","service":"payment-gateway","message":"Processing request id=odlj4qde"}
ts=2024-12-20 03:02:09.667Z level=error svc=shipping-service msg="Invalid request payload: timeout"
2024-12-20 03:02:17.546Z i checkout-service Scheduled job backup completed
ts=2024-12-20 03:02:24.347Z level=debug svc=user-service msg="Executing query: SELECT * FROM orders WHERE id=7475"
2024-12-20 03:02:31.804Z DEBUG [inventory-service] Processing request id=yggcxjj6
{"@timestamp":"2024-12-20 03:02:38.897Z","level":"INFO","service":"etcd","message":"Liveness probe failed for checkout-service-3hguh"}
2024-12-20 03:02:45.788Z d auth-service Rate limiter check: bucket=auth remaining=82
ts=2024-12-20 03:02:53.157Z level=info svc=shipping-service msg="Cache hit rate=87%"
2024-12-20 03:03:00.616Z INFO [user-service] Scheduled job sync completed
2024-12-20 03:03:07.287Z d checkout-service Connection pool stats: active=18 idle=4
2024-12-20 03:03:15.311Z [DEBUG] auth-service: Serializing response payload size=40685
{"@timestamp":"2024-12-20 03:03:22.590Z","level":"ERROR","service":"cart-service","message":"Out of memory: killed process 1999"}
2024-12-20 03:03:29.667Z [WARN] auth-service: Slow query detected duration=2289ms query=SELECT * FROM orders WHERE...
{"@timestamp":"2024-12-20 03:03:36.149Z","level":"WARN","service":"inventory-service","message":"Certificate expiring in 12 days"}
2024-12-20 03:03:43.531Z d shipping-service Loading configuration from environment
2024-12-20 03:03:50.504Z i kube-scheduler OOMKilled: container sidecar exceeded memory limit
ts=2024-12-20 03:03:57.639Z level=info svc=user-service msg="Inventory updated product=PROD-8685 quantity=51"
ts=2024-12-20 03:04:05.742Z level=debug svc=inventory-service msg="Connection pool stats: active=20 idle=6"
{"@timestamp":"2024-12-20 03:04:12.837Z","level":"INFO","service":"notification-service","message":"Scheduled job report completed"}
2024-12-20 03:04:20.033Z [INFO] user-service: Inventory updated product=PROD-2737 quantity=22
{"@timestamp":"2024-12-20 03:04:27.033Z","level":"INFO","service":"payment-gateway","message":"Inventory updated product=PROD-9556 quantity=32"}
{"@timestamp":"2024-12-20 03:04:33.812Z","level":"INFO","service":"shipping-service","message":"Deployment v1.3.72 rolled out successfully"}
2024-12-20 03:04:40.809Z [ERROR] notification-service: Authentication failed for user=user-754
2024-12-20 03:04:48.347Z [ERROR] api-gateway: Authentication failed for user=user-841
2024-12-20 03:04:55.753Z DEBUG [order-service] Loading configuration from environment
{"@timestamp":"2024-12-20 03:05:02.851Z","level":"DEBUG","service":"user-service","message":"Validating request parameters"}
{"@timestamp":"2024-12-20 03:05:09.642Z","level":"DEBUG","service":"api-gateway","message":"Loading configuration from environment"}
{"@timestamp":"2024-12-20 03:05:17.398Z","level":"DEBUG","service":"shipping-service","message":"Serializing response payload size=18301"}
2024-12-20 03:05:24.473Z INFO [auth-service] Payment processed amount=457.21 currency=USD
2024-12-20 03:05:31.822Z [WARN] user-service: Connection pool running low available=5
{"@timestamp":"2024-12-20 03:05:38.808Z","level":"INFO","service":"user-service","message":"User user-191 logged in successfully"}
2024-12-20 03:05:46.306Z DEBUG [checkout-service] Loading configuration from environment
2024-12-20 03:05:53.634Z d api-gateway Rate limiter check: bucket=auth remaining=17
2024-12-20 03:06:00.318Z [INFO] kube-controller-manager: OOMKilled: container sidecar exceeded memory limit
2024-12-20 03:06:07.469Z [ERROR] api-gateway: Database connection failed host=db-replica error="invalid response"
ts=2024-12-20 03:06:14.936Z level=info svc=auth-service msg="Cache hit rate=79%"
ts=2024-12-20 03:06:21.639Z level=info svc=notification-service msg="Health check passed"
{"@timestamp":"2024-12-20 03:06:29.570Z","level":"ERROR","service":"auth-service","message":"Database connection failed host=db-replica error=\"connection refused\""}
2024-12-20 03:06:36.665Z [INFO] user-service: Deployment v3.4.24 rolled out successfully
2024-12-20 03:06:43.504Z d notification-service Loading configuration from environment
{"@timestamp":"2024-12-20 03:06:50.850Z","level":"INFO","service":"auth-service","message":"Scheduled job backup completed"}
2024-12-20 03:06:57.833Z i inventory-service User user-444 logged in successfully
2024-12-20 03:07:05.276Z i order-service Payment processed amount=324.33 currency=USD

Stop fighting with Grok.

The second problem: even when you find the right logs, the data you need is buried in a message blob.

timestamp? Buried. log.level? Buried. user.id? Buried.

Traditionally, you'd write Grok patterns. Regex. Ingest pipelines. Pray you don't break production.

What if AI wrote them for you?

Raw log — unparsed
2024-12-20T03:14:24.312Z ERROR checkout-service Database connection pool exhausted active_connections=20 waiting_requests=147
Current fields:
message: [entire string]
@timestamp: [ingest time]

No regex. No guessing. No 2-hour debugging sessions.

Stop hunting. Start knowing.

The third problem: even organized, parsed logs are still a lot of logs.

You're not looking for a log. You're looking for the important log. The error. The anomaly. The thing that's actually broken.

What if your logs told you what matters?

Significant Events uses AI to automatically surface the signals you care about:

  • Errors and exceptions
  • Anomalous patterns
  • Critical warnings
  • Things that changed

You don't hunt for the needle. The needle finds you.

These were surfaced automatically. No query. No search. No hunting.

Before you're paged. Before the user complains. Before the 3am wake-up.

Same incident. Two realities.

Let's go back to that 3am alert. Same incident. Two ways to handle it.

Reality A: Today
0:00
Alert fires
0:30
Check PagerDuty
1:00
Open Grafana
3:00
Check metrics dashboard
5:00
Open Datadog
8:00
Search traces
12:00
Give up, open Splunk
15:00
Search logs: "error"
18:00
Search logs: "checkout"
22:00
Search logs: "payment"
28:00
Find something maybe relevant
34:00
Confirm root cause
34 minutes
to root cause
Reality B: With AI-powered logs
0:00
Alert fires
0:15
Open logs
0:20
See Significant Event: "payment-gateway timeout"
0:45
Click → see context, related logs, timeline
1:30
Root cause confirmed
90 seconds
to root cause
34:00
before
1:30
after
=
23x
faster

How to escape alert fatigue.

Five principles to transform your incident response.

1

Make logs your first stop, not your last.

Logs have the richest context. Stop treating them as a last resort.

2

Let AI organize and parse.

You have better things to do than write Grok patterns.

3

Surface signals automatically.

Don't make humans hunt for needles. Let the needles find you.

4

Alert on causes, not just symptoms.

"CPU is high" is a symptom. "Connection pool exhausted" is a cause.

5

Measure time to why, not just time to acknowledge.

MTTA is vanity. Time to root cause is sanity.

Click on any principle to copy it. Share with your team.

This is what Elastic Streams does.

Everything you just saw — the auto-organization, the AI-generated parsing, the significant events — this is Elastic Streams.

It's available now. It works on your existing logs. It doesn't require you to change how you instrument.

If you're tired of alert fatigue, try it.

D

Built by David

Director of Product Marketing, Observability @ Elastic

Who thinks about this stuff way too much.

alertfatigue.fail — An interactive exploration of modern observability.

Built with Next.js, Tailwind CSS, and too much caffeine.