Where should I start when investigating incidents?

Make logs your first stop, not your last. Logs have the richest context—stop treating them as a last resort.

How can I avoid writing complex log parsing patterns?

Let AI organize and parse your logs. You have better things to do than write Grok patterns manually.

How do I find important signals in noisy logs?

Surface signals automatically. Don't make humans hunt for needles—let the needles find you.

What should I alert on to reduce alert fatigue?

Alert on causes, not just symptoms. "CPU is high" is a symptom. "Connection pool exhausted" is a cause.

What metrics should I track for incident response?

Measure time to why, not just time to acknowledge. MTTA is vanity. Time to root cause is sanity.

It's 3am.
Your phone is buzzing.
Again.

Another alert. Another context switch. Another 30 minutes gathering data before you even know if it's real.

By the time you find the answer, the incident is either resolved or escalated. You're exhausted. Your team is exhausted.

You've accepted this as normal.

There's a better way.

You've been alerted 47 times this week. 4 were actionable.

Alert fatigue is a system problem.

New Tab

0min

about:blank

Incident Alert: checkout-service

Latency spike detected. Time to investigate.

The answer is in your logs. Somewhere.

logs — streaming (5,000+ lines/sec)

ts=2024-12-20 03:00:00.601Z level=warn svc=payment-gateway msg="Certificate expiring in 27 days"

{"@timestamp":"2024-12-20 03:00:08.135Z","level":"INFO","service":"cart-service","message":"User user-914 logged in successfully"}

2024-12-20 03:00:14.621Z INFO [payment-gateway] Inventory updated product=PROD-8305 quantity=38

2024-12-20 03:00:22.199Z INFO [notification-service] Session created for user=user-806

2024-12-20 03:00:29.327Z DEBUG [checkout-service] Processing request id=9ookz0ec

ts=2024-12-20 03:00:36.675Z level=info svc=api-gateway msg="Deployment v1.7.68 rolled out successfully"

2024-12-20 03:00:43.406Z d notification-service Cache lookup for key=cache:ka1bpr9v

{"@timestamp":"2024-12-20 03:00:50.416Z","level":"DEBUG","service":"payment-gateway","message":"Loading configuration from environment"}

2024-12-20 03:00:58.284Z d order-service Validating request parameters

2024-12-20 03:01:05.138Z [ERROR] notification-service: Out of memory: killed process 4061

2024-12-20 03:01:12.616Z w payment-gateway High memory usage: 78% of limit

2024-12-20 03:01:19.548Z d auth-service Validating request parameters

{"@timestamp":"2024-12-20 03:01:26.605Z","level":"INFO","service":"kube-scheduler","message":"Created container sidecar"}

ts=2024-12-20 03:01:34.549Z level=info svc=auth-service msg="Order ORD-36885 created total=211.40"

2024-12-20 03:01:41.553Z INFO [notification-service] Order ORD-35901 created total=483.54

2024-12-20 03:01:48.099Z [WARN] checkout-service: Rate limit approaching threshold=84

2024-12-20 03:01:55.877Z WARN [checkout-service] Rate limit approaching threshold=81

ts=2024-12-20 03:02:02.911Z level=debug svc=cart-service msg="Cache lookup for key=cache:g2hq40a8"

2024-12-20 03:02:09.989Z INFO [shipping-service] Inventory updated product=PROD-3984 quantity=43

ts=2024-12-20 03:02:17.280Z level=debug svc=shipping-service msg="Processing request id=k35216qp"

2024-12-20 03:02:24.881Z WARN [cart-service] Retry attempt 3/3 for cache

{"@timestamp":"2024-12-20 03:02:32.155Z","level":"DEBUG","service":"user-service","message":"Executing query: SELECT * FROM products WHERE id=2185"}

2024-12-20 03:02:39.297Z d shipping-service Processing request id=e5nec6xw

{"@timestamp":"2024-12-20 03:02:45.676Z","level":"ERROR","service":"checkout-service","message":"Database connection failed host=db-primary error=\"invalid response\""}

{"@timestamp":"2024-12-20 03:02:53.218Z","level":"INFO","service":"checkout-service","message":"Payment processed amount=101.97 currency=USD"}

2024-12-20 03:03:00.398Z [ERROR] auth-service: Out of memory: killed process 5396

{"@timestamp":"2024-12-20 03:03:07.691Z","level":"INFO","service":"notification-service","message":"Order ORD-19325 created total=72.33"}

2024-12-20 03:03:15.381Z d inventory-service Loading configuration from environment

ts=2024-12-20 03:03:22.590Z level=info svc=api-gateway msg="Health check passed"

2024-12-20 03:03:28.840Z i user-service Health check passed

2024-12-20 03:03:36.924Z WARN [auth-service] Queue depth exceeding threshold current=3332

ts=2024-12-20 03:03:44.091Z level=error svc=inventory-service msg="Failed to process request error="connection refused""

{"@timestamp":"2024-12-20 03:03:50.720Z","level":"DEBUG","service":"order-service","message":"Cache lookup for key=cache:52fs3yvk"}

2024-12-20 03:03:57.744Z i inventory-service User user-266 logged in successfully

2024-12-20 03:04:05.754Z i cart-service Health check passed

2024-12-20 03:04:12.520Z d order-service Serializing response payload size=16540

ts=2024-12-20 03:04:19.459Z level=warn svc=checkout-service msg="Connection pool running low available=3"

2024-12-20 03:04:26.570Z [DEBUG] auth-service: Serializing response payload size=16494

ts=2024-12-20 03:04:33.609Z level=info svc=notification-service msg="Deployment v1.5.71 rolled out successfully"

2024-12-20 03:04:41.403Z DEBUG [auth-service] Executing query: SELECT * FROM sessions WHERE id=4975

2024-12-20 03:04:48.133Z [DEBUG] api-gateway: Validating request parameters

2024-12-20 03:04:55.490Z INFO [shipping-service] Session created for user=user-123

2024-12-20 03:05:02.887Z [INFO] kube-controller-manager: Successfully pulled image gcr.io/project/payment-gateway:latest

{"@timestamp":"2024-12-20 03:05:09.657Z","level":"DEBUG","service":"shipping-service","message":"Rate limiter check: bucket=auth remaining=5"}

2024-12-20 03:05:16.907Z w inventory-service Connection pool running low available=5

2024-12-20 03:05:24.871Z d shipping-service Processing request id=f0ya0oum

2024-12-20 03:05:31.291Z INFO [notification-service] Deployment v2.2.18 rolled out successfully

2024-12-20 03:05:38.999Z [WARN] api-gateway: Retry attempt 3/3 for cache

2024-12-20 03:05:46.136Z w checkout-service Slow query detected duration=2736ms query=SELECT * FROM orders WHERE...

ts=2024-12-20 03:05:53.333Z level=info svc=inventory-service msg="Inventory updated product=PROD-6078 quantity=86"

ts=2024-12-20 03:06:00.479Z level=error svc=inventory-service msg="Database connection failed host=redis-master error="connection refused""

2024-12-20 03:06:08.064Z DEBUG [order-service] Validating request parameters

ts=2024-12-20 03:06:14.905Z level=info svc=ingress-nginx msg="Successfully pulled image gcr.io/project/checkout-service:latest"

2024-12-20 03:06:22.351Z [DEBUG] payment-gateway: Validating request parameters

2024-12-20 03:06:29.725Z [DEBUG] user-service: Processing request id=m8zwbzch

2024-12-20 03:06:36.636Z DEBUG [inventory-service] Connection pool stats: active=6 idle=3

2024-12-20 03:06:43.323Z INFO [auth-service] Deployment v1.0.74 rolled out successfully

2024-12-20 03:06:50.957Z INFO [cart-service] Inventory updated product=PROD-8152 quantity=46

{"@timestamp":"2024-12-20 03:06:57.978Z","level":"WARN","service":"inventory-service","message":"Certificate expiring in 21 days"}

ts=2024-12-20 03:07:05.242Z level=debug svc=shipping-service msg="Processing request id=zhx7w10h"

{"@timestamp":"2024-12-20 03:07:12.849Z","level":"DEBUG","service":"order-service","message":"Connection pool stats: active=10 idle=8"}

2024-12-20 03:07:19.317Z DEBUG [api-gateway] Serializing response payload size=48495

2024-12-20 03:07:27.210Z d inventory-service Processing request id=5faqz1zy

2024-12-20 03:07:33.721Z DEBUG [user-service] Validating request parameters

ts=2024-12-20 03:07:41.264Z level=warn svc=user-service msg="Rate limit approaching threshold=93"

2024-12-20 03:07:48.063Z e notification-service Circuit breaker opened for auth-service

2024-12-20 03:07:56.005Z [ERROR] cart-service: Circuit breaker opened for inventory-service

ts=2024-12-20 03:08:03.050Z level=debug svc=checkout-service msg="Serializing response payload size=14688"

2024-12-20 03:08:09.682Z [WARN] order-service: Rate limit approaching threshold=83

2024-12-20 03:08:17.180Z d user-service Loading configuration from environment

ts=2024-12-20 03:08:24.976Z level=info svc=cart-service msg="Inventory updated product=PROD-7759 quantity=88"

2024-12-20 03:08:31.823Z [ERROR] cart-service: Circuit breaker opened for auth-service

{"@timestamp":"2024-12-20 03:08:39.015Z","level":"ERROR","service":"inventory-service","message":"Invalid request payload: timeout"}

2024-12-20 03:08:46.260Z DEBUG [api-gateway] Rate limiter check: bucket=checkout remaining=46

2024-12-20 03:08:53.470Z WARN [payment-gateway] Rate limit approaching threshold=92

2024-12-20 03:09:00.322Z [DEBUG] user-service: Executing query: SELECT * FROM sessions WHERE id=8330

2024-12-20 03:09:07.969Z d checkout-service Loading configuration from environment

2024-12-20 03:09:15.226Z i inventory-service User user-857 logged in successfully

2024-12-20 03:09:22.081Z [INFO] checkout-service: Inventory updated product=PROD-7834 quantity=32

ts=2024-12-20 03:09:29.115Z level=debug svc=auth-service msg="Executing query: SELECT * FROM products WHERE id=8701"

{"@timestamp":"2024-12-20 03:09:36.237Z","level":"INFO","service":"payment-gateway","message":"Inventory updated product=PROD-3810 quantity=58"}

ts=2024-12-20 03:09:44.025Z level=info svc=auth-service msg="Scheduled job report completed"

2024-12-20 03:09:50.552Z d payment-gateway Cache lookup for key=cache:idobt1j2

ts=2024-12-20 03:09:58.430Z level=info svc=user-service msg="Scheduled job report completed"

2024-12-20 03:10:05.262Z [DEBUG] shipping-service: Cache lookup for key=cache:w09cxfh5

2024-12-20 03:10:12.413Z [DEBUG] notification-service: Executing query: SELECT * FROM users WHERE id=4124

2024-12-20 03:10:19.739Z INFO [cart-service] Request completed status=200 duration=928ms

ts=2024-12-20 03:10:26.756Z level=info svc=cart-service msg="Request completed status=200 duration=1996ms"

{"@timestamp":"2024-12-20 03:10:33.840Z","level":"INFO","service":"shipping-service","message":"User user-534 logged in successfully"}

{"@timestamp":"2024-12-20 03:10:41.081Z","level":"INFO","service":"api-gateway","message":"Deployment v2.4.7 rolled out successfully"}

2024-12-20 03:10:48.357Z DEBUG [checkout-service] Processing request id=0jk3dvvy

{"@timestamp":"2024-12-20 03:10:55.654Z","level":"DEBUG","service":"payment-gateway","message":"Executing query: SELECT * FROM users WHERE id=2789"}

ts=2024-12-20 03:11:03.133Z level=info svc=auth-service msg="Deployment v2.0.9 rolled out successfully"

2024-12-20 03:11:10.130Z INFO [order-service] Inventory updated product=PROD-2185 quantity=87

ts=2024-12-20 03:11:17.788Z level=info svc=api-gateway msg="Scheduled job backup completed"

2024-12-20 03:11:24.306Z [ERROR] notification-service: Circuit breaker opened for auth-service

{"@timestamp":"2024-12-20 03:11:31.294Z","level":"DEBUG","service":"api-gateway","message":"Executing query: SELECT * FROM sessions WHERE id=6259"}

ts=2024-12-20 03:11:38.699Z level=info svc=shipping-service msg="Request completed status=200 duration=1322ms"

2024-12-20 03:11:45.845Z [DEBUG] auth-service: Loading configuration from environment

2024-12-20 03:11:52.868Z DEBUG [shipping-service] Validating request parameters