hero

Portfolio Careers

Discover regional opportunities across our network of transformational companies.
KCRise Fund
companies
Jobs

Datadog Tester / Observability QA Engineer

Sailes

Sailes

Quality Assurance
United States
Posted on Mar 9, 2026
Job Title

Datadog Tester / Observability QA Engineer

Job Summary

We are looking for a Datadog Tester responsible for validating monitoring, logging, tracing, and alerting implementations using Datadog. The role ensures accurate observability, correct alerting, and reliable dashboards across applications, APIs, and infrastructure.

Key Responsibilities

Monitoring & Metrics Testing

  • Validate application, API, and infrastructure metrics collected in Datadog
  • Ensure correct host, container, and service-level metrics
  • Verify metric thresholds, anomalies, and aggregation accuracy

Log Management Testing

  • Validate log ingestion from applications, APIs, servers, and cloud services
  • Ensure log parsing, indexing, tagging, and retention policies
  • Verify log search, filters, and correlation with metrics

APM & Tracing Validation

  • Test distributed tracing for APIs and microservices
  • Validate service maps, latency, error rates, and throughput
  • Ensure trace-to-log and trace-to-metric correlation

Alerting & Incident Validation

  • Test alerts, monitors, and notifications (Email, Slack, PagerDuty, etc.)
  • Validate alert conditions, severity levels, and escalation rules
  • Perform negative testing to avoid false positives/negatives

Dashboard Testing

  • Validate Datadog dashboards for accuracy and usability
  • Ensure correct widgets, queries, and time-series data
  • Verify role-based access and dashboard sharing

Integration & Environment Testing

  • Test Datadog integrations with cloud providers (AWS/Azure/GCP), Kubernetes, Docker, databases, and CI/CD tools
  • Validate environment separation (Dev, QA, Stage, Prod)

Collaboration & Reporting

  • Work with Dev, DevOps, SRE, and Product teams
  • Report observability gaps and monitoring defects
  • Assist in production issue root cause analysis (RCA)