I provision and configure production-grade Kubernetes clusters on EKS, GKE, or AKS with the full observability stack — Prometheus, Grafana, Loki, and Alertmanager — so you can start deploying workloads with confidence from day one, not after three weeks of setup.
Cluster provisioning uses Terraform. Configuration includes: managed node pools with appropriate instance types; cluster autoscaler; kube-prometheus-stack (Prometheus + Grafana + Alertmanager); Loki + Promtail for log aggregation; Nginx ingress controller with cert-manager for automatic TLS; RBAC namespaced by team; Velero for cluster backup; and network policies for workload isolation.
Alert routing is configured for your destinations (Slack, PagerDuty, etc) with sensible default rules for node pressure, pod crash loops, and high error rates. Grafana dashboards are provisioned as code.
Multi-environment and GitOps setups (ArgoCD or Flux) are available. For advanced engagements I add OPA Gatekeeper admission policies, External Secrets Operator, and Kubecost for cost monitoring.