• Incident Management:
Participate in on-call support, incident management, investigation, diagnosis, and resolution to minimize system downtime and ensure reliability.
• Monitoring and Logging:
Implement and maintain monitoring and logging solutions, including Prometheus, Grafana, Datadog, to ensure the health and performance of systems.
• Cloud Expertise:
Leverage your experience in building and supporting solutions on public cloud platforms, such as AWS or Azure.
• Automation:
Drive the automation of infrastructure and application build, delivery, and management processes across Non-Production and Production environments.
• Microservices and Containers:
Work with microservice architectures and container technologies like Kubernetes and Docker to optimize application deployments.
• Operating Systems and Networking:
Demonstrate knowledge of operating systems, specifically Linux, and cloud networking concepts to troubleshoot and optimize system performance.