We are looking for an enthusiastic and motivated Site Reliability Engineer (SRE) to join our growing team. In this role, you will have the opportunity to learn and contribute to the stability, performance, and scalability of our critical systems. We place a strong emphasis on security in all aspects of our operations. You will work closely with teams to maintain and improve our infrastructure, monitor services, and respond to incidents. This is an excellent opportunity to develop your skills in a dynamic and supportive environment.
<Responsibilities>
1.Assist in Maintaining and Optimizing Infrastructure: Support teams in the day-to-day maintenance and optimization of our infrastructure components.
2.Monitor Services and Address Issues: Monitor system health and service performance, and assist in troubleshooting and resolving issues in a timely manner.
3.Track Resource Usage and System Status: Help monitor various resource indicators and the overall status of the system, contributing to optimization efforts.
4.Support System Stability and Incident Response: Assist in maintaining system stability and participate in incident response procedures under guidance.
5.Contribute to Preventing System Failures: Work with the team to implement measures that help avoid system failures and service interruptions.
6.Collaborate with Other Teams: Work alongside other teams to continuously learn about and contribute to improving system architecture and service quality.
7.Support System Maintenance and Deployment Processes: Assist in the execution of established processes for system maintenance, deployment, and upgrades.
8.Learn and Apply SRE Best Practices: Actively learn and apply SRE principles and best practices in daily tasks.
【職缺描述】
-Ability to use configuration management tools and revision control system (e.g., Git)
-Experience with CI/CD & Automation systems (e.g., Jenkins)
-Experience with AWS Core Services: EC2 / ELB / S3 / CloudFront/ IAM/ VPC, AWS SDK and CLI
-Build & operation container based platform with Nomad / Consul / Kubernetes.
-Experience with monitoring, alerting, and log pipeline analysis tools (Graylog2, ELK, Prometheus, etc.)
【職務需求】
-The successful candidate will be a self-driven Senior DevOps Engineer with proven experience in large-scale microservice systems hosted on AWS
-The candidate will have a deep understanding of cloud architecture, AWS technologies, and cloud security best practices
-The candidate will be following the latest industry trends and be passionate about cloud computing for large-scale systems