台北市內湖區5年以上大學以上待遇面議
We are seeking an experienced and passionate Senior Site Reliability Engineer (SRE) to join our team. In this pivotal role, you will be responsible for designing, building, maintaining, and optimizing our growing and highly complex infrastructure, ensuring its stability, scalability, and performance. As a senior member, you will lead technical direction, mentor junior engineers, and collaborate closely with cross-functional teams to continuously improve service quality and system resilience.
1.Lead Infrastructure Design, Implementation, and Strategy: Architect and drive the evolution of scalable, highly available, and cost-effective infrastructure solutions.
2.Spearhead Monitoring and Alerting Systems: Design, implement, and continuously optimize comprehensive monitoring, logging, and alerting systems to ensure proactive issue detection and rapid root cause analysis.
3.Conduct In-depth System Performance Analysis and Optimization: Monitor and analyze various resource metrics and overall system health, identify bottlenecks, and lead the implementation of advanced performance tuning solutions.
4.Enhance System Stability and Incident Response Capabilities: Develop and execute preventive maintenance strategies, lead major incident troubleshooting and emergency response efforts, ensuring service continuity. Design and conduct disaster recovery drills.
5.Drive Automation and Efficiency Improvements: Proactively identify and implement automation solutions for infrastructure operations, deployment, scaling, and other areas to enhance team efficiency.
6.Facilitate Cross-Team Collaboration and Architectural Improvements: Serve as an SRE technical expert, working closely with development, QA, and other teams to provide architectural recommendations from a reliability perspective, driving continuous improvements in service quality.
7.Establish Operational Standards and Processes: Design, establish, and promote standardized processes and best practices for system maintenance, deployment, upgrades, and change management.
8.Mentor and Train: Guide and mentor junior SRE engineers within the team, sharing knowledge and experience to elevate the team's overall technical proficiency.
9.Evaluate and Adopt New Technologies: Assess, test, and introduce new technologies, tools, and methodologies to enhance infrastructure efficiency, reliability, and security.
10.Oversee Capacity Planning and Cost Optimization: Perform capacity planning, forecast resource requirements, and identify opportunities for infrastructure cost optimization.