Driving with us to the Next!
"Integration of various energy sources, improvement in energy efficiency, and creation of a powerful platform that benefits everyone"
【Job Description】
We are in search of SRE engineer who can seamlessly integrate development artifact with cloud resources. The candidate needs to have hands-on experience on public cloud usage and work closely on container world. We are looking for highly self-motivated engineer to join to build operational environments to support from customer service to development. Daily task might include explore to the latest technology to be adopted to resolve business problems.
【Core Responsibilities】
• Work closely with engineer teams to identify and implement optimal cloud-based solutions for the company.
• Build and maintain the agile / responsive container native CI/CD pipelines (Jenkins / ArgoCD), and support multiple development teams to deliver high-quality builds with measurable performance
• Build, maintain, improve, scale and secure cloud infrastructures and resources by using IaC tools (Terraform / Pulumi) with cost consideration
• Build automation tools to improve system's observability, availability and reliability via Python and Serverless solutions (AWS Lambda, Kubernetes Jobs)
• Design, manage and monitor Kubernetes clusters for multiple production workloads
• Participate in an on-call rotation to mitigate disruption for any production systems and conduct root cause analysis reports
• Plan and test disaster recovery scenarios and business continuity plans for a highly available micro-services architecture
• Develop and implement security policies in compliance with ISO 27001/27017 standards, including access control, encryption and logging
• Build central dashboard and alert mechanisms to identify potential resource problems
• Handle production issues with intelligent means
【Essential Qualification】
• Bachelor degree in computer related program
• 3 year experience in AWS cloud management
• 3 year experience in Kubernetes management
• 3 year experience in CI/CD area (Jenkins)
• 3 year experience in network or database (PostgreSQL, Cassandra, Redis)
• 2 year experience in observability mechanism (Prometheus, Grafana, InfluxDB, OpenSearch, ELK)
• 3 year experience in Linux
• Performance tuning & error handling & root cause analysis
• Need to on-call
【Desirable Abilities】
• AWS related certification
• CKA, CKAD, CKS
1. Conduct and observe modem and Android Telephony automated tests across diverse platforms, both on-site and in house settings.
2. Review the results of automated testing, incl. analyzing logs.
Document and identify any resulting defects, and refine the scripts or configurations to address test failures.
3. Address test script defects and validate resolutions,
incl. APK files.
4. Oversee and handle devices utilized for testing, along with the servers operating those tests.
5. Valid results and future development demand rigorous data analysis, error correction, and thorough documentation.
Technical expertise, data skills, meticulousness, and strong communication are crucial for success.
6. Annual salary: 800K NTD and above
7. Onsite Google Xindian Office
Free shuttle bus 直達 新店辦公室: (上/下班時段)
A. 板橋 - Tpark
B. MRT Sta. - 大坪林站
C. Taoyuan - 桃園火車站
D. Hsinchu - 高鐵新竹站
【Who We Are?】
Hytech是一個年輕、充滿活力的團隊,專注於推動金融科技行業的企業技術轉型,是全球領先的管理技術諮詢公司。創新思維和扁平化的管理,讓團隊成員以公開、透明的方式自在工作,也為全球客戶提供卓越的商業價值服務。
【Why Join The Team?】
Hytech 團隊在共事的過程中核心技術會與時俱進,即時討論,並且有良好的溝通管道,扁平化管理,任何問題或意見都可以討論及合作解決。密切的與跨國同事團隊交流。我們的工程師不用輪班,更沒有長期加班的惡性文化。
About the role:
我們正在尋找一位具備監控平台規劃與維運經驗的DevOps 工程師,能夠協助團隊打造高可用性、可觀測性與自動化的雲端與應用服務環境。您將需要與開發團隊、資安團隊及基礎架構團隊緊密合作,確保系統在快速成長的環境中依然保持穩定性、彈性與安全性;您也將協助規劃與維運監控及告警平台,提升系統可見度與故障預防能力。這是一個能夠結合技術專業與跨團隊協作的關鍵角色,若您對提升系統可靠性與打造高效能、可擴展性平台架構充滿熱情,我們誠摯邀請您加入!
What this job involves:
1. Build and maintain CI/CD pipelines using tools such as Jenkins and Bitbucket.
(建立與維護 CI/CD 流程,例如 Jenkins、Bitbucket 等)
2. Design, deploy, and optimize monitoring systems (e.g., Prometheus, Grafana, CloudWatch, ELK/Loki).
(設計、部署與優化監控系統,如 Prometheus、Grafana、CloudWatch、ELK/Loki)
3. Implement infrastructure and application-level monitoring and alerting rules to detect anomalies in real time.
(實作基礎架構與應用層監控及告警規則,確保異常可即時偵測與通知)
4. Analyze monitoring data to identify bottlenecks and propose optimization solutions.
(分析監控數據,發現流程/問題瓶頸並提出優化建議)
5. Maintain centralized log collection systems to ensure traceability and compliance (e.g., PCI DSS).
(維護日誌收集與集中化系統,確保可追蹤性及符合稽核需求,例如 PCI DSS)
6. Perform automated deployments using Infrastructure-as-Code (IaC) tools such as Terraform and Ansible.
(使用基礎架構即程式(IaC)工具,如 Terraform、Ansible,進行自動化部署)
7. Manage cloud resources and support cost optimization in AWS and Alibaba Cloud.
(參與雲端資源管理與成本優化,涵蓋 AWS 與阿里雲)
8. Collaborate with the team on incident response and root cause analysis (RCA).
(與團隊合作進行事故應變及問題根因分析(RCA))