Cloud Infra Automation: Design and deploy infrastructure on bare metal or cloud using Terraform, Ansible, or Helm. Automate workflows with Python or Go. Platform Reliability: Maintain and scale GPU clusters, Kubernetes, and AI-optimized storage (Ceph, BeeGFS, Weka) to ensure stability and performance. Monitoring & Alerting: Use Prometheus, Grafana, ELK, etc., to monitor system health and trigger alerts on anomalies. Capacity Planning: Analyze usage patterns and forecast infrastructure needs for AI workloads. Incident Management: Lead root cause analysis and manage SLOs/SLIs/SLAs to maintain high availability. CI/CD Integration: Work with DevOps/MLOps teams on CI/CD pipelines using GitLab, ArgoCD, or similar tools. Security & Compliance: Secure Linux systems, manage certificates, and enforce access controls (RBAC, LDAP SSO, TLS, segmentation). Documentation & Playbooks: Maintain architecture diagrams, runbooks, and incident playbooks to support knowledge sharing and onboarding.
待遇面議
(經常性薪資達 4 萬元或以上)
Bachelor’s degree in Computer Science, Engineering, or a related field—or equivalent experience and 3-7 years of experience in the areas below is preferred. Proficiency in Linux (Ubuntu, RHEL/CentOS), containers (Docker, Podman), and orchestration (Kubernetes). Experience managing GPU compute clusters (NVIDIA / CUDA, AMD / ROCm) Hands-on experience with observability tools (Prometheus, Grafana, Loki, ELK, etc.). Strong scripting and coding skills (Bash, Python, or Go). Exposure to secure multi-tenant environments and zero trust architectures. Familiarity with network protocols, DNS, DHCP, BGP, ROCEv2, and InfiniBand or high-throughput Ethernet fabrics. Excellent collaboration and communication skills for cross-team, partner, and customer initiatives
◆ 薪酬類 1.每年發放 2 次績效獎金 2.中秋、端午及生日禮券 ◆ 保險類 1.勞保 2.健保 3.員工免費團保 4.眷屬優惠自費保險 ◆ 休假制度 1.週休二日 2.優於勞基法的特休制度 ◆ 補助類 1.結婚禮金 2.生育禮金 3.住院、喪禮慰問金 4.子女獎學金 5.運動健身補助 ◆ 其他類 1.公司週年慶、年終尾牙 2.定期電影欣賞會、電影票發放 3.定期部門聚餐 4.社團活動補助 5.免費供應午餐(八德) 6.免費機車停車位(八德) 站上國際舞台,開創非凡職涯:至美國加州矽谷、荷蘭出差與培訓的機會,與世界頂尖研發人員切磋交流