Rack Team 在Supermicro是一個充滿特色與學習的團隊
● 彈性自主:我們尊重個人節奏,您可以自主安排工作,必要時也能與團隊緊密合作。
● 國際視野:坐落於台灣,您將有機會與全球頂尖企業合作,並前往國外大廠或矽谷總部進行技術交流。
● 豐富經驗:無論是職場新鮮人或有工作經驗的專業人士,都能在這個靈活且充滿潛力的環境中獲得成長。
團隊使命
Rack Team 的核心工作在於設計、驗證與優化客戶解決方案,服務對象涵蓋半導體業龍頭、電動車巨擘以及國際電信公司。我們專注於:
● 協助開發各國大企業 AI 資料中心
● 深入學習並參與 Nvidia、AMD、Intel 等業界領先技術
● 支援合作夥伴利用 Supermicro 的伺服器產品設計高效解決方案
技術範疇
Rack Software Team 涵蓋廣泛的領域,包括:
● 系統工程:伺服器、儲存、網路交換器的測試與配置
● 基準測試:AI、雲端運算、機器學習、大數據、虛擬化等性能評估
● 專業支持:以業界標準工具進行測試,分析結果並提供專業報告
● 我們的同事不僅擁有各類技術深度,更將一起學習,推動創新
目前我們正在招募核心職位,Supermicro Rack Team 將讓您一展長才
Job Summary:
As a member of Rack Solution Team at Supermicro, this candidate will be responsible for testing and validating hardware, network, storage and server components; to create solutions meeting or exceeding the market requirements for both OEM and end-user customers; to discover and resolve hardware and HW/SW issues collaboratively. We hope you will be a fast-learner, team player and with excellent communication skills.
Essential Duties and Responsibilities:
● Conduct performance testing and benchmarking for servers, GPUs, and HPC environments.
● Analyze results to identify bottlenecks and optimize system performance for AI/ML workloads.
● Design and configure high-speed network topologies (InfiniBand, Ethernet) for AI clusters.
● Configure network components to ensure optimal performance
● Write Python scripts to automate testing, monitoring, and system optimization.
● Understanding of AI/ML frameworks (e.g., PyTorch, TensorFlow) and deployment requirements for LLMs.
● Monitor network health and server performance, proactively identifying and resolving issue.
The IT Engineer will be part of Supermicro IT Department, System Team, supporting global IT infrastructure and performing the following duties
1. Linux KVM Cluster Management
Implementation, configuration, and maintenance of Linux KVM clusters integrated with Ceph storage.
2. Monitoring & Alerting
a. Service and infrastructure monitoring using Zabbix and PRTG, including custom template, trigger, and discovery rule configuration.
b. Familiar with notification via instant message (Telegram/Teams)
3. On-Premises Server Operations
Deployment, configuration, and ongoing maintenance of on-premises servers and related infrastructure.
4. Service Ticket Handling
Assist in processing, tracking, and closing IT service requests and incidents.
5. Other Duties
Perform additional assignments as directed by the supervisor
• Familiar with the day-to-day operational support for Cluster, Storage, HPC, AI, Data Center and Cloud infrastructures.
• Builds Cluster, Storage, HPC, AI, Data Center and Cloud infrastructures in-house and onsite testing, deployment, and platforms accordingly to meet customer's requirement.
• Troubleshoot hardware and software issues in rack cabinet. Provide fixes in a timely manner.
• Documents complex test procedures and troubleshooting procedures related to servers/networks/clusters software and hardware.
• Familiar with Intel/AMD/NVIDIA development toolkits like CUDA, oneAPI, ROCm.
• Conduct tests and benchmarks against server hardware, storage, network, applications, HPC and AI/ML/DL workflows.
• Programming experience with web applications, including frontend or backend.
• Collect, visualize, and analyze test and benchmark results.
• Programming experience with Python, Ansible and Linux shell scripting.
• Write technical documentation including test reports and standard operating procedure (SOP).
Driving with us to the Next!
"Integration of various energy sources, improvement in energy efficiency, and creation of a powerful platform that benefits everyone"
【Job Description】
We are in search of SRE engineer who can seamlessly integrate development artifact with cloud resources. The candidate needs to have hands-on experience on public cloud usage and work closely on container world. We are looking for highly self-motivated engineer to join to build operational environments to support from customer service to development. Daily task might include explore to the latest technology to be adopted to resolve business problems.
【Core Responsibilities】
• Work closely with engineer teams to identify and implement optimal cloud-based solutions for the company.
• Build and maintain the agile / responsive container native CI/CD pipelines (Jenkins / ArgoCD), and support multiple development teams to deliver high-quality builds with measurable performance
• Build, maintain, improve, scale and secure cloud infrastructures and resources by using IaC tools (Terraform / Pulumi) with cost consideration
• Build automation tools to improve system's observability, availability and reliability via Python and Serverless solutions (AWS Lambda, Kubernetes Jobs)
• Design, manage and monitor Kubernetes clusters for multiple production workloads
• Participate in an on-call rotation to mitigate disruption for any production systems and conduct root cause analysis reports
• Plan and test disaster recovery scenarios and business continuity plans for a highly available micro-services architecture
• Develop and implement security policies in compliance with ISO 27001/27017 standards, including access control, encryption and logging
• Build central dashboard and alert mechanisms to identify potential resource problems
• Handle production issues with intelligent means
【Essential Qualification】
• Bachelor degree in computer related program
• 3 year experience in AWS cloud management
• 3 year experience in Kubernetes management
• 3 year experience in CI/CD area (Jenkins)
• 3 year experience in network or database (PostgreSQL, Cassandra, Redis)
• 2 year experience in observability mechanism (Prometheus, Grafana, InfluxDB, OpenSearch, ELK)
• 3 year experience in Linux
• Performance tuning & error handling & root cause analysis
• Need to on-call
【Desirable Abilities】
• AWS related certification
• CKA, CKAD, CKS
As a manager of Manufacturing Test Platform, you will lead a team to take ownership of platform and infrastructure management as development in a global manufacturing test environment. This role will oversee the end-to-end design and deployment of cluster platforms, software systems, and manufacturing test infrastructure network.
1. Network Design & Planning
• Design LAN, WAN, cloud, and hybrid network architectures.
• Plan network topology (topology diagrams), subnetting, and routing protocols.
• Configure firewalls, VPN, VLAN, and intrusion detection/prevention systems (IDS/IPS).
• Ensure network design complies with information security policies and regulatory requirements.
2. Kubernetes (K8s) Cluster Platform Design & Development
• Design and implement high-availability (HA) architecture.
• Manage storage systems for the cluster environment.
• Develop automation systems for cluster management and maintenance.
3. Technology Selection & Deployment
• Evaluate and select equipment (e.g., routers, switches, fiber optics, servers, etc.).
• Support deployment and testing of new technologies.
4. Performance Monitoring & Troubleshooting
• Monitor host systems, network performance, and traffic flow.
• Analyze bottlenecks and root causes of failures, and propose optimization solutions.
5. Cross-Department Collaboration
• Coordinate network requirements with IT, information security, systems, and test development teams.
• Prepare technical documentation, proposals, and reports.
Experience:
• Over 10 years of experience in manufacturing test systems and network infrastructure, with demonstrated leadership in driving global test platform development and large-scale implementation projects.
• Proven leadership in cross-functional and cross-departmental team settings.
• Deep technical expertise in robotic automation and process optimization.
Essential Duties and Responsibilities:
• Work with cross-functional teams, including PCB Engineers, BIOS engineers, Validation Labs and Product Managers on assigned projects.
• Define, develop, troubleshoot enterprise level server products.
• Execute benchmark testing and build proof of concepts based on latest data center technologies and Supermicro server products such as the Supermicro SuperServers.
• Work with product managers to draft technical marketing presentations and white papers for online publications and industry event promotional activities.
• Proactively engage in learning and researching new technologies and products to propose new Proof of Concepts.
您是我們正在找的IT超級英雄嗎?
我們正在尋找一位對資訊系統與資訊安全有熱情的IT專員,如果您擅長解決user的軟硬體問題及維護網路資訊安全,歡迎加入我們!
【工作內容】
1. 協助用戶端 PC 的軟硬體安裝、故障排除與日常支援。
2. 負責 Windows 伺服器及相關設備的維運、資料備份與復原、系統監控與故障處理。
3. 執行與推動公司資訊安全政策。
4. 撰寫 MS SQL Stored Procedure與系統自動化程式(具相關經驗者尤佳)。
- Provide IT support for software installation, hardware setup, and daily troubleshooting.
- Manage Windows servers and infrastructure, including backup, recovery, monitoring.
- Implement and enforce corporate information security policies and procedures.
- Develop and maintain MS SQL stored procedures and automate system workflows.(experience preferred).