1. Design / build / maintain large scale cloud system on AWS
2. CI/CD automation
3. Keep monitoring and optimizing the system reliability
4. Improve the on-call experience
*Location: Hsinchu HQ(first) or Taipei Office(fine)
職責:
我們正在尋找技術專家
1. 在雲服務開通方面:在GCP等主流雲提供商上運營和管理應用系統。
2. 在發布管理中:合規、組裝、交付源代碼到容器鏡像中,並進一步部署到各種格式的基礎設施中。
負責項目:
1. 部署、管理和操作可擴展、高可用性和容錯系統。
2. 將現有的本地應用程序遷移到雲端。
3. 根據計算、數據或安全要求選擇合適的雲服務。
4. 估算雲使用成本並確定運營成本控制機制。
5. 執行測試腳本來構建軟件包,發布工程師確保新產品的配置和編碼正確,以便成功集成和運行。
6. 構建測試環境並解決與軟件性能相關的任何問題。與 RD 合作解決任何問題並記錄修復以供將來參考資料使用。
7. 構建工具以支持軟件工程流程、審查工程實踐、協助研究新技術,並與開發團隊會面討論未來需求。他們還為完成的產品提供持續支持並維護服務器。
8. 處理升級事件並在需要時提供 On Call 的支持。
要求:
1. 3 年以上雲環境配置、運營和管理經驗
2. 掌握 CI/CD 工具和方法。
3. 擁有 Kubernetes、Docker-compose 和容器化方面的經驗
4. 擁有 配置管理和 infrastructure as code 的經驗。
5. 擁有 Site Reliability Engineering 或 DevOps 方面的經驗更理想。
6. 跨部門的溝通能力。
7. 願意在高增長/擴展技術環境中工作。
8. 經驗/接觸較少的候選人將被視為 SysOps 工程師
工具:Tereform , argoCD , jenkins , shell
====
Responsibilities
1. We are looking for technical experts
In cloud service provisioning: operate and manage application systems on mainstream cloud providers such as AWS, GCP, Azure, Aliyun.
2. In release management: comply, assemble, deliver source code into container images and further deploy in infrastructure with various formats.
You are responsible for
1. Deploy, manage, and operate scalable, highly available, and fault tolerant systems.
2. Migrate an existing on-premises application to cloud.
3. Select the appropriate cloud service based on compute, data, or security requirements.
4. Estimating cloud usage costs and identifying operational cost control mechanisms.
5. Execute test scripts to build software packages, release engineers ensure that new products are configured and coded properly for successful integration and operations.
6. Build test environments and troubleshoot any issues pertaining to the software’s performance. They work with software engineers to resolve any issues and document fixes for use in future reference materials.
7. Build tools to support the software engineering process, review engineering practices, assist in researching new technologies, and meet with the development team to discuss future needs. They also provide ongoing support for completed products and maintain servers.
8. Handle incident escalation and provide on-call support if necessary.
Requirements:
1. 3+ years experience in provisioning, operating, and managing cloud environments
2. Mastery of CI/CD tools and methodologies.
3. Experience in Kubernetes, Docker orchestration and containernation
4. Experience in configuration management and infrastructure as code.
5. Experience as a site reliability or devops would be ideal.
6. Excellent communication skills.
7. Working in a high-growth/scaling technical environment.
8. Candidate with less experience/exposure will be considered as SysOps Engineer
1.負責協助公司規劃、部署與管理容器化平台(如 Red Hat OpenShift、Kubernetes、OKD、Docker),並根據環境與需求量身設計最適合的架構。
2.提供技術輔導,涵蓋容器編排、容器註冊表、容器建置策略及微服務架構等主題。
3.協助建立自動化及容器化雲端應用平台解決方案,並於後期協助平台維運及問題排解,確保服務品質。
1. RESPONSIBILITIES:
• Provide software support and assistance to customers and conduct evaluation to find solutions to customers' modification requests for existing machines.
• Work in close collaboration with Product Marketing Management (PMM) team to ensure strategic business objectives are met.
• Generate the machine software application and guarantee stability and compliance to customers' specifications.
• Provide onsite customer support in solving machine software issue.
• Test and qualify software to ensure quality machine software both internally and at customer site.
• Work closely with other engineering groups to evaluate the interface between hardware and software and the operational and performance requirements of the overall system.
• Support and train customers and service engineers on machine software feature.
• Any other ad-hoc assignments within the scope of main objectives.
2. ESSENTIAL QUALITY EXPECTATIONS:
• Proactively identify opportunities for improvement.
• Seek for continuous improvement in own job processes.
• Provide accurate reporting.
3. AUTHORITY:
• Authorized to Cohu MY Software Engineering source code and documents that is needed to fulfil the objectives and responsibilities.
• Authorized to discuss technical solution with customer directly and feedback to related department.
## Job Description:
- Planning and establishing pass/fail criteria for LEO satellite product testing (e.g., OTA, Thermal Vacuum, Radiation, etc.).
- Execution and result analysis of LEO satellite product testing.
- Writing test reports and documenting anomalies
- Developing and maintaining automated testing programs to improve testing efficiency.
## Skill:
- Familiarity with RF or phased array testing is preferred.
- Familiarity with Python and basic instrument control is preferred.
- Familiarity with military and space testing standards is preferred.
We are looking for a Site Reliability Engineer (SRE) to make sure our cloud-based commerce platform is up and running and healthy.
As a SRE for iKala Commerce, you will be responsible for everything from our cloud infrastructure and operating systems to developing tools for code deployment and service monitoring. You will also review our code and system design and partner with developers to build our applications.
The SRE role is an integral member of our product development team. You will be a part of the team that makes crucial decisions about how to manage and scale complex, high-performance distributed systems. You will also provide your own perspective on our backend systems and constantly develop innovative ways to improve the way we manage the underlying infrastructure. Our ideal candidate should be able to develop applications on his/her own, but more eager to accelerate the whole team by building systems to improve performance and operational efficiency.
Ultimately, you should be involved in all stages of software development to define and improve our SLOs, SLAs & SLIs.
Our current tech stack include:
GCP, Terraform, Kubernetes, Helm, ArgoCD, Gitlab-CI/CD, Grafana LGTM,
【Key Responsibilities】
1. Designing & implementing infrastructure for collecting metrics, crunching data and improving service monitoring to detect problems before they're visible to our customers.
2. Building systems to automate our server lifecycle, from configuration management, CI/CD to server bootstrap and decommission.
3. Troubleshooting, performing root cause analysis, and resolving production issues from the application and network layers all the way down to the system level.
4. Participating in solution design and advising other developers when building new features so that they're scalable, maintainable, and performing well.
5. Improving the observability of our applications through monitoring, alerting, logging, tracing and profiling, and building such observability features into a common platform.
6. Practicing sustainable incident response and blameless postmortems.
7. Proactively identifying and reducing issues through design, testing, and implementation of software-based solutions.
More Info>>>https://www.ikala.ai
Main JD
-Develop automated programs for specific tasks to make it effectively and efficiently.
-Can deploy automated programs to the website, making it easy for users to use.
-The output results can be presented in a visual way, and the project data can be quantified for real-time review.
-Build and manage servers to ensure normal daily operations.
-Continuously optimize the data analysis process: data collection, modeling data and parsing, cleaning, analysis, report output and improve user utilization.
-Leverage AI/LLM models to facilitate engineering tasks.
Secondary JD
-Familiar with Python-based web construction and Linux operation (Must).
-Be familiar with the meaning and practical experience of DevOps (Must).
-Able to efficiently understand the problem requirements from engineers and project managers, and convert the problem requirements into necessary flow charts to facilitate program development. (Must)
-Able to understand the block diagrams of circuit diagrams (e.g. OrCAD) and the meanings of net names (Nice to have).
-Able to build U-boot, Linux to bring-up the system and know the concept of firmware diagnostic program (Nice to have).
-Able to deploy large language model (LLM) on Linux system to interact and facilitate engineering tasks (Nice to have).
As a software automation and development engineer, you will be collaborating with hardware and diagnostics engineers. Your work has the potential to facilitate engineering tasks via self-developed python applications.
As well, to use your imagination to develop automation tools and AI LLM to make some meaningful results for hardware/diagnostic engineers is also terrific. To achieve the goal, how to leverage the resources of Python community to deliver a transparent result with simplicity and regularity is the main idea.
Depending on your experience/expertise, you may leverage Python to choose what deliverables you want to make, the best way is to make it available around engineers (e.g. spreadsheets, html or web-based server based on Linux).