職責:
我們正在尋找技術專家
1. 在雲服務開通方面:在GCP等主流雲提供商上運營和管理應用系統。
2. 在發布管理中:合規、組裝、交付源代碼到容器鏡像中,並進一步部署到各種格式的基礎設施中。
負責項目:
1. 部署、管理和操作可擴展、高可用性和容錯系統。
2. 將現有的本地應用程序遷移到雲端。
3. 根據計算、數據或安全要求選擇合適的雲服務。
4. 估算雲使用成本並確定運營成本控制機制。
5. 執行測試腳本來構建軟件包,發布工程師確保新產品的配置和編碼正確,以便成功集成和運行。
6. 構建測試環境並解決與軟件性能相關的任何問題。與 RD 合作解決任何問題並記錄修復以供將來參考資料使用。
7. 構建工具以支持軟件工程流程、審查工程實踐、協助研究新技術,並與開發團隊會面討論未來需求。他們還為完成的產品提供持續支持並維護服務器。
8. 處理升級事件並在需要時提供 On Call 的支持。
要求:
1. 3 年以上雲環境配置、運營和管理經驗
2. 掌握 CI/CD 工具和方法。
3. 擁有 Kubernetes、Docker-compose 和容器化方面的經驗
4. 擁有 配置管理和 infrastructure as code 的經驗。
5. 擁有 Site Reliability Engineering 或 DevOps 方面的經驗更理想。
6. 跨部門的溝通能力。
7. 願意在高增長/擴展技術環境中工作。
8. 經驗/接觸較少的候選人將被視為 SysOps 工程師
工具:Tereform , argoCD , jenkins , shell
====
Responsibilities
1. We are looking for technical experts
In cloud service provisioning: operate and manage application systems on mainstream cloud providers such as AWS, GCP, Azure, Aliyun.
2. In release management: comply, assemble, deliver source code into container images and further deploy in infrastructure with various formats.
You are responsible for
1. Deploy, manage, and operate scalable, highly available, and fault tolerant systems.
2. Migrate an existing on-premises application to cloud.
3. Select the appropriate cloud service based on compute, data, or security requirements.
4. Estimating cloud usage costs and identifying operational cost control mechanisms.
5. Execute test scripts to build software packages, release engineers ensure that new products are configured and coded properly for successful integration and operations.
6. Build test environments and troubleshoot any issues pertaining to the software’s performance. They work with software engineers to resolve any issues and document fixes for use in future reference materials.
7. Build tools to support the software engineering process, review engineering practices, assist in researching new technologies, and meet with the development team to discuss future needs. They also provide ongoing support for completed products and maintain servers.
8. Handle incident escalation and provide on-call support if necessary.
Requirements:
1. 3+ years experience in provisioning, operating, and managing cloud environments
2. Mastery of CI/CD tools and methodologies.
3. Experience in Kubernetes, Docker orchestration and containernation
4. Experience in configuration management and infrastructure as code.
5. Experience as a site reliability or devops would be ideal.
6. Excellent communication skills.
7. Working in a high-growth/scaling technical environment.
8. Candidate with less experience/exposure will be considered as SysOps Engineer
Driving with us to the Next!
"Integration of various energy sources, improvement in energy efficiency, and creation of a powerful platform that benefits everyone"
【Job Description】
We are in search of SRE engineer who can seamlessly integrate development artifact with cloud resources. The candidate needs to have hands-on experience on public cloud usage and work closely on container world. We are looking for highly self-motivated engineer to join to build operational environments to support from customer service to development. Daily task might include explore to the latest technology to be adopted to resolve business problems.
【Core Responsibilities】
• Work closely with engineer teams to identify and implement optimal cloud-based solutions for the company.
• Build and maintain the agile / responsive container native CI/CD pipelines (Jenkins / ArgoCD), and support multiple development teams to deliver high-quality builds with measurable performance
• Build, maintain, improve, scale and secure cloud infrastructures and resources by using IaC tools (Terraform / Pulumi) with cost consideration
• Build automation tools to improve system's observability, availability and reliability via Python and Serverless solutions (AWS Lambda, Kubernetes Jobs)
• Design, manage and monitor Kubernetes clusters for multiple production workloads
• Participate in an on-call rotation to mitigate disruption for any production systems and conduct root cause analysis reports
• Plan and test disaster recovery scenarios and business continuity plans for a highly available micro-services architecture
• Develop and implement security policies in compliance with ISO 27001/27017 standards, including access control, encryption and logging
• Build central dashboard and alert mechanisms to identify potential resource problems
• Handle production issues with intelligent means
【Essential Qualification】
• Bachelor degree in computer related program
• 3 year experience in AWS cloud management
• 3 year experience in Kubernetes management
• 3 year experience in CI/CD area (Jenkins)
• 3 year experience in network or database (PostgreSQL, Cassandra, Redis)
• 2 year experience in observability mechanism (Prometheus, Grafana, InfluxDB, OpenSearch, ELK)
• 3 year experience in Linux
• Performance tuning & error handling & root cause analysis
• Need to on-call
【Desirable Abilities】
• AWS related certification
• CKA, CKAD, CKS
We are looking for a Site Reliability Engineer (SRE) to make sure our cloud-based commerce platform is up and running and healthy.
As a SRE for iKala Commerce, you will be responsible for everything from our cloud infrastructure and operating systems to developing tools for code deployment and service monitoring. You will also review our code and system design and partner with developers to build our applications.
The SRE role is an integral member of our product development team. You will be a part of the team that makes crucial decisions about how to manage and scale complex, high-performance distributed systems. You will also provide your own perspective on our backend systems and constantly develop innovative ways to improve the way we manage the underlying infrastructure. Our ideal candidate should be able to develop applications on his/her own, but more eager to accelerate the whole team by building systems to improve performance and operational efficiency.
Ultimately, you should be involved in all stages of software development to define and improve our SLOs, SLAs & SLIs.
Our current tech stack include:
GCP, Terraform, Kubernetes, Helm, ArgoCD, Gitlab-CI/CD, Grafana LGTM,
【Key Responsibilities】
1. Designing & implementing infrastructure for collecting metrics, crunching data and improving service monitoring to detect problems before they're visible to our customers.
2. Building systems to automate our server lifecycle, from configuration management, CI/CD to server bootstrap and decommission.
3. Troubleshooting, performing root cause analysis, and resolving production issues from the application and network layers all the way down to the system level.
4. Participating in solution design and advising other developers when building new features so that they're scalable, maintainable, and performing well.
5. Improving the observability of our applications through monitoring, alerting, logging, tracing and profiling, and building such observability features into a common platform.
6. Practicing sustainable incident response and blameless postmortems.
7. Proactively identifying and reducing issues through design, testing, and implementation of software-based solutions.
More Info>>>https://www.ikala.ai