IAdea focus on the total solution on latest IoT devices for the world's top enterprise customers. To help us build functional systems that fulfill customers need and improve user experience, we are looking for an experienced DevOps Engineer. You will be responsible for identifying production issues, implementing integrations with partners' services, and maintaining cloud infrastructure for daily operations.
If you have a solid background in cloud and Kubernetes ecosystem and also are familiar with NodeJS development, we'd love to speak with you.
## Job description
• Building and setting up new development tools and infrastructure.
• Working on ways to automate and improve development and release processes.
• Testing and examining code written by others and analyzing results.
• Ensuring that systems are safe and secure against cybersecurity threats.
• Identifying technical problems and developing software updates and fixes.
• Working with software developers and software engineers to ensure that development follows established processes and works as intended.
• Planning out projects and being involved in project management decisions.
## Responsibilities
• Deploy updates and fixes.
• Build tools to reduce occurrences of errors and improve customer experience.
• Perform root cause analysis for production errors.
• Investigate and resolve technical issues.
• Develop scripts to automate deployment process
• Design procedures for system troubleshooting and maintenance.
## Qualification
Preferred:
• Experience in Kubernetes ecosystem.
• Experience in AWS/GCP/Azure cloud.
• Understand the concept of infrastructure as code (IaC), and experience with tools such as Terraform/Pulumi and ArgoCD/Flux.
• Experience in monitor tools like Promethues/Loki/Grafana.
• BSc in Computer Science, Engineering or relevant field.
Additional:
• Knowledge of the JavaScript/Typescript ecosystem.
• Experience in test case automation.
• Experience in IOT infrastructure like MQTT broker, and data pipeline tools.
We are looking for an enthusiastic and motivated Site Reliability Engineer (SRE) to join our growing team. In this role, you will have the opportunity to learn and contribute to the stability, performance, and scalability of our critical systems. We place a strong emphasis on security in all aspects of our operations. You will work closely with teams to maintain and improve our infrastructure, monitor services, and respond to incidents. This is an excellent opportunity to develop your skills in a dynamic and supportive environment.
<Responsibilities>
1.Assist in Maintaining and Optimizing Infrastructure: Support teams in the day-to-day maintenance and optimization of our infrastructure components.
2.Monitor Services and Address Issues: Monitor system health and service performance, and assist in troubleshooting and resolving issues in a timely manner.
3.Track Resource Usage and System Status: Help monitor various resource indicators and the overall status of the system, contributing to optimization efforts.
4.Support System Stability and Incident Response: Assist in maintaining system stability and participate in incident response procedures under guidance.
5.Contribute to Preventing System Failures: Work with the team to implement measures that help avoid system failures and service interruptions.
6.Collaborate with Other Teams: Work alongside other teams to continuously learn about and contribute to improving system architecture and service quality.
7.Support System Maintenance and Deployment Processes: Assist in the execution of established processes for system maintenance, deployment, and upgrades.
8.Learn and Apply SRE Best Practices: Actively learn and apply SRE principles and best practices in daily tasks.