As a Cloud Operations Engineer, you will be responsible for maintaining the stability and efficiency of cloud-based applications, ensuring systems operate seamlessly under demanding service level requirements. Your work will center on proactive monitoring, incident resolution, and continuous improvement of operational processes.
Key Responsibilities
- Oversee application server health, including service restarts and resource management across Unix, Linux, and Windows environments
- Diagnose and resolve technical disruptions such as batch processing failures, network anomalies, and data feed inconsistencies
- Monitor system performance metrics including CPU, memory, disk usage, and database activity
- Generate daily operational reports and maintain accurate records of incidents, resolutions, and run procedures
- Respond to client support requests within defined SLAs, addressing tier 2 and tier 3 escalations
- Document root causes and resolution steps in the customer support system for transparency and knowledge sharing
- Provide after-hours on-call support and perform tasks during non-standard business hours as needed
Qualifications
- 4–6 years of hands-on experience in production or application support within cloud-hosted, high-availability environments
- Proven expertise with Azure and AWS platforms, including infrastructure management and performance optimization
- Strong command of operating systems: Unix, Linux, Windows, and application servers such as Tomcat
- Working knowledge of SQL Server, Oracle, and MySQL databases
- Scripting proficiency and familiarity with SSH for automation and system access
- Understanding of complex data workflows and distributed application architectures
- Strong written and verbal communication skills for coordinating with technical teams and clients
- Experience debugging application issues, analyzing scalability challenges, and applying security best practices
- Ability to work independently and collaboratively in fast-moving settings
- Capacity to manage multiple priorities and deliver rapid responses during production incidents
Work Environment
This role follows a hybrid model with two scheduled office days per week, designed to foster collaboration, innovation, and team alignment. The remaining days are remote, supporting flexibility while maintaining strong team connectivity.
