Create Alert

Senior Engineer -Site Reliability Engineering- Emirati Talent

Apply now »

Date: 17 Sept 2025

Location: Abu Dhabi, AE

Company: EDGE Group PJSC

About KATIM

KATIM is a leader in the development of innovative secure communication products and solutions for governments and businesses. As part of the Electronic Warfare & Cyber Technologies cluster at EDGE, one of the world’s leading advanced technology groups, KATIM delivers trust in a world where cyber risks are a constant threat, and fulfils the increasing demand for advanced cyber capabilities by delivering robust, secure, end-to-end solutions centered on four core business units: Networks, Ultra Secure Mobile Devices, Applications, and Satellite Communications.

The Senior SRE Engineer is responsible for ensuring the reliability, scalability, and performance of mission-critical systems and services. This role combines software engineering and operations expertise to automate processes, optimize infrastructure, and reduce toil. Acting as a bridge between development and operations, the Senior SRE Engineer drives continuous improvement in availability, observability, and incident response, while mentoring junior team members and promoting a culture of reliability across the organization.

Key Responsibilities:

Design, implement, and maintain highly available, scalable, and resilient infrastructure and services.
Develop automation frameworks and tools to improve deployment, monitoring, and operational processes.
Lead incident response, root cause analysis (RCA), and implement permanent fixes to improve system reliability.
Collaborate with development and infrastructure teams to embed reliability and performance best practices into the product lifecycle.
Define and monitor SLOs/SLAs to ensure service quality and client satisfaction.
Drive capacity planning, performance tuning, and cost optimization initiatives.
Mentor junior engineers and contribute to knowledge sharing, standards, and documentation.
Stay current with industry trends and emerging technologies to propose innovative solutions.

Experience and Education:

Bachelor's degree in Computer Science, Engineering, or a related field

7–10 years of overall IT experience, with at least 4–5 years in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.
3+ years of Ops experience in a production, customer-facing environment
Hands-on experience managing large-scale distributed systems and production environments.
Proven experience in incident management, performance tuning, and capacity planning
Strong expertise in Linux/Unix administration and scripting (Python, Bash, Go preferred).
Proficiency with containerization and orchestration technologies (Docker, Kubernetes, Helm).
Experience with cloud platforms (AWS, Azure, GCP) and on-prem hybrid environments.
Knowledge of CI/CD pipelines and automation frameworks (Jenkins, GitLab CI, ArgoCD, Terraform, Ansible).
Solid understanding of networking, security, and load balancing.
Experience with observability stacks (Prometheus, Grafana, ELK/EFK, OpenTelemetry).
Database operations knowledge (PostgreSQL, MySQL, NoSQL)

Key Skills:

Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
Kubernetes Administrator (CKA) or Kubernetes Application Developer (CKAD).
Cloud certifications (AWS Solutions Architect, Azure Administrator, or GCP Professional Cloud Engineer)
Proven track record of working in Agile/Scrum environments and using tools like Jira and Confluence.
Exceptional communication and collaboration skills, with the ability to work effectively in cross-functional teams.

#KATIM

Job Segment: Application Developer, Cloud, Computer Science, Developer, Solution Architect, Technology

Apply now »