Senior Engineer -Site Reliability Engineering- Emirati Talent

Apply now »

Date: 17 Sept 2025

Location: Abu Dhabi, AE

Company: EDGE Group PJSC

About KATIM

KATIM is a leader in the development of innovative secure communication products and solutions for governments and businesses. As part of the Electronic Warfare & Cyber Technologies cluster at EDGE, one of the world’s leading advanced technology groups, KATIM delivers trust in a world where cyber risks are a constant threat, and fulfils the increasing demand for advanced cyber capabilities by delivering robust, secure, end-to-end solutions centered on four core business units: Networks, Ultra Secure Mobile Devices, Applications, and Satellite Communications.

The Senior SRE Engineer is responsible for ensuring the reliability, scalability, and performance of mission-critical systems and services. This role combines software engineering and operations expertise to automate processes, optimize infrastructure, and reduce toil. Acting as a bridge between development and operations, the Senior SRE Engineer drives continuous improvement in availability, observability, and incident response, while mentoring junior team members and promoting a culture of reliability across the organization.

 

Key Responsibilities:  

  • Design, implement, and maintain highly available, scalable, and resilient infrastructure and services.
  • Develop automation frameworks and tools to improve deployment, monitoring, and operational processes.
  • Lead incident response, root cause analysis (RCA), and implement permanent fixes to improve system reliability.
  • Collaborate with development and infrastructure teams to embed reliability and performance best practices into the product lifecycle.
  • Define and monitor SLOs/SLAs to ensure service quality and client satisfaction.
  • Drive capacity planning, performance tuning, and cost optimization initiatives.
  • Mentor junior engineers and contribute to knowledge sharing, standards, and documentation.
  • Stay current with industry trends and emerging technologies to propose innovative solutions. 

Experience and Education:

Bachelor's degree in Computer Science, Engineering, or a related field

  • 7–10 years of overall IT experience, with at least 4–5 years in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.
  • 3+ years of Ops experience in a production, customer-facing environment
  • Hands-on experience managing large-scale distributed systems and production environments.
  • Proven experience in incident management, performance tuning, and capacity planning 
  • Strong expertise in Linux/Unix administration and scripting (Python, Bash, Go preferred).
  • Proficiency with containerization and orchestration technologies (Docker, Kubernetes, Helm).
  • Experience with cloud platforms (AWS, Azure, GCP) and on-prem hybrid environments.
  • Knowledge of CI/CD pipelines and automation frameworks (Jenkins, GitLab CI, ArgoCD, Terraform, Ansible).
  • Solid understanding of networking, security, and load balancing.
  • Experience with observability stacks (Prometheus, Grafana, ELK/EFK, OpenTelemetry).
  • Database operations knowledge (PostgreSQL, MySQL, NoSQL)

Key Skills:

  • Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
  • Kubernetes Administrator (CKA) or Kubernetes Application Developer (CKAD).
  • Cloud certifications (AWS Solutions Architect, Azure Administrator, or GCP Professional Cloud Engineer)
  • Proven track record of working in Agile/Scrum environments and using tools like Jira and Confluence. 
  • Exceptional communication and collaboration skills, with the ability to work effectively in cross-functional teams.

#KATIM

 


Job Segment: Application Developer, Cloud, Computer Science, Developer, Solution Architect, Technology

Apply now »