Technology Support Engineer
Technology Support Engineer - FULLY ONSITE IN GLASGOW - INSIDE IR35
(Information Technology Operations)
Role Overview
We are seeking an experienced Technology Support Engineer to support, operate, and scale a cloud-native data platform running in AWS environments.
The role focuses on driving platform reliability, resilience, and operational excellence through automation, disaster recovery testing, observability, and proactive SLO/SLI/SLA management, while providing day-to-day operational support to internal platform users.
You will work closely with platform and engineering teams to embed reliability-by-design, reduce operational toil, and continuously improve availability and performance.
Key Responsibilities
- Design, build, and maintain automation for infrastructure provisioning, platform operations, and incident response using Infrastructure as Code (IaC) and CI/CD pipelines
- Lead resiliency and disaster recovery (DR) planning, including DR drills, failure testing, and recovery validation across cloud and data platform components
- Define, implement, and monitor SLIs, SLOs, and SLAs for critical platform services and data pipelines, using error budgets to guide reliability improvements
- Build and operate observability solutions including metrics, logs, traces, dashboards, and alerting to support proactive issue detection
- Collaborate with platform and engineering teams to integrate SRE and operational best practices into platform design and delivery
- Perform root cause analysis (RCA) for incidents and drive continuous improvement to enhance platform stability, scalability, and performance
- Own and resolve incidents and service requests raised by internal consumer teams, providing hands-on operational support while identifying recurring issues and automating long-term fixes
Required Skills & Experience
- Practical experience applying Site Reliability Engineering principles, including SLI/SLO/SLA design and operational reliability management
- Strong hands-on experience with AWS in production environments (eg EC2, S3, IAM, VPC, monitoring services)
- Experience implementing and operating monitoring, logging, and alerting solutions
- Hands-on experience with automation and Infrastructure as Code (Terraform, CloudFormation, or similar tools)
- Scripting experience using Python and/or Bash
- Exposure to cloud-based data platforms and analytics workloads
Nice to Have
- Experience conducting disaster recovery testing, resiliency validation, or chaos engineering in cloud environments
- Familiarity with CI/CD pipelines and GitOps practices
- Background supporting large-scale data or analytics platforms