Senior Site Reliability Engineer (SRE)

Clouddevs San Francisco

Join WhatsApp Channel! Remote Jobs Alert

CloudDevs partners with fast-growing, venture-backed startups to help them hire exceptional designers and developers. When you join us, you’ll work directly with one of these startups and contribute to their early-stage growth.

If you specialize in a different tech stack, no problem.
We support multiple projects and are always looking for experienced Senior Developers. If you have 4+ years of professional software development experience, we’d love to connect and match you with the right opportunity. Simply apply, and we’ll share all the relevant details.

If your profile fits, our hiring process includes:

Submitting your CV on our website
A 30-minute screening call or technical interview
Completing a coding challenge
Getting matched to the most suitable roles on our platform

Location: LATAM, Europe

CloudDevs also works with high-growth startups across the US and is building a strong pipeline of world-class Site Reliability Engineers (SREs) for current and future positions. You may be placed directly with a partner company or added to our vetted SRE network for upcoming projects.

This role is ideal for engineers who prioritize reliability, metrics, performance, and scalable system design. If you enjoy improving deployment processes, ensuring stability, and helping teams ship high-quality software, this is a great fit.

Key Responsibilities

Act as a hands-on engineer focused on reliability, performance, and system observability
Define and maintain SLIs, SLOs, and error budgets
Optimize monitoring costs and improve metrics, logging, and tracing quality
Enhance deployment safety, canary rollouts, and UAT pipelines
Build tools for automated and local performance testing, including benchmarks
Lead resilience initiatives such as failover drills, chaos testing, and redundancy checks
Work with engineering teams to refine scaling patterns and architecture
Support incident response and reduce operational noise
Write clean, production-ready code in Go, Python, or Node.js
Contribute to CI/CD automation and improvements
Collaborate across teams to elevate reliability standards

Requirements

5+ years of experience in SRE, DevOps, or Platform Engineering
Strong background with cloud platforms (AWS preferred), Kubernetes, and Terraform
In-depth understanding of observability tools such as DataDog, Prometheus, or OpenTelemetry
Strong debugging abilities across services, networking, and data layers
Proven experience building and monitoring SLIs/SLOs
Familiarity with CI/CD tools (GitHub Actions, Jenkins, ArgoCD, etc.)
Ability to write production-quality code in Go, Python, or Node.js
Comfortable operating independently in fast-paced environments

Nice to Have

Experience optimizing observability costs and data ingestion
Familiarity with chaos engineering and progressive deployment strategies
Work with high-throughput or low-latency systems
Hands-on experience with AWS at scale (EKS, Lambda, DynamoDB, S3)
Exposure to regulated industries (fintech, payments, SOC2)
Background in performance or load-testing automation
Experience with systems processing tens of millions of API calls

Join Our SRE Talent Pool

If you don’t meet every requirement or don’t match the current opening, skilled SREs with real production experience are still encouraged to apply. We regularly place engineers across reliability, DevOps, platform, observability, backend, and infrastructure roles.

Courses Related to this Job

AWS Cloud Technology Consultant Professional Certificate

Introduction to Nod.js

Project Management

Apply Now

Clouddevs

San Francisco Job posted (5) https://clouddevs.com View company profile and all jobs

About Company

Assemble your dream tech team with CloudDevs’ pre-vetted LatAm talent. Seamlessly integrate our elite talents into your existing team through our Team Augmentation services, and leave it to us to handle all legal, compliance, and administrative complexities.

Job Information

Employee Type:
Full-time
Location:
Anywhere in the World
Job Type:
DevOps and Sysadmin
Applicants:
0
Skills:
Node.js AWS Python Back-End Dev Full Stack Dev Go
Salary:
Date posted:
Nov 22, 2025
Share

Senior Site Reliability Engineer (SRE)

Courses Related to this Job

Clouddevs

About Company

Job Information

Policies

Remote Jobs by Skills

Remote Jobs by Companies