Senior Site Reliability Engineer (SRE)
Clouddevs San Francisco
Senior Site Reliability Engineer (SRE)

CloudDevs partners with fast-growing, venture-backed startups to help them hire exceptional designers and developers. When you join us, you’ll work directly with one of these startups and contribute to their early-stage growth. 

If you specialize in a different tech stack, no problem.
 We support multiple projects and are always looking for experienced Senior Developers. If you have 4+ years of professional software development experience, we’d love to connect and match you with the right opportunity. Simply apply, and we’ll share all the relevant details. 

If your profile fits, our hiring process includes: 

  1. Submitting your CV on our website

  2. A 30-minute screening call or technical interview

  3. Completing a coding challenge

  4. Getting matched to the most suitable roles on our platform

Location: LATAM, Europe

CloudDevs also works with high-growth startups across the US and is building a strong pipeline of world-class Site Reliability Engineers (SREs) for current and future positions. You may be placed directly with a partner company or added to our vetted SRE network for upcoming projects.

This role is ideal for engineers who prioritize reliability, metrics, performance, and scalable system design. If you enjoy improving deployment processes, ensuring stability, and helping teams ship high-quality software, this is a great fit.

Key Responsibilities
 

  • Act as a hands-on engineer focused on reliability, performance, and system observability

  • Define and maintain SLIs, SLOs, and error budgets

  • Optimize monitoring costs and improve metrics, logging, and tracing quality

  • Enhance deployment safety, canary rollouts, and UAT pipelines

  • Build tools for automated and local performance testing, including benchmarks

  • Lead resilience initiatives such as failover drills, chaos testing, and redundancy checks

  • Work with engineering teams to refine scaling patterns and architecture

  • Support incident response and reduce operational noise

  • Write clean, production-ready code in Go, Python, or Node.js

  • Contribute to CI/CD automation and improvements

  • Collaborate across teams to elevate reliability standards

Requirements
 

  • 5+ years of experience in SRE, DevOps, or Platform Engineering

  • Strong background with cloud platforms (AWS preferred), Kubernetes, and Terraform

  • In-depth understanding of observability tools such as DataDog, Prometheus, or OpenTelemetry

  • Strong debugging abilities across services, networking, and data layers

  • Proven experience building and monitoring SLIs/SLOs

  • Familiarity with CI/CD tools (GitHub Actions, Jenkins, ArgoCD, etc.)

  • Ability to write production-quality code in Go, Python, or Node.js

  • Comfortable operating independently in fast-paced environments

Nice to Have
 

  • Experience optimizing observability costs and data ingestion

  • Familiarity with chaos engineering and progressive deployment strategies

  • Work with high-throughput or low-latency systems

  • Hands-on experience with AWS at scale (EKS, Lambda, DynamoDB, S3)

  • Exposure to regulated industries (fintech, payments, SOC2)

  • Background in performance or load-testing automation

  • Experience with systems processing tens of millions of API calls

Join Our SRE Talent Pool
 
If you don’t meet every requirement or don’t match the current opening, skilled SREs with real production experience are still encouraged to apply. We regularly place engineers across reliability, DevOps, platform, observability, backend, and infrastructure roles.
 

Clouddevs
About Company

Assemble your dream tech team with CloudDevs’ pre-vetted LatAm talent. Seamlessly integrate our elite talents into your existing team through our Team Augmentation services, and leave it to us to handle all legal, compliance, and administrative complexities.

Job Information