Senior SRE
Chipcolate Milan
Senior SRE

Chipcolate is an Italian company of passionate engineers dedicated to orchestrating scalable fleets—from edge devices to autonomous agents and servers. Our background spans embedded systems, web applications, and 3D printing. Currently, we are focused on scaling a fleet of autonomous agents delivering high-throughput financial services. We operate in a flexible and unstructured environment to maximize speed and quality, guided by engineering principles and first-principles thinking. 

Learn more: chipcolate.com · LinkedIn (@chipcolate) · GitHub  

Job Summary 

Chipcolate is seeking a Senior Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our agent-based financial platform. You will design and automate resilient cloud infrastructure, maintain low latency and high uptime, and provide tooling that enables our product team to ship confidently at scale. 

  • Salary: €80,000 – €100,000 gross / year

  • Location: Remote (within ±4 hours of Central European Time)

  • Employment Type: Full-time, permanent

Disclaimer: Applications outside the required timezone or lacking the required seniority will be automatically rejected. Please include your GitHub profile as a portfolio—profile image is not required.

Responsibilities
 

  • Architect, provision, and maintain a distributed, multi-cloud infrastructure with strict availability and latency requirements

  • Design long-term solutions for thousands of concurrently executing agents

  • Ensure Postgres database performance and reliability, including OLAP workloads

  • Build scalable observability stacks (Grafana / OpenTelemetry) with actionable SLOs

  • Implement automated reliability measures: blue/green deployments, canary rollouts, chaos testing, and game days

  • Collaborate with backend teams to profile services, remove bottlenecks, and enable horizontal scaling

  • Drive cost-efficient capacity planning and enforce security best practices

Experience & Qualifications

Must-Have:
 

  • 5+ years in SRE, DevOps, or Production Engineering

  • Deep knowledge of Linux and containerization

  • Strong Postgres expertise

  • Proficiency in at least one programming language: Node.js, Python, Go, or Rust

  • Mastery of Infrastructure-as-Code tools like Ansible and Terraform

  • Strong monitoring and alerting experience; familiarity with RED/USE metrics

Nice-to-Have:
 

  • Experience with Grafana observability stack

  • Familiarity with event-driven or agent-based architectures

  • Multi-region, active-active setups experience

  • Exposure to Supabase or DuckDB

  • Managing Kubernetes clusters at scale

  • No formal degree required—demonstrable work via code, OSS contributions, or projects is valued

Benefits
 

  • Flexible hours & fully remote

  • Fast-growing, innovative environment

  • Exciting application domain

  • 24 days paid leave + local public holidays

  • Competitive salary

  • Home-office budget and company laptop

  • Additional special benefits discussed during the hiring process

Application Process
 

  1. Submit your CV and GitHub profile

  2. Cultural chat and technical deep-dive with our CTO (60 min, system design & live problem-solving)

  3. Offer within 7 working days of final interview

Ready to make high-stakes infrastructure feel effortless? Apply today and let’s engineer reliability together.

About Us
Chipcolate is an Italian engineering-driven company focused on orchestrating scalable fleets across devices, agents, and servers. Our work spans embedded systems, web applications, and 3D printing. Currently, we scale autonomous agents providing high-throughput financial services, operating flexibly to deliver speed and quality.

Learn more: chipcolate.com · LinkedIn (@chipcolate) · GitHub
 

About Company

Chipcolate is an Italian engineering-driven company specializing in orchestrating scalable fleets—from edge devices to autonomous agents and servers. With a background in embedded systems, web applications, and 3D printing, we now focus on scaling high-throughput financial services platforms. Operating in a flexible and unstructured environment, we prioritize speed, quality, and first-principles engineering thinking.  

Job Information