Our platform delivers millions of SMS and MMS messages every day through a distributed, event-driven architecture on Google Cloud. We run Cloud Run–based microservices, Pub/Sub event streams, Spanner-backed data stores, and Redis (Memorystore) for coordination and rate limiting. The system scales dynamically to meet campaign spikes, processes real-time delivery telemetry at high throughput, and must remain fast, secure, and dependable 24/7.
We’re hiring a Senior Infrastructure and Security Engineer to take ownership of the platform’s reliability, security posture, and operational excellence. As our first dedicated infrastructure hire, you’ll partner closely with the CTO to shape the foundation of how we scale. This is not a “follow the runbook” role—you’ll define the roadmap, set architectural direction, and build the systems that keep the platform resilient and customer data protected.
What You’ll Own
Cloud Architecture & Infrastructure as Code
- Design, operate, and evolve our Terraform-managed GCP environment across a shared VPC host project and multiple service projects
- Architect for scalability, resilience, and cost efficiency across Cloud Run, Spanner, Pub/Sub, Cloud Storage, Memorystore (Redis), and Cloud Tasks
- Own environment promotion and consistency across dev, staging, and production
Reliability & Observability
- Build robust monitoring, alerting, and incident response using Cloud Monitoring, Logging, and Trace
- Define SLIs and SLOs for critical message delivery paths and reduce MTTR
- Implement health checks and self-healing patterns for high-throughput Cloud Run services
Cloud Security
- Strengthen security across network, application, and data layers
- Design IAM policies and service account models, manage secrets with Secret Manager, and implement Cloud Armor protections for DDoS and rate limiting
- Own security reviews, vulnerability management, and incident response for security events
CI/CD & Developer Experience
- Maintain and improve GitHub Actions pipelines for a TypeScript monorepo deploying to Cloud Run
- Enable fast, safe releases with automated testing, linting, container builds, and environment-specific deployments
- Improve build performance and deployment reliability
Performance & Auto-Scaling
- Fine-tune Cloud Run autoscaling (concurrency, min/max instances) for public APIs and Pub/Sub workers
- Optimize Spanner performance and capacity planning
- Ensure Redis-based distributed rate limiting remains low-latency across horizontally scaled services
Compliance & Data Protection
- Help define and maintain compliance practices for messaging platforms (TCPA, carrier policies, encryption, retention, and audit logging)
- Ensure enterprise-grade expectations around security, privacy, and data handling are met
What We’re Looking For
Required
- 5+ years in infrastructure, DevOps, or SRE roles with increasing ownership
- Deep hands-on experience with Google Cloud (Cloud Run, VPC networking, IAM, and at least one managed database)
- Production-grade Terraform experience managing multi-environment, modular IaC
- Practical cloud security expertise: network security, IAM design, secrets management, and vulnerability assessment
- Proven experience building and running CI/CD pipelines with GitHub Actions for containerized workloads
- Experience operating distributed systems with high event/message throughput and strict reliability requirements
- Strong Linux and networking fundamentals (DNS, TLS, load balancing) and comfort debugging production systems
- Security-first mindset with a strong understanding of least privilege, encryption in transit/at rest, and incident response
- Willingness to participate in on-call and incident response in a small-team environment
Preferred
- Hands-on experience with Spanner, Pub/Sub, Memorystore (Redis), Cloud Tasks, or Cloud Armor
- Background in messaging/telecom systems (carrier APIs, throughput management, large-scale rate limiting)
- Familiarity with TypeScript/Node.js runtime environments
- Experience managing CI/CD in monorepo environments
- Exposure to compliance frameworks relevant to communications platforms (TCPA, SOC 2, carrier security standards)
- Experience as the primary infrastructure owner at a scaling company
- Google Cloud certifications (Professional Cloud Security Engineer or Cloud Architect) are a plus
What Makes This Role Stand Out
- True ownership: You’ll be the first infrastructure specialist—no inherited playbooks, you’ll create them.
- Real scale, real impact: Millions of daily messages, carrier-level SLAs, and multi-region considerations in a small, high-impact team.
- Challenging systems: Distributed rate limiting on auto-scaling Cloud Run services, high-throughput Pub/Sub pipelines, and real-time metrics on Spanner.
- Direct CTO partnership: Work alongside a technically hands-on CTO with the context and authority to make meaningful architectural decisions.
- Autonomy with outcomes: We care about reliability, security, cost efficiency, and delivery speed—you’ll shape how we achieve them.
Compensation & Benefits
- Competitive base salary (range available upon request)
- Performance-based bonus
- Remote-first culture with flexible working hours
- Direct reporting line to the CTO
Our Stack at a Glance
- Cloud: Google Cloud Platform
- IaC: Terraform
- CI/CD: GitHub Actions
- Languages: TypeScript (Node.js)
- Networking: Shared VPC, Global HTTPS Load Balancer, API Gateway, Cloud Armor
- Observability: Cloud Monitoring, Logging, Trace
- Architecture: Event-driven microservices