Avani Singhal - Experience

👨‍💻 Principal Engineer

April 2025 – Present

As a Principal Engineer, I am responsible for driving technical excellence, architectural decisions, and mentoring engineering teams. In this role, I will focus on:

Leading technical strategy and architecture for large-scale systems
Mentoring senior engineers and technical leads
Driving innovation through emerging technologies and best practices
Ensuring system scalability, reliability, and performance

👩‍💻 SRE III – BookMyShow

BigTree Entertainment Pvt. Ltd. | Apr 2021 – Present

As a senior SRE, I've been instrumental in designing scalable infrastructure, ensuring high availability for large-scale events, and embedding reliability across the CI/CD lifecycle.

🛠️ CI/CD Architecture & Release Automation

Standardized CI/CD across teams using GitLab, Bitbucket, and Bamboo.
Integrated SonarQube for quality gates; cut production issues by 30%.
Enabled reusable deployment templates with safe rollback support.

☁️ Cloud Migration & Infra Modernization

Migrated core workloads from VMware & GCP to AWS with EKS, EC2, RDS.
Replaced JFrog with Amazon ECR for better cost and container management.
Automated infra provisioning via CloudFormation & Ansible.

🌐 Disaster Recovery Implementation

Built a multi-region DR architecture with RDS cross-region replication, S3 backups, and Route 53 failover.
Authored DR runbooks and executed regular failover drills.
Reduced RTO from 4h to <30 mins across critical services.

📈 Scalability for High-Traffic Events

Handled peak loads (5x+ traffic) during the Cricket World Cup and concerts.
Tuned EKS with HPA, disruption budgets, and circuit-breakers.
Monitored with Grafana, Prometheus, synthetic testing, and APM tools like New Relic and ELK Stack APM

🧩 Istio-Based Service Mesh Deployment

Introduced advanced traffic routing, retries, mirroring, and observability for microservices.
Improved service resilience and debugging via sidecar telemetry and distributed tracing with Jaeger.

🧠 Reliability Culture & Team Enablement

Led incident response for P0s, with detailed postmortems and RCA reviews.
Trained new SREs on Kubernetes, observability tools, and CI/CD platforms.
Documented internal architecture and DR knowledge base.

⚙️ DevOps Engineer II – HERE Technologies

Nov 2019 – Mar 2021

📊 CI Observability Dashboards: Built real-time Grafana dashboards for CI pipelines using Python & MySQL — reduced build failures by 25%.
⚡ GitLab Runner Optimization: Improved flaky job reliability and cut CI build time by 30%.
🛠️ Infrastructure Provisioning: Automated AWS infra via Ansible — reduced manual errors significantly.
💰 Cloud Cost Optimization: Cut AWS costs by 20% using RIs and right-sized resources.
🚀 Automated Deployment Pipelines: Rolled out Jenkins pipelines to improve deployment speed and consistency.

🔧 DevOps Deployment Engineer – Zycus

Oct 2017 – Nov 2019

Zycus is a global leader in Source-to-Pay procurement software, empowering enterprises with automation-driven solutions.

🐍 GitLab Access Automation: Automated GitLab access control using Python scripts for streamlined onboarding.
🧪 CI Pipeline Hardening: Integrated GitLab CLI with Jenkins, SonarQube, and Nexus to enforce quality gates.
🚀 Release Management: Coordinated deployment planning across multiple non-prod and prod environments.
☁️ Hybrid Cloud Management: Provisioned and maintained infra on AWS, Navisite, and VMware platforms.
📦 AWS Services Integration: Deployed VPC, EC2, ALB, Auto Scaling, and S3 in scalable infra setups.
🐳 Docker-Based Dev Envs: Enabled isolated dev/testing using Docker Compose and shared base images.
🔄 Developer Enablement: Guided devs in creating Dockerfiles and containerizing local apps.
🔧 Ansible Configuration: Managed infra configuration and app deployments via Ansible roles/playbooks.
🧭 Consul for Service Discovery: Leveraged Consul to manage dynamic service configurations.
🌐 Web Server Config: Served applications via Apache, Nginx, and HAProxy for high availability.
🛡️ CI/CD Quality Assurance: Ensured stable deployments through robust infra testing and rollout strategies.

🩺 DevOps Engineer – Doctor Insta (via OpsTree Solutions)

June 2016 – September 2017

Doctor Insta is a telehealth platform offering digital primary care and remote doctor consultations across India.

🔧 Infra Automation with Ansible: Created reusable roles/playbooks for consistent infra provisioning.
☁️ AWS Infrastructure Management: Managed EC2, RDS, VPC, S3, and Route 53 across environments.
📦 Region Migration: Led successful production migration across AWS regions with minimal downtime.
🔐 Git Hosting via Gitolite: Deployed and maintained secure, internal Git repositories.
🚀 CI/CD with Jenkins: Automated deployments and app builds using job-based pipelines.
💾 DB Backup Automation: Scheduled daily backups with secure uploads to AWS S3.
📈 Monitoring with Zabbix: Implemented end-to-end infra and app monitoring for alerts and health checks.
🐍 Python App Deployment: Deployed apps in isolated virtual environments to ensure dependency consistency.

💼 Professional Experience

🚀 Work History