Atul Salunkhe

Introduction

My journey in tech began in December 2014 as a manual tester. I spent my early days executing 50+ test cases daily, reporting defects, and validating fixes. Although repetitive, it taught me one of the most important lessons: attention to detail and discipline matter more than anything else.

Over time, I transitioned into automation testing, first with Java and Selenium, building regression suites of 200+ test cases, and later with Python, creating custom scripts and lightweight frameworks that reduced manual effort by 30–40%. These experiences strengthened my problem-solving skills and prepared me for the next phase of my career: Site Reliability Engineering (SRE).

"Attention to detail and discipline matter more than anything else."

My Transition to SRE

In 2022, I joined an SRE team with limited cloud and reliability experience. The first few weeks were overwhelming — learning tools, platforms, and practices while avoiding mistakes in production.

I began with small, manageable tasks:

Automating monitoring dashboards using Python
Writing alerts and remediation scripts
Supporting incident response and troubleshooting

Over time, I took ownership of critical services, implemented CI/CD pipelines with Jenkins and GitLab, and introduced automation that improved system uptime by 10–15%. Task by task, I grew into an engineer capable of resolving incidents, optimizing workflows, and mentoring junior teammates.

"Bit by bit, task by task, I grew into an engineer capable of designing solutions and resolving incidents."

Anecdotes: Lessons from QA Applied to SRE

Catching the "Silent Failure"

During my QA days, I noticed an API returning correct status codes but inconsistent payloads under certain conditions. Reporting this subtle bug prevented potential production issues.

Years later as an SRE, I detected a similar silent failure in a microservice — monitoring dashboards showed "all healthy," but logs revealed intermittent latency spikes. Drawing on my QA instincts, I implemented a script to automatically restart failing pods, preventing an outage for thousands of users.

Automating Error Detection

Manual regression in QA taught me the value of automation. Hours spent running repetitive tests instilled a mindset of efficiency and reliability.

In SRE, I applied the same principle:

Aggregated logs and automated critical alerting
Filtered false positives to reduce alert noise by over 50%
Saved the team countless hours weekly

python

# Pseudo-code for monitoring service logs
for service in services:
    logs = fetch_logs(service)
    if detect_anomaly(logs):
        trigger_alert(service)

Prioritizing Reliability Like Quality

QA taught me that not all bugs are equal — severity and impact guide prioritization.

During a production database replication lag incident, I:

Prioritized customer-facing services
Applied a temporary failover
Systematically fixed the root cause

This approach minimized downtime while ensuring long-term reliability — much like triaging test cases in QA.

How My QA Mindset Strengthened My SRE Skills

Attention to detail: Scrutinizing every log and configuration to catch subtle reliability issues early
End-user perspective: Designing systems that are resilient and user-friendly
Preventive thinking: Treating minor issues seriously to prevent major incidents
Ownership & ethics: Taking responsibility for systems, pipelines, and automation
Continuous learning: Experimenting with Python, Kubernetes, Terraform, Helm, and cloud platforms

Personal Growth

Patience: QA trained me to stay calm during repetitive work; SRE reinforced it during on-call rotations and incidents
Discipline: Writing reproducible test cases evolved into reliable scripts, workflows, and runbooks
Analytical thinking: Breaking down test scenarios turned into debugging complex distributed systems

"My QA years were not a separate chapter, but the foundation of my quality-first, ethical, and reliable SRE approach."

Lessons Learned & Advice for Aspiring QA-to-SRE Engineers

Leverage Your QA Mindset – Attention to detail and focus on quality are critical in SRE
Embrace Continuous Learning – Master cloud platforms, CI/CD, and infrastructure automation gradually
Think Like the End User – Design alerts, dashboards, and automation that benefit real users
Prioritize Preventive Action – Treat small issues seriously to prevent major outages
Take Ownership with Ethics – Responsibility goes beyond tasks; ensure system reliability
Be Patient and Persistent – Both QA and SRE demand endurance and perseverance
Document and Share Knowledge – Maintain runbooks and incident documentation for team success

Closing Thoughts

The path from QA to SRE may seem daunting, but it is full of opportunities for growth and impact. Your QA mindset — focusing on detail, ethics, and quality — is a foundation for building reliable, scalable systems.

By combining these values with SRE skills, you can excel technically, grow personally, and drive meaningful impact for your team and the end-users who rely on your systems every day.

"If you're on a QA-to-SRE journey, feel free to connect with me and share your experiences!"

From QA to SRE: How My Quality-First Mindset and Ethical Approach to Testing Helped Me Engineer Reliability at Scale