In 2017, AWS announced a feature to modify the EBS volumes. It allowed users to increase the volume size.
But, have you ever decreased EBS volumes (scale-in) without any volume replacement & scheduled downtime? Sounds impossible? At Paytm, we do this regularly.
Yes, you read it right. Just like EC2 autoscaling, you can scale your EBS volumes as well. Bye-bye to scheduled downtime while modifying the AWS EBS volume storage. Being India’s largest fintech company, by Indians and for Indians, we try to build the most innovative tech stack across the globe.
Why do you need EBS scale-out & scale-in?
To avoid production downtimes (Scale-out) due to full disk
Every application has tons of data being constantly added to the attached disk. If we don’t take any action on disk usage alarms, full-disk can cause downtimes to the applications. This leads to emergency actions of manually increasing the disk space.
Eliminate extra storage capacity allocated to machines to save cost
Many times, large EBS volumes are provisioned and not used. In some of the cases, EBS volumes are required for specific durations like ES cleanup or DB backups and there is the extra provisioned capacity for the rest of the day/week.
Hence, there is a need for a real-time automated EBS auto-scaler based on the Scale-In and Scale-out properties.
Also, you know you have increased the volume size and you are asked to reduce the volume size to eliminate the extra storage to save cost. You do it by following the rsync way or any other Linux way which will end up causing production downtime. The below diagram depicts this case.
Minimize human intervention on every disk alarm ( even at night )
When an alarm is triggered, Homo-sapiens are the ones who come into the picture. Disk full alarms can be business destructive hence need full attention on production.
The below image shows how a Prometheus alarm can wake you up even at midnight.
How can you avoid the above-mentioned scenarios?
That’s the whole point of this blog, let’s see the proposed solution.
EBS auto-scaler provides scale-in & scale-out functionalities, unlike AWS. It gives you the flexibility to increase and decrease volume size on the fly with 0 seconds of downtime.
The implementation
The complete process is automated with AWS Cloudwatch, SNS, and Lambda.
Architecture:
Paytm’s monitoring system (you can say India’s largest monitoring ecosystem), is using EBS auto-scaler on Terabytes of disks for the last 1 year.
The Impact
- 4.6% reduction in downtimes due to disk utilization issues.
- EBS cost optimization up to 65%.
- If we use 5 GP3 volumes of 200GB each, we have 3000*5 IOPs and 125*5 MiB/s throughput which is way more than traditional volumes.
- 100% automated.
- No sleep disturbances
The Future
- Better DevOps practices
- More cost savings