AWS EBS Auto-scaling

In 2017, AWS announced a feature to modify the EBS volumes. It allowed users to increase the volume size.

But, have you ever decreased EBS volumes (scale-in) without any volume replacement & scheduled downtime? Sounds impossible?  At Paytm, we do this regularly. 

Yes, you read it right. Just like EC2 autoscaling, you can scale your EBS volumes as well. Bye-bye to scheduled downtime while modifying the AWS EBS volume storage. Being India’s largest fintech company, by Indians and for Indians, we try to build the most innovative tech stack across the globe.

Why do you need EBS scale-out & scale-in?

To avoid production downtimes (Scale-out) due to full disk

Every application has tons of data being constantly added to the attached disk. If we don’t take any action on disk usage alarms, full-disk can cause downtimes to the applications. This leads to emergency actions of manually increasing the disk space.

Engineering_3_AWS-EBS-Auto-scaling_2_InternalImage

Eliminate extra storage capacity allocated to machines to save cost

Many times, large EBS volumes are provisioned and not used. In some of the cases, EBS volumes are required for specific durations like ES cleanup or DB backups and there is the extra provisioned capacity for the rest of the day/week.

Hence, there is a need for a real-time automated EBS auto-scaler based on the Scale-In and Scale-out properties.

Also, you know you have increased the volume size and you are asked to reduce the volume size to eliminate the extra storage to save cost. You do it by following the rsync way or any other Linux way which will end up causing production downtime. The below diagram depicts this case.

Engineering_3_AWS-EBS-Auto-scaling_3_InternalImage

Minimize human intervention on every disk alarm ( even at night )

When an alarm is triggered, Homo-sapiens are the ones who come into the picture. Disk full alarms can be business destructive hence need full attention on production. 

The below image shows how a Prometheus alarm can wake you up even at midnight.

Engineering_3_AWS-EBS-Auto-scaling_4_InternalImage

How can you avoid the above-mentioned scenarios?

That’s the whole point of this blog, let’s see the proposed solution.

EBS auto-scaler provides scale-in & scale-out functionalities, unlike AWS. It gives you the flexibility to increase and decrease volume size on the fly with 0 seconds of downtime. 

Engineering_3_AWS-EBS-Auto-scaling_5_InternalImage

The implementation

The complete process is automated with AWS Cloudwatch, SNS, and Lambda.

Architecture:

Engineering_3_AWS-EBS-Auto-scaling_6_InternalImage

Paytm’s monitoring system (you can say India’s largest monitoring ecosystem), is using EBS auto-scaler on Terabytes of disks for the last 1 year.

The Impact

  • 4.6% reduction in downtimes due to disk utilization issues.
  • EBS cost optimization up to 65%.
  • If we use 5 GP3 volumes of 200GB each, we have 3000*5 IOPs and 125*5 MiB/s throughput which is way more than traditional volumes.
  • 100% automated.
  • No sleep disturbances

The Future

  • Better DevOps practices
  • More cost savings

Engineering_3_AWS-EBS-Auto-scaling_7_InternalImage

0 Shares:
You May Also Like
Read More
Paytm Analytics Widget
Idea Paytm Analytics is a wizard that is envisioned with the thought of creating a visual feed of…