Online payments have become an integral part of our lives, and we are already making big waves with successful translation rates. We handle superfast, secure and reliable money transfers on a regular basis. Our transaction rate is much faster when compared with other similar businesses. Many businesses rely on us for making successful money transfers. We accept payments from Paytm Wallet, Paytm Postpaid, Paytm UPI, netbanking, debit and credit cards, and more.
We handle large size database clusters of different businesses, which includes IoT devices that have been deployed at different stores, the account status of different merchants, or various forms of heavy transactions. Each of these use cases has its own database cluster, which serves as an unpredictable number of reads and writes since it is not possible to assume the number of transactions at any given point in time. Moreover, we were also looking to adopt an open source solution instead of readily available solutions. Considering this, we initially had the following challenges:
- We required a single sort of database architecture that can provide a high throughput of read and write operations
- We wanted to ensure that the database servers in a cluster must have a continuous discussion with each other so that they can serve the latest data basis on the business requirements
- Handling large clusters
- We wanted to adopt the open source solutions.
Next Level Solutions: This was then clearly demonstrated by our Database Engineering team by showing a sense of ownership. Here is an inside look into the kind of architecture we have adopted to make our digital transactions successful.
Consensus Algorithms (Paxos): Consensus means reaching an agreement when several people are discussing/debating a specific topic. When we replace people with physical machines that are talking with each other and try to reach an agreement, we call it consensus algorithms in distributed computers.
We required a set of clusters that interact with each other in different physical data centers and make sure that every transaction is safely recorded on the majority of the machines involved. To achieve this, we are using the Paxos algorithm, which is one of the branches of the consensus algorithm along with Raft. We found that Paxos was easier to implement than Raft. We have adopted the single leader approach as it is more efficient than the multi leader concept.
Data Sharding: To make our system write scalable – in the form of normal inserts or DDLs. We have played with the normal sharding process, where data is distributed at the different nodes based on sharding key or hash value.
As per our testing, we found that instead of directly executing any changes on the different shards, it is better to execute them on the logical database which will further make the necessary changes eliminating the efforts required to handle the physical databases. Additionally, we have implemented the LSM (Log structured merge tree) concept which helped us to cache all the changes being done in the memory and then later flush it on the persistent storage, making write performance relatively high as flushing can cause more I/Os. We have modified the flushing process as per our requirement.
Deep-level Understanding of the InnoDB Engine: To implement a stable environment, we wanted to adopt an open source database engine which is the Innodb engine. We have identified various topics which helped us in making InnoDB work as we required.
Data Consistencies: Strong, weak and medium are the three types of consistencies that any distributed system uses while calculating the tradeoff between performance and the data consistency.
Our main requirement is to ensure that only the latest data should be sent to the client. According to the situation, we have set up specific consistency levels so that performance and data consistency can be taken care of.
The blog was contributed by the Database Engineering Team at Paytm. Want to be a part of Paytm? Explore our career opportunities here.