In the first part of the case study, we outline the purpose of performance testing for the Bulgarian DSK Bank, describe the selected systems, then shed light on the dilemma of the appropriate sizing of banking systems, as well as present the special difficulties encountered by DSK Bank.
A merger, expansion or acquisition represents a huge challenge in the life of a financial institution. In addition to the organizational consolidation realized as a result of the acquisition, all services, all previous customers, and all their data will be transferred to the new organization. In order to manage the changed market, the integration of banking systems is inevitable. Whether they are redesigning old systems or creating entirely new ones, they must include all new customers, all new data, all new transaction types, and often old transactions as well. The new, uniform IT system must be designed in such a way that it simultaneously supports the new, larger organization, while also perfectly managing the legacy files.
This did not happen differently in the case of DSK Bank. Bulgaria's leading bank - a member of the OTP Group since 2003 - expanded by around 30 percent during recently completed acquisitions. During the merger process, the systems of DSK Bank remained, and the data of the "legacy" IT systems were migrated to them. According to preliminary expectations, the load on the systems of the expanded bank increased by 30-60 percent compared to before.
The bank's managers saw it as necessary to check the systems with performance tests before the end of the migration, to model the operation of the system in advance under the expected extreme load in the first days after the migration. The work was entrusted to the team of KPMG and Proofit, which has gained significant joint experience in performance testing and, within this, work in the banking environment.
In order to be able to model the expected load, we need to know the base to which we can compare. In our case, even finding this out required serious investigative work. Through a lot of work, we were only able to find out which area and which transaction type had the largest load.
Assessing the expected increase in load is also extremely complex, since the expected 30-60 percent extra load is not evenly distributed between the systems. The complexity of the task can be well illustrated by a simple example: In a process that is considered simple in banking terms, such as a mobile bank transfer initiated from a mobile device, at least 3-4 systems must be available and work in coordination. The operation of these 3-4 systems can affect 6-8 or even more independent computers. Of course, this is only a small part of the entire banking system. In order for such a large and complex system to withstand a sudden increase in load, all its small parts must be in place. Since such a big change involves significant risk, it is extremely important to model the process in advance, for which performance testing is an excellent tool.
As can be seen from the above, the integration of banking systems is always a huge task, but in the case of DSK, there were also special tasks. Some of these came from the diversity of banking systems. In the past, systems running under Windows and Linux took part side by side in serving the main processes, and there were also mainframe systems that remained here from earlier, but are still working. The bank did have performance measurement and analysis tools, but they were not sufficient to handle such a large and rapidly emerging task with such technological diversity. In addition to all this, the entire measurement task took much less time than usual, only 3 months.
Similar to other similar integration processes, the managers of DSK Bank were faced with a fundamental decision: if any component of the IT system is undersized, it can lead to congestion and stalling of processes, thus leading to a serious loss of service. If, on the other hand, the system is over-sized, it can lead to huge unnecessary costs of up to several million euros. Accordingly, from a business perspective, the goal of the work of the KPMG and Proofit team during the project was to reduce the chance of downtime while helping to optimize the cost of hardware and software purchases.
During the integration, in addition to determining the optimal sizing of the systems, the main goal of the project was to prepare for the expected, predictable peak loads. Such a periodically repeated extreme load can be observed before and immediately after the holidays, as well as during the payment of wages and pensions. In that case, you can even count on two or three times the usual, everyday load.
In addition to such easily predictable events, there are also other types of peak loads. It can be clearly observed, for example, that after every major change - such as a merger or acquisition - the load jumps significantly. At this time, people check their accounts, want to see if everything is fine, or just want to know what the new interface looks like for them. With this, customers use the system in an extraordinary way in the days after the transition, or even in the first week after it.
Based on the experience gained during previous transitions and information from other financial institutions, DSK specialists have determined an expected load level for this period as well. One of the objectives of the testing was to see if the converted systems would be able to cope with this load.
Like most financial institutions, 30-50 of the systems operating at DSK Bank are considered to be of high importance. However, not all of them could be tested, as neither sufficient time nor adequate resources were available. Among these systems, we had to select the 5-8 that are truly the most critical and provide the most important services from the point of view of external visibility. These are usually systems serving external requests with 24/7 availability. It is of fundamental importance that these are able to serve the needs that are being rushed to them separately, and that the needs served at the same time do not degrade each other's performance.
Unfortunately, due to the complexity of the prerequisites, no integrated banking system makes it possible to replay and analyze a critical day without making any changes, and then increase the performance of the system based on this. Instead, we set up a measurement model for all similar work, which narrows down the number of affected systems and processes. From these, we can then extract a "load mix" that models the most typical situations and load patterns quite well.
Such a testing group typically includes residential internet banking, residential mobile banking, and systems serving bank branches. In the case of DSK Bank, the latter was particularly important, as both the acquiring and the acquired bank had a significant branch network, and this was used intensively by a significant part of the customer base - typically the older age group. Critically important systems also include the service of card traffic, since many transactions have to be served here as well, with a very short response time. Of course, priority systems also include those that handle large amounts of referrals from companies and public administration organizations.
In the end, the multitude of banking systems was narrowed down to these systems. The measurement model had to be set up for this, taking into account the number of transactions, the type of transactions and the patterns of the expected load. This solution was not able to capture the full detail of reality, but at the same time it reproduced the load levels and patterns well.
In a technical sense, such testing can be carried out completely from a distance of thousands of kilometers. However, a few face-to-face meetings are definitely important for establishing a well-functioning joint work. These consultations are especially important in the initial phase, since the cooperation of two teams that did not previously know each other and did not work together must be established. Competences and powers must be designated. Once trust has been established and informal channels are in place, it is much easier to work remotely.
The role of personal meetings and their number are essentially independent of the length of the project; establishing basic relationships lays the foundation for the flexibility and quality of subsequent work. The establishment of trust and informal relationships is extremely important, as this is the main basis for the second stage of cooperation, based on remote work. It's a huge relief if you know who and how to deal with an unexpected problem without getting involved in endless emails.
Like other banks, DSK does not have all competences in-house. From external developers to suppliers of boxed systems to external and internal operators, the interplay of many actors is necessary. We needed the information from them already during planning, and during execution it was important to be able to rely on them due to the ability to intervene and feedback.
In the initial, "getting to know each other" phase, we had a looser working relationship with nearly 30 internal DSK employees and about 20 external supplier employees, and in the continuation, the 10-15 relationships that were necessary for day-to-day operations became closer.
We originally planned 6-7 couple-day trips and personal meetings for the entire duration of the project, of which only 4 were realized due to the coronavirus epidemic. With the closing of the borders, the personal meetings were finally interrupted, but by then we had already established the appropriate relationships, which could be relied on during the subsequent work. The testing itself took place completely remotely, and we received adequate help to solve any problems that arose in the meantime.
In the second half of the case study, which provides an insight into the performance testing of the Bulgarian DSK Bank, we present how the performance testing took place under extremely short deadlines and conditions, how we developed the measurement model, and finally what the result of the very complex performance testing task was.