The modeling of the probable behaviour of insider cyber fraudsters in banks

. Insider cyber fraud in the banking sector is a serious and complex issue for financial institutions. This form of cyber fraud is particularly insidious due to insiders’ inherent access and knowledge, necessitating banks to implement comprehensive strategies for detecting, preventing, and responding to these internal threats. The aim of this study is to develop a scientific and methodological approach to model the probable behaviour of insider cyber fraudsters in banks based on a complex combination of principal component analysis, k-means clustering, and associative analysis. During the analysis of current challenges in the financial sector regarding the evolution of cyber fraud and its implications, the systematization of existing theoretical approaches concerning the examination of cyber fraud in banks was performed. Its result revealed a positive trend in the dynamics of the number of published materials in conferences and articles using keywords “cyber” and “frauds” in the Scopus database from 2000 to 2023. Additionally, utilizing the VOSviewer software facilitated the systematization of keyword combinations used in scholarly publications on the chosen topic, forming clusters to visualize and organize vectors of scientific research. Analytical data from Google Trends on critical issues related to cyber fraud were chosen as input data. Twenty variables were formed, which are the results of search queries, characterizing cyberattacks and decreased trust in financial institutions. The principal components method was used to reduce the dimensionality of the input data array, making it possible to select the nine most significant for the study. Conducting a cluster analysis using the k-means method made it possible to form 3 main groups of search queries, which included 12 of the selected variables. The results of the performed procedures contributed to the implementation of associative analysis for three sets of variables. It has been found that what intrigues potential insider cybercriminals in banks the most is the personal financial information of the client, access to the client’s profile in online banking and gaining access to his phone data. The obtained results can be utilized by commercial banks for identifying potential insider cyber fraudsters and ensuring a higher level of client protection against the actions of insider cyber fraudsters, by bank clients for analysing and mitigating potential threats from insider cyber fraudsters, and by law enforcement agencies for prompt responses to potential threats posed by insider cyber fraudsters in banks.


Introduction
The banking sector is one of the sections of the economy that generates and accumulates large amounts of information.In addition, it was the banking system that became one of the most popular industries where the concept of "big data" began to be implemented.In recent years, the banking industry has continued to change under the influence of a number of innovations arising from technological progress, psychological aspects of consumers of financial services and regulatory requirements.
With the development of innovative approaches to the transformation of the financial sector, the problem of cyber fraud is becoming increasingly relevant.Thus, between October 2021 and September 2022, malware was the most common type of cyber attack worldwide, affecting 40% of financial and insurance organizations.The second and third most affected organizations in the financial sector are cyber fraud via websites and mobile applications (23%) and intrasystem fraud (20%) (Statista, 2023).Such a trend is rather alarming, as it has a positive character.Thus, in 2022, 1,829 incidents of cyber fraud in the financial industry were reported worldwide, compared to 2,527 in the previous year.The total number of cyber fraud cases in the financial sector during 2013-2022 increased by 46.8% (from 856 cases in 2013 to 1,829 cases in 2022).The lowest level of cyber fraud in the financial sector was observed in 2017 (598 cases) (Statista, 2023).Accordingly, the cases of cyber frauds that were accompanied by a data leak during the presented period changed proportionally to the previous indicator.However, it is worth noting that in percentage terms, the largest number of cyber fraud cases, which were accompanied by data leakage in the financial sphere, occurred in 2020 (64.8%).During 2021-2022, the number of such cyber frauds does not exceed 28%, which indicates an improvement in the level of cyber protection in the financial sector (Statista, 2023).
It should be noted that the initiators of cyber fraud can be not only external sources, but also internal ones caused by insiders -employees of financial institutions.Insider cyber fraud is a serious and complex problem.Unlike external threats, insider cyber fraud involves individuals within an organization using their privileged access to commit fraudulent activities.This type of threat can compromise the integrity of banking systems and undermine the trust of customers and stakeholders.This form of cyber fraud is particularly insidious because of the inherent access and knowledge possessed by insiders, making it imperative for banks to implement comprehensive strategies to detect, prevent and respond to these insider threats.With the development and implementation of digital innovations, the problem of the spread of insider cyber fraud is constantly updated and requires more and more new solutions.

Literature Review
To study the current state of issues related to insider cyber fraud in banks, an analysis of existing scientific works in this field was carried out, which allows to get an idea of the history of the origin of cyber fraud, their types, the most popular technological vulnerabilities, the regulatory landscape, the impact of cyber threats on interested parties facilities, cyber security measures, etc.As of 2023, a search using the keywords "cyber" and "frauds" in the authoritative international scientific database Scopus yielded 1,262 documents, including 547 conference materials, 451 articles, 119 book chapters, 58 abstracts of conference presentations and other scientific works (Scopus, 2023).In order to understand the relevance of scientific works that are related to the topic of "cyber fraud" and form a generalized picture of the theoretical background, a map of concepts on the topic of "cyber fraud" was constructed using the VOSviewer software product (Figure 1).The map of publications (Figure 1) makes it possible to distinguish eight clusters.The red cluster includes concepts more closely related to the criminal component of cyber fraud, since the most common terms are "computerized crime", "cybercrime", "law and legalization", "criminal activity", "digital forensics", etc.The connection with cyber fraud in the financial sphere in this cluster is represented by the terms "electronic commerce", "online shopping", "financial crime", "money laundering", "online banking".The second largest cluster (green) contains concepts such as "machine learning", "learning algorithms", "neural networks", "learning systems", "fraud detection", "social networking", "behavioral aspects".This shows that when researching the topic of cyber fraud, machine learning technologies are actively used and it is necessary to take into account the behavioral aspects of subjects who can commit cyber fraud.From the sphere of the financial sector, such concepts as "credit cards", "phishing", "financial transactions" appear in this cluster.
The blue cluster contains the concepts most closely related to cyber security: "cyber security", "control systems", "network security", "security systems", "cloud computing", "security threats".From the sphere of the financial sector, the following concepts appear in the blue cluster: "financial information", "electronic money".The rest of the selected clusters are significantly smaller than the three previous ones considered.However, it is worth noting that in the yellow cluster, concepts from the financial sphere that are directly related to banking activities are most often found -"banking system", "credit card theft", "electronic banking", "mobile banking", "fintech", "financial inclusion".This confirms the close connection between cyber fraud and banking.Purple, blue, orange, brown clusters are dedicated to examples of specific cyber fraud.
As banks adopt increasingly sophisticated technology, they become more vulnerable to cyber attacks (Kuzior et al., 2022a).Therefore, some scientists highlight the issue of vulnerabilities related to online banking platforms (Wahab et al., 2023), mobile applications (Siano et al., 2020), the use of blockchain technologies (Ugochukwu et al., 2022) and cloud services ( Setyaji et al., 2020).Studying these technological vulnerabilities is imperative to developing robust cybersecurity strategies.In addition, behavioral aspects of financial market entities play an important role in the cyber vulnerability of financial systems (Al, 2019).
The systematization of existing theoretical approaches to the consideration of the subject of cyber fraud in the financial sphere in general and in banks in particular allows to create a basis for conducting further research and solving new scientific problems.Since the subject of insider cyber threats is little researched, the issues of this article are relevant for the scientific basis.

Research Methodology and Data
In the context of the purpose of this work, it is necessary to model the probable behavior of insiders-cyber fraudsters in the bank.Since everything related to the assessment of behavioral aspects of human activity is largely subjective, the main difficulty in conducting such studies is the selection of input parameters for this.It is not possible to predict one hundred percent how a person will behave in a particular situation, in particular, an insider-cyber fraudster of a bank, as his behavior is determined by a number of endogenous and exogenous quantitative and qualitative factors, the influence of which is very difficult to analyze.Taking into account the nature of the potential fraudulent actions of a cyber-fraudulent bank insider, it is proposed to use possible combinations of search queries in the Google search system during the last five years from 2018 to 2023 as an array of input variables that will allow us to evaluate his possible behavior.basis of the formation of the input array of data for the presented study, two lists of search queries were formed: a list of queries describing the characteristics of cyber attacks and a list of queries characterizing the level of decreased trust in financial institutions (Table 1).Modeling of the likely behavior of insider cyber fraudsters in banks will be carried out in three stages.
At the first stage, using the method of principal components, an array of the most relevant variables for further research will be formed, obtained from the list of 20 key queries presented in Table 1.The main goal of this method is to transform data of a large dimension into a representation of a smaller dimension, fixing as many deviations as possible in data The principal component method algorithm has the following sequence: 1. Data centering (subtracting the average value of each variable from each value of the corresponding indicator).
2. Calculation of the covariance matrix (the covariance matrix describes the relationships between all pairs of variables in the data).
3. Decomposition of the covariance matrix into vectors of eigenvalues representing the directions of maximum dispersion in the data, and the corresponding eigenvalues indicate the amount of dispersion along these directions.
4. The selection of the main components is accompanied by the ranking of the vectors of the eigenvalues of the components in descending order.The eigenvector with the highest eigenvalue is the first principal component, the second largest is the second principal component, and so on.
5. Projecting data onto principal components (initial data are projected onto selected principal components, creating a new set of variables (principal components) that are uncorrelated and capture the most important information in the data.
6. Evaluation of the factor loadings of the input indicators within the selected components.
At the second stage of modeling, it is necessary to carry out clustering using the k-means method.This clustering method was chosen for this study because of its popularity in grouping points in such a way as to minimize the sum of squared distances between data points and the centroid of the cluster to which they belong.
The k-means clustering method algorithm includes the following sequential steps: 1.Primary selection of the centers of previous k clusters (selection of k variables subject to determination of the maximum distance between them).
2. Primary redistribution of objects between clusters (the principle of redistribution is based on determining the minimum distance between objects).
3. Starting the iterative process, which continues until the optimal cluster structure is formed, and the total number of iterations is equal to the maximum number.
At the third stage of the study, the construction of potential portraits of insiders-cyber fraudsters in banks based on the selected variables by the method of principal components and clustering using the method of associative learning is envisaged.The construction of associative rules is the basis of affinity analysis, the essence of which is to identify the relationship between certain events that may have a fundamental condition (Kovalenko et al., 2019).The general algorithm of modeling using associative rules includes the following steps: Formation of a set of events (transactions), which will form the basis of modeling.
A study of the structure of an associative rule that should include antecedent та consequent (X=>Y).
Definition of the main characteristics of the associative rule: support, confidence, interest lift, leverage, conviction and Zhang metric.
n association rule mining, support is a measure that indicates the frequency with which a particular set of items appears together in a dataset.It helps to identify the strength of the relationship between items in a transactional database (formula 1): Confidence in the context of an associative rule, it is a measure of the accuracy of the rule and is equal to the ratio of the total number of transactions with the condition and the consequence to the number of transactions (formula 2): where ( ∪ ) is the support of the combined itemset  ∪ , representing the transactions where both  and  co-occur; () is the support of the antecedent itemset , representing the transactions where  occurs.
The higher the support and probability values, the higher the probability that a given transaction that contains the condition will also include the consequence.Interest lift is the ratio of the frequency of the condition and the consequence of the transaction to the frequency of the occurrence of the consequence (the larger the value, the more often the condition determines the occurrence of the consequence) (formula 3): where ( → ) is the confidence of the association rule  → ; () is the support of the consequent itemset , representing the transactions where  occurs.If the lift is equal to 1, then there is no connection between the condition and the consequence.If the value is close to 0, then there is a strong inverse relationship.
Leverage, is equal to the difference of the observed frequency when the condition and the consequence are identified together, and the product of the frequency of detection of the condition and the consequence (formula 4): where ( ∪ ) is the support of the combined itemset  ∪ , representing the transactions where both  and  co-occur; () of the antecedent itemset  , representing the transactions where  occurs; () of the antecedent itemset , representing the transactions where  occurs.Conviction is a measure used in association rule mining to evaluate the degree of dependency between the antecedent and consequent of a rule.It focuses on the ratio of the expected frequency of incorrect predictions to the observed frequency.The formula (5) for conviction is given by: where () is the support of the consequent itemset , representing the transactions where  occurs; ( → ) is the confidence of the association rule  → .
Zhang's metric (6) allows to determine both association and dissociation.The value ranges from -1 to 1.A positive value indicates association and a negative value indicates dissociation.
4. Formulation of conclusions based on the obtained associative rules.
Thus, methodological support for modeling the probable behavior of insider-cyber fraudsters in banks will be implemented on the basis of a combination of three methods of statistical research: the method of principal components for the identification of relevant variables, the method of k-means clustering for the formation of research clusters, and the method of associative rules for building potential portraits of insiders -cyber fraudsters in banks.All necessary calculations in the work will be carried out using the Python 3 programming language.

Results and Discussions
According to the defined sequence of stages of modeling the probable behavior of the actions of insiderscyber fraudsters in banks, it is first necessary to select the most relevant variables for further research using the method of principal components.Let's analyze the eigenvalues of the components obtained for 20 input variables (Table 2) and the graph of the stony scree (Figure 2).This will reveal the optimal number of components for further analysis.The total number of obtained components corresponds to the total number of input variables.Considering the results of the eigenvalues of the obtained components presented in Table 2, the first nine components have an eigenvalue greater than 1.At the same time, the value of the cumulative variance for the data of the nine components is equal to 0.633, which means that more than 63% of the studied the phenomenon is explained by these components.
The stony scree graph (Figure 2) allows you to visualize the results of the first stage of the principal component method, as it shows the intrinsic values of each component.The dotted line on the graph indicates the place corresponding to the optimal number of components.In this case, it is 9.
In order to understand the degree of influence of each variable within each component, it is necessary to examine their factor loadings (Table 3).In essence, the factor loading is the correlation coefficient of the corresponding variable with the component to which it entered.In the context of the topic of this study, each component represents a certain portrait of a potential insider-cyber fraudster in the bank, and the largest values of the factor loadings of the variables indicate which features determine this portrait.For a better perception of the obtained results, we highlight in this table only those values of factor loadings that absolutely exceed the value of 0.3 (Table 3).This will identify the most relevant variables within the selected components.As you can see, each of the presented components is determined by a different combination of input variables.This once again confirms the possibility of identifying different potential portraits of cyber-fraud insiders in the bank.
To implement cluster analysis, the Silhouette criteria were determined, the maximum values of which correspond to the optimal number of clusters that can be formed during k-means clustering.The input data for the cluster analysis served as the selected 9 components.The largest value of the Silhouette criterion (0.357) corresponds to the optimal number of clusters of 2. However, this number of clusters is not effective.Experiments with a different set of clusters showed that the most effective distribution for this study will be achieved if 3 clusters are selected, which corresponds to the Silhouette criterion value of 0.27.A visual representation of the formed clusters is shown in Figure 3. Thus, the clustering result made it possible to form three groups of variables taking into account the factor loadings of Table 3, which will be the most important for associative analysis.Their results are shown in Table 4.
transactions.A relatively low level of support value in the context of detecting potential fraudulent actions of insider cyber fraudsters in banks is normal, since the moment of detecting fraudulent actions is quite complex and may depend on a significant set of factors.The high elevator value for both rules, 26 and 22.286 respectively, suggests that the presented effects are often determined precisely by the given causes, compared to situations where the causes are absent.The significance of the obtained associations, which is described by leverage, is the same and is 1.1%.The positive value of Zhang's metric for both associative rules is positive, 0.98 and 0.974, respectively, which confirms the presence of an association between causes and effects.
Thus, if we transform the obtained results of the associative analysis for the first group of variables into a potential insider-cyber-fraudster in the bank, we can conclude that the reason for the change of the servicing bank is cyber-fraud itself, since most likely after that there is a need to search for a cyber-police number.Thus, an insider-cyber-fraudulent bank can gain access to the personal financial information of the affected client.
The second associative rule from the table.5 gives an insider-cyber-fraudster in the bank an opportunity to understand potential vulnerabilities in the protection of the computer of the user-client of the bank for a blocked transaction.
Similarly, for the preliminary analysis of the results, we will select those associative rules that have the highest value of associative probability for the second group.Many associative rules have a probability value of 100%, however, taking into account the low number of previously obtained results for the search queries "How to reduce the credit limit" and "Black list of customers", all the consequences of the generated associative rules correspond to zero number of relevant queries, which does not allow correctly investigate the relationship between the condition and the consequence.In addition, the lift value for all pairs of associative rules approaches unity, which also confirms the absence of a relationship between the condition and the outcome.
The number of obtained associative rules for the third group is large, so it is necessary to analyze their quality.
Similarly to the previous results for this group of variables, there is also a value of associative probabilities at the level of 100%, but not for all constructed associative rules it really confirms the presence of a cause-andeffect relationship between the condition and the effect.Therefore, there is a need to analyze low values of associative probabilities, on the basis of which it is possible to prove the presence of a qualitative connection between the condition and the consequence.Table 6 presents the associative rules for a set of variables for which the associative probabilities are in the range from 0.6 to 1, and the search queries are not zero.As you can see, the search query "Police number" is a consequence of the cause "How to protect yourself from cyber attacks" with a probability of 100%.Within other associative rules, where "Police number" also occurs as a consequence or part of a consequence together with another search query, the cause remains unchanged.The search query "Bank call center number" is present in seven obtained associative rules and all seven times as a result.With a probability of 75%, this search query appears as a result of another search query -"How to find that phone is hacked".At the same time, it is worth noting that this cause-and-effect relationship is present both directly between this pair of search queries and in combination with other search queries.
The rest of the resulting associative rules contain queries that have zero frequency of occurrence, so there is no need to interpret them.At the same time, the value of support for the considered associative rules is from 3.1% to 6.5%.This means that the presented rules are found in from 3.1% to 6.5% of all transactions.This result is absolutely normal when it comes to the analysis of potential fraudulent schemes.The high lift value for both types of associative rules, from 15.294 to 24.375, confirms that the presented effects are often determined precisely by the considered causes, compared to situations where the causes are absent.The significance of the obtained associations, which is described by leverage, is the same and is 1.1%.The positive value of Zhang's metric for both associative rules is positive, from 0.964 to 0.974, which confirms the presence of an association between causes and effects.
Therefore, based on the results of the third associative analysis, a potential insider-cyber-fraudster in the bank is once again convinced that the vulnerability of users to cyber-attacks is accompanied by a search for a police number to eliminate negative consequences, which once again confirms the effectiveness of fraudulent actions on the condition of gaining access to the personal data of bank customers .The second associative rule from the table.6 gives the insider-cyber-fraudster in the bank an understanding that the majority of banking transactions by the modern user of banking services today take place with the help of the phone, since the need to find out whether the phone has been hacked by cyber-fraudsters is most likely accompanied by a call to the bank's call center.Therefore, an insider-cyber-swindler of a bank can, having gained physical access to a bank client's phone, carry out a number of fraudulent actions with his bank account.

Conclusion
The modeling of the probable behaviour of insider cyber fraudsters in banks is one of the important problems for the financial system and cyber security as a whole.Its solution becomes a necessary component of the strategy for ensuring cyber security in the banking sector.In the process of researching this problem, this paper systematizes the existing approaches to cyber fraud in banks, developed by modern experts and scientists.As a result, a positive trend in the dynamics of the number of published materials of conferences and articles using the keywords "cyber" and "frauds" in the international database Scopus during the years 2000-2023 was revealed, which only indicates the growth of interest in this topic in scientific circles.The analysis of publications by keywords with the help of the analytical application VOSviewer made it possible to form the most important research clusters, among which the problem of insider cyber fraud is poorly studied.
In the research process, possible combinations of search queries in the Google search system were used, which made it possible to identify two sets of variables that are critically important for the topic of insider cyber fraud.The first included variables that directly characterize cyberattacks, the second -those that characterize the level of decreased trust in financial institutions.The obtained variables allow us to indirectly understand the behavior of insiders.Its simulation was implemented in three stages.At the first and second stages of the research, using the method of principal components and clustering by the k-means method, an array of the most relevant variables for further research was formed.The method of principal components made it possible to reduce the dimensionality of the data, and cluster analysis contributed to the formation of their groups.As a result, they included twelve of the twenty initial variables, which were grouped into three lists.
At the next stage of modeling, using associative analysis, three models of associative rules were built, on the basis of which the following conclusion was formulated -the most interesting for potential insiders-cyber fraudsters in banks is the personal financial information of the client, access to the personal account of the bank client, as well as gaining access to his phone Therefore, in order to minimize the consequences of the actions of insiders-cyber fraudsters in the bank, customers need to take appropriate preventive measures, namely: use multi-factor authentication for online banking transactions; communicate only with verified employees of the bank, who can accordingly confirm fact that they really work in this bank, and do not share personal passwords and other confidential information directly with bank employees; regularly update security software, including anti-virus programs; monitor the activity of your own account in real time to detect unusual or suspicious activity; insure yourself against potential cyber fraud; etc.

Figure 1 .
Figure 1.Distribution of scientific publications on the subject of "cyber fraud" by clusters in the international Scopus databaseSource: compiled by the authors based on(Scopus, 2023).

Figure 3 .
Figure 3. Results of k-means clustering

Table 1 .
Input array of data -requests Source: compiled by the authors based on the Google Trends search engine, 2023.

Table 2 .
Eigenvalues, variance and cumulative variance of components

Table 3 .
Factor loadings of indicators

Table 3 (
cont.).Factor loadings of indicators Source: compiled by the authors.

Table 6 .
Results of associative analysis for the third group of variables, the probability of associative rules of which is in the range of 0.6-1

Table 6 (
cont.).Results of associative analysis for the third group of variables, the probability of associative rules of which is in the range of 0.6-1 *inf means informative.Source: compiled by the authors.