# CREDIT RISK MODELLING

## Projet scientifique

**Issues and problem area**

The objective is to construct a model to analyse the financial health and the risk associated with the companies present within the OFI perimeter. More precisely, OFI AM wants to have a tool to evaluate the probability of predictive default over a 3-5 year time frame.

Traditionally, this type of monitoring tool can be constructed from the basic model theorized by Merton in 1974 and the model proposed by Black and Scholes in 1973, which constitutes the foundations of finance. Such a model analyses the structure of a company’s financial book. Default risk can be linked to the uncertainty surrounding the company’s ability to repay its debts and liabilities. The literature in terms of structural modelling has become an important area of research. Several alternatives to the Merton model have been proposed. These models estimate the same values but with different dynamics. Examples are the Black and Cox model, the Broackman and Turtle model and the CreditGrades model (much used recently). OFI AM wanted to explore alternative approaches and use flexible statistical tools to provide a response with regard to predictions of default probability. Within this logic, OFI AM currently uses a statistical Generalized Additive Model (GAM) for analysing the risks associated with the different asset classes that make up the portfolios of companies in the world of investment. This type of statistical modelling directly exploits the data and looks for empirical links that will be useful in prediction. The underlying assumption of such modelling is that the explanatory variables influence prediction in a separate and additive way. The form of this influence varies and is estimated from the data. However, given abundant information on the financial aspects of companies and market dynamics, the GAM approach has to select those variables that are ultimately considered relevant for predicting the probabilities of events of interest. In its current version, the GAM methodology can be limited by the fact that the selection of variables is done in a rather elementary way (test of influence of pairs or triplets of variables), whereas modern statistics permits a selection process from all the variables simultaneously, thus avoiding any selection bias. Selection based on the set of all variables uses penalization techniques to account for the overall complexity of the model, without necessarily favouring certain types of clustering.

The data gives rise to significant methodological challenges: (i) due to the nature of the available explanatory variables (heterogeneity of the types of variables), (ii) due to their large number (the problem of selecting relevant variables, and the need to apply dimension reduction techniques), and (iii) due to the dynamic nature of the expected solution, given, among other things, that certain variables are not observed with the same frequency.

**Envisaged response **

Academic researchers are able to propose a series of modern techniques adapted to the problem and the type of data. The proposed statistical solution should combine recent statistical learning and time series techniques. Because the prediction horizon is long, the construction of predictive accuracy indicators is particularly difficult. Statistics can provide flexible solutions based on data, with no particular prior structure. However, such solutions are generally less interpretable and the robustness of long-term predictions may prove unsatisfactory. Ideally, a prediction tool based on a structural model would be preferable both in terms of interpretation and the robustness of predictions. Within a logic of complementarity between statistical tools and economic and financial theory, statistical tools will be de facto useful for selecting the relevant variables and highlighting certain dynamics observed in the data. These results could then be used in the construction of an adequate structural model.

The team of academic researchers will propose research tracks. These proposals and the final results will be based on two aspects: the profile of new recruits and their ability to implement the model and the precise characteristics of the data. At this stage, the various exchanges have not yet enabled the team to precisely identify the medium, format and integrity of the data which will serve as a basis for calibrating the model. Only after taking account of a sample of these data will it be possible to make concrete proposals for the data processing tracks to be explored.