The aim of this project is to evaluate how the asset allocation of pension funds can be optimized in terms of performance and simultaneous hedging of the coverage ratio using reinforcement learning. This class of algorithms is a relatively new area of artificial intelligence and is already being used in numerous sectors. As part of the two-year research project, Dr. Patrick Walker (Head of Investment Solutions) and Dr. Gianluca De Nard (Head of Quantitative Research) are working together with Dr. Simon Broda - a proven expert in applied statistics, machine and deep learning from the Institute of Financial Services Zug at Lucerne University of Applied Sciences and Arts.
Systematic investment decisions in asset allocation and portfolio optimization are usually made in two steps: First, the joint distribution of future returns of the investment instruments is estimated. In the simplest case, going back to Harry Markowitz (1952), this is a multivariate normal distribution, which is already fully determined by the expected value and the covariance matrix. Based on this estimated distribution, the optimal portfolio weights are determined in a second step. Optimality here often refers to the so-called mean-variance criterion, which amounts to a trade-off between expected return and tolerable risk. However, other criteria are also possible, e.g. minimizing the fluctuation range of the portfolio (minimum variance) or balancing the risk contributions of the individual investment instruments (risk parity).
The first step of this approach described above is fraught with problems in practice. In particular, it requires a number of assumptions, for example regarding the distribution family of returns or the development of volatility over time. However, every assumption carries the risk that it will not be fulfilled in practice. Even if the assumptions are correct, estimating the distribution, specifically the covariance matrix, can be problematic, especially if the investment universe is large and only a limited amount of data is available. Usually, these weaknesses of the mean-variance approach are addressed by a variety of different tools such as shrinkage estimators, regularization and a well thought-out design of the constraints.
The novel approach of this research project, however, is to completely bypass this first step of estimating the distribution and instead learn the optimal portfolio weights directly from the data. This is made possible by the use of machine learning, the basis of modern artificial intelligence. We use deep reinforcement learning - a revolutionary technique that enables AI models to perform superhumanly in challenging games such as Go or chess and has applications in many areas, such as the development of self-driving cars, robots and the diagnosis and treatment of diseases. This technology enables our models to navigate independently in dynamic environments such as the financial market and to continuously learn from their mistakes.
In the first part of our research project, we focus on the optimal allocation between different asset classes using liquid ETFs: equities (Vanguard Total Stock Market ETF), real estate (Vanguard Real Estate Index ETF), bonds (iShares Core U.S. Aggregate Bond ETF), commodities (Invesco DB Commodity Index ETF), and gold (spot price). These instruments represent the majority of the investment universe of a typical institutional investor. Investments in private equity or other alternative investments are not included as there are insufficient data series available.
The results of our novel A.I. approach are very promising: In the analyzed period from November 2017 to August 2023, our model (OLZ Reinforcement Learning Allocation) achieved an impressive total return (after costs) of 148.7%, while an equally weighted portfolio of the five asset classes only achieved 31.0%. A typical pension fund portfolio (30% equities, 30% fixed-income securities, 30% real estate and 10% commodities) even achieved a cumulative return of only 20.6%. The performance of our strategy, shown in Figure 1, therefore clearly outperforms the two benchmarks.