Learning to trade via direct reinforcement

TLDR

Direct reinforcement treats investment decision‑making as a stochastic control problem, contrasting with dynamic programming and value‑function methods like TD‑learning and Q‑learning that estimate a value function. The authors aim to develop and demonstrate a recurrent reinforcement learning (RRL) algorithm for directly optimizing portfolios, asset allocations, and risk‑adjusted returns without relying on forecasting models. RRL is applied to portfolio optimization and risk‑adjusted return maximization, incorporating transaction costs, and has been tested in real‑world settings such as intra‑daily currency trading and monthly S&P 500/T‑Bill allocation. Simulations with real financial data show that RRL eliminates the need for forecasting models, simplifies problem representation, avoids Bellman’s curse of dimensionality, and yields trading strategies that outperform Q‑learning systems.

Abstract

We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

References

Page 1

	Year	Citations

Page 1