DRL for optimal execution of profolio transaction

  1. Based on State S, we do an action A = F(S); and after doing the action, we got an Reward R = G(A) ; Based on this, we want to maximize the future reward. (In Pon Game, 120min Agent learns to break the breaks on the left.)

  2. Deep Q learning is like using deep neural network to estimate what action (finite actions) to do given current state (RGB input image), Then it would seem like a classification task to classify which action to take and the reward could be thought as label, in an abstract way. 【To approximate the Q-Value】

  3. Policy gradient, Q-learning are limited to a number of actions

  4. Policy gradient

  5. Actor critic 【good for a million (continuous space) actions】— state of the art (Converge faster than Deep Q-learning)

    1. Actor to predict actions (continuous output, infinite actions)

    2. Critic to take state and action as input, predict the reward in the end.

  6. Optimal execution of portfolio transactions is a well known problem in quantitative finance. Sell X shares before time T while minimizing the cost of trading /market impact /implementation shortage etc.

  7. Traditional method for portfolio transaction: Almgren-Chriss

  8. Applying RL to A-C algorithm (Define reward function first)

Last updated