The main motivation for writing this book was to provide an
accessible account of methods based on
Reinforcement Learning (closely related to what is now also called
Approximate Dynamic Programming) and
Meta-Heuristics (closely related to what is now also called
Stochastic Adaptive Search) for optimization in discrete-event systems via simulation. Reinforcement Learning (RL) is typically used for solving Markov decision problems (
MDPs), which are dynamic optimization problems where the underlying discrete-event stochastic system is driven by Markov chains, while Meta-Heuristics are used for solving static optimization problems where the underlying system is any discrete-event stochastic system (not necessarily driven by Markov chains).
This book provides a selected collection of topics, mostly focused on
model-free techniques, which are useful when one does not have access to the structure of the objective function (in static optimization) or the transition probability function (in dynamic optimization). My goal was neither to overwhelm the reader with mathematical details nor was it to cover every topic. Rather, the goal was to provide the reader with an overview of the fundamental concepts and at the same time provide the details required for solving real-world stochastic optimization problems via simulation-based techniques.
Some of the main topics covered are:
- Reinforcement learning techniques, mainly rooted in Q-Learning for discounted and average reward MDPs
- Static optimization techniques rooted in meta-heuristics (simulated annealing, genetic algorithms, and tabu search) for discrete solution spaces and simultaneous perturbation for continuous solution spaces
- Neural network algorithms useful for function approximation in response surface methods for static optimization and in reinforcement learning for MDPs with large state-action spaces
- A detailed background on dynamic programming (value and policy iteration)
- A special coverage of semi-MDPs (SMDPs) and average reward problems
- A discussion on convergence of a subset of methods enumerated above