Warren Buckler Powell
Approximate Dynamic Programmin
Warren Buckler Powell
Approximate Dynamic Programmin
- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Understanding approximate dynamic programming (ADP) in large industrial settings helps develop practical and high-quality solutions to problems that involve making decisions in the presence of uncertainty. With a focus on modeling and algorithms in conjunction with the language of mainstream operations research, artificial intelligence, and control theory, this second edition of Approximate Dynamic Programming Solving the Curses of Dimensionality uniquely integrates four distinct disciplines-Markov design processes, mathematical programming, simulation, and statistics-to show students,…mehr
Andere Kunden interessierten sich auch für
- George E P BoxStatistical Control by Monitoring and Adjustment160,99 €
- George E. P. BoxImproving Almost Anything121,99 €
- Aparna V. HuzurbazarFlowgraph Models215,99 €
- Terje AvenMisconceptions of Risk90,99 €
- John I. McCoolUsing the Weibull Distribution152,99 €
- Terje AvenUncertainty in Risk Assessment102,99 €
- Jan BeirlantStatistics of Extremes182,99 €
-
-
-
Understanding approximate dynamic programming (ADP) in large industrial settings helps develop practical and high-quality solutions to problems that involve making decisions in the presence of uncertainty. With a focus on modeling and algorithms in conjunction with the language of mainstream operations research, artificial intelligence, and control theory, this second edition of Approximate Dynamic Programming Solving the Curses of Dimensionality uniquely integrates four distinct disciplines-Markov design processes, mathematical programming, simulation, and statistics-to show students, practitioners, and researchers how to successfully model and solve a wide range of real-life problems using ADP.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Wiley Series in Probability and Statistics .
- Verlag: Wiley & Sons
- 2. Aufl.
- Seitenzahl: 656
- Erscheinungstermin: 27. September 2011
- Englisch
- Abmessung: 240mm x 161mm x 40mm
- Gewicht: 1058g
- ISBN-13: 9780470604458
- ISBN-10: 047060445X
- Artikelnr.: 33684836
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- 06621 890
- Wiley Series in Probability and Statistics .
- Verlag: Wiley & Sons
- 2. Aufl.
- Seitenzahl: 656
- Erscheinungstermin: 27. September 2011
- Englisch
- Abmessung: 240mm x 161mm x 40mm
- Gewicht: 1058g
- ISBN-13: 9780470604458
- ISBN-10: 047060445X
- Artikelnr.: 33684836
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- 06621 890
WARREN B. POWELL, PhD, is Professor of Operations Research and Financial Engineering at Princeton University, where he is founder and Director of CASTLE Laboratory, a research unit that works with industrial partners to test new ideas found in operations research. The recipient of the 2004 INFORMS Fellow Award, Dr. Powell has authored more than 160 published articles on stochastic optimization, approximate dynamicprogramming, and dynamic resource management.
Preface to the Second Edition xi
Preface to the First Edition xv
Acknowledgments xvii
1 The Challenges of Dynamic Programming 1
1.1 A Dynamic Programming Example: A Shortest Path Problem, 2
1.2 The Three Curses of Dimensionality, 3
1.3 Some Real Applications, 6
1.4 Problem Classes, 11
1.5 The Many Dialects of Dynamic Programming, 15
1.6 What Is New in This Book?, 17
1.7 Pedagogy, 19
1.8 Bibliographic Notes, 22
2 Some Illustrative Models 25
2.1 Deterministic Problems, 26
2.2 Stochastic Problems, 31
2.3 Information Acquisition Problems, 47
2.4 A Simple Modeling Framework for Dynamic Programs, 50
2.5 Bibliographic Notes, 54
Problems, 54
3 Introduction to Markov Decision Processes 57
3.1 The Optimality Equations, 58
3.2 Finite Horizon Problems, 65
3.3 Infinite Horizon Problems, 66
3.4 Value Iteration, 68
3.5 Policy Iteration, 74
3.6 Hybrid Value-Policy Iteration, 75
3.7 Average Reward Dynamic Programming, 76
3.8 The Linear Programming Method for Dynamic Programs, 77
3.9 Monotone Policies*, 78
3.10 Why Does It Work?**, 84
3.11 Bibliographic Notes, 103
Problems, 103
4 Introduction to Approximate Dynamic Programming 111
4.1 The Three Curses of Dimensionality (Revisited), 112
4.2 The Basic Idea, 114
4.3 Q-Learning and SARSA, 122
4.4 Real-Time Dynamic Programming, 126
4.5 Approximate Value Iteration, 127
4.6 The Post-Decision State Variable, 129
4.7 Low-Dimensional Representations of Value Functions, 144
4.8 So Just What Is Approximate Dynamic Programming?, 146
4.9 Experimental Issues, 149
4.10 But Does It Work?, 155
4.11 Bibliographic Notes, 156
Problems, 158
5 Modeling Dynamic Programs 167
5.1 Notational Style, 169
5.2 Modeling Time, 170
5.3 Modeling Resources, 174
5.4 The States of Our System, 178
5.5 Modeling Decisions, 187
5.6 The Exogenous Information Process, 189
5.7 The Transition Function, 198
5.8 The Objective Function, 206
5.9 A Measure-Theoretic View of Information**, 211
5.10 Bibliographic Notes, 213
Problems, 214
6 Policies 221
6.1 Myopic Policies, 224
6.2 Lookahead Policies, 224
6.3 Policy Function Approximations, 232
6.4 Value Function Approximations, 235
6.5 Hybrid Strategies, 239
6.6 Randomized Policies, 242
6.7 How to Choose a Policy?, 244
6.8 Bibliographic Notes, 247
Problems, 247
7 Policy Search 249
7.1 Background, 250
7.2 Gradient Search, 253
7.3 Direct Policy Search for Finite Alternatives, 256
7.4 The Knowledge Gradient Algorithm for Discrete Alternatives, 262
7.5 Simulation Optimization, 270
7.6 Why Does It Work?**, 274
7.7 Bibliographic Notes, 285
Problems, 286
8 Approximating Value Functions 289
8.1 Lookup Tables and Aggregation, 290
8.2 Parametric Models, 304
8.3 Regression Variations, 314
8.4 Nonparametric Models, 316
8.5 Approximations and the Curse of Dimensionality, 325
8.6 Why Does It Work?**, 328
8.7 Bibliographic Notes, 333
Problems, 334
9 Learning Value Function Approximations 337
9.1 Sampling the Value of a Policy, 337
9.2 Stochastic Approximation Methods, 347
9.3 Recursive Least Squares for Linear Models, 349
9.4 Temporal Difference Learning with a Linear Model, 356
9.5 Bellman's Equation Using a Linear Model, 358
9.6 Analysis of TD(0), LSTD, and LSPE Using a Single State, 364
9.7 Gradient-Based Methods for Approximate Value Iteration*, 366
9.8 Least Squares Temporal Differencing with Kernel Regression*, 371
9.9 Value Function Approximations Based on Bayesian Learning*, 373
9.10 Why Does It Work*, 376
9.11 Bibliographic Notes, 379
Problems, 381
10 Optimizing While Learning 383
10.1 Overview of Algorithmic Strategies, 385
10.2 Approximate Value Iteration and Q-Learning Using Lookup Tables, 386
10.3 Statistical Bias in the Max Operator, 397
10.4 Approximate Value Iteration and Q-Learning Using Linear Models, 400
10.5 Approximate Policy Iteration, 402
10.6 The Actor-Critic Paradigm, 408
10.7 Policy Gradient Methods, 410
10.8 The Linear Programming Method Using Basis Functions, 411
10.9 Approximate Policy Iteration Using Kernel Regression*, 413
10.10 Finite Horizon Approximations for Steady-State Applications, 415
10.11 Bibliographic Notes, 416
Problems, 418
11 Adaptive Estimation and Stepsizes 419
11.1 Learning Algorithms and Stepsizes, 420
11.2 Deterministic Stepsize Recipes, 425
11.3 Stochastic Stepsizes, 433
11.4 Optimal Stepsizes for Nonstationary Time Series, 437
11.5 Optimal Stepsizes for Approximate Value Iteration, 447
11.6 Convergence, 449
11.7 Guidelines for Choosing Stepsize Formulas, 451
11.8 Bibliographic Notes, 452
Problems, 453
12 Exploration Versus Exploitation 457
12.1 A Learning Exercise: The Nomadic Trucker, 457
12.2 An Introduction to Learning, 460
12.3 Heuristic Learning Policies, 464
12.4 Gittins Indexes for Online Learning, 470
12.5 The Knowledge Gradient Policy, 477
12.6 Learning with a Physical State, 482
12.7 Bibliographic Notes, 492
Problems, 493
13 Value Function Approximations for Resource Allocation Problems 497
13.1 Value Functions versus Gradients, 498
13.2 Linear Approximations, 499
13.3 Piecewise-Linear Approximations, 501
13.4 Solving a Resource Allocation Problem Using Piecewise-Linear
Functions, 505
13.5 The SHAPE Algorithm, 509
13.6 Regression Methods, 513
13.7 Cutting Planes*, 516
13.8 Why Does It Work?**, 528
13.9 Bibliographic Notes, 535
Problems, 536
14 Dynamic Resource Allocation Problems 541
14.1 An Asset Acquisition Problem, 541
14.2 The Blood Management Problem, 547
14.3 A Portfolio Optimization Problem, 557
14.4 A General Resource Allocation Problem, 560
14.5 A Fleet Management Problem, 573
14.6 A Driver Management Problem, 580
14.7 Bibliographic Notes, 585
Problems, 586
15 Implementation Challenges 593
15.1 Will ADP Work for Your Problem?, 593
15.2 Designing an ADP Algorithm for Complex Problems, 594
15.3 Debugging an ADP Algorithm, 596
15.4 Practical Issues, 597
15.5 Modeling Your Problem, 602
15.6 Online versus Offline Models, 604
15.7 If It Works, Patent It!, 606
Bibliography 607
Index 623
Preface to the First Edition xv
Acknowledgments xvii
1 The Challenges of Dynamic Programming 1
1.1 A Dynamic Programming Example: A Shortest Path Problem, 2
1.2 The Three Curses of Dimensionality, 3
1.3 Some Real Applications, 6
1.4 Problem Classes, 11
1.5 The Many Dialects of Dynamic Programming, 15
1.6 What Is New in This Book?, 17
1.7 Pedagogy, 19
1.8 Bibliographic Notes, 22
2 Some Illustrative Models 25
2.1 Deterministic Problems, 26
2.2 Stochastic Problems, 31
2.3 Information Acquisition Problems, 47
2.4 A Simple Modeling Framework for Dynamic Programs, 50
2.5 Bibliographic Notes, 54
Problems, 54
3 Introduction to Markov Decision Processes 57
3.1 The Optimality Equations, 58
3.2 Finite Horizon Problems, 65
3.3 Infinite Horizon Problems, 66
3.4 Value Iteration, 68
3.5 Policy Iteration, 74
3.6 Hybrid Value-Policy Iteration, 75
3.7 Average Reward Dynamic Programming, 76
3.8 The Linear Programming Method for Dynamic Programs, 77
3.9 Monotone Policies*, 78
3.10 Why Does It Work?**, 84
3.11 Bibliographic Notes, 103
Problems, 103
4 Introduction to Approximate Dynamic Programming 111
4.1 The Three Curses of Dimensionality (Revisited), 112
4.2 The Basic Idea, 114
4.3 Q-Learning and SARSA, 122
4.4 Real-Time Dynamic Programming, 126
4.5 Approximate Value Iteration, 127
4.6 The Post-Decision State Variable, 129
4.7 Low-Dimensional Representations of Value Functions, 144
4.8 So Just What Is Approximate Dynamic Programming?, 146
4.9 Experimental Issues, 149
4.10 But Does It Work?, 155
4.11 Bibliographic Notes, 156
Problems, 158
5 Modeling Dynamic Programs 167
5.1 Notational Style, 169
5.2 Modeling Time, 170
5.3 Modeling Resources, 174
5.4 The States of Our System, 178
5.5 Modeling Decisions, 187
5.6 The Exogenous Information Process, 189
5.7 The Transition Function, 198
5.8 The Objective Function, 206
5.9 A Measure-Theoretic View of Information**, 211
5.10 Bibliographic Notes, 213
Problems, 214
6 Policies 221
6.1 Myopic Policies, 224
6.2 Lookahead Policies, 224
6.3 Policy Function Approximations, 232
6.4 Value Function Approximations, 235
6.5 Hybrid Strategies, 239
6.6 Randomized Policies, 242
6.7 How to Choose a Policy?, 244
6.8 Bibliographic Notes, 247
Problems, 247
7 Policy Search 249
7.1 Background, 250
7.2 Gradient Search, 253
7.3 Direct Policy Search for Finite Alternatives, 256
7.4 The Knowledge Gradient Algorithm for Discrete Alternatives, 262
7.5 Simulation Optimization, 270
7.6 Why Does It Work?**, 274
7.7 Bibliographic Notes, 285
Problems, 286
8 Approximating Value Functions 289
8.1 Lookup Tables and Aggregation, 290
8.2 Parametric Models, 304
8.3 Regression Variations, 314
8.4 Nonparametric Models, 316
8.5 Approximations and the Curse of Dimensionality, 325
8.6 Why Does It Work?**, 328
8.7 Bibliographic Notes, 333
Problems, 334
9 Learning Value Function Approximations 337
9.1 Sampling the Value of a Policy, 337
9.2 Stochastic Approximation Methods, 347
9.3 Recursive Least Squares for Linear Models, 349
9.4 Temporal Difference Learning with a Linear Model, 356
9.5 Bellman's Equation Using a Linear Model, 358
9.6 Analysis of TD(0), LSTD, and LSPE Using a Single State, 364
9.7 Gradient-Based Methods for Approximate Value Iteration*, 366
9.8 Least Squares Temporal Differencing with Kernel Regression*, 371
9.9 Value Function Approximations Based on Bayesian Learning*, 373
9.10 Why Does It Work*, 376
9.11 Bibliographic Notes, 379
Problems, 381
10 Optimizing While Learning 383
10.1 Overview of Algorithmic Strategies, 385
10.2 Approximate Value Iteration and Q-Learning Using Lookup Tables, 386
10.3 Statistical Bias in the Max Operator, 397
10.4 Approximate Value Iteration and Q-Learning Using Linear Models, 400
10.5 Approximate Policy Iteration, 402
10.6 The Actor-Critic Paradigm, 408
10.7 Policy Gradient Methods, 410
10.8 The Linear Programming Method Using Basis Functions, 411
10.9 Approximate Policy Iteration Using Kernel Regression*, 413
10.10 Finite Horizon Approximations for Steady-State Applications, 415
10.11 Bibliographic Notes, 416
Problems, 418
11 Adaptive Estimation and Stepsizes 419
11.1 Learning Algorithms and Stepsizes, 420
11.2 Deterministic Stepsize Recipes, 425
11.3 Stochastic Stepsizes, 433
11.4 Optimal Stepsizes for Nonstationary Time Series, 437
11.5 Optimal Stepsizes for Approximate Value Iteration, 447
11.6 Convergence, 449
11.7 Guidelines for Choosing Stepsize Formulas, 451
11.8 Bibliographic Notes, 452
Problems, 453
12 Exploration Versus Exploitation 457
12.1 A Learning Exercise: The Nomadic Trucker, 457
12.2 An Introduction to Learning, 460
12.3 Heuristic Learning Policies, 464
12.4 Gittins Indexes for Online Learning, 470
12.5 The Knowledge Gradient Policy, 477
12.6 Learning with a Physical State, 482
12.7 Bibliographic Notes, 492
Problems, 493
13 Value Function Approximations for Resource Allocation Problems 497
13.1 Value Functions versus Gradients, 498
13.2 Linear Approximations, 499
13.3 Piecewise-Linear Approximations, 501
13.4 Solving a Resource Allocation Problem Using Piecewise-Linear
Functions, 505
13.5 The SHAPE Algorithm, 509
13.6 Regression Methods, 513
13.7 Cutting Planes*, 516
13.8 Why Does It Work?**, 528
13.9 Bibliographic Notes, 535
Problems, 536
14 Dynamic Resource Allocation Problems 541
14.1 An Asset Acquisition Problem, 541
14.2 The Blood Management Problem, 547
14.3 A Portfolio Optimization Problem, 557
14.4 A General Resource Allocation Problem, 560
14.5 A Fleet Management Problem, 573
14.6 A Driver Management Problem, 580
14.7 Bibliographic Notes, 585
Problems, 586
15 Implementation Challenges 593
15.1 Will ADP Work for Your Problem?, 593
15.2 Designing an ADP Algorithm for Complex Problems, 594
15.3 Debugging an ADP Algorithm, 596
15.4 Practical Issues, 597
15.5 Modeling Your Problem, 602
15.6 Online versus Offline Models, 604
15.7 If It Works, Patent It!, 606
Bibliography 607
Index 623
Preface to the Second Edition xi
Preface to the First Edition xv
Acknowledgments xvii
1 The Challenges of Dynamic Programming 1
1.1 A Dynamic Programming Example: A Shortest Path Problem, 2
1.2 The Three Curses of Dimensionality, 3
1.3 Some Real Applications, 6
1.4 Problem Classes, 11
1.5 The Many Dialects of Dynamic Programming, 15
1.6 What Is New in This Book?, 17
1.7 Pedagogy, 19
1.8 Bibliographic Notes, 22
2 Some Illustrative Models 25
2.1 Deterministic Problems, 26
2.2 Stochastic Problems, 31
2.3 Information Acquisition Problems, 47
2.4 A Simple Modeling Framework for Dynamic Programs, 50
2.5 Bibliographic Notes, 54
Problems, 54
3 Introduction to Markov Decision Processes 57
3.1 The Optimality Equations, 58
3.2 Finite Horizon Problems, 65
3.3 Infinite Horizon Problems, 66
3.4 Value Iteration, 68
3.5 Policy Iteration, 74
3.6 Hybrid Value-Policy Iteration, 75
3.7 Average Reward Dynamic Programming, 76
3.8 The Linear Programming Method for Dynamic Programs, 77
3.9 Monotone Policies*, 78
3.10 Why Does It Work?**, 84
3.11 Bibliographic Notes, 103
Problems, 103
4 Introduction to Approximate Dynamic Programming 111
4.1 The Three Curses of Dimensionality (Revisited), 112
4.2 The Basic Idea, 114
4.3 Q-Learning and SARSA, 122
4.4 Real-Time Dynamic Programming, 126
4.5 Approximate Value Iteration, 127
4.6 The Post-Decision State Variable, 129
4.7 Low-Dimensional Representations of Value Functions, 144
4.8 So Just What Is Approximate Dynamic Programming?, 146
4.9 Experimental Issues, 149
4.10 But Does It Work?, 155
4.11 Bibliographic Notes, 156
Problems, 158
5 Modeling Dynamic Programs 167
5.1 Notational Style, 169
5.2 Modeling Time, 170
5.3 Modeling Resources, 174
5.4 The States of Our System, 178
5.5 Modeling Decisions, 187
5.6 The Exogenous Information Process, 189
5.7 The Transition Function, 198
5.8 The Objective Function, 206
5.9 A Measure-Theoretic View of Information**, 211
5.10 Bibliographic Notes, 213
Problems, 214
6 Policies 221
6.1 Myopic Policies, 224
6.2 Lookahead Policies, 224
6.3 Policy Function Approximations, 232
6.4 Value Function Approximations, 235
6.5 Hybrid Strategies, 239
6.6 Randomized Policies, 242
6.7 How to Choose a Policy?, 244
6.8 Bibliographic Notes, 247
Problems, 247
7 Policy Search 249
7.1 Background, 250
7.2 Gradient Search, 253
7.3 Direct Policy Search for Finite Alternatives, 256
7.4 The Knowledge Gradient Algorithm for Discrete Alternatives, 262
7.5 Simulation Optimization, 270
7.6 Why Does It Work?**, 274
7.7 Bibliographic Notes, 285
Problems, 286
8 Approximating Value Functions 289
8.1 Lookup Tables and Aggregation, 290
8.2 Parametric Models, 304
8.3 Regression Variations, 314
8.4 Nonparametric Models, 316
8.5 Approximations and the Curse of Dimensionality, 325
8.6 Why Does It Work?**, 328
8.7 Bibliographic Notes, 333
Problems, 334
9 Learning Value Function Approximations 337
9.1 Sampling the Value of a Policy, 337
9.2 Stochastic Approximation Methods, 347
9.3 Recursive Least Squares for Linear Models, 349
9.4 Temporal Difference Learning with a Linear Model, 356
9.5 Bellman's Equation Using a Linear Model, 358
9.6 Analysis of TD(0), LSTD, and LSPE Using a Single State, 364
9.7 Gradient-Based Methods for Approximate Value Iteration*, 366
9.8 Least Squares Temporal Differencing with Kernel Regression*, 371
9.9 Value Function Approximations Based on Bayesian Learning*, 373
9.10 Why Does It Work*, 376
9.11 Bibliographic Notes, 379
Problems, 381
10 Optimizing While Learning 383
10.1 Overview of Algorithmic Strategies, 385
10.2 Approximate Value Iteration and Q-Learning Using Lookup Tables, 386
10.3 Statistical Bias in the Max Operator, 397
10.4 Approximate Value Iteration and Q-Learning Using Linear Models, 400
10.5 Approximate Policy Iteration, 402
10.6 The Actor-Critic Paradigm, 408
10.7 Policy Gradient Methods, 410
10.8 The Linear Programming Method Using Basis Functions, 411
10.9 Approximate Policy Iteration Using Kernel Regression*, 413
10.10 Finite Horizon Approximations for Steady-State Applications, 415
10.11 Bibliographic Notes, 416
Problems, 418
11 Adaptive Estimation and Stepsizes 419
11.1 Learning Algorithms and Stepsizes, 420
11.2 Deterministic Stepsize Recipes, 425
11.3 Stochastic Stepsizes, 433
11.4 Optimal Stepsizes for Nonstationary Time Series, 437
11.5 Optimal Stepsizes for Approximate Value Iteration, 447
11.6 Convergence, 449
11.7 Guidelines for Choosing Stepsize Formulas, 451
11.8 Bibliographic Notes, 452
Problems, 453
12 Exploration Versus Exploitation 457
12.1 A Learning Exercise: The Nomadic Trucker, 457
12.2 An Introduction to Learning, 460
12.3 Heuristic Learning Policies, 464
12.4 Gittins Indexes for Online Learning, 470
12.5 The Knowledge Gradient Policy, 477
12.6 Learning with a Physical State, 482
12.7 Bibliographic Notes, 492
Problems, 493
13 Value Function Approximations for Resource Allocation Problems 497
13.1 Value Functions versus Gradients, 498
13.2 Linear Approximations, 499
13.3 Piecewise-Linear Approximations, 501
13.4 Solving a Resource Allocation Problem Using Piecewise-Linear
Functions, 505
13.5 The SHAPE Algorithm, 509
13.6 Regression Methods, 513
13.7 Cutting Planes*, 516
13.8 Why Does It Work?**, 528
13.9 Bibliographic Notes, 535
Problems, 536
14 Dynamic Resource Allocation Problems 541
14.1 An Asset Acquisition Problem, 541
14.2 The Blood Management Problem, 547
14.3 A Portfolio Optimization Problem, 557
14.4 A General Resource Allocation Problem, 560
14.5 A Fleet Management Problem, 573
14.6 A Driver Management Problem, 580
14.7 Bibliographic Notes, 585
Problems, 586
15 Implementation Challenges 593
15.1 Will ADP Work for Your Problem?, 593
15.2 Designing an ADP Algorithm for Complex Problems, 594
15.3 Debugging an ADP Algorithm, 596
15.4 Practical Issues, 597
15.5 Modeling Your Problem, 602
15.6 Online versus Offline Models, 604
15.7 If It Works, Patent It!, 606
Bibliography 607
Index 623
Preface to the First Edition xv
Acknowledgments xvii
1 The Challenges of Dynamic Programming 1
1.1 A Dynamic Programming Example: A Shortest Path Problem, 2
1.2 The Three Curses of Dimensionality, 3
1.3 Some Real Applications, 6
1.4 Problem Classes, 11
1.5 The Many Dialects of Dynamic Programming, 15
1.6 What Is New in This Book?, 17
1.7 Pedagogy, 19
1.8 Bibliographic Notes, 22
2 Some Illustrative Models 25
2.1 Deterministic Problems, 26
2.2 Stochastic Problems, 31
2.3 Information Acquisition Problems, 47
2.4 A Simple Modeling Framework for Dynamic Programs, 50
2.5 Bibliographic Notes, 54
Problems, 54
3 Introduction to Markov Decision Processes 57
3.1 The Optimality Equations, 58
3.2 Finite Horizon Problems, 65
3.3 Infinite Horizon Problems, 66
3.4 Value Iteration, 68
3.5 Policy Iteration, 74
3.6 Hybrid Value-Policy Iteration, 75
3.7 Average Reward Dynamic Programming, 76
3.8 The Linear Programming Method for Dynamic Programs, 77
3.9 Monotone Policies*, 78
3.10 Why Does It Work?**, 84
3.11 Bibliographic Notes, 103
Problems, 103
4 Introduction to Approximate Dynamic Programming 111
4.1 The Three Curses of Dimensionality (Revisited), 112
4.2 The Basic Idea, 114
4.3 Q-Learning and SARSA, 122
4.4 Real-Time Dynamic Programming, 126
4.5 Approximate Value Iteration, 127
4.6 The Post-Decision State Variable, 129
4.7 Low-Dimensional Representations of Value Functions, 144
4.8 So Just What Is Approximate Dynamic Programming?, 146
4.9 Experimental Issues, 149
4.10 But Does It Work?, 155
4.11 Bibliographic Notes, 156
Problems, 158
5 Modeling Dynamic Programs 167
5.1 Notational Style, 169
5.2 Modeling Time, 170
5.3 Modeling Resources, 174
5.4 The States of Our System, 178
5.5 Modeling Decisions, 187
5.6 The Exogenous Information Process, 189
5.7 The Transition Function, 198
5.8 The Objective Function, 206
5.9 A Measure-Theoretic View of Information**, 211
5.10 Bibliographic Notes, 213
Problems, 214
6 Policies 221
6.1 Myopic Policies, 224
6.2 Lookahead Policies, 224
6.3 Policy Function Approximations, 232
6.4 Value Function Approximations, 235
6.5 Hybrid Strategies, 239
6.6 Randomized Policies, 242
6.7 How to Choose a Policy?, 244
6.8 Bibliographic Notes, 247
Problems, 247
7 Policy Search 249
7.1 Background, 250
7.2 Gradient Search, 253
7.3 Direct Policy Search for Finite Alternatives, 256
7.4 The Knowledge Gradient Algorithm for Discrete Alternatives, 262
7.5 Simulation Optimization, 270
7.6 Why Does It Work?**, 274
7.7 Bibliographic Notes, 285
Problems, 286
8 Approximating Value Functions 289
8.1 Lookup Tables and Aggregation, 290
8.2 Parametric Models, 304
8.3 Regression Variations, 314
8.4 Nonparametric Models, 316
8.5 Approximations and the Curse of Dimensionality, 325
8.6 Why Does It Work?**, 328
8.7 Bibliographic Notes, 333
Problems, 334
9 Learning Value Function Approximations 337
9.1 Sampling the Value of a Policy, 337
9.2 Stochastic Approximation Methods, 347
9.3 Recursive Least Squares for Linear Models, 349
9.4 Temporal Difference Learning with a Linear Model, 356
9.5 Bellman's Equation Using a Linear Model, 358
9.6 Analysis of TD(0), LSTD, and LSPE Using a Single State, 364
9.7 Gradient-Based Methods for Approximate Value Iteration*, 366
9.8 Least Squares Temporal Differencing with Kernel Regression*, 371
9.9 Value Function Approximations Based on Bayesian Learning*, 373
9.10 Why Does It Work*, 376
9.11 Bibliographic Notes, 379
Problems, 381
10 Optimizing While Learning 383
10.1 Overview of Algorithmic Strategies, 385
10.2 Approximate Value Iteration and Q-Learning Using Lookup Tables, 386
10.3 Statistical Bias in the Max Operator, 397
10.4 Approximate Value Iteration and Q-Learning Using Linear Models, 400
10.5 Approximate Policy Iteration, 402
10.6 The Actor-Critic Paradigm, 408
10.7 Policy Gradient Methods, 410
10.8 The Linear Programming Method Using Basis Functions, 411
10.9 Approximate Policy Iteration Using Kernel Regression*, 413
10.10 Finite Horizon Approximations for Steady-State Applications, 415
10.11 Bibliographic Notes, 416
Problems, 418
11 Adaptive Estimation and Stepsizes 419
11.1 Learning Algorithms and Stepsizes, 420
11.2 Deterministic Stepsize Recipes, 425
11.3 Stochastic Stepsizes, 433
11.4 Optimal Stepsizes for Nonstationary Time Series, 437
11.5 Optimal Stepsizes for Approximate Value Iteration, 447
11.6 Convergence, 449
11.7 Guidelines for Choosing Stepsize Formulas, 451
11.8 Bibliographic Notes, 452
Problems, 453
12 Exploration Versus Exploitation 457
12.1 A Learning Exercise: The Nomadic Trucker, 457
12.2 An Introduction to Learning, 460
12.3 Heuristic Learning Policies, 464
12.4 Gittins Indexes for Online Learning, 470
12.5 The Knowledge Gradient Policy, 477
12.6 Learning with a Physical State, 482
12.7 Bibliographic Notes, 492
Problems, 493
13 Value Function Approximations for Resource Allocation Problems 497
13.1 Value Functions versus Gradients, 498
13.2 Linear Approximations, 499
13.3 Piecewise-Linear Approximations, 501
13.4 Solving a Resource Allocation Problem Using Piecewise-Linear
Functions, 505
13.5 The SHAPE Algorithm, 509
13.6 Regression Methods, 513
13.7 Cutting Planes*, 516
13.8 Why Does It Work?**, 528
13.9 Bibliographic Notes, 535
Problems, 536
14 Dynamic Resource Allocation Problems 541
14.1 An Asset Acquisition Problem, 541
14.2 The Blood Management Problem, 547
14.3 A Portfolio Optimization Problem, 557
14.4 A General Resource Allocation Problem, 560
14.5 A Fleet Management Problem, 573
14.6 A Driver Management Problem, 580
14.7 Bibliographic Notes, 585
Problems, 586
15 Implementation Challenges 593
15.1 Will ADP Work for Your Problem?, 593
15.2 Designing an ADP Algorithm for Complex Problems, 594
15.3 Debugging an ADP Algorithm, 596
15.4 Practical Issues, 597
15.5 Modeling Your Problem, 602
15.6 Online versus Offline Models, 604
15.7 If It Works, Patent It!, 606
Bibliography 607
Index 623