De Groot, "Optimal Statistical Decisions".
Gives a good overview of Bayesian statistics and decision theory, and an introduction to Markov decision processes from a statistical perspective. Most reinforcement learning problems can be reduced to MDPs of this type.
Savage, "The foundations of statistics".
This early work gives a more verbose (but rigorous) alternative to de Groot. Most basic ideas of statistical decision theory are laid out in this book, including minimax problems.
Robert and Casella, "Monte Carlo Statistical Methods".
Gives an excellent overview of Monte Carlo methods, including theory and many useful examples.
Puterman, "Markov Decision Processes".
Examines various types of Markov decision processes and gives detailed proofs of basic dynamic programming algorithms, as well as important cases not treated elsewhere.
Bertsekas and Tsitsiklis, "Neuro-dynamic programming".
This work connects basic MDP theory to reinforcement learning, in the case where we perform approximate dynamic programming via simulation. Many algorithms are described and analysed (in the limit).
Sutton and Barto, "Reinforcement learning: an introduction".
Gives an overview of the problem of reinforcement learning, and many important basic algorithms. Includes no analysis, but it is recommended for a first approach to the subject.
General sequential learning problems
Cesa-Bianchi and Lugosi, "Prediction, Learning and Games".
Considers the case where we do not necessarily have an explicit, or estimated environment model, and where we may be acting against an adversarial environment. This book requires patience, but is very rewarding.