The main objective of reservoir operation planning is to determine the amount of water released from a reservoir and the amount of energy traded in each time step to make the best use of available resources. This is done by evaluating the trade-off between the immediate and the future profit of power generation while meeting a set of constraints such as the continuity equations, transmission limits, generation and reservoir limits, flood control limits and load resource balance. Another important issue in these problems is uncertainty coming from spatial and temporal variability, inherent nature of a problem or parameter, errors in measurement due to human or technology inaccuracy and modeling errors. This research implements a Reinforcement Learning (RL) optimization algorithm to incorporate flood control constraints of the Columbia River Treaty. It considers the main sources of uncertainty in operating a large scale hydropower system: market prices and inflows by using a numberof scenarios of historical data on inflow and energy prices in the learning process. The RL method reduces the time and computational effort needed to solve the operational planning problem.