Long Short-Term Memory (LSTM) is the most successful neural network architecture for processing time series data. Convolutional Neural Networks (CNNs) are outstanding in many image processing tasks like e.g. object detection. Since videos are nothing else but time series of images, it is tempting to use an architecture that combines these two concepts. The Convolutional LSTM (ConvLSTM) realizes this combination and is therefore a very natural architecture to use for the next frame prediction task, whose goal is to make a prediction for the next upcoming frames in a video sequence, i.e. predicting the future in a movie. In this work, new ways of training state of the art ConvLSTM neural networks for next frame prediction are introduced and explored.