Neural Networks are state of the art in the area of semantic object segmentation in images. One of the most important aspects affecting their permanence is the amount and quality of the available training data. These data need to be gathered by hand, which greatly restricts their availability. The goal of this thesis is the investigation of the possibility to reduce this manual over- head without reducing the quality of the final segmentation. This is done by the example of recognizing leafs of turnip plants in RGB images. In order to reduce the manual overhead, short video sequences consisting of tracking shots over the plants are created. The first frames of these videos are annotated by hand. For the purpose of training, the information of these first frames is propagated over the whole video sequence in order to increase the number of training data.