The amount of video data in the web and in archives is growing permanently. This fact motivates the active research area of video indexing, search and retrieval. Videos vary in many ways: in terms of recording devices and circumstances, compression technology, genre, and of course, in terms of content. The research question addressed by this Ph.D. thesis is how to build robust video content analysis approaches that work reliably on arbitrary videos. For this purpose, a transductive learning framework is developed that is based on feature selection and ensemble classification. Apart from solutions based on the framework, some approaches employ unsupervised learning or deal with compression artefacts adequately. Solutions for several video analysis problems are presented: shot boundary detection, camera motion estimation, face recognition, semantic concept detection and semantic indexing of computer game sequences. Experimental results on large test sets demonstrate the very good performance of the approaches. Finally, some areas of future work are outlined. This thesis is relevant for students and researchers who are interested in the field of video content analysis and retrieval.