Multimedia research has provided sophisticated technical solutions for automatic image and video classification. Expert systems facilitate archiving and search of digital content by emulating human experts processing visual information. However, the fact that visual information is open for interpretation has long been neglected. Unambiguous classification of visual information is - even to humans - very difficult. This directly affects the effectiveness of expert systems which rely on high-grade data as their input. This book is based on a PhD dissertation in which selected techniques for processing semantic information from digital video are presented. The author describes how human description of visual content is commonly used for designing expert systems, and shows that disagreement between different human observers occurs frequently. What does this mean for an automatic classification system that is based on such information? In response to this question, the author presentsan approach that combines the input of many human experts. While this book does not contain a ready-to-deploy solution, the author provides new food for thought for multimedia and data mining research.