This book presents a unified solution to the challenging problem of human action analysis in video. Human action is represented as a structured set of local features so that different structured learning models can be applied to classify and localise action instances in video databases. A wide range of factors for a good human action analysis is also analysed. At the feature level, different detectors, descriptors and selectors will be employed, while at the learning level, different types of models will be proposed and described. Lastly, at the evaluation level, a few different datasets will be used to analyse the applicability of each structured learning model under different test scenarios. The analysis obtained from this work has confirmed that the structured learning of local features is an effective potential approach for the task of analysing human action. It helps not only to create a significant bridge between traditional approaches of local and global features, but alsoto provide a foundational guideline for future work in this direction. Many other potential applications in visual pattern recognition can also be developed using the frameworks developed in this book.