HAR is a research field related to the voluntary detection of daily activities performed by people based on time series data using sensors. HAR consists of various domains such as surveillance, baby monitoring, elderly health care, smart driving cars using different approaches to solve problems efficiently and accurately. Traditional HAR systems use wearable sensors such as inertial measurement units (IMUs) and stretch sensors to recognize activity.This approach shows remarkable results for basic user activities such as sitting, standing, and walking. But for complex activities such as running, jumping, wrestling, and swinging, sensor-based HAR systems have higher misclassification rates due to sensor reading errors. These sensor errors have the worst possible classification results and reduce the overall performance of the HAR system. Using a combination of CNN and LSTM data will be extracted and processed from videos. In this book, a deep convolutional neural network is proposed, using which the features are extracted for the collection of the data from the input sequence (video). Then LSTM will be used to determine temporal relationships between the images.