Emotion recognition utilizing pictures, videos, or speech as input is considered an intriguing issue in the research field over certain years. The introduction of deep learning procedures like the Convolutional Neural Networks (CNN) has made emotion recognition achieve promising outcomes. This book is carried out to develop an image and video-based emotion recognition model using CNN for automatic feature extraction and classification with Matlab sample codes. Five emotions are considered for recognition: angry, happy, neutral, sad, and surprise, compared to previous algorithms. Different pre-processing steps have been carried over data samples, followed by the popular and efficient Viola-Jones algorithm for face detection. Evaluating results using confusion matrix, accuracy, F1-score, precision, and recall shows that video-based datasets obtained more promising results than image-based datasets.