This research proposes deterministic and probabilistic generic multimodal behavior understanding models from body gestures and vocal utterances sensed by Microsoft Kinect technology. The proposed deterministic model is based on Moore Machine principles while the probabilistic model is a machine learning approach using Hidden Markov Models. The models have been implemented in a friendly graphical system which encompasses detecting, recognizing, discovering, learning, predicting and measuring human emotional behavior patterns. The system takes as input the skeletal data and voice signal from Kinect Sensor. Then specific gestures and interjections are recognized, grouped and classified depending on the context. Rule-based, signal alignment modelsMicrosoft and Speech recognizer have been used respectively for static and dynamic body gestures and emotional interjections. Tests reveals an averal precision 94.67%, sensitivity 88.89%, specificity 99.24 (false alarm rate 0.76%), efficiency 92.91%, positive prediction accuracy 98.91%, reliability of 94.40% and overall accuracy of 88.60%. Deterministic behavior classifier reached 100% hit rate while probabilistic averaged 90.8%.