Music is built from sound, ultimately resulting from an elaborate interaction between the sound-generating properties of physical objects (i.e. music instruments) and the sound perception abilities of the human auditory system. Humans, even without any kind of formal music training, are typically able to extract, almost unconsciously, a great amount of relevant information from a musical signal (e.g. the beat and main melody of a musical piece, or the sound sources playing in a complex musical arrangement). In order to do so, the human auditory system uses a variety of cues for perceptual grouping such as similarity, proximity, harmonicity, common fate, among others. This book proposes a flexible and extensible Computational Auditory Scene Analysis (CASA) framework for modeling perceptual grouping in music listening. Implemented using the open source sound processing framework Marsyas, this work should be specially interesting to researchers in Music Information Retrieval (MIR) and CASA fields, or anyone developing software to perform automatic analysis and processing of sound and music signals.