This work addresses the problem of recognizing objects in images. Representation, detection, and learning are the main issues that need to be tackled in designing an object recognition system. This work proposes a framework for the statistical representation of visual features classes. The model combines several key concepts that have been developed in computer vision, machine learning, and computational neuroscience; spatial relations between features, graphical models, and hierarchies of complex cells. This results in a compositional hierarchy of visual feature classes. Its strength is to provide a coherent and generic model by representing both local and global aspects through the combination of shape and appearance modalities. Interestingly, the use of graphical models provides a convenient formalism to represent complex systems and to exploit efficient inference mechanisms, namely Nonparametric Belief Propagation (NBP). The hierarchical model is learned iteratively and composed in a bottom-up manner. We also provide a review of the state-of-the-art with respect to the detection and the description of local visual features.