The IMAGE framework allows an unsupervised learning of complex 3D objects and a recognition in cluttered scenes, under scale changes and in case of partial occlusions. Local correspondences are determined autonomously for an unordered and unlabelled training set and are used to learn the view-dependent variation of the object`s features.An exhaustive matching is avoided by a clustering of the local features. Each feature is tracked through the object`s training set giving a view-dependent feature model. This is also motivated from cognitive science that showed that human vision is able to interpolate between views. Most existing learning approaches store a collection of training views neglecting their relations.Experiments with real objects show that the recognition performance is robust in the case of cluttered scenes, partial occlusion, in-plane rotation and scale changes. We achieve a recognition performance that is comparable to a standard SIFT approach although the IMAGE framework is able to learn unsupervised. Experiments with the Amsterdam Library of Object Images (ALOI) shows that the method also works, without any parameter change, for another set of images.