Object knowledge is an important cue to distinguish between human activities, but nevertheless usually disregarded in video-based activity recognition systems. In contrast, the aim of this work is to explore ways how to boost activity recognition performance by augmenting motion features with object information. Instead of relying on supervised detectors, the proposed object representation is motivated by a key mechanism of visual perception: saliency detection. Saliency detection serves as a gating mechanism selecting which information to process. It thus allows us, humans, to focus our visual attention on certain regions even before we identify them as actual objects. The proposed proto-object features are based on computational models implementing such an attentional process making the representation independent of statistical knowledge about objects. A major advantage of the present approach is, therefore, its ability to be transferred across domains without the explicit necessity of learning new object models.