The abundance of textual data in the information age poses an immense challenge for us: how to perform large-scale inference to understand and utilize this overwhelming amount of information. We develop effective and efficient statistical topic models for massive text collections by taking care of extra information from other modalities in addition to the text itself. Text documents are not just text, for example, research papers have author information, email messages contain social sender-recipient links, legislative resolutions are recorded with votes, and so on. These kinds of additional information are naturally interleaved with text. Most of the previous work, however, pay attention to only one modality at a time, and ignore the others. We present a series of probabilistic topic models to show how we can bridge multiple modalities of information, in a united fashion. Interestingly, joint inference over multiple modalities leads to many findings that can not be discovered from just one modality alone, which are clear evidence that we can better understand and utilize massive text collections when additional modalities are modeled jointly with text.