This book overviews VLSI technology constraints and introduces the architecture of the Trident processor, which tries to overcome these constraints and to satisfy the requirement of future applications. In addition, this book proposes the use of a multi-level instruction set architecture (ISA) to express fine-grain data parallelism to the Trident processor instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of data parallel applications are scalar, vector, and matrix, our proposed Trident processor extends a scalar ISA with vector and matrix instruction sets to effectively process data parallel applications. Like vector microarchitectures, the Trident processor consists of a set of parallel lanes (each lane contains a set of vector pipelines and a slice of a register file) combined with a fast scalar core. However, the Trident processor can effectively process on parallel lanes not only vector but also matrix data.