Nowadays, data-parallel applications, which include scientific and engineering, multimedia, network, security, etc., are growing in importance and demanding increased performance from hardware. On the other hand, the exponential growth in the fabrication technology and the continuous improvements in transistor density have allowed tens of billions of transistors to be integrated onto one single chip. Thus, this book proposes three microarchitectures for matrix processors architectures that exploit this huge number of transistors to improve the performance of data-parallel applications: simple matrix processor (SMP), simple super-matrix processor (SSMP), and multithreaded simple super-matrix processor (ThrSSMP). In addition, this book explains in details the implementation of our proposed designs for SMP, SSMP, and ThrSSMP using VHDL targeting FPGA Virtex-6, XC6VLX550T-2FF1760 device. Moreover, the performances of SMP/SSMP/ThrSSMP are evaluated on some vector/matrix kernels from basic linear algebra subprograms(BLAS).