N-dimensional permutation is a very important operation in many large-scale data intensive and scientific applications. These applications include oil industry (i.e. seismic data processing), nuclear medicine (i.e. 3D and 4D computed tomography and positron emission tomography), media production (i.e. 3D TV and 4D Cinema), digital signal processing and business intelligence (i.e. OLAP cubes). This book proposes an efficient parallel in-place n-dimensional permutation algorithm. The algorithm is based on a novel 3D transpose algorithm that was invented and published by IBM in 2008. The proposed algorithm has been implemented in CUDA on NVIDIA GTS 250 GPU and it was tested against 3D, 4D, 5D, 6D and 7D data sets as a proof of concept. It mixes both the logical and physical permutation approaches. In addition, it exploits the fast on-chip memory bandwidth, which improved the performance much. This performance improvement shortens the execution time of the applications that depend onthe permutation. This research was submitted to the Faculty of Engineering, University of Alexandria in partial fulfillment of the requirements for the degree Of M.Sc. in Computers and Systems Engineering.