Regular processor array implementations lack efficiency due to limitations set by data dependences in order to enable regular data flow. Efficient processor arrays implement data flow of all variables and avoid static variables that require intensive data loads from memory introducing idle processor activity. Most of existing design methods and techniques that map algorithms onto processor arrays are based on linear mappings and just transform the algorithm dependence graphs in space-time graphs. Obtained processor arrays do not reach the required efficiency, producing bubbles when the processor is not performing a reasonable operation in alternative time moments, i.e. producing idle activity. The results in this research show implementations that can eliminate mentioned problems and can reach maximum efficiency, except for processor data load and store activities. The implementations are based on non-linear transformations that include folding, double mapping and fast systolic designs. There are theoretical and experimental proofs which designs can reach the most efficient processor array implementations by introducing the fastest processors array implementations