Multiple parallel processing cores are about to conquer embedded systems as well - it is not the question of whether they are coming but how the architectures of the micro-controllers should look in respect to the strict demands in the field. In this work the step from one to multiple cores is presented, establishing coherence and consistency for different types of shared memory by soft- and hardware means. Also point-to-point synchronization between the processor cores is realized. Although the theoretical approach using simulations is independent of the number of processing units, the practical examinations focus on the logical first step from single- to dual-core systems. Best- and worst-case results, together with intensive benchmarking of all synchronization primitives implemented, show the expected superiority of the hardware solutions.