Our approach takes a step-by-step approach, starting with learning feature representations of key facial components. We then guide the corresponding parts of the input sketches towards the underlying component structures defined by the feature vectors obtained from a set of facial component samples. Additionally, we introduce another deep neural network that learns the mapping from the embedded component features to realistic images, utilizing multi-channel feature maps as intermediate results to enhance the flow of information.