Acquisition of 3D model of real objects and scenes is indispensable and useful in many practical applications, such as digital archives, game and entertainment industries, engineering, advertisement. However, a single photograph, while records the color, the brightness of one scene, can not provide its 3D structure. Fortunately, with many photographs of the scene in different viewpoints, the 3D information can be revealed with a technique called multi-view stereo. This image-based technique is easier, cheaper, faster than the active range finding technique (laser-based technique), especially for large-scale outdoor scenes. Nevertheless, one disadvantage of the multi-view technique comparing to the laser-based technique is its lack of accuracy. In this work, we target the accuracy as well as scalability issues. We significantly improve some previous multi-view methods and combine them into a remarkably effective pipeline with GPU acceleration. We produce highly complete and accurate meshes that achieve best scores in many benchmarks. Then, we develop Divide and Conquer, mesh merging methods in order to build large 3D models from thousands of high-resolution images.