Cost of transportation of goods and services is an interesting topic in today's society. The Capacitated vehicle routing problem, which is been consider in this research, is one of the variants of the vehicle routing problem. In this research we develop a reinforcement learning technique to find optimal paths from a depot to the set of customers while also considering the capacity of the vehicles, in order to reduce the cost of transportation of goods and services. Our basic assumptions are; each vehicle originates from a depot, service the customers and return to the depot, the vehicles are homogeneous. We solve the CVRP with an exact method; column generation, goole's operation research tool and reinforcement learning and compare their solutions. Our objective is to solve a large-size of vehicle routing problem to optimality.