Accessing and cataloging data offers the ability to use and connect into new analytical techniques and services, such as predictive analytics, data visualization and Artificial intelligence. Big data in information technology is a set of processing methods and means of structured and unstructured, dynamic, heterogeneous big data for their analysis and use of the decision support. To capture all the complex data streaming into systems from various sources, businesses have turned to data lakes. Often on the cloud, these are storage repositories that hold an enormous amount of data until it's ready to be analyzed: raw or refined, and structured or unstructured. A well-architected data lake should provide an environment to build data science models using open source languages such as R, Python or Scala. Strong integration with open repositories is must to recommend the best algorithm for use cases.