Apache Hive is a data warehouse infrastructure integrated on Hadoop allowing analysis, query via a language syntactically close to SQL as well as data synthesis called the HQL. The goal of this document is to help you take advantage of this tool by keeping in mind some key aspects (storage format, user functions, data sampling). as well as code examples that you could easily reuse.