Apache Drill – Get Started in minutes

What is Apache Drill ?

Schema-free SQL Query Engine for Hadoop, NoSQL and cloud storage.

It is a distributed processing engine that interprets SQL.  Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. ( Source – https://drill.apache.org/)

With Drill users can just query the raw data from its original place.  There isn’t a need to load the data, create and maintain schemas, or transform the data before it can be processed.

Why Drill?

  • You can get started in minutes
  • Schema free JSON model . No ETL needed
  • Query data from its original place
  • Support for ANSI SQL
  • Support for multiple datasources
  • Extremely user and developer friendly
  • Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs

Installation (Mac OSX)

First things first. Lets install Apache Drill “Embedded Mode”. This requires almost no setup configurations.

Download apache-drill-1.5.0.tar.gz from one of the mirror sites :

  • http://www.apache.org/dyn/closer.cgi/drill/drill-1.5.0/apache-drill-1.5.0.tar.gz

You can also download it using curl or wget :

  • wget http://mirrors.sonic.net/apache/drill/drill-1.5.0/apache-drill-1.5.0.tar.gz
  • curl -o apache-drill-1.5.0.tar.gz http://mirrors.sonic.net/apache/drill/drill-1.5.0/apache-drill-1.5.0.tar.gz

Ensure that you have JAVA_HOME set (1.7 is recommended)








Unzip and untar the downloaded file and go to bin folder :

cd /<PATH-TO>/apache-drill-1.5.0/bin

 Drill - Path to Drill

Now we start the drill shell


Drill Shell

Easy setup as we can see. Now that we have the shell running lets run some SQL queries on a json file.

I have a some sample jsons which can be downloaded from: https://github.com/nizantz/drill-example-jsons.git

The files are placed in the sample-data folder of drill

Nishants-MacBook-Pro:sample-data nishant$ pwd
Nishants-MacBook-Pro:sample-data nishant$ ls -lrt
drwxr-xr-x@ 3 nishant  staff   102 Feb  9 15:23 regionsSF
drwxr-xr-x@ 3 nishant  staff   102 Feb  9 15:23 regionsMF
-rwxr-xr-x@ 1 nishant  staff   455 Feb  9 15:23 region.parquet
drwxr-xr-x@ 3 nishant  staff   102 Feb  9 15:23 nationsSF
drwxr-xr-x@ 3 nishant  staff   102 Feb  9 15:23 nationsMF
-rwxr-xr-x@ 1 nishant  staff  1210 Feb  9 15:23 nation.parquet
-rw-r--r--  1 nishant  staff   487 Mar  1 13:49 positions.json
-rw-r--r--  1 nishant  staff   510 Mar  1 13:49 employee.json
-rw-r--r--  1 nishant  staff   160 Mar  1 13:49 departments.json
-rw-r--r--@ 1 nishant  staff    36 Mar  1 13:49 departments.csv
Nishants-MacBook-Pro:sample-data nishant$

From the drill shell , lets query the employee.json :

SELECT * FROM dfs.`/Users/nishant/drill/apache-drill-1.5.0/sample-data/employee.json`;

 Screen Shot 2016-03-01 at 1.57.30 PM

From the drill shell , lets query the positions.json file:

SELECT * FROM dfs.`/Users/nishant/drill/apache-drill-1.5.0/sample-data/positions.json`;  Simple Select ALL on JSON

How about getting the employee names and roles :

SELECT emp.firstName,emp.lastName, pos.roleTitle FROM dfs.`/Users/nishant/drill/apache-drill-1.5.0/sample-data/employee.json` emp , dfs.`/Users/nishant/drill/apache-drill-1.5.0/sample-data/positions.json` pos WHERE emp.roleId = pos.roleId;

Simple Joins

So we could fire a simple join SQL between data in different JSON files.

How about querying two different datasources, like say JSON and MongoDB.

Drill supports MongoDB 3.0, providing a mongodb storage plugin to connect to MongoDB using MongoDB’s latest Java driver.

Am assuming that you already have a MongoDB setup. If no please follow instructions at https://docs.mongodb.org/manual/installation/  to install

Since Drill shell is already running, open the following URL in browser : http://localhost:8047/storage and click on “Enable” button for mongo

Drill Storage Plugin Console

Before Mongo storage plugin is enabled

Screen Shot 2016-03-02 at 12.21.57 AM




After MongoDB storage plugin is enabled

After MongoDB storage plugin is enabled


Screen Shot 2016-03-02 at 12.49.21 AM



In Drill shell , we now will use the mongo.TEST schema. I have a departments table in the mongo schema;

Screen Shot 2016-03-02 at 12.58.28 AM

Lets see the data in the departments table in MongoDB:


Now we can query the employee JSON file to get the employee names and join with departments table in Mongo to get the department names:

SELECT a.firstName||' '||a.lastName,b.deptName FROM dfs.`/Users/nishant/drill/apache-drill-1.5.0/sample-data/employee.json` a,departments b WHERE a.deptId=b.deptId;

Screen Shot 2016-03-02 at 1.03.39 AM

So here we get the employee name from JSON and department name from MongoDB from a single SQL query. Isn’t that cool.

So now you can use Drill to query JSONs , join JSONs, join JSON and MongoDB.

Get going on drilling data!!

More later 🙂

Posted in Big Data, Generic, NoSQL
0 comments on “Apache Drill – Get Started in minutes
1 Pings/Trackbacks for "Apache Drill – Get Started in minutes"
  1. […] Singh explains how to get started with Apache […]

Leave a Reply

Your email address will not be published. Required fields are marked *