MongoDB Queries

MongoDB Queries

Created By: Debasis Das

This page is dedicated to be a one stop shop for all mongodb related queries.
This page is good for a quick refresher for mongodb queries for beginners or relatively experienced mongo developers.

All the below queries were run on a mongo instance running on a Mac OSX (Mavericks)

Exporting the path to the location of the mongo folder

mongoFolderLocation
export PATH=/mongoFolderLocation/mongodb/bin:$PATH

Starting the Mongo Instance

Debasiss-MacBook-Pro-2:~ debasisdas$ mongod

mongod --help for help and startup options
2014-10-07T15:34:22.453-0700 [initandlisten] MongoDB starting : pid=15341 port=27017 dbpath=/data/db 64-bit host=Debasiss-MacBook-Pro-2.local
2014-10-07T15:34:22.453-0700 [initandlisten] 
2014-10-07T15:34:22.454-0700 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
2014-10-07T15:34:22.454-0700 [initandlisten] db version v2.6.3
2014-10-07T15:34:22.454-0700 [initandlisten] git version: 255f67a66f9603c59380b2a389e386910bbb52cb
2014-10-07T15:34:22.454-0700 [initandlisten] build info: Darwin mci-osx108-3.build.10gen.cc 12.3.0 Darwin Kernel Version 12.3.0: Sun Jan  6 22:37:10 PST 2013; root:xnu-2050.22.13~1/RELEASE_X86_64 x86_64 BOOST_LIB_VERSION=1_49
2014-10-07T15:34:22.454-0700 [initandlisten] allocator: system
2014-10-07T15:34:22.454-0700 [initandlisten] options: {}
2014-10-07T15:34:22.454-0700 [initandlisten] journal dir=/data/db/journal
2014-10-07T15:34:22.455-0700 [initandlisten] recover : no journal files present, no recovery needed
2014-10-07T15:34:22.534-0700 [initandlisten] waiting for connections on port 27017

Connecting to Mongo Shell

Debasiss-MacBook-Pro-2:~ debasisdas$ mongo

MongoDB shell version: 2.6.3
connecting to: test
Server has startup warnings:
2014-10-07T15:34:22.453-0700 [initandlisten]
2014-10-07T15:34:22.454-0700 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
>

Show Databases

> show dbs
admin      (empty)
blog       0.078GB
company    4.077GB
knowstack  0.078GB
local      0.078GB
m101       0.078GB
school     0.078GB
students   0.078GB
test       0.078GB

Using a database

> use knowstack
switched to db knowstack

Show Collections

> show collections
employees
people
system.indexes
system.profile

drop an existing collection

> db.employees.drop()
true

Create a New Collection

> db.employees.insert({"employee_id":"1000","first_name":"John","last_name":"Doe","email":"JohnDoe@knowstack.com","phone_number":"408-000-0000","salary":713077,"year_exp":30,"designation":"Senior Director","department":"Finance"})
WriteResult({ "nInserted" : 1 })

findOne

> db.employees.findOne()
{
	"_id" : ObjectId("54346fc025a2c172bb14ea9b"),
	"employee_id" : "1000",
	"first_name" : "John",
	"last_name" : "Doe",
	"email" : "JohnDoe@knowstack.com",
	"phone_number" : "408-000-0000",
	"salary" : 713077,
	"year_exp" : 30,
	"designation" : "Senior Director",
	"department" : "Finance"
}

Removing a record from a Mongo Collection

> db.employees.remove({"employee_id" : "1000"})
WriteResult({ "nRemoved" : 1 })

Insert Data into a Mongo Collection from a JSON

Debasiss-MacBook-Pro-2:mongo_python debasisdas$ mongoimport -d knowstack -c employees < kstackemployees.json
connected to: 127.0.0.1
2014-10-07T16:04:39.028-0700 check 9 31
2014-10-07T16:04:39.028-0700 imported 31 objects

Create Index

db.students.ensureIndex ({student_id:1})
This creates an index on student_id in collection students in ascending order

> db.employees.ensureIndex({employee_id:1})
{
“createdCollectionAutomatically” : false,
“numIndexesBefore” : 1,
“numIndexesAfter” : 2,
“ok” : 1
}

Creating indexes on multiple keys
db.students.ensureIndex ({student_id:1,age:-1})
student_id is ascending
age is descending

Drop Index

> db.employees.dropIndex({first_name:1})
{ “nIndexesWas” : 3, “ok” : 1 }

Discover Index

db.system.indexes.find()

> db.employees.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "knowstack.employees"
	},
	{
		"v" : 1,
		"key" : {
			"employee_id" : 1
		},
		"name" : "employee_id_1",
		"ns" : "knowstack.employees"
	},
	{
		"v" : 1,
		"key" : {
			"first_name" : 1
		},
		"name" : "first_name_1",
		"ns" : "knowstack.employees"
	}
]

Creating Multi-Index

> db.employees.ensureIndex({first_name:1,last_name:1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.employees.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "knowstack.employees"
	},
	{
		"v" : 1,
		"key" : {
			"first_name" : 1,
			"last_name" : 1
		},
		"name" : "first_name_1_last_name_1",
		"ns" : "knowstack.employees"
	}
]

Creating a Unique Index

> db.employees.ensureIndex({employee_id:1},{unique:true})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 2,
	"numIndexesAfter" : 3,
	"ok" : 1
}
> db.employees.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "knowstack.employees"
	},
	{
		"v" : 1,
		"key" : {
			"first_name" : 1,
			"last_name" : 1
		},
		"name" : "first_name_1_last_name_1",
		"ns" : "knowstack.employees"
	},
	{
		"v" : 1,
		"unique" : true,
		"key" : {
			"employee_id" : 1
		},
		"name" : "employee_id_1",
		"ns" : "knowstack.employees"
	}
]

Creating a Unique Index and drop duplicates

db.employees.ensureIndex({employee_id:1},{unique:true,dropDups:true})

Creating a Unique Index using Sparse Option

db.employees.ensureIndex({employee_id:1},{unique:true,sparse:true})

Using Hint Option

db.employees.find().sort({employee_id:1}).hint({employee_id:1})

explain()

TO BE DONE

Stats and Index Size

db.employees.stats()
{
	"ns" : "knowstack.employees",
	"count" : 30,
	"size" : 14624,
	"avgObjSize" : 487,
	"storageSize" : 40960,
	"numExtents" : 2,
	"nindexes" : 3,
	"lastExtentSize" : 32768,
	"paddingFactor" : 1,
	"systemFlags" : 0,
	"userFlags" : 1,
	"totalIndexSize" : 24528,
	"indexSizes" : {
		"_id_" : 8176,
		"first_name_1_last_name_1" : 8176,
		"employee_id_1" : 8176
	},
	"ok" : 1
}
> db.employees.totalIndexSize()
24528
> 

Index Cardinality

Regular Index, 1:1
Sparse Index, The number of indexes is less than the number of documents as the null values are not indexed.
MultiKey, > number of Documents as it also indexes the contents of the tags array.

Using hint

hint({a:1,b:1,c:1})

If we want to use no index and use the cursor that goes through each document
hint({$natural:1})

Geospatial Index

ensureIndex({‘location':’2d’})
db.places.find({location:{$near:[74,140]}})
db.places.find({location:{$near:[74,140]}}).limit(3)

Geospatial Spherical

db.places.find({location:{$near:{$geometry:{type:”point”, coordinates:[-122.166641,37.42]},$maxDistance:2000}}})

db.stores.find({loc:{$near:{$geometry:{type:”Point”, coordinates:[-130,39]},$maxDistance:1000000}}})

Text Based Indexes

db.collectionname.ensureIndex({‘keyOnWhichIndexIsApplied':’text’})

Searching
db.collectionname.find({$text:{$search:’yourSearchCriteriaText’}})

Text Based Indexes, textScore

db.collectionname.find({$text:{$search:’text1 text2 text3′}},{score:{$meta:’textScore’}}).sort({score:{$meta:’textScore’}})

Profiling

db.system.profile.find({ns:/knowstack.employees.}).sort({ts:1}).pretty()

> db.getProfilingLevel()
0
> db.getProfilingStatus()
{ "was" : 0, "slowms" : 100 }
> db.setProfilingLevel(1,200)
{ "was" : 0, "slowms" : 100, "ok" : 1 }

> db.getProfilingStatus()
{ "was" : 1, "slowms" : 200 }
db.system.profile.find({millis:{$gt:1000}}).sort({ts:-1})

Mongostat & mongotop

mongotop will show the places where mongo is spending most of the time.

Aggregation in Mongodb

If the structure of the employee collection is as follows
> db.employees.findOne()
{
	"_id" : ObjectId("54347187dbbd79a1510148b3"),
	"employee_id" : "1000",
	"first_name" : "John",
	"last_name" : "Doe",
	"email" : "JohnDoe@knowstack.com",
	"phone_number" : "408-000-0000",
	"salary" : 713077,
	"year_exp" : 30,
	"designation" : "Senior Director",
	"department" : "Finance"
}
And we want to group by the designation and take the count of employees in each designation
> db.employees.aggregate([{$group:{_id:"$designation",num_employees:{$sum:1}}}])
{ "_id" : "Director", "num_employees" : 2 }
{ "_id" : "Senior Manager", "num_employees" : 3 }
{ "_id" : "Manager", "num_employees" : 7 }
{ "_id" : "Developer", "num_employees" : 17 }
{ "_id" : "Senior Director", "num_employees" : 2 }

similarily if we want to group the employees by the department the query will be as follows

> db.employees.aggregate([{$group:{_id:"$department",num_employees:{$sum:1}}}])
{ "_id" : "Operations", "num_employees" : 2 }
{ "_id" : "Information Systems", "num_employees" : 27 }
{ "_id" : "Finance", "num_employees" : 2 }
> db.employees.aggregate([{$group:{_id:"$department"}}])
{ "_id" : "Operations" }
{ "_id" : "Information Systems" }
{ "_id" : "Finance" }
> db.employees.aggregate([{$group:{_id:"$designation"}}])
{ "_id" : "Director" }
{ "_id" : "Senior Manager" }
{ "_id" : "Manager" }
{ "_id" : "Developer" }
{ "_id" : "Senior Director" }

using compound aggregate

> db.employees.aggregate([{$group:{_id:{"department":"$department","designation":"$designation"},num_employees:{$sum:1}}}])
{ "_id" : { "department" : "Operations", "designation" : "Director" }, "num_employees" : 1 }
{ "_id" : { "department" : "Finance", "designation" : "Director" }, "num_employees" : 1 }
{ "_id" : { "department" : "Information Systems", "designation" : "Senior Manager" }, "num_employees" : 3 }
{ "_id" : { "department" : "Operations", "designation" : "Senior Director" }, "num_employees" : 1 }
{ "_id" : { "department" : "Information Systems", "designation" : "Manager" }, "num_employees" : 7 }
{ "_id" : { "department" : "Information Systems", "designation" : "Developer" }, "num_employees" : 17 }
{ "_id" : { "department" : "Finance", "designation" : "Senior Director" }, "num_employees" : 1 }

Aggregation expressions

$sum
$avg
$min
$max
$push
$addToSet
$first
$last

In the below sample we will run a query to get the following
1. list designations
2. count of employees in each designation
3. total salary for each designation
4. average salary for each designation

> db.employees.aggregate([{$group:{_id:{"designation":"$designation"},num_employees:{$sum:1},sum_salary:{$sum:"$salary"},avg_salary:{$avg:"$salary"}}}])
{ "_id" : { "designation" : "Director" }, "num_employees" : 2, "sum_salary" : 1732139, "avg_salary" : 866069.5 }
{ "_id" : { "designation" : "Senior Manager" }, "num_employees" : 3, "sum_salary" : 1233914, "avg_salary" : 411304.6666666667 }
{ "_id" : { "designation" : "Manager" }, "num_employees" : 7, "sum_salary" : 4680630, "avg_salary" : 668661.4285714285 }
{ "_id" : { "designation" : "Developer" }, "num_employees" : 17, "sum_salary" : 6932954, "avg_salary" : 407820.82352941175 }
{ "_id" : { "designation" : "Senior Director" }, "num_employees" : 2, "sum_salary" : 1618724, "avg_salary" : 809362 }

$avg

The below query derives the average years of experience of employees in each role/designation

> db.employees.aggregate([{$group:{_id:{"designation":"$designation"},avg_experience:{$avg:"$year_exp"}}}])
{ "_id" : { "designation" : "Director" }, "avg_experience" : 28.5 }
{ "_id" : { "designation" : "Senior Manager" }, "avg_experience" : 23 }
{ "_id" : { "designation" : "Manager" }, "avg_experience" : 18.571428571428573 }
{ "_id" : { "designation" : "Developer" }, "avg_experience" : 8.647058823529411 }
{ "_id" : { "designation" : "Senior Director" }, "avg_experience" : 30 }

$addToSet in aggregate queries

> db.employees.aggregate([{$group:
... {_id:{"department":"$department"},
... designations:{$addToSet:"$designation"}
... }}])
{ "_id" : { "department" : "Operations" }, "designations" : [ "Director", "Senior Director" ] }
{ "_id" : { "department" : "Information Systems" }, "designations" : [ "Senior Manager", "Manager", "Developer" ] }
{ "_id" : { "department" : "Finance" }, "designations" : [ "Director", "Senior Director" ] }
> 

$push in aggregate

  db.employees.aggregate([{$group:
... {_id:{"department":"$department"},
... designations:{$push:"$designation"}
... }}])
{ "_id" : { "department" : "Operations" }, "designations" : [ "Senior Director", "Director" ] }
{ "_id" : { "department" : "Information Systems" }, "designations" : [ "Developer", "Manager", "Developer", "Senior Manager", "Developer", "Developer", "Manager", "Senior Manager", "Developer", "Developer", "Developer", "Manager", "Manager", "Developer", "Manager", "Developer", "Developer", "Developer", "Developer", "Manager", "Developer", "Developer", "Senior Manager", "Developer", "Developer", "Developer", "Manager" ] }
{ "_id" : { "department" : "Finance" }, "designations" : [ "Senior Director", "Director" ] }
> 
As you can see in the above result set there is repetition of the values in the designations. Push does not check for duplicates.

$max and $min in aggregate

The below sample code will derive the maximum and minimum salary by designation

   db.employees.aggregate([{$group:{_id:{"designation":"$designation"},max_salary_by_designation:{$max:"$salary"},min_salary_by_designation:{$min:"$salary"}}}])
{ "_id" : { "designation" : "Director" }, "max_salary_by_designation" : 891581, "min_salary_by_designation" : 840558 }
{ "_id" : { "designation" : "Senior Manager" }, "max_salary_by_designation" : 868923, "min_salary_by_designation" : 173005 }
{ "_id" : { "designation" : "Manager" }, "max_salary_by_designation" : 987655, "min_salary_by_designation" : 219682 }
{ "_id" : { "designation" : "Developer" }, "max_salary_by_designation" : 941026, "min_salary_by_designation" : 60314 }
{ "_id" : { "designation" : "Senior Director" }, "max_salary_by_designation" : 905647, "min_salary_by_designation" : 713077 }
> 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Recent Posts


Hit Counter provided by technology news