Kirill Zonov

Chop-chop, MongoDB! Leveraging indexes power. Part 1

November 06, 2017 | 9 Minute Read

As you may know, PostgreSQL provides you four index types: B-tree, Hash, GiST and GIN. They all named the way that if you don’t know ‘em you’ll never get which one do you need. In MongoDB indexes are named in a more human-readable way. Here they are:

  1. Single field index.
  2. Compound index.
  3. Multikey index.
  4. Text index.
  5. Hashed index.
  6. 2dsphere, 2d, geoHaystack indexes. Since I’m using Mongo for more than a year now, I worked with few of them and will elucidate you the most commonly used ones.

Hedgehog who loves Mongo

I’ll start with the most common one:

Single field index

It’s pretty obvious, nah? You have a collection, e.g. Users, they have emails. You use this field to search them, so you have to make this field indexed. And most likely when you’ll search for them by email, you won’t need to filter by other fields. So long story short - it’s a best case to use single field index. And here how is it looks:

db.users.createIndex( { email: 1 } )

As you may encounter, there is a 1 in the expression. It means that you create an index in ascending order. If you provide -1, you’ll create a descending index. Honestly, it makes no difference in case of single field index, so just consider using only ascending. If you want to get the hang of why is it - here is some explanation. Mongo uses B-Tree structure to build an index. So the searching time is always O(log(n)). And it doesn’t matter, what you’ll have on your rightest node - zzzzz@somemail.com or aaaaa@somemail.com, it will always take the same time to fetch it. Same works for ranges, if your records are close in ascending index, they will also be close to each other in descending one. And if they are in different buckets - it will be slower anyway. No more tricks here, let’s go further.

Compound index

I bet you have no idea, what does this index intend to index. What, you have?! No waaay. Ok, let’s imaging again our precious users. They have country code and city fields. Both are strings (honestly doesn’t matter for our case, just why not?). And we are building e.g. a promo website for fancy but big coffee shop network with social elements. So you want to suggest your clients when they enter your website to join other locals from their city in their coffee party in one of your cafes. You may think that I’m overcomplicating and you don’t need multikey index here, you can just index city and it would be enough. Huh, not so fast! Suppose I’m from Moscow, Russia and you want to show me my neighbors (kinda joke, Moscow has 10M+ inhabitants, so they wouldn’t be really my neighbors, but anyway). But you also have a cafe in Moscow, Idaho, USA. Bang! So, if no more doubts, let’s create an index:

db.users.createIndex( { country: 1, city: 1 } )

Try to answer the question, can you swap the order of country and city keys in the expression above? (you better do not) As you see - you can also set ascending/descending order here. It doesn’t matter if you intend to search for users in a specific city in a specific country. But if you will want to search for users in a subset of countries with sorting in ascending order and subset of cities, sorted in descending order, but your city indexed in ascended order - here is what will happen:

> db.customers.find().sort({"country": 1, "city": 1}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.customers",
		"indexFilterSet" : false,
		"parsedQuery" : {
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
	"country" : 1,
	"city" : 1
},
"indexName" : "country_1_city_1",
"isMultiKey" : false,
"multiKeyPaths" : {
	"country" : [ ],
	"city" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
	"country" : [
		"[MinKey, MaxKey]"
	],
	"city" : [
		"[MinKey, MaxKey]"
	]
}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "595d0c8f0fb2",
		"port" : 27017,
		"version" : "3.4.10",
		"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
	},
	"ok" : 1
}
> db.customers.find().sort({"country": 1, "city": -1}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.customers",
		"indexFilterSet" : false,
		"parsedQuery" : {
		},
		"winningPlan" : {
			"stage" : "SORT",
			"sortPattern" : {
"country" : 1,
"city" : -1
			},
			"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
	"stage" : "COLLSCAN",
	"direction" : "forward"
}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "595d0c8f0fb2",
		"port" : 27017,
		"version" : "3.4.10",
		"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
	},
	"ok" : 1
}

Looks verbose, but don’t be afraid, the most important thing here is that if you have an index in ascending order and try to sort in the descending - you just won’t be using the index at all :( So always pay attention to this! One more cool thing here:

> db.customers.find().sort({"country": 1}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.customers",
		"indexFilterSet" : false,
		"parsedQuery" : {
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
	"country" : 1,
	"city" : 1
},
"indexName" : "country_1_city_1",
"isMultiKey" : false,
"multiKeyPaths" : {
	"country" : [ ],
	"city" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
	"country" : [
		"[MinKey, MaxKey]"
	],
	"city" : [
		"[MinKey, MaxKey]"
	]
}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "595d0c8f0fb2",
		"port" : 27017,
		"version" : "3.4.10",
		"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
	},
	"ok" : 1
}

I currently don’t have an index for countries explicitly. But because I have the compound index for country and city - I can use it to search only for countries. The same works if you have more than two parts in the compound index. F.e. if you have {a: 1, b: 1, c: 1} index, you can search in {a:1, b: 1} and this index will be used. Lovely.

I wanted to cover the whole topic of Mongo’s indexes in this post, but it turned to be too cumbersome, so I decided to split it into a few posts. In further topics, I’ll cover text indexes, hashed indexes, indexing array values, indexing embedded documents and maybe will go somewhere deeper. If you enjoy this post - please like and share it, I luv it ^__^ UPD. Here you can read my next post about indexes in MongoDB, this time about Multikey indexes.