+N Consulting, Inc.

MongoDB - Don't be so (case) sensitive!

I have a collection of people, named peeps:

db.peeps.insert({UserName: 'BOB'})
db.peeps.insert({UserName: 'bob'})
db.peeps.insert({UserName: 'Bob'})
db.peeps.insert({UserName: 'Sally'})

And I want to be able to find the user named “Bob”. Except I don’t know if the user name is lower, upper or mixed case:

db.peeps.find({UserName:'bob'}).count() // 1 result
db.peeps.find({UserName:'BOB'}).count() // 1 result
db.peeps.find({UserName:'Bob'}).count() // 1 result

But I don’t really want to require my users to type the name the same case as was entered into the database…

db.peeps.find({UserName:/bob/i}).count() // 3 results

Me: Yey, Regex!!!

MongoDB: Ahem! Not so fast… Look at the query plan.

db.peeps.find({UserName:/bob/}).explain()
// ugg, collection scan:
// "winningPlan" : {
// "stage" : "COLLSCAN", ...

Me: Oh, I’ll create an index!


db.peeps.createIndex({UserName:1});
// query again...
db.peeps.find({UserName:/bob/}).explain()
// Ah! IXSCAN!
// "winningPlan" : {
// "stage" : "FETCH",
// "inputStage" : {
// "stage" : "IXSCAN", ...

// ignore case, still uses index
db.peeps.find({UserName:/bob/i}).explain()
// Ah! IXSCAN!
// "winningPlan" : {
// "stage" : "FETCH",
// "inputStage" : {
// "stage" : "IXSCAN", ...

Me: Yey!!!

MongoDB: Dude, dig deeper… and don’t forget to left-anchor your query.

// Run explain(true) to get full blown details:
db.peeps.find({UserName:/^bob/i}).explain(true)
// "executionStats" : {
// "executionSuccess" : true,
// "nReturned" : 3,
// "executionTimeMillis" : 0,
// "totalKeysExamined" : 4, // <<<=== Oy! Each key examined.
// "totalDocsExamined" : 3,

Me: Yey?

MongoDB: Each key in the index was examined! That’s not scalable… for a million documents, mongo will have to evaluate a million keys.

Me: But, but, but…

db.peeps.find({UserName:/^bob/}).explain(true)
//"executionStats" : {
// "executionSuccess" : true,
// "nReturned" : 1,
// "executionTimeMillis" : 0,
// "totalKeysExamined" : 1, // <<<=== Ok,
// "totalDocsExamined" : 1, // <<<=== Not Ok...

Me: This is back to exact match :-) Only one document returned. I want case insensitive match!

Old MongoDB: ¯\(ツ)… Normalize string case for that field, or add another field where you store a lowercase version just for this comparison, then do an exact match?

Me:

New MongoDB: Dude: Collation!

Me: Oh?

Me: (Googles MongoDB Collation frantically…)

Me: Ahh!

db.peeps.createIndex({UserName:-1}, { collation: { locale: 'en', strength: 2 } )

db.peeps.find({UserName:'bob'}).collation({locale:'en',strength:2}) // 3 results!

// "executionStats" : {
// "executionSuccess" : true,
// "nReturned" : 3,
// "executionTimeMillis" : 0,
// "totalKeysExamined" : 3, // <<<=== Good! Only matching keys examined
// "totalDocsExamined" : 3,

Me: Squee!

MongoDB: Indeed.


Collation is a very welcome addition to MongoDB.

You can set Collation on a whole collection, or use it in specific indexing strategies.

The main pain point it solves for me is the case-insensitive string match, which previously required either changing the schema just for that (ick!), or using regex (index supported, but not nearly as efficient as exact match).

Beyond case-sensitivity, collation also addresses character variants, diacritics, and sorting concerns. This is a very important addition to the engine, and critical for wide adoption in many languages.

Check out the docs: Collation

Notice

We use cookies to personalise content, to allow you to contact us, to provide social media features and to analyse our site usage. Information about your use of our site may be combined by our analytics partners with other information that you’ve provided to them or that they’ve collected from your use of their services. You consent to our cookies if you continue to use our website.