+N Consulting, Inc.

2024-12-05T17:08:07.000Z

The MongoDB Durability and High Availability Advantage

In this era of digital transformation, businesses rely on data-driven applications and data driven insights to make informed decisions. The central piece of this is data. Moreover, the unspoken features of the data are widely understood to be threefold: That the data is accurate, that the data is available, and that the data is durable. The following discussion will touch on the last two features. When selecting a database to serve your real-time applications, the reliability and availability of your databases are crucial. How do we acheive that? How do we determine the levels of availability and durability appropriate for our application? Which database provides us with the strategic advantage that drives agility and rapid innovation without forgoing the three key data features? This article will examine the MongoDB database architecture with respect to those questions, and help shed light on some of these aspects.

MongoDB provides increased durability and high availability compared to traditional databasesThis translates into improved data integrity, reduced downtime, and higher system resilience.

About Data durability and high availability

Availability and durability are key concepts that all databases address, or at least should address to be considered for production workloads. Especially in the world of OLTP (Online Transaction Processing), data which is not guaranteed to be durable and available makes for low utility. Before digging into MongoDB’s approach to this, let’s define these two terms:

Data durability refers to the ability of a database system to persist data even in cases of hardware failures, operating system failures. A database system that guarantees durability should have the same data that was available pre-crash completely present and usable post crash recovery. The database system ensures that data is not lost due to abnormal system behavior. If power is lots, a disk is corrupted, the operating system crashes unexpectely, or the database process crashes - data which was written prior to any such event should not be lost.

High availability refers to the ability of a database system to remain accessible and operational, even in the event of hardware failures or network outages. The word “system” in “database system” is key here, since a certain redundancy is assumed here. Whether by having a chassis stuffed with extra hardware (CPU, Memory, I/O bus, disks, power supply, network, etc.) or by enrolling multiple host machines for the database system as a whole, this feature requires redundancy to a degree where no single resource failure causes the system as a whole to stop fulfilling its function. This is typically accomplished through techniques like replication, load balancing, and automatic failover. A highly available database system minimizes downtime and continues to funcion in the face of system component failure.

Traditionaly, relational database systems struggled to provide both strong durability and high availability. While they offer robust data consistency, their high-availability offering was often complex to configure, complex to manage, and often required manual intervention or considerable downtime to stabilize in the face of failure. The bigger the system, the more pronounced this became. This complexity by itself sometime led to failure by itself, increasing - rather than decresing - the risk of data loss and system downtime.

Here are a few examples of traditional database system approaches, with noted risks to durability and high availability.

Setup	Aspect	Risk
RAID Subsystem	`H/A`, `Durability`	RAID subsystem failure, connection failure, enclusure failure.
Block Replication	`H/A`, `Durability`	Replication lag, data loss, integrity if blocks are corrupt, consistency issues, difficult to determine logical data point of recovery.
Logical Replication	`H/A`, `Durability`	Replication lag, cross-node consistency.
Replication - hot standby	`H/A`, `Durability`	Extended failover period, manual intervention, replication lag, non-deterministic data integrity
Replication - bidirectional	`H/A`, `Durability`	Data consistency, slower writes, complex conflict resolution
Backup	`Durability`	Manual recovery, lengthy downtime, potential data loss
Backup	`Durability`	Manual recovery, lengthy downtime, potential data loss

Database system designers attempt to address all of these risks to some degree, and have developed remarkable point solutions to specific risks. The question one faces is though: In totality, does my chosen system provide an adequate answer to these risks?

Before moving to how MongoDB’s adresses durability and high availability, a note about ACID. The ACID acronym characterizes desirable system propertis of Atomicity, Consistency, Isolation, and Durability. While durability is a focus of this article, the first 3 terms can use a bit of framing.

The properties discuss a “transaction” - a re-used term which traditionally is interpreted as a form of write-bundling. In that interpretation, read and write operations within a single transaction would either succeed of fail as a unit. When successful, all writes within a transaction become visible “at once” and readers would see them as the new state of data. But in traditional interpretation of these terms, things were quite narrow. Atomicity was focused on a database process writing locally, on a single host. It did not marshal who transactions on a multi-host system or systems with hot-standby replication would apply writes. Consistency was designed to guarantee even less: When a multy byte piece of data is written, it should appear to be written completely. For example if a long string is written into a column in a single row, it would have to be completely written, and not in parts as far as the users are concerned. This therefore had more to do with the integrity of multi-byte data not being interrupted or corrupt. No “half strings”, no long numbers being partially written due to OS time-slicing or any parallel conditions. This has not in any way extended to writing the same data on two hosts. The idea of “Eventual consistency” had to do with the more collaborative clustering techniques modern systems (MongoDB among them) use to deliver a highly available database system. And finally, isolation: The property of a transaction spanning multiple logical operations (say check row condition, write data to multiple rows and tables, etc). Again, this was designed as a property of a single node database, and protects data within a transaction from parallel operations by other users at the same time. It’s the notion of a view of the data which is maintained by the system to support the atomicity: not to expose effects of transactions not yet commiteted to others, and not to be affected by neighboring transactions of others during the execution of our own transaction.

When more modern multi-node database systems came on the scene, attempts to fault them for not being “ACID Compliant” imposed the notion that single-node ACID guarantees would magically extend to multi-node clusters. Indeed, distributed locking and other pessimistic concurrency mechanisms did emerge, but were typically to slow on write and are largely disfavored. The understanding of “Eventual Consistency” have evolved thankfully, and now we are talking about applications being able to have consistent reads rather than insisting that replication magic would alwas ensure any piece of data written to the system is syncronized under lock. Without this understanding, it’s hard to discuss why and which databases fulfil on ACID and on consistency.

MongoDB’s Durability Mechanism

The ability to have data survice unforseen crashes and various failures in MongoDB relies on the internal journal mechanism, whereby writes are applied first to disk in raw form and then processed later into the main storage blocks. This ensures that if the host crashed before data made it to the main storage, the transactions are recovered from the journal upon re-start, and applied before the database is open for any operation.

Once data is on disk - whether in journal or main storage form - it is considered durable. But this is just the beginning. In a replica set cluster, MongoDB further applies those writes to other members of the cluster, thereby protecting not only against temporary crash, but also from complete, irrecoverable failure of a host (or even several hosts). Having the data distributed among several different hosts allows MongoDB to offere higher durability than any single-instance database system, especially when the cluster hosts reside in different geographical locations.

Insofar as transactional isolation, MongoDB supports multi-document transactions (and has always supported single-document transaction isolation too). You may read more about transactions in the official documentation, especially atomicity and isolation.

The journaling mechanism allows customers to pick the host and cross-host transaction consistency levels. At minimum, applications can send un-acknowledged write operations. This would be fast, but does not guarantee durability. In face of an error, the application would not have received any guarantee that the operation carried out. From there on, write concerns of j or w:majority would ensure that the writing application would receive an acknowledgement only after data has been locally applied to the journal, and therefore would withstand a crash.

But as mentioned, MongoDB can extend the durability guarantee beyond a single host, by using a Replica Set configuration - something traditional database systems struggle to provide.

The MongoDB Replica Set

A replica set in MongoDB is an arrangement of 3 or more hosts, where one of the replica set’s members is writable at any given point in time, and other members mirror the data from that writable member. In case of failure of the primary member, a secondary node takes over the primary role quickly, automatically, and without manual intervention.

The data replication mechanism relies a special linarized log of logical write operations called the OpLog. Every write received by MongoDB includes an entry into the OpLog as well as the Journal. Operations in the OpLog are written in the sequence in which they were applied to that server. Therefore, any secondary member reading and writing them locally in the same order would be guaranteed to end up with the same exact state the primary had when applying that write.

With this mechanism, MongoDB can now guarantee a higher level of durability. A write which is sent over to the system with a write concern of w:majority and is acknowledged by the primary guarantees that the data already resides elsewhere, at a majority of secondaries and not just on one server. At that point, a catastrophic complete and irrecoverable loss of the primary would still not lose our data.

MongoDB employs a single-writable node in a replica set, meaning that applications must connect and write to that primary. But applications do not have to implement monitoring or connection logic. All official software drivers provide cluster discovery, connection, connection pooling, and routing out of the box. When failure occurs for a primary node, a secondary node becomes primary, and applications are largely oblivious to that fact. In cases where an in-flight write has been interrupted due to primary failure, applications will receive an error and can re-try their transaction again, successfully once a new primary is available. The whole failover can take seconds. Traditional hot-standby cutovers and other recovery systems in RDBMS would take tens of seconds, minutes, or even require manual intervention. This level of high availability is much more difficult to maintain or even achieve.

Eventual Consistency

At this point the concept of Eventual Consistency typically comes up. This entails the observation that at some point data may be written at the primary and be visible at the primary, but is not yet visible or even present at some secondary node. Luckily - by design actually - MongoDB has addressed this with what is called a Read Concern. A Read Concern is an mode that an application sends along with commands, instructing the replica set node to only consider certain data while processing. Specifying a Read Concern of majority ensures that mongo will reply to a query only with data which has already been acknowledged by the majority of nodes. This guarantees data is both durable and highly available. For data written with Write Concern w:majority, this reciprocal Read Concern provides a “read your own writes” behavior which generally satisfies consistency needs across the board.There are several Read Concern levels, and nuances regarding transactional isolation levels, but the general idea here is: Yes, you can have consistent reads with MongoDB. Surprisingly though, not all NoSQL systems - let alone RDBMS - provide this level. Multi-primary (or bi-directional replication) databases don’t always allow for this. Hot-standby and replication arrangement at the storage level struggle to provide this too. Some systems boast speed, yet fail to update indexes at the same time as the writes - causing furhter conusion attempting to fulfill on consitency.

Summary

MongoDB’s Replica Set mechanism provides both consistency and high availability not common in other distributed database systems. Further, this mechanism provides stronger durability and availability than any single-node database can provide. It therefore offers organizations an attractive database solution that can support the most demanding needs.

AI Query Generator Added to MongoDB Compass

2023-10-04T17:38:06.000Z

MongoDB’s aggregate command (and its predecessor find) take some time to get used to.

This is seen as a barrier to entry for those coming from a relational background as well as those who are early in the journey of MongoDB adoption.

With all the new AI and awesome natural language tools, can we get some help for generating queries? “Yes!” says the Compass team.

Announced at the .Local event in London, Compass is rolling out a natural language text prompt which translates your natural language input into a properly stated query using either the find syntax or an aggregation pipeline (MQL).

I ran some experiments to kick the tiers on this feature.

Taking a Test Run

The following are prompts and results I ran against a collection which has documents like the one below. Documents have different field values. The fields in the sub-document contact are sparse: a document may have one, two, or all of the fields cell, email, and fax. These fields - if missing - are absent (not assigned null).

{
  "_id": 6,
  "name": "Elmer",
  "status": "prospect",
  "contact": {
    "cell": "07969 773415",
    "email": "wupah@what.not",
    "fax": "01949 525019"
  }
}

Prompt

How How many Elmers are there?

Generated Query

[
  {
    $match: {
      name: "Elmer"
    }
  },
  {
    $group: {
      _id: null,
      count: {
        $sum: 1
      }
    }
  }
]

Success. The query generated answers the prompted text as intended. Note that the field name is nowhere in the prompt. The tool understood that we’re looking for Elmers (a proper name, plural) and figured it should look in the field name. Nice!

Prompt

What are the 3 most common names of prospects with the status “customer” who have an email in the system?

Generated Query

[
  {
    $match: {
      status: "customer",
      "contact.email": { $exists: true }
    }
  },
  {
    $group: {
      _id: "$name",
      count: { $sum: 1 }
    }
  },
  {
    $sort: { count: -1 }
  },
  {
    $limit: 3
  },
  {
    $project: {
      _id: 0,
      name: "$_id",
      count: 1
    }
  }
]

Correct. The AI understood the AND filter for documents with the status set to “customer” and email existing.It used the common trope for “top N by V”, which groups by the value of the field (name) and produces a count per grouping, then $sorts by the count descending, then $limits to the intended number.

I noted that the generated queries use the old-school {$sum: 1} trick to produce the count. Understandable since Compass may be used on older MongoDB version that lacked the $count group accumulator operator, but this is a bit more cryptic for the untrained when reading the resultant pipeline.

Let’s try some more:

Prompt

Which customers have and odd id number, and a name that starts with D?

Generated Query

[
  {
    $match: {
      _id: { $mod: [2, 1] },
      name: /^D/
    }
  }
]

Nicely interpreted. The $mod operator is a bit obscure, but fits the bill. The name is compared with a regex match on the expression /^D/. Left-anchored regex matches are the lesser evil of non-anchored matches, and can use indexes. But maybe a range expression such as {name: {$gte: "D", $lt: "E"}} would perform just as well?

I decided to change one little thing in the query: change from an upper case D (which is by the way how my data is - names are capitalized in the dataset) to a lower case d.

Prompt

Which customers have and odd id number, and a name that starts with d?

Generated Query

[
  {
    $match: {
      $and: [
        {
          _id: { $mod: [2, 1] }
        },
        {
          name: /^d/
        }
      ]
    }
  },
  {
    $project: {
      name: 1
    }
  }
]

Interestingly, the tool decided to add a $project stage to the pipeline. Re-reading the prompt, I can see that I under specified which fields to return. But what caused the previous query to not $project at all and this one to do so is a mystery to me. Not strictly wrong, but a mystery nonetheless.

A fix to this is to explicitly specify what to show.The prompt Which customers have and odd id number, and a name that starts with d? Give me their name labeled "Customer Name" and their id labeled "Customer Id" only. indeed generated the expected pipeline.

[
  {
    $match: {
      $and: [
        { _id: { $mod: [2, 1] } },
        { name: /^d/i }
      ]
    }
  },
  {
    $project: {
      "Customer Name": "$name",
      "Customer Id": "$_id"
    }
  }
]

The projection renames the fields as the prompt described.

My standing conclusion is to be as explicit and concise as possible with the prompt. I guess this should be expected.

Prompt

How likely are people to have both a cell phone and a fax number?

Generated Query

[
  {
    $match: {
      "contact.cell": {
        $exists: true
      },
      "contact.fax": {
        $exists: true
      }
    }
  },
  {
    $group: {
      _id: null,
      count: {
        $sum: 1
      }
    }
  }
]

Chalk this one to “not correct”. The prompt intent was to generate a ratio or number representing the chance or likelihood of a document to contain both fields. The generator understood the and condition correctly in the $match stage. But it failed to produce an aggregation which compares the count of documents matching the criteria vs. those which do not.

Non-English

Many people speak English as a second language. But many do not. How about typing in your native tongue?

I gave it a whirl with Hebrew. Not remotely a popular language, so I can’t imaging the team built specific support for it.

This query asks to find those who are named “יוסי“.

Prompt

מצא לי מי ששמם יוסי

Generated Query

{"name": "יוסי"}

The generator correctly figured out the pluralized word for “name” in my query, and determined the field to search against is name. It understood the value sought was “יוסי“, and kept the text as is - again, correct. It did not try and translate the value into english (Yossi), but that’s a good thing. I would expect that if I knew the name was in english, i’d maybe type a hybrid question such as

Prompt

מצא לי מי ששמם Elmer

Generated Query

{"name": "Elmer"}

Compass did the needful and produced my expected query. A note here is that BiDi editing isn’t perfect… Hebrew is written left to right, and the textbox moved the word “Elmer” to the very “end” of the sentence from an English perspective, but a native Hebrew speaker would expect it to be on the very left, at the end from a left-to-right perspective.But the results were correct, and I won’t argue with “it worked.”

How about some aggregation? The Hebrew query asks how many people have an email but not a cell phone.

Prompt

לכמה אנשים יש דואל אבל אין סלולרי

Generated Query

[
  {
    $match: {
      "contact.cell": { $exists: false },
      "contact.email": { $exists: true }
    }
  },
  {
    $group: {
      _id: null,
      count: { $sum: 1 }
    }
  }
]

Correct, again. The Hebrew prompt used the acronym term for “email” (דואל) and the common word for “mobile phone” or “cell phone” which is “cellular” (סלולרי). Not only did it match the terms to the fields cell and email, it also figured out correctly that the fields are in the sub-document contact and generated the dot-path to the fields.

Conclusion

I have reservations about using natural language to describe queries. For one, spoken language was not designed to be concise, and the range of misinterpretation that can be cause by using imprecise language as well as ambiguities within the language itself can yield wild results. I was surprised how well this new feature performed. Color me impressed.Not only has the generator created my intended queries, it picked up on nuances that I did not imagine it would.

Can there be test cases where it comes up with the wrong answer? Yeah, sure. But id performed remarkably well on what I gave it, and in doing so proved it is suitable for a wide enough audience for a wide enough set of reporting and analysis needs.

Am I going to use this daily? Time will tell. For simple quests? Maybe. For more complex ones? Maybe also - as I can see it creating a skeleton query for me to approve or refine.Does this replace learning and knowing the breadth and depth of MQL - no. But for newcomers and occasional analysis this may just do the trick.

Give it a try, see what you think!

Compound Wildcard Indexes in MongoDB 7

2023-09-20T16:31:24.000Z

What

Wildcard indexes have improved in several ways in MongoDB 7.0.

Notabley, wildcard indexes can now be compound, including additional non-wildcard fields.

Previously, wildcard indexes only acted as a single-field index on each indivudual target field, stored in a single index which included those fields.

Newly supported is the ability to include both wildcard fields and named individual fields in the same index.

This capability supports common use cases such as the attribute-pattern document design. In such cases, a sub-document contains a set of keys, but the presence of the keys varies from document to document.

Consider the case of the followin documents describing a prospect record in a CRM system

{ _id: 35, name: 'Francis', status: 'customer', contact: {} }

{ _id: 36, name: 'Connor',  status: 'customer', contact: { cell: '07787 304928', fax: '0191 558 5860' }}

{ _id: 1,  name: 'Ethan',   status: 'customer', contact: { cell: '07943 819337' }}

Francis has no contact fields at all - just a name.
Connor has both a cell and a fax .
Ethan has only a cell phone number.

A wildcard index on contact.** would allow single-field match against one of the contact fields.

But query filtering on the status field and one of the wildcard fields would not be supported without a compound wildcard index.

Using a Compound Wildcard Index

First, create an index by supplying both a wildcard expression and an additional plain (non-wildcard) field(s).

db.prospect.createIndex({ "contact.$**": 1, status:1 }, {name: "idx_compound_wild"})

As with other compound indexes, the key structure supports key prefixes that include the first, or first and second fields.

This query leverages the compound index:

db.prospect.find({'contact.cell':'07408 926850', status:'customer'})

The query includes the a full prefix in the index: a single wildcard field contact.cell, and the non-wildcard field status.

Running .explain(true) on this query shows a winning plan that states our index named idx_compound_wild was used. The indexBounds entry shows that both the contact.cell and status fields were considered when processing the query agains the index.

"winningPlan": {
      "queryPlan": {
        "stage": "FETCH",
        "planNodeId": 2,
        "inputStage": {
          "stage": "IXSCAN",
          "planNodeId": 1,
          "keyPattern": {
            "$_path": 1,
            "contact.cell": 1,
            "status": 1
          },
          "indexName": "idx_compound_wild",
          "isMultiKey": false,
          "multiKeyPaths": {
            "$_path": [],
            "contact.cell": [],
            "status": []
          },
          "isUnique": false,
          "isSparse": true,
          "isPartial": false,
          "indexVersion": 2,
          "direction": "forward",
          "indexBounds": {
            "$_path": [
              "[\"contact.cell\", \"contact.cell\"]"
            ],
            "contact.cell": [
              "[\"07408 926850\", \"07408 926850\"]"
            ],
            "status": [
              "[\"customer\", \"customer\"]"
            ]
          }
        }

No Index Intersection

Index intersection is a strategy where more than one indexed fields are stated in a filter, and the result is computed by perforing a set intersection on the keys of 2 indexes.

Intersection seems to make sense when two separate simple indexes are at play on two separate fields. What about wildcard index? Will the mongo query planner choose to “intersect” the same wildcard index with itself somehow? Do we expect it will be smart about choosing to scan for two values, one found in a certain path contact.cell, then the second found in a different path contact.fax - both of which are contained with the same index? The short answer: no!

Consider the following query:

db.prospect.find({ 'contact.email': 'madtiwvab@what.not', 'contact.fax': '0121 062 9173'})

The query filters on two separate wildcard fields email and fax.

The query planner shows the following output:

"winningPlan": {
      "queryPlan": {
        "stage": "FETCH",
        "planNodeId": 2,
        "filter": {
          "contact.email": {
            "$eq": "madtiwvab@what.not"
          }
        },
        "inputStage": {
          "stage": "IXSCAN",
          "planNodeId": 1,
          "keyPattern": {
            "$_path": 1,
            "contact.fax": 1,
            "status": 1
          },
          "indexName": "idx_compound_wild",
          "isMultiKey": false,
          "multiKeyPaths": {
            "$_path": [],
            "contact.fax": [],
            "status": []
          },
          "isUnique": false,
          "isSparse": true,
          "isPartial": false,
          "indexVersion": 2,
          "direction": "forward",
          "indexBounds": {
            "$_path": [
              "[\"contact.fax\", \"contact.fax\"]"
            ],
            "contact.fax": [
              "[\"0121 062 9173\", \"0121 062 9173\"]"
            ],
            "status": [
              "[MinKey, MaxKey]"
            ]
          }
        }
      }...

The above is a bit awkward. The filter term mentions the contact.email field alone. The indexBounds field mentions the contact.fax field only. Though one might intuit that the strategy is to us the index to scan the index contact.fax first, then filter the entries in the index by contact.email, it is neither clear that this is the case nor expected. One would expect that the index contains the list of documents under either of the keys, and therefore an AND query would hit the index key structure twice and do some nested loop join or something similar.

Let’s try specifying three of the wildcard fields in the index, and see if things become any clearer:

db.prospect.find({
  'contact.cell': '07404 190465',
  'contact.email': 'kagaje@what.not',
  'contact.fax': '01995 364874'
}).explain("queryPlanner")

The plan shown below seems to imply the same general strategy. The index bounds only mention one of the three query terms, then the filter mentions the other two.

"winningPlan": {
      "queryPlan": {
        "stage": "FETCH",
        "planNodeId": 2,
        "filter": {
          "$and": [
            {
              "contact.cell": {
                "$eq": "07404 190465"
              }
            },
            {
              "contact.fax": {
                "$eq": "01995 364874"
              }
            }
          ]
        },
        "inputStage": {
          "stage": "IXSCAN",
          "planNodeId": 1,
          "keyPattern": {
            "$_path": 1,
            "contact.email": 1,
            "status": 1
          },
          "indexName": "idx_compound_wild",
          "isMultiKey": false,
          "multiKeyPaths": {
            "$_path": [],
            "contact.email": [],
            "status": []
          },
          "isUnique": false,
          "isSparse": true,
          "isPartial": false,
          "indexVersion": 2,
          "direction": "forward",
          "indexBounds": {
            "$_path": [
              "[\"contact.email\", \"contact.email\"]"
            ],
            "contact.email": [
              "[\"kagaje@what.not\", \"kagaje@what.not\"]"
            ],
            "status": [
              "[MinKey, MaxKey]"
            ]
          }
        }
      },...

Well, the plan is not lying. MongoDB does indeed choose only one of the query terms to serve as the value sought in the index. It evaluates the selectivity of the terms in the query, and picks one it deems most suitable.

This still doesn’t answer the question why it doesn’t compare the other terms agains the index either. Which brings the topic of index intersection. MongoDB introduced an implementation of index intersection at some point, then reversed the default implementation of it to a degree that pretty much prevents intersection from every being excercised. Turns out that the performance gains were rather low for the intersection candidates as compared to other strategies. Worse: the planner would often pick intersection in cases where other strategies would have performed way better. In other words: Index intersection turned out to be the wrong answer too often, and would actually hurt performance.

But this is a single index you say - wouldn’t intersection make sense in this particular wildcard scenario? The way I like to think about it, a wildcard index is actually several indexes in a trench-coat pretending to be a single index. Each wildcard field value is processed into it’s own index structure, but all of those structures are stored together as one. A hint to this notion is that the planner outputs the $_path term, stating which wildcard field within the index is targeted.

The addition of non-wildcard fields to the index would then shape the key prefixes to contain the additional non-wildcard value so that the index is more selective and the (theoretical) need for intersection is reduced.

Conclusion

Compound wildcard indexes are a welcome addition to MongoDB 7.0. Having no intersection mechanism doomed many queries to only partially use indexes and resort to loading documents based on a single field, then filtering them further. With this feature, the number of documents loaded can be reduced. Reducing I/O should help many workloads. Whether it does for your workload is subject to - as always - experimintation and monitoring. Give it a try, hope it helps!

For limitations and further details, see Compound Wildcard Indexes

Increase Security by Limiting Access to Atlas

2023-03-16T19:20:20.000Z

Security is a huge topic of immense importance. Ironically, it’s all too common to sacrifice security for the sake of convenience.

This is often the case when it comes to network access security: we are quick to open up network access “just for now”, but fail to close things down later. This increases the exposure surface into the protected resource (A MongoDB Atlas cluster for example) indefinitely. But it doesn’t have to be that way.

TL;DR: Use expiring network access permissions for IPs instead of permanent ones.

How to Make it Better

For a while now, Atlas had a simple measure that helps clean up temporary access grants. Instead of the network access permission lasting until you explicitly delete it, you can set it to expire automatically and get removed.

Flip the toggle at the bottom left to enforce a time limit on the access grant. Pick a time frame: 6 hours, one day, or a week - the shorter the better. Atlas will automatically clean up the entry and close the IP access after that duration elapsed.

Some might ask “But what if I want to keep it open longer?”

Here’s my cheat-sheet:

IP Address Space	Access Type	Auto-Expire?
A. Private (10. …, 192.198. …)	Production Application	No
B. Private (10. …, 192.198. …)	Interactive	Yes!
C. Public IP / Block	Interactive	Yes!
D. Public IP / Block	Production Application	Yes!

This cheat sheet is of the paranoid opinion “Don’t allow public IP access to Atlas.” But paranoia good for security.

Here are some supporting thoughts. They all share the following core belief:

Sacrifice convenience to gain security

With that in mind, lets talk about interactive vs. application access.

Interactive Users

An interactive user is a human running MongoDB Compass, or the shell , or program during development.

This is the danger zone. “I just need to do my job now“ motivates allowing such traffic. But tomorrow is not now anymore… better expire it.If you are the grantor (you have an Atlas role allowing you to add the network), then you can just as well grant yourself permissions again after expiration. Sacrifice convenience.

If the interactive user does not have access-control privileges, the nuisance of provisioning access is higher. But then the question arises: why is access to Atlas done over the public IP?

No peering? Ask to set it up.
No private-link? Ask to set it up.
Atlas in different zone? Ask to set up peering.
Private IP block? Expire anyway.

Why expire an interactive IP anyway? Well, because it provides another surface for would-be attackers. Interactive users have potential OS exposure to compromise via channels such as email, malware, random downloads and just plain old neglect.

For Applications

An application is a program or process running unattended.

Seek to route traffic via either peering or a private link. This would turn the traffic into non-public traffic, and therefore fall under use case #1.

Auto-Expire in Other Ways

The alternatives to auto-expiring automatically existed for a while. Shun the “I’m personally responsible so it’s ok for me to have permanent access” argument.

If the DBaaS or DB doesn’t auto expire, a good sys-admin or DevOps practice would automate tearing down old entries. Still valid: Atlas management API allow for creating and removing network access, so it’s a viable option.

Manual “expiration” entails manual auditing, and manual removal of such entries. This is both error prone and less reliable in general. But if that is all your organization is willing to do for security - so be it. Chalk that one up for “acceptable within parameters”.

Permanent Access

As you surmise by now, the approach in this post strongly shuns permanence, especially for public IP space.

What about permanence within private space? Those should also be expired! But in this case, automatic expiration is a bit trickier. A production application loosing connectivity is a risk not many would take. There can be arrangements that either audit and expire connections with no observed traffic, or which renew/ re-provision access on a cadence as long as some sign of life is detected. But such automation is a chance for error with higher risk to availability of the production application. So letting practicality win here: revert to manual auditing of such entries. Yes: they are internal traffic, but still another open vector.

Bottom line: Your data is priceless, consequences are high - keep it as safe as possible. Sacrifice convenience to gain security.

Common Rollup $unionWith and $merge

2021-08-19T17:17:38.000Z

TL;DR

Schedule periodic rollup and aggregation on historic data using $merge into a collection, then run ad-hoc reports on cumulative live data for the current period, $unionWith historic data.

WUT???

We often need to produce reports summarizing sales over time which include both current partial period, such as “sales this year to date”. While it’s perfectly fine to run ad hoc queries on the data set, this can become a performance issue when report frequency increases or data sets become extremely large or the periods are very long (eg: all sales to date since inception).

It’s been common practice to pre-aggregate historic data and persist it to some collection. This lets us query only the most recent yet-to-be-rolled-up period, and add it up to historic data queried from the other cumulative collection. Trivial concept, nothing terribly new here. So where’s the catch? Details (it’s always the details…):

sequenceDiagram    autonumber    participant a as Query    participant ms as [monthlySales]    participant s as [Sales]    a->>+ms: January - July?    ms-->>-a: <>    a->>+s: Current Month?    s-->>-a: <>    a-->> a: Union

The implementation of this technique relies on 2 high level operations: the

A periodic calculation to populate the historic data.
An ad-hoc query over both historic and current data.

Historic Data

The technique used with MongoDB to persist historic data used to be mapReduce in the old days. Very old days. Using mapReduce allowed you to persist the results into a collection all server-side. Before or without that, you’d need to have client code computing a result-set, then turning around and writing it explicitly back into MongoDB.

With introduction the Aggregation Framework, more efficient query operators were introduced, as well as the $out pipeline operator. The issue with that though, was that $out did not play very well with sharded clusters, and did not allow for incremental data insertion. Instead, $out wiped the output collection clean and replaced its content with the new data. This implied that for a long history, you would have to compute again over a growing number of documents - not very efficient, and exacerbates the problem we were trying to avoid in the first place. You could of course run a query just across a year say, and store it into a yearly collection. But then the problem of summing up across years would require more and more collections be queried independently by the client, and you’d end up with many more collections just to handle this case. Not elegant.

Now with $merge the story becomes manageable again. $merge lets you perform incremental updates to an existing collection, so your historic data can largely remain save for the newest addition or update to the latest period.

Collection sales contains document-per-sale with a field total and date in the example below:

{ "date" : ISODate("2021-08-10T23:54:01.028Z"), "total" : NumberDecimal("178.4349") }
{ "date" : ISODate("2021-08-10T23:48:37.671Z"), "total" : NumberDecimal("17.6805") }
{ "date" : ISODate("2021-08-10T23:45:37.238Z"), "total" : NumberDecimal("135.4978") }
// etc.

To produce sum of sales, count of sale numbers across August, we just run an aggregation, producing the result into the monthlySales collection

db.sales.aggregate([
    {$match: { date: {$gte: ISODate('2021-08-01'), $lt: ISODate('2021-09-01')}}},

    {$group: { _id: ISODate('2021-08-01'), total: {$sum: '$total'}, saleCount: {$sum:1}}},

    {$merge: { into: {db: 'test', coll: 'monthlySales'}}}
])

The shape of the document in monthlySales is verbatim the document(s) produced by the predecessor pipeline stage of $merge:

// db.monthlySales.findOne()

{ "_id" : ISODate("2021-08-01T00:00:00Z"), "saleCount" : 1613, "total" : NumberDecimal("815527.3523") }

What happens though if it is still August? A previous value for August may exist already. We have to address what would be done in case a document with the same _id is not present, as well as what do do if it is.

When a document is not present, $merge defaults to inserting it. But you can also tell it to discard the new document, or produce an error. For example, you may want to prevent accidental creation of documents in the target collection in order to prevent arbitrary items created unless some approved well known document already exists. The full universe of choices is expressed via the whenNotMatched field with value being one of insert,discard,or fail. Note that in case of fail, other documents may have been inserted so the state is kind of non-determinant , it is not wrapped in a transaction.

This example shows discarding silently and documents that $merge found no match for in the target collection. Someone would have had to create a document for August 2021 otherwise the result for August is effectively ignored by $merge and nothing will be written.

{$merge: { 
    into: {db: 'test', coll: 'monthlySales'}, 
    whenNotMatched: 'discard'
}}

When are document is matched, you have a few more choices. The default is to replace the existing document. The full list of options: replace, keepExisting, merge, fail, and pipeline. You can choose to keep the existing document to preserve things as they were. Replace and fail are pretty straightforward as well.

Specifying whenMatched: 'merge' will slam fields from the new document into the existing document.

Document	Value
Existing	`{_id:1, name: 'bob', age: 33}`
New Document	`{_id:1, age: 34, height: 1.74 }`
Resultant Document	`{_id:1, name: 'bob', age: 34, height: 1.74 }`

An existing field’s value is replaced and previously non-existing fields will be created. Fields not present in the new document which existed before are left as is.

That’s good for having multiple separate calculations affecting some destination fields. But for incremental update, we’ll be looking to combine the value of the existing field in the target collection with a value from the new calculation on the origin collection.

Given the document in an origin collection, and a corresponding document already in the destination collection:

db.origin.insert({ _id:1, x:2})
db.destination.insert({ _id:1, x:40})

We can now add the existing value in the destination‘s document field x to the sum of that value and the value of the field x from the preceding pipeline operator ($match for simplicity here):

db.origin.aggregate([{$match: {}}, {$merge: {into: 'destination', whenMatched: [{$set: {x: {$add: ["$x","$$new.x"]}}} ]}}] );

db.destination.find();
{ "_id" : 1, "x" : 42 }

Using the special pipeline variable $$new, we can distinguish the field x from the existing field x that was in the destination document.

The pipeline constructed to produce the merge of existing documents can only take one of the stages $addFields, $set, $project, $unset, $replaceRoot, or $replaceWith. This mini-pipeline only ever sees a single document input from the destination collection - the one that matched - so other pipeline operators really don’t make a lot of sense. Only single-document-modifier types are allowed - or at least that’s how I make sense of it.

Union Queries

With all this syntax background, let’s turn back to our original problem. We want to have cheaper queries over historical periods yet combine them with recent on-the-fly values. Storing the past data can be don in some frequency, either monthly or weekly or daily. Given that we can incrementally update the destination collection, we’re free to pick a frequency that makes sense.

To combine documents for a partial month, we’ll need to query 2 collections though. The historic sales in the monthlySales collection, and the results from the month-to-date query on the fly from sales. Would be nice to do it in one command to Mongo. Historically, we would shoot 2 queries and combine the results into one client-side. But that’s extra code that someone had to write, and makes populating generic graphs more involved.

With the introduction of $unionWith, a single aggregation command can now return results from more than one collection! Harnessing this for our scenario we’ll query the “main” collection sales for the current month-to-date, then union the results with a pipeline over the monthlySales for already computed document since the beginning of the year:

db.sales.aggregate([
    { $match: { date: { $gte: ISODate('2021-08-01'), $lt: ISODate('2021-09-01') } } },

    { $group: { _id: ISODate('2021-08-01'), saleCount: { $sum: 1 }, total: { $sum: '$total'} }},
    
    {
        $unionWith: {
            coll: 'monthlySales',
            pipeline: [
                { $match: { _id: { $gte: ISODate('2021-01-01'), $lt: ISODate('2021-08-01') } } },
            ]
        }
    },
    
    { $sort: { _id: -1 } }
])

{ "_id" : ISODate("2021-08-01T00:00:00Z"), "saleCount" : 1613, "total" : NumberDecimal("815527.3523") }
{ "_id" : ISODate("2021-07-01T00:00:00Z"), "saleCount" : 5255, "total" : NumberDecimal("2615967.9695") }
{ "_id" : ISODate("2021-06-01T00:00:00Z"), "saleCount" : 5134, "total" : NumberDecimal("2540986.0768") }
{ "_id" : ISODate("2021-05-01T00:00:00Z"), "saleCount" : 5294, "total" : NumberDecimal("2680115.7638") }
{ "_id" : ISODate("2021-04-01T00:00:00Z"), "saleCount" : 5083, "total" : NumberDecimal("2532271.9124") }
{ "_id" : ISODate("2021-03-01T00:00:00Z"), "saleCount" : 5270, "total" : NumberDecimal("2647041.1003") }
{ "_id" : ISODate("2021-02-01T00:00:00Z"), "saleCount" : 4809, "total" : NumberDecimal("2424930.7338") }
{ "_id" : ISODate("2021-01-01T00:00:00Z"), "saleCount" : 5252, "total" : NumberDecimal("2608231.7712") }

Mic drop. A single query processed all on the server side, giving us everything we need in one shot.

Which leaves us with the small matter of granularity. How much of the work do we want to do on the fly vs. just digging up the computed data?

If we populate the monthlySales monthly after the edn of each month, we’ll need to query up to 31 days worth of data on the ad-hoc side. If we update the sales daily we’ll have less than a day’s worth to sum on the fly, with the onus of determining exactly what cutoff window we set for the current vs the existing already merged data. Whatever we choose, we’ll just need to make sure our boundaries for the time window don’t overlap so we don’t count the same sale too many or too few times.

Getting There

To round things off, you might want to start accumulating pre-calculated data but already have a slew of individual documents in your live collection. You could write some script client side to iterate the months and years of the past. Or you could run a query to populate the monthlySales like so:

db.sales.aggregate([
    {
        $group:
        {
            _id: { $dateFromParts: { year: { $year: '$date' }, month: { $month: '$date' } } },
            total: { $sum: '$total' },
            saleCount: { $sum: 1 }
        }
    },
    {
        $merge:
        {
            into: { db: 'test', coll: 'monthlySales' }
        }
    }
])

There are more nuances bells and whistles that come along with $unionWith and $merge, but I think with the above techniques we can already achieve a good portion of the historic+current type scenarios. You will of course need to have a recent MongoDB version - $merge was introduced in 4.2, and $unionWith in 4.4.

Time Series Collections - Native Support

2021-07-30T21:14:51.000Z

Time Series & Mongo

Time series data is a fairly common use case and has been “solved” before. Indeed, in blog posts around the net as well as in my course on schema design and data modeling, a common approach to storing such data was discussed at length.

But not everyone knows that bit of common wisdom. Further, it’s easy to get caught up in the storage details and make silly little mistakes that wreck the exercise.

Someone in MongoDB had the brilliant idea to simplify this further (I should by them a drink of their choice I find out who it was). Instead of you painstakingly bucketing each “reading” or time series item, then un-bucketing it or peppering your queries with time-bucket-boundary match clauses - why not provide a new collection for time series which does the tricky bits for you?

As far as the client is concerned, every document is a stand-alone point in time item which has a timestamp and some measurement - a reading - composed of one or more fields recording your data. If you insert a document, you insert one per reading. You don’t have to push into arrays, or compute any time-window or anything like that.

When reading a document from a time series collection, each point in time measurement returns as a single document - not a bucket with an array of readings withing a window.

It appears largely un-magical, until and unless you compare it to what you had to do manually before this existed.

The before document was a bucket with multiple readings, so for a 12h bucket and say 4 temperature readings in the field temperature in a day you would see something like:


db.sensings.find({timestamp: {$gte: ISODate('2021-09-01'), $lt: ISODate('2021-09-02')}})

> [
    {_id: ..., bucketStart: ISODate("2021-09-01T00:00:00Z"), sensorId: 123, [
        {temperature: 97.1, timestamp: ISODate("2021-09-01T00:00:00"},
        {temperature: 98.6, timestamp: ISODate("2021-09-01T00:06:00"},]
    },
    {_id: ..., bucketStart: ISODate("2021-09-01T00:00:00Z"), sensorId: 123, [
        {temperature: 96.9, timestamp: ISODate("2021-09-01T00:12:00"},
        {temperature: 97.2, timestamp: ISODate("2021-09-01T00:18:00"},]
    }
]

Using Time-Series Collection, the readings are presented to you individually:


db.sensings.find({timestamp: {$gte: ISODate('2021-09-01'), $lt: ISODate('2021-09-02')}})

> [
    {_id: ..., temperature: 97.1, timestamp: ISODate("2021-09-01T00:00:00", info: [{sensorId:123}]},
    {_id: ..., temperature: 98.6, timestamp: ISODate("2021-09-01T00:06:00", info: [{sensorId:123}]},
    {_id: ..., temperature: 96.9, timestamp: ISODate("2021-09-01T00:12:00", info: [{sensorId:123}]},
    {_id: ..., temperature: 97.2, timestamp: ISODate("2021-09-01T00:18:00", info: [{sensorId:123}]}
]

You will note that the bucket granularity is nowhere in the results! You may have chosen to bucket spanning seconds (the default), minutes, or hours. But that is a storage choice, and your queries and users don’t need to care. This awesome arrangement lets users just be users. Insert readings individually as the occur, read and summarize over time as if they were stored individually. On the surface this matches the business expectation and mental model of periodic measurements written or read as they are. However

The internals are what’s nice here. The “collection” presented to the user (sensings in the example above) is actually more of a view provided over internal collections that physically exist but you do not directly interact with them. The backing storage collections are organized to bucket the individual readings into some fixed time window size. If you choose an hours granularity, there can be up to 1 bucket per hour. Multiple readings within the same hour would all fall into the same bucket, and next hour’s readings will fall into the next bucket. Except now you don’t worry about structuring your document in any special way - Mongo does it for you!

At the time of writing, the internal physical backing collection is created when you declare your special Time-Series collection. For my example of the collection names sensings Mongo has created a backing collection system.buckets.sensings and stores physical documents under that namespace, in buckets.

show collections

> sensings 
> system.buckets.sensings

Oh, yes: you have to declare your special collection. If you have Mongo implicitly create a collection by inserting data into a yet-to-be-created collection, Mongo will assume a normal collection. The syntax requires one thing only - the timeField. I like to include also the metaField and granularity parameters, for reasons expanded below.

To create my sensings collection and store at an ingestion rate fitting in hours long buckets issue the following command:

db.createCollection("sensings", {timeseries: {timeField: "timestamp", metaField: "info", granularity: "hours"}})

The collection named sensings will be created (as well as a corresponding system.buckets.sensings behind the scenes). It treat the field named timestamp on your inserted documents as the time for bucketing purpose. It will let you also add extra non-measurement data or “meta” details in the field named “info”. The collection will optimize the bucket in granularity consistent with multiple items in a matter of hours. For high ingestion rates you should choose minutes or seconds. The

What is the business with the metaField and granularity about? It lets Mongo optimize the bucket sizes internally and organize storage efficiently. The theory behind creating time buckets is that:

Events are written once
Events are ready for a time period
Loading fewer documents that contain all the readings is more efficient than if each reading resided in one physical document.

When doing things manually we could shove the metadata extras such as sensor-id or patient id in any field. But for Mongo to understand that the metadata is a label and not a metric, and to enable it to create buckets specific to groups of readings we have to tell it where we store the metadata. In our case here, I used the info field. In that field, I can store a single value, or several different fields. This way when I query for temperature average for a particular sensorId, since it is written inside the info field, Mongo knows that and can load buckets for that meta only. It should try and create a bucket for each time-window and for each meta value or combo of values. That way, query for sensor 123 doesn’t need to read buckets for sensor 456.

But the magic ends here as things now and Mongo does not automatically create indexes on the meta fields . It leaves that up to you. Mongo does create indexes on the bucket boundaries timestamps on the backing collection, but that is usually no enough for prime time. You will no doubt have some narrower interest in slicing the time series by some of the meta fields. I’ll add a compound index on my sensorId like so:

db.sensings.createIndex({"info.sensorId": 1, "timestamp": 1})

My query can now look to get average temperature across the devices on a cartain day, and leverage the index I just created:

db.sensings.aggregate([
    {$match: {timestamp: {$gte: ISODate("2021-09-01"), $lt: ISODate("2021-09-02")}}}
    {$group: {_id: "$info.sensorId", avgTemperature: {$avg: "$temperature"}}}
    ] )

A query chased with an explain shows the index is used:

db.sensings.find({"timestamp" : ISODate("2021-09-01T00:18:00Z"), "info.sensorId" : 123}).explain()

// "inputStage" : {
// "stage" : "IXSCAN", 
// "keyPattern" : {
//         "meta.sensorId" : 1,
//         "control.min.timestamp" : 1,
//         "control.max.timestamp" : 1
// },
// "indexName" : "info.sensorId_1_timestamp_1",

The inputStrage section shows an IXSCAN leveraging my index named info.sensorId_1_timestamp_1.

What is interesting to compare, is the index definitions on the nominal collection sensings versus the internal collection system.buckets.sensings.

The nominal collection shows what you expect: an index on the direct structural fields timestamp and the embedded info.sensorId:

> db.sensings.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "info.sensorId" : 1,
                        "timestamp" : 1
                },
                "name" : "info.sensorId_1_timestamp_1"
        }
]
>

But the backing collection shows a different picture:

> db.system.buckets.sensings.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "meta.sensorId" : 1,
                        "control.min.timestamp" : 1,
                        "control.max.timestamp" : 1
                },
                "name" : "info.sensorId_1_timestamp_1"
        }
]

The 2-field index I specified to create was actually created as a 3-key field behind the scenes. The timestamp - a single field in my nominal documents - was translated into a bucket boundary defined by the control.min.timestamp and control.max.timestamp fields. Makes sense. The raison d’être of this whole thing is exactly that: that your nominal collection facade deals with individual reading documents, but the physical storage of the documents is something entirely different and more efficient than the facade lets us know. For a lesson on how the more efficient document is actually structured, I inserted a few documents:

db.sensings.insertMany([
{temperature: 97.1, timestamp: ISODate("2021-09-01T00:00:00"), info: {sensorId:123}},
{temperature: 98.6, timestamp: ISODate("2021-09-01T00:06:00"), info: {sensorId:123}},
{temperature: 96.9, timestamp: ISODate("2021-09-01T00:12:00"), info: {sensorId:123}},
{temperature: 97.2, timestamp: ISODate("2021-09-01T00:18:00"), info: {sensorId:123}}
])

Then I chased it with a query on the backing collection system.buckets.sensings :

> db.system.buckets.sensings.findOne()
{
        "_id" : ObjectId("612ec2809f952fa914335e8e"),
        "control" : {
                "version" : 1,
                "min" : {
                        "_id" : ObjectId("6104819959cb4aa1c00f0b9c"),
                        "temperature" : 96.9,
                        "timestamp" : ISODate("2021-09-01T00:00:00Z")
                },
                "max" : {
                        "_id" : ObjectId("6104819959cb4aa1c00f0b9f"),
                        "temperature" : 98.6,
                        "timestamp" : ISODate("2021-09-01T00:18:00Z")
                }
        },
        "meta" : {
                "sensorId" : 123
        },
        "data" : {
                "_id" : {
                        "0" : ObjectId("6104819959cb4aa1c00f0b9c"),
                        "1" : ObjectId("6104819959cb4aa1c00f0b9d"),
                        "2" : ObjectId("6104819959cb4aa1c00f0b9e"),
                        "3" : ObjectId("6104819959cb4aa1c00f0b9f")
                },
                "temperature" : {
                        "0" : 97.1,
                        "1" : 98.6,
                        "2" : 96.9,
                        "3" : 97.2
                },
                "timestamp" : {
                        "0" : ISODate("2021-09-01T00:00:00Z"),
                        "1" : ISODate("2021-09-01T00:06:00Z"),
                        "2" : ISODate("2021-09-01T00:12:00Z"),
                        "3" : ISODate("2021-09-01T00:18:00Z")
                }
        }
}

The control fields describe the bucket itself. They provide boundary info, minimum and maximum timestamps for the bucket. The meta field contains my fields that I told Mongo I’d store under the original info field in my inserted documents. In my case, a single field but there could be more. It would make little logical sense for Mongo to have a bucket spanning multiple sensors. So Mongo will find or create a bucket for every differing meta values for a given time period. Finally, the data. Fields not originally contained in the info field as a sub-document and fields which are not the timestamp itself are stored under the data field in backing storage. This includes the temperature - as expected. It also contains a corresponding timestamp that the measurement was taken at. Since every document has an _id in Mongo, the _id of the reading is also stored here. The correlation among the triplet of values is positional, so the first reading in the bucket is spread across data._id[0], data.temperature[0] and data.timestamp[0], the next reading in the bucket is ...[1] and so forth.

Back to my aggregation from before: db.sensings.aggregate([{$match: {timestamp: {$gte: ISODate("2021-09-01"), $lt: ISODate("2021-09-02")}}}, {$group: {_id: "$info.sensorId", avgTemperature: {$avg: "$temperature"}}}] ). Nowhere did I include any positional magic or unwinds or extra filtering logic. MongoDB did the work for my taking my straightforward mental model of the nominal collection - one reading per document - and did all the heavy lifting. Life is good.

In the course of writing this post and poking around, I found this bit interesting: when you run collection stats on the nominal collection you get a new section named timeseries:

> db.sensings.stats().timeseries
{
        "bucketsNs" : "test.system.buckets.sensings",
        "bucketCount" : 0,
        "numBucketInserts" : 0,
        "numBucketUpdates" : 0,
        "numBucketsOpenedDueToMetadata" : 0,
        "numBucketsClosedDueToCount" : 0,
        "numBucketsClosedDueToSize" : 0,
        "numBucketsClosedDueToTimeForward" : 0,
        "numBucketsClosedDueToTimeBackward" : 0,
        "numBucketsClosedDueToMemoryThreshold" : 0,
        "numCommits" : 0,
        "numWaits" : 0,
        "numMeasurementsCommitted" : 0
}

The new section gives some clues that the optimization and salient metrics to watch have to do with buckets, their management in terms of “opening” and “closing” them with respect to aspects such as the metadata, the number of items, the total size of the bucket, the time boundaries, as well as memory pressure. Pretty interesting, but I have no insight or clue how further to interpret these numbers. Time will tell (rim-shot, exit stage).

Whats next for Time-Series? I don’t know. Now that the abstraction and use-case has been introduced, the team can presumably focus on even more efficient storage or ways to process time-series queries more efficiently. Judging from the past, this is the way things unfold: a feature is introduced, which “makes it a thing”, then future iterations are taken to make it even better and more useful. I for one see this as a great new feature.

If it ain't broke - HTTP PATCH it!

2021-01-19T17:59:12.000Z

Is your document “broken” inside your database? Sure, from time to time a value on a document needs to be updated, because the real world thing it describes changes. But does that mean what you currently have is “broken”? No, you say - it is not broken, just that “field xyz needs to change value”. Or “field xyz is no longer needed”.

Fine then: the document structure - the schema if you will - is still sound.

Where am I going with this? If the document is not broken, then you should not “touch” the whole document. You should use a surgical update, one that only sends a command to modify a field inside the document.

In MongoDB, the update command takes 2 forms:

Document Replacement (AKA: “Slam”)
Surgical Field Manipulation

In the Document Replacement form, you supply the update command with a full document value which will replace the whole document in the database. Doing so will logically target a single document (by _id or some alternate key) and supply the future document in its entirety.

db.peeps.update({_id: 'superbob'}, {name: 'bob', likes: ['coffee','bacon']}})

The update above slams in a new document with only the _ id, name, and likes fields remaining. Beyond losing any previous document field values, this update also assumes you intended to replace both the name and the likes. This may be true. But usually it’s just that in order to update one thing such as add ‘bacon’ to the list of things ‘superbob’ likes, you had to include previous unchanged values such as the name. Two problems here: One that you need to read the value from the database ahead of the update, second that the value you just read may be stale - updated by someone else - by the time your write back into the database.

The second - and my preferred - way to update a document is using surgical updates. If all I want to do is add ‘bacon’ to the list of likes, I can issue the command:

db.peeps.update({_id: 'superbob'}, {$addToSet: {likes: 'bacon'}})

This form will

Add ‘bacon’ to the likes field if it doesn’t already have ‘bacon’ in it
Not touch the name field, or any other field in the document
Not require you to read the document before you issue the update, since you will not touch other fields.

Which brings me to the point of this post: What about REST?

REST API use HTTP actions to represent the action taken. The usual suspects are GET, POST, PUT, DELE. The least controversial one is DELE probably. GET gains some bad-boy street creds for whether queries should be solely specified using the URI path or query variables. POST and PUT generate lots of lengthy distinction discussions to decide whether PUT should or can create original objects and whether in the context of databases it is permissible to return the value of the created or modified object in full or at least the id or URI. Lots of discussions. But my focus here is on update. I’d like to avoid having to query the current document value, and I’d like to only issue a change for a subset of the fields on my entity, umm, document.

HTTP’s PATCH method aims to do exactly that. It lets a caller supply a future state of some of the fields, against a background of the existing, current entity. The docs state:

“The difference between the PUT and PATCH requests is reflected in the way the server processes the enclosed entity to modify the resource identified by the Request-URI. In a PUT request, the enclosed entity is considered to be a modified version of the resource stored on the origin server, and the client is requesting that the stored version be replaced. With PATCH, however, the enclosed entity contains a set of instructions describing how a resource currently residing on the origin server should be modified to produce a new version.”

This has “Surgical Update” written all over it, and maps naturally to the intent of the surgical update. And with MongoDB, some operators are idempotent such as $addToSet, $set , $unset.

Where’s the rub?

PATCH is not always implemented as surgical update. Some API frameworks don’t explicitly support PATCH, and others implement PATCH as a PUT (read: “slam” semantics) instead of the intended surgical semantics.

Some, like Mongoose, support the surgical update semantics explicitly with the Document.prototype.update() function, or by collecting single field assignments to an existing document and issuing a concise update. Careful here! * Some frameworks will just load the document first even when PATCH is supposed to be able to handle things in a more efficient way. Other * blogs might advise you to build it yourself in the same way. This is because mongoose by default is trying to entity-track the state of the document in memory then do the math to update or create the document backing the entity.

The culprit here is not frameworks actually - it’s the consumer. Uninformed or under-curious consumers of the frameworks will be causing extra back-end round trips, concurrency issues, or even data loss. And though there are applications that would not be horribly negatively impacted by these nuanced issues, but I don’t like to risk it.

Resolving this requires that you build an update outside of the document instance itself. Create an update command and issue it directly to the underlying database. If using something like mongoose, this means calling updateOne() on the Model level, something along the lines of the code below. It does not have to find() nor save() anything ahead of updating the document and therefore saves that round-trip. Consequently also makes no assumption as to the current values in other fields, and would leave any un-mentioned field values intact.

// ... in your handler code:

// Get id and patch values from your REST somehow
    const id = req.params.id; // or whatever your REST API gives you
    const patchValues = JSON.parse(req.body); // eg: { age: 23}
    
// ... validate somehow, then: 
    const updateResults = await Person.updateOne({ _id: id }, patchValues);
    
    console.log(updateResults);
// { n: 1, nModified: 1, ok: 1 } ... or something to that effect.

This handles the simple cases. To build more elaborate manipulation such as adding/removing array items or nested fields, you would want to explore more expressive PATCH parsing so that nuanced intents are clearer. You may want to take a look at jsonpatch.js for inspiration.

Why doesn’t update clarify the semantics? AFAIK: For historic reasons. To remedy this, official MongoDB drivers expose a replaceOne command (exact naming depends on language) which more precisely describes the operation when you intend to slam the existing document, leaving update and updateOne to convey surgical updates. This makes reading programs better, but does not prevent you from issuing a slam using update, so you will need to take care of it manually, especially since the update signature still allows for “slam” values to be provided.

Hopefully, this post revealed nothing new. But if you are unsure how your REST API maps to backend DB mutation commands, it’s worth taking a look.

* Examples provided to illustrate the topic of this post only. They are not in any way meant to offend or diminish the contribution of respective referenced codebase authors.

Positional Array Indexing in MongoDB

2019-02-26T16:33:32.000Z

Overview

Did you know that MongoDB lets you index a specific positional element in an array? Well, it does! I stumbled across this feature when reading the source code for mongod for an unrelated feature exploration, and stumbled upon this great feature. This feature can help you create more efficient indexes for specific scenarios.

How it Works

What is that good for you ask? Consider the following scenario:

Your company makes widgets as kits. Each widget is made of parts - each of which can be also sold separately. When a customer buys a “product” they are actually buying a box of items - a list of parts where the first one is by convention the “widget” and the rest are the parts of that widget. The schema for the things collection might look like this:

{  
    _id: 123,
    name: 'Widget A',
    kit_sku: 'jk',
    "parts": [
        {name: 'widget a', sku: 'abc' },
        {name: 'hex key', sku: 'xyz' },
        {name: 'obscure manual in Klingon', sku: 'qwe' },
        {name: 'widget stand', sku: 'cbc' },
    ]
}

The schema above has a field kit_sku which contains a unique selling id that folks can order. The parts list all items in the kit. The company sells other kits featuring the same widget just with differing extra parts. For example, there might be an obscure manual in Klingon for one kit, and a different kit with instructions in ancient Phoenician (you know, because somebody called and complained that they need one).

The catalog application needs call up the widget “widget a”. Except, it doesn’t really. It knows that the widget is sold under the sku abc in a variety of kits and that the widget is by convention the first item in the array. So the kit_sku is not useful here.

The query could look something like this:

db.things.find({ 'parts.0.sku': 'abc' })

This is a frequent enough query, so we’d want to support it with an index. But while indexing on 'parts.sku' will produce a usable index, it is not necessarily optimal. Consider that you have a long list of parts for each kit across a large collection. The index would have to contain a key for every part, and point to all documents. This is both wasteful and misses the point. It is wasteful because memory and disk will need to contain entries that are not useful to the user - Only the first element in the parts list is what we’re looking for ever. It misses the point because of the same reason. We create this index only to satisfy queries that are interested in the first element of the array, not any part in the list.

You can create an index that includes a positional path though. Here:

db.things.createIndex({'parts.0.sku':1})

The index create above includes the positional marker 0, telling mongo only to index the sku of the first element of the parts array!

Given the index above, running our query will make use of this index, as long as we use the path to the first element.

db.things.find({ 'parts.0.sku': 'abc' }).explain()

The query plan looks something like this:

       "queryPlanner" : {
                ...
                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "parts.0.sku" : 1
                                },
                                "indexName" : "parts.0.sku_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "parts.0.sku" : [ ]
                                },
                                ...
                                "direction" : "forward",
                                "indexBounds" : {
                                        "parts.0.sku" : [
                                                "[1.0, 1.0]"
                                        ]
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },...
        "ok" : 1
}

The plan above shows a few reassuring feature. First and foremost, an index is used for our query as evident by the winning plan’s "IXSCAN" indicator. This is good.

Second, note the "isMultiKey" reports false. When you index an array field, MongoDB creates a multi-key index. A multi index index includes a pointer to the document for each element in its array hence the name. Non-multiKey indexes will only contain one pointer to a given document from a single key, since the indexed field contains only one document. This affects query processing because with multiKey index, mongo needs to work a bit harder to eliminate duplicates. TL;DR : when you use the positional index on an array element , it is not a multi-key index.

Third thing to note is the path. The index will be useful for queries that include the specific path "parts.0.sku". Other queries that do not include the positional path part will NOT be using this index:

db.things.find({ 'parts.sku': 'abc' }).explain()
// ...
// "parsedQuery" : {
//         "parts.sku" : {
//                 "$eq" : "abc"
//         }
// },
// "winningPlan" : {
//         "stage" : "COLLSCAN",
//         "filter" : {
//                 "parts.sku" : {
//                         "$eq" : "abc"
//                 }
// ...

The path used in the query above did not target the first positional element in the array, and therefore the query planner chose a full collection scan "COLLSCAN". It is not correct to use the positional array index because the general query above asks are there any elements in the array with sku “abc”. The positional index we created doesn’t include pointers based on all elements, so it would be “missing” documents if mongo was to use the positional index for this query.

Conclusion

If your query targets a specific known positional array element, this indexing strategy can buy you a performance boost. This index will be smaller than one on a whole array and will be a non-multiKey one, reducing the work the server has to do to process your query.

MongoDB’s support for this and other interesting indexing strategies is pretty phenomenal. This gives DBAs and developers finer surgical indexing capabilities to support a wide range of schema and query scenarios.

Unwind MongoDB Array - A Pivotal Pipeline Stage

2019-01-31T17:46:48.000Z

When your document contains an array, $unwind is the aggregation pipeline stage that you most likely need. $unwind lets you un-pivot array elements, so that you can perform specific operations on each (or some) array elements independently.

Why Unwind?

In relational databases, we used to need to pivot rows into columns. Say we had and IoT readings table like this one:

| Device | Measure | Reading ||— |— |— |— || 1 | ‘Temp’ | 42 | 1:00 || 1 | ‘Speed | 91 | 1:00 || 2 | ‘Speed’ | 94 | 1:00| 2 | ‘Temp’ | 33 | 1:00 || 2 | ‘Temp’ | 37 | 2:00 |

And we want a query to return results such as:

Device	Speed	Temp	Time
1	91	42	1:00
2	94	33	1:00
2		37	2:00

We can achieve that with a SQL pivoting function. The need to pivot stems from the model in which each “soft property” in the data table. We didn’t want to create a column for each measure, and we didn’t want to add a column to the table each time a new measure is needed.

In MongoDB (and any other document oriented database), the model for a single reading document might look more like the one below. Each document is a point in time that contains a bunch of measures. If ever a new measure was added - no problem! An extra element can be added to the array with no schema change. If ever a reading had more or fewer readings at a point in time - no problem either. The array contains a set of readings which can grow or shrink.

{
    device: 1,
    time: ISODate('2019-01-31T01:00:00Z'),
    readings: [{
            name: 'speed',
            value: 91
        },
        {
            name: 'temp',
            value: 42
        },
    ]
}

With this document model, we would not need to do anything special for the report of readings by hour. The document already contains relevant measurements for each time period.

But if we wanted to calculate the average temperature across the devices, we’d need to do some digging. We don’t want to average every readings.value field - only the ones labeled 'temp'.

Using the aggregation framework and $unwind we can do this in 3 steps:

Unwind the readings array.
Filter out anything except 'temp'.
Perform $avg on the values.

How to Unwind?

Let’s do this step by step. First, just unwind the array

db.readings.aggregate([
    {$unwind:'$readings'}
]);

/* 
{ "_id" : ..., "device" : 1, "readings" : { "name" : "speed", "value" : 91 }, "time" : ISODate("2019-01-31T01:00:00Z") }
{ "_id" : ..., "device" : 1, "readings" : { "name" : "temp",  "value" : 42 }, "time" : ISODate("2019-01-31T01:00:00Z") }
{ "_id" : ..., "device" : 2, "readings" : { "name" : "speed", "value" : 94 }, "time" : ISODate("2019-01-31T01:00:00Z") }
...
*/

The result of $unwind is that for each original document’s readings array element, a new document will be produces that contains all the fields from the original document, but only one element assigned to the value of readings. As you can see above, the data type of the field readings after $unwind became a single-valued field rather than the original array. Device 1 had 2 elements in the original document, so in the $unwind output we have 2 documents, each attached to one element from the original array. One document contains a single reading for the measurement named speed, another document contains a reading for temperature measurement named temp.

Next step - we want only the temperature readings, getting rid of any speed or other measurements that may exist. We can do this with the $match pipeline stage. $match filters documents and only emits ones that match the criteria supplied - just like find() does.

db.readings.aggregate([
    { $unwind: '$readings'},
    { $match: { 'readings.name': 'temp' }}
]);
/*
{ "_id" : ..., "device" : 1,  "readings" : { "name" : "temp", "value" : 42 }, "time" : ISODate("2019-01-31T01:00:00Z"), }
{ "_id" : ..., "device" : 2,  "readings" : { "name" : "temp", "value" : 33 }, "time" : ISODate("2019-01-31T01:00:00Z"), }
{ "_id" : ..., "device" : 2,  "readings" : { "name" : "temp", "value" : 37 }, "time" : ISODate("2019-01-31T02:00:00Z"), }
...
*/

This got rid of any measurements that were not labeled temp. With temperature only values in hand, we can now calculate average temperature.The $group pipeline operator will let us calculate an average for each device:

db.readings.aggregate([
    { $unwind: '$readings'},
    { $match: { 'readings.name': 'temp' }},
    { $group: { _id: '$device', averageTemperature: {$avg: '$readings.value'}}}
]);

/*
{ "_id" : 2, "averageTemperature" : 35 }
{ "_id" : 1, "averageTemperature" : 42 }
*/

The behavior of $unwind evolved over time, matching common usage patterns and expectation the community found useful. Several scenarios and special cases arise when unwinding arrays:

What if the document contains a field with a single value - not an array?
What if the array has no elements?
What if the document contains no such field at all?

All these need to be handled in some way, and understanding the behavior would help you avoid fluky results.

Since MongoDB version 3.2, $unwind treats a target field containing a scalar or non-array value as if it were a single element array containing that one non-array value. These two documents are treated the same way by $unwind:

{
    device: 3,
    readings: { name: 'temp', value: 29 },
    time: ISODate('2019-01-31T03:00:00Z')
}
{
    device: 4,
    readings: [ { name: 'temp', value: 29 } ],
    time: ISODate('2019-01-31T03:00:00Z')
}

Device 3 above has a single reading in the readings field - not an array. Device 4 has a single-element array for readings. Since $unwind treats them the same, we don’t neat fancy $project and $cond checks to normalize input documents (Squee!). If we added these documents to the readings collection, and ran the same 3 stage pipeline as before, we’d get the results we expect. Prior to mongo 3.2, a non-array field handed to $unwind would have produced an error.

db.readings.aggregate([
    { $unwind: '$readings'},
    { $match: { 'readings.name': 'temp' }},
    { $group: { _id: '$device', averageTemperature: {$avg: '$readings.value'}}}
]);

/*
{ "_id" : 3, "averageTemperature" : 37 }
{ "_id" : 2, "averageTemperature" : 35 }
{ "_id" : 4, "averageTemperature" : 29 }
{ "_id" : 1, "averageTemperature" : 42 }
*/

The Nitty Gritty

Now what about a missing field, an empty array, or a field containing null? Here are examples of these documents:

{ device: 10, time: ISODate('2019-01-31T06:00:00Z') } 
{ device: 10, readings: null, time: ISODate('2019-01-31T07:00:00Z') } 
{ device: 10, readings: [], time: ISODate('2019-01-31T08:00:00Z') }

If we ran $unwind on those documents, none of them would be emitted past $unwind. The example below passes documents with device == 10 to $unwind, but since none of them had a “useful” readings value to attach to an output document - none of them got emitted:

db.readings.aggregate([
    { $match: { device: 10}},
    { $unwind: '$readings'},
])
// No documents  returned

This is just a behavior though. the word “useful” in our case where we won’t want an empty or missing reading to pollute our averaging math is our choice. Other use cases may exist where you actually want every document entering $unwind to be emitted forward even if it doesn’t have an array element to contribute. If you want that behavior, you can use the option preserveNullAndEmptyArrays set to true. To supply this option though, you would need to use a slightly different argument to $unwind. Until now, we gave $unwind a bare value-of-field argument $. The more elaborate syntax is {$unwind: { path: '$'}}. The following two forms are identical in meaning:

{ $unwind: '$readings' }
{ $unwind: { path: '$readings' }}

The latter expression is where we can add our preserveNullAndEmptyArrays option. Putting it all together looks like this:

db.readings.aggregate([
    { $match: { device: 10}},
    { $unwind: {
        path: '$readings',
        preserveNullAndEmptyArrays: true
        }
    },
])

/*
{ "_id" : ..., "device" : 10, "time" : ISODate("2019-01-31T06:00:00Z") }
{ "_id" : ..., "device" : 10, "readings" : null, "time" : ISODate("2019-01-31T07:00:00Z") }
{ "_id" : ..., "device" : 10, "time" : ISODate("2019-01-31T08:00:00Z") }
*/

This time, a single document was emitted corresponding to each of the incoming device 10 documents. For each of those 3 documents, MongoDB considered that the incoming document has to be emitted out at least once. But what value should the output document have? In the case of the document containing readings: null, mongo emitted a null in the readings field. For the ones with missing field and with the array containing no value, $unwind emitted a document with no readings field at all. The notion is that a missing field should be preserved - no invention of a non-existing field. As far as an existing but empty array - think of it as “a document with no value there”. Emitting a field would have required a value to be assigned to the field. The value null is still a value and is not the same as not having any value. If mongo would have emitted a {readings: null} field in response to an empty array that’s like saying that the original document had an array element null - which is not the case. Bottom line: the behavior is defined in a certain way. We can leverage it when we need to, and we can also manipulate the fields further using other pipeline operators.

Unwind Without $unwind?

The three-step pipeline to calculate average is not very difficult to read, understand, or produce. But the real need is expressed in the $group stage. The stages for filtering and unwinding were cruft that adds verbiage - but not for nothing.

Tempted to shorten the syntax, one might think to use some of the projection operators, such as $filter or $arrayToObject. The $filter example would go something along the lines of:

db.readings.aggregate([
    { $match: { readings: { $type:4 } } },
    { $project: {
        temperature: {
            $filter: {
                input: '$readings',
                as: 'elm',
                cond: {
                     $eq: [ '$$elm.name', 'temp' ]
                }
            }
        }
    }
}])

Where we can pluck only the 'temp' elements out of the readings array. This works, but must be preceded by something to ensure that readings is actually an array. Unlike the more forgiving $unwind, $filter insists that the input field is an array or it produces an error. That’s why I included a $match that only passes along documents where that condition holds true (See BSON spec for type information. 0x04 is Array). This seems almost palatable, but would still require more syntax to transform the one-element array to something we can use directly for the math. So since the amount of text required to achieve the goal is not reduced, and the overall pipeline does not become more readable, I see little reason to go to those lengths.

Conclusion

$unwind is an important pipeline stage in the document-oriented world. Dealing with arrays and aggregation will undoubtedly get you using $unwind sooner or later. In a previous post we focused on data scenarios where this all made sense. In this post, we focused more on the balance between syntax and the meaning of the stage and related operators. Either way, it’s well worth learning the nuanced behavior of this key pipeline stage.

Better Cosmos DB Cost Sharing

2019-01-07T23:11:28.000Z

Until recently, Cosmos DB’s pricing model was based on a per-collection provisioning. This model was a good fit for single-antity applications using Cosmos as a flow-through data point collection or other narrow well defined collection usages.

But for applications spanning multiple entities with broader usage pattern this could prove cost prohibiting: Each document type or entity would have naturally been designed to go into its own collection. But with the minimum price for the smallest collection around $30 a month for 400RU, an enterprise would think twice before designing an application that used Cosmos as a full back-end because low-usage collection data would start adding up to the solution cost. If nothing else, this pricing strategy flew in the face of the code-first benefits which document oriented databases promise.

Worst, this pricing lead some to use a single collection for all entities of all types. While that move saved money by not over-provisioning collections for small infrequent documents, it fosters no separation of concerns and did not promote separation of concern in code as microservices would have otherwise.

Good News Everyone!

This is all in the past!

Cosmos DB now lets you provision database throughput at the database level. Collections created under that database will then share the cost based on that provisioned RU level. You can still provision specific collections with specific RU to meet your needs. But if you do not, then all the collections in that database will share the provisioned throughput.

To do this, you will start with creating a Cosmos DB account. Once an account is created, you can go to the data explorer and then hit the “New Database” link at the top.

In the dialog, type the name of your database, and check the “Provision throuput” checkbox.

This will reveal a secondary input where you can enter the desired throughput. The minimum provisioning is 400RU, but the default number popping in that box is 1000RU. Choose the throughput level your application demands - that’s something you have to determine on your own.

That’s it! Collections created under the shared throughput will share the cost and throughput at the database level. No more kludge code to force all documents of all types into one collection. No more skimping on collection creation for low-volume / low-frequency data to save a buck. Sharing the throughput let your architect and code your application without weirdness coming from the pricing model.

Kudos to the team that made this possible.

Transactions with MongoDB 4.0 and .Net

2018-10-02T19:27:32.000Z

Multi-Document Transactions and MongoDB

TL;DR - MongoDB 4.0 supports multi-document transactions!

When you have 2 mutations (write operations), each affecting 1 document, Mongo used to apply each write as an independent commit. Consequently, there was a point in time in which mutation 1 was applied but 2 was not. If 2 failed for whatever reason, 1 was still applied.

This behavior caused some pain in attempting to manage all-or-nothing style operations which affect multiple documents. For example, if you had to lend a book to a person you might have wanted the bookmarked as lent-out, and the library visitor to have the book appended to their lent-out list.

If marking the book succeeded but marking the visitor failed, then the book is lent to no one. If marking the visitor first and failing to mark the book as lent, it can be double-lent. Further, there is a point in time in which the visitor “has” the book, but the book is not yet “held”.

Now, there are potential structural solutions to this problem in a document-oriented world, but let’s go with transactions. The newly available transaction support lets you wrap a transaction around the 2 operations. I like thinking of a Mongo transaction in terms of visibility:

Before the “begin-transaction”, other readers see data as it was.
During the transaction’s lifetime, other readers see data as it was. Mutations happening during the transaction are not visible to readers (*there are some choices here, more on that in a bit..)
After the transaction commits, other readers see the results of all mutations.

Speaking of visibility, one of the core changes that needed to occur in the Mongo engine is marking OpLog entries with a global logical cluster time. Why is that important? Because transactions are really about controlling the visibility of written items across the Replica Set. The implementation of point-in-time reads is another key piece of the puzzle. This feature provides a view for reading such that the document states visible to the reader are of the same version they were when the read operation started. Modifications occurring during a long-running operation would not be exposed to the reader, so a consistent view is ensured.

To scope several operations into a transaction, Mongo relies on the already available session implementation. Sessions existed in 3.6, so the leap is smaller. A session groups a sequence of commands, and is tracked throughout the cluster. It is therefore already suitable for the notion of a transaction. All that was added is a syntax for a developer to pass along a session into the commands themselves. All method signatures that mutate data now accept a session handle in the latest official drivers.From a client perspective (using a supported driver version), creating a transaction looks like:

Create a session.
1. Issue some CRUD operations with the session handle.
Commit the session.

Mongo will control the visibility of the constituent operations according to the session settings.

Consider this session initiation C# code:

Demo code based on .Net driver mongodb.driver version 2.7.0

using (var session = client.StartSession())
{
    session.StartTransaction(new TransactionOptions(

            readConcern: ReadConcern.Snapshot,

            writeConcern: WriteConcern.WMajority));

...

A few things are readily apparent from this small code structure.

A session is a disposable object, so proper disposal is guaranteed by a using claws.

A session by itself is not a transaction. We explicitly start a transaction by calling the StartTransaction() method. Within a session, only one transaction may be “live” at a given time. Since we are within a using scope, this code has a low risk of breaking that rule.

TransactionOptions describe 2 key parts of the transaction treatment: read and write concerns. The write-concern describes the durability expectation of the mutations. Just like any Replica Set write, it lets us control the risk of roll-back of individual writes in case of primary transitions or other adverse events.

The read-concern describes the visibility mode of the mutation during the transaction- the time between the start and the would-be commit or abort commands. As mentioned earlier, what happens during the transaction lifetime before it is finished - successful or not - is really what transactions are about.

The particular setting of ReadConcern.Snapshot, when paired with a write-concern WriteConcern.WMajority guarantee that any reads occurring as part of a transaction view data that is majority committed. Those reads are “stable” and should not roll back since the majority of nodes already have applied that data.You might be tempted to use a weaker read-concern such as ReadConcern.Local or ReadConcern.Majority for sake of speed. That choice may not be treated as you expect. For one, Mongo might “upgrade” the concern to a higher one such as snapshot. Further, Mongo does not guarantee that the writes won’t be rolled back in the face of cluster adverse events. In case of a rollback, your whole transaction might be rolled back so what’s the point really?…

Snapshot

Snapshot is a read-concern relating to read-your-own-writes and causal consistency. Causal consistency describes a relationship between operations where one causes the other: A read operation returning the value of filed count right after a write operation setting count = 123 expects the count to be 123. The write preceding the read caused the mutation and the reader expects the write to be “the one caused by the preceding operation”. An implied order is what this is about. As mentioned before, one of the underpinnings of transactions is a global timestamp, allowing a strict order of operations. Within a causally consistent session, pure read operations following a write is guaranteed to see the results of that write. It may seem trivial - desirable certainly - but keep in mind that other concurrent writes may occur during your sequence which may affect the state of a document. Causal consistency assures that the state of a read document following a write is seen as that writer’s result.

In the diagram below, a session with causal consistency ensures the reader sees the results of its preceding write. A session with no causal consistency does not ensure that, and depending may result in Client One reading a document modified by Client Two rather than the result of Client One’s own write.

The default for creating a session is to create it with causal consistency. The code below creates a session with the default value or explicit option. Either of these result in the same.

// setting CausalConsistency explicitly
using (var session = client.StartSession(new ClientSessionOptions { CausalConsistency = true }))
{ ...

// setting CausalConsistency implicitly
using (var session = client.StartSession())
{ ...

Now we can state this: A transaction in a causally consistent session with a read-concern of “snapshot” and write-concern of “majority” containing reads, will view documents committed to a majority of the nodes. This guarantee level extends to reads within the transaction such that not only will the transaction writes succeed if majority acknowledged, but the reads within the transactions will also only rely on majority committed documents according to the snapshots time. This shuts down the concern of having a transaction rely on document state which might be rolled back because once majority-committed, it won’t be rolled back. This provides a consistent view since the causal consistency kicks in and guarantees that.

Code it Up

The theory above gives us the background necessary to understand what’s going on. The code below implements a multi-document transaction touching 3 documents across 3 different collections.

The scenario is that we have some Tool which can be borrowed by some Person and is then logged in the LendingLedger.We start by creating a new session. We then perform the sequence of operations inside the transaction:

Mark the tool as held by the person.
Check that the tool was indeed found and marked.
If the tool is not found to be un-held, or if not found at all or update failed we’ll throw an exception, which is then caught and aborts the transaction.
Add a ledger entry detailing the tool, person, and time the tool was lent out.
Increase the number of tools the person has by 1.

Under the cover of a transaction, performing this sequence gives us assurance that all three entities would be manipulated to satisfaction, or rolled back completely. Further - other concurrent operations in parallel would not interfere with the operations happening inside this transaction.

For a more complete demonstration please see my GitHub repo.

using (var session = client.StartSession(new ClientSessionOptions { CausalConsistency = true }))
{

    session.StartTransaction(new TransactionOptions(
            readConcern: ReadConcern.Snapshot,
            writeConcern: WriteConcern.WMajority));

    try
    {
        var personCollection = db.GetCollection(nameof(Person));
        var toolCollection = db.GetCollection(nameof(Tool));
        var lendLogCollection = db.GetCollection(nameof(LendingLedger));


        var holdTool = toolCollection.UpdateOne(session,
            Builders.Filter.Eq(t => t.Id, toolId) & Builders.Filter.Eq(t => t.HeldBy, null),
            Builders.Update.Set(t => t.HeldBy, personId));

        if (holdTool.ModifiedCount != 1)
        {
            throw new InvalidOperationException($"Failed updating hold on tool {toolId}. It might be held or non-existent");
        }

        lendLogCollection.InsertOne(session, new LendingLedger
        {
            ToolId = toolId,
            PersonId = personId,
            CheckOutTime = DateTime.UtcNow
        });
        
        var toolCount = personCollection.UpdateOne(
                    session,
                    Builders.Filter.Eq(p => p.Id, personId),
                    Builders.Update.Inc(p => p.ToolCount, 1)
                    );

        if (toolCount.ModifiedCount != 1)
        {
            throw new InvalidOperationException($"Failed updating tool count on person {personId}");
        }
    }
    catch (Exception exception)
    {
        Logger.Error($"Caught exception during transaction, aborting: {exception.Message}.");
        session.AbortTransaction();
        throw;
    }

    session.CommitTransaction();

}

Epilog

Transactions have been long awaited by some for quite a while. Others see transactions as a performance and scalability hindrances, placing an undue burden on the core engine. There are performance implications to transactions. Measuring those is tricky because the effect depends on concurrency, velocity, and size of data. Transactions also introduce more controls on timing, with defaults favoring quick transactions and opting to abort rather than consume precious resources. How much overhead will a transaction introduce? I don’t know - better measure it. The documentation currently states only a vague warning:

…In most cases, multi-document transaction incurs a greater performance cost over single document writes, and the availability of multi-document transaction should not be a replacement for effective schema design …

Certainly, something to consider, and I for one definitely model entities with the mindset of embedding where appropriate. After all - if you want completely “independent” entities with cross-references, there’s an app for that… We chose a document-oriented-database for its document-oriented nature - let’s leverage that. A good rule of thumb is that if your RDBMS schema was translated into one-collection-per-table in MongoDB - try again.

Lastly, I should mention that the v4.0 release includes multi-document transactions on replica sets but not on sharded collections. Support for sharded collections with transactions is slated for V4.2.

Happy transacting!

The Good, the Bad, and the Join

2018-07-06T16:16:51.000Z

When modeling in the document-oriented world, One to few relationships are often modeled as arrays embedded in the containing entity. Sometimes the relationship is one to many, or such that an embedded array would not work well. In those cases, the need for some sort of JOIN comes up.

The Aggregation Framework in MongoDB gives us a convenient option: $lookup!But $lookup is not JOIN - at least not the one we are familiar with from SQL and relational databases. This distinction is important to understand, as it may affect your modeling and performance.

What $lookup Is

$lookup queries a second collection for each document entering the pipeline stage. Any documents from the secondary collection that pass the lookup condition are attached to an array field on the entering document.

The behavior differs from SQL’s INNER and LEFT JOIN in its geometry. In SQL a row is emitted for each match of the right table, and produces a wide row with all columns from the left and all columns from the right - flat. Rows in the left table that had no match in the right table would not be produced at all. Similarly, a LEFT JOIN would have produced flat rows, and a NULL filled row for the right table in case no match was found for the right table.

The behavior also differs in the DDL aspects and expectation. MongoDB does not represent or maintain any kind of foreign key relationship between collection. While relational databases expose foreign keys in the DDL and extend them with things like cascade-delete and insert/update validation of key existence, MongoDB has no such constructs.

Apart from the geometrical and DML-only nature of $lookup, it is also a bit more limited. You cannot lookup into a collection that is sharded. This limits the scale of the database where one could apply the syntax, and should give you pause to think through your modeling carefully before relying on this syntax.

The closest analog of $lookup to SQL JOIN varieties is the LEFT OUTER JOIN.

Simulating INNER JOIN

At times, we want to simulate INNER JOIN. We may want to emit only documents from the first collection which have one or more match from the in the second collection.

There’s a way to do this using aggregation by chaining $lookup and an $unwind.

Full props to Asya Kamsky for this technique!

The subtle point is that $lookup appends an array of values from the second collection for each input document, and an empty array if no match is found. That empty array is the key to this solution. $unwind has the ability to suppress emitting documents where the array field is empty or missing. In fact, that was the default behavior of $unwind all along.

Consider the following data sets:

db.cities.insert([
    { "_id" : "PIE", "name" : "St. Petersburg", "state" : "FL" }
    { "_id" : "YUM", "name" : "Yuma", "state" : "AZ" }
    { "_id" : "TOL", "name" : "Toledo", "state" : "OH" }
    { "_id" : "FAR", "name" : "Fargo", "state" : "ND" }
]);

db.routes.insert([
    { "_id" : 1, "departure" : "PIE", "arrival" : "YUM", "d" : 90 }
    { "_id" : 2, "departure" : "PIE", "arrival" : "TOL", "d" : 60 }
    { "_id" : 3, "departure" : "FAR", "arrival" : "YUM", "d" : 40 }
    { "_id" : 4, "departure" : "YUM", "arrival" : "TOL", "d" : 50 }
    { "_id" : 5, "departure" : "TOL", "arrival" : "FAR", "d" : 30 }
]);

Now query the cities to discover the inbound flights arriving in for each city. For each city, a list of routes is appended from the routes collection. The match is on the cities _id field against the routes arrival field, both of which contain the FAA 3 letter airport code.

    {
        $lookup: {
            from: 'routes',
            localField:'_id',
            foreignField:'arrival',
            as: 'inbound' 
        }
    }
])

The result for Toledo (‘TOL’) will show 2 inbound flights:

{
    "_id" : "TOL",
    "name" : "Toledo",
    "state" : "OH",
    "inbound" : [
            {
                    "_id" : 2,
                    "departure" : "PIE",
                    "arrival" : "TOL",
                    "d" : 60
            },
            {
                    "_id" : 4,
                    "departure" : "YUM",
                    "arrival" : "TOL",
                    "d" : 50
            }
    ]
}

The result for St. Petersburg (PIE), will contain no inbound flights - an empty array. This is because there is no route with an arrival into it.

{
    "_id" : "PIE",
    "name" : "St. Petersburg",
    "state" : "FL",
    "inbound" : [ ]
}

This behavior simulates an SQL LEFT OUTER JOIN. But we want to simulate an INNER JOIN. This is where $unwind comes in. Since version 3.2, $unwind includes the preserveNullAndEmptyArrays option. When true, documents containing empty arrays in the $unwind field will be emitted from the stage. When false, documents entering $unwind with an empty array or missing field will be suppressed - won’t be emitted. The default value of preserveNullAndEmptyArrays is false. So we can actually use the simplified backwards compatible syntax:

db.cities.aggregate([
    {
        $lookup: {
            from: 'routes',
            localField:'_id',
            foreignField:'arrival',
            as: 'inbound' 
        }
    },

    {
        $unwind: '$inbound'
    }
])

This time, St. Petersburg (PIE) will be suppressed, because it had an empty array. Simulated INNER JOIN done!In addition to effectively simulating the INNER JOIN, this technique produces a geometry that matches the SQL counterpart a bit closer. The result documents after $unwind contain a “flat” one-to-one document with all fields from the first collection and all fields from one document from the second collection (albeit demoted one level into the inbound field).

Not Full Parity

Relational databases expose 4 varieties of JOIN: INNER, LEFT OUTER, RIGHT OUTER, and FULL OUTER.

A RIGHT OUTER JOIN is just a LEFT OUTER JOIN, but on the opposite collection, right? Well, not exactly. First, the limitations for $lookup with regards to sharding might prevent you from enjoying full symmetry. Second, you might need to “join” in a late pipeline stage, after a few manipulation of the first collection. In that case - again - it may be either difficult or impossible to reverse the roles of the 2 collections. Consider for example collection A that you use a $text match on, and want to leverage the full text index for filtering. Reversing and aggregating on collection B will prevent you from using the same index effectively.

FULL OUTER JOIN is flatly not supported by $lookup. It would be difficult to implement at scale, and would obviously collide with the sharding limitation on one of the collections involved. Performance questions aside, one would also need to consider the meaning of a pipeline - the basis of the aggregate() command. The pipeline feeds documents from the previous stage or a collection into a given stage. Under no circumstance do all documents from 2 separate collection enter into a stage. $lookup is designed to bring in a subset of matched documents from a different collection, not all documents from a second collection. It may be tempting to simulate a FULL OUTER by using uncorrelated sub-query within the $lookup syntax, using a pipeline option that effectively drags in every document from the second collection. This will attach an array of every document from collection B into an array on every document from collection A. Ouch! Even if this worked, you’d surely need to chase this with a $group or similar operator to get distinct values somehow. But very likely the pipeline will abort: The memory impact on the stage will be arbitrarily large. Finally, and even with allow-disk-usage, the document size limit of 16MB applies. So if collection B has more than 16MB, it will fail. If both collections are sufficiently small, an application-side join might be your best bet. Application side join is not very efficient, but can be acceptable for small-ish collections such as metadata type documents.

In general though, taking SQL operators and hoping that MongoDB or the Aggregation syntax would provide full parity is perilous. Taking the time to analyze the actual use cases and modeling properly using Document Oriented techniques would lead to better performance and cleaner designs.