Sharded Cluster Internals

This document introduces lower level sharding concepts for users who are familiar with sharding generally and want to learn more about the internals. This document provides a more detailed understanding of your cluster’s behavior. For higher level sharding concepts, see Sharded Cluster Overview. For complete documentation of sharded clusters see the Sharding section of this manual.

Shard Keys

Shard keys are the field in a collection that MongoDB uses to distribute documents within a sharded cluster. See the overview of shard keys for a higher-level introduction.

Cardinality

Cardinality in the context of MongoDB, refers to the ability of the system to partition data into chunks. For example, consider a collection of data such as an “address book” that stores address records:

  • Consider the use of a state field as a shard key:

    The state key’s value holds the US state for a given address document. This field has a low cardinality as all documents that have the same value in the state field must reside on the same shard, even if a particular state’s chunk exceeds the maximum chunk size.

    Since there are a limited number of possible values for the state field, MongoDB may distribute data unevenly among a small number of fixed chunks. This may have a number of effects:

    • If MongoDB cannot split a chunk because all of its documents have the same shard key, migrations involving these un-splittable chunks will take longer than other migrations, and it will be more difficult for your data to stay balanced.
    • If you have a fixed maximum number of chunks, you will never be able to use more than that number of shards for this collection.
  • Consider the use of a zipcode field as a shard key:

    While this field has a large number of possible values, and thus has potentially higher cardinality, it’s possible that a large number of users could have the same value for the shard key, which would make this chunk of users un-splittable.

    In these cases, cardinality depends on the data. If your address book stores records for a geographically distributed contact list (e.g. “Dry cleaning businesses in America,”) then a value like zipcode would be sufficient. However, if your address book is more geographically concentrated (e.g “ice cream stores in Boston Massachusetts,”) then you may have a much lower cardinality.

  • Consider the use of a phone-number field as a shard key:

    Phone number has a high cardinality, because users will generally have a unique value for this field, MongoDB will be able to split as many chunks as needed.

While “high cardinality,” is necessary for ensuring an even distribution of data, having a high cardinality does not guarantee sufficient query isolation or appropriate write scaling. Please continue reading for additional information.

Write Scaling

Some possible shard keys will allow your application to take advantage of the increased write capacity that the cluster can provide, while others do not. Consider the following example where you shard by the values of the default _id field, which is ObjectID.

ObjectID is computed upon document creation, that is a unique identifier for the object. However, the most significant bits of data in this value represent a time stamp, which means that they increment in a regular and predictable pattern. Even though this value has high cardinality, when using this, any date, or other monotonically increasing number as the shard key, all insert operations will be storing data into a single chunk, and therefore, a single shard. As a result, the write capacity of this shard will define the effective write capacity of the cluster.

A shard key that increases monotonically will not hinder performance if you have a very low insert rate, or if most of your write operations are update() operations distributed through your entire data set. Generally, choose shard keys that have both high cardinality and will distribute write operations across the entire cluster.

Typically, a computed shard key that has some amount of “randomness,” such as ones that include a cryptographic hash (i.e. MD5 or SHA1) of other content in the document, will allow the cluster to scale write operations. However, random shard keys do not typically provide query isolation, which is another important characteristic of shard keys.

Querying

The mongos provides an interface for applications to interact with sharded clusters that hides the complexity of data partitioning. A mongos receives queries from applications, and uses metadata from the config server, to route queries to the mongod instances with the appropriate data. While the mongos succeeds in making all querying operational in sharded environments, the shard key you select can have a profound affect on query performance.

See also

The mongos and Sharding and config server sections for a more general overview of querying in sharded environments.

Query Isolation

The fastest queries in a sharded environment are those that mongos will route to a single shard, using the shard key and the cluster meta data from the config server. For queries that don’t include the shard key, mongos must query all shards, wait for their response and then return the result to the application. These “scatter/gather” queries can be long running operations.

If your query includes the first component of a compound shard key [1], the mongos can route the query directly to a single shard, or a small number of shards, which provides better performance. Even if you query values of the shard key reside in different chunks, the mongos will route queries directly to specific shards.

To select a shard key for a collection:

  • determine the most commonly included fields in queries for a given application
  • find which of these operations are most performance dependent.

If this field has low cardinality (i.e not sufficiently selective) you should add a second field to the shard key making a compound shard key. The data may become more splittable with a compound shard key.

See also

See

mongos Operational Overview for more information on query operations in the context of sharded clusters. Specifically the mongos Operational Overview sub-section outlines the procedure that mongos uses to route read operations to the shards.

[1]In many ways, you can think of the shard key a cluster-wide unique index. However, be aware that sharded systems cannot enforce cluster-wide unique indexes unless the unique field is in the shard key. Consider the Indexing Overview page for more information on indexes and compound indexes.

Sorting

In sharded systems, the mongos performs a merge-sort of all sorted query results from the shards. See the sharded query routing and Use Indexes to Sort Query Results sections for more information.

Operations and Reliability

The most important consideration when choosing a shard key are:

  • to ensure that MongoDB will be able to distribute data evenly among shards, and
  • to scale writes across the cluster, and
  • to ensure that mongos can isolate most queries to a specific mongod.

Furthermore:

  • Each shard should be a replica set, if a specific mongod instance fails, the replica set members will elect another to be primary and continue operation. However, if an entire shard is unreachable or fails for some reason, that data will be unavailable.
  • If the shard key allows the mongos to isolate most operations to a single shard, then the failure of a single shard will only render some data unavailable.
  • If your shard key distributes data required for every operation throughout the cluster, then the failure of the entire shard will render the entire cluster unavailable.

In essence, this concern for reliability simply underscores the importance of choosing a shard key that isolates query operations to a single shard.

Choosing a Shard Key

For many data sets, there may be no single, naturally occurring key in your collection that possesses all of the qualities of a good shard key. For these cases, you may select one of the following strategies:

  1. Compute a more ideal shard key in your application layer, and store this in all of your documents, potentially in the _id field.

  2. Use a compound shard key that uses two or three values from all documents that provide the right mix of cardinality with scalable write operations and query isolation.

  3. Determine that the impact of using a less than ideal shard key, is insignificant in your use case given:

    • limited write volume,
    • expected data size, or
    • query patterns and demands.
  4. New in version 2.4: Use a hashed shard key. With a hashed shard key, you can choose a field that has high cardinality and create a hashed indexes index on that field. MongoDB then uses the values of this hashed index as the shard key values, thus ensuring an even distribution across the shards.

From a decision making stand point, begin by finding the field that will provide the required query isolation, ensure that writes will scale across the cluster, and then add an additional field to provide additional cardinality if your primary key does not have sufficient split-ability.

Shard Key Indexes

All sharded collections must have an index that starts with the shard key. If you shard a collection that does not yet contain documents and without such an index, the shardCollection command will create an index on the shard key. If the collection already contains documents, you must create an appropriate index before using shardCollection.

Changed in version 2.2: The index on the shard key no longer needs to be identical to the shard key. This index can be an index of the shard key itself as before, or a compound index where the shard key is the prefix of the index. This index cannot be a multikey index.

If you have a collection named people, sharded using the field { zipcode: 1 }, and you want to replace this with an index on the field { zipcode: 1, username: 1 }, then:

  1. Create an index on { zipcode: 1, username: 1 }:

    db.people.ensureIndex( { zipcode: 1, username: 1 } );
    
  2. When MongoDB finishes building the index, you can safely drop existing index on { zipcode: 1 }:

    db.people.dropIndex( { zipcode: 1 } );
    

Warning

The index on the shard key cannot be a multikey index.

As above, an index on { zipcode: 1, username: 1 } can only replace an index on zipcode if there are no array values for the username field.

If you drop the last appropriate index for the shard key, recover by recreating an index on just the shard key.

Hashed Shard Keys

New in version 2.4.

Hashed shard keys use a special hashed index type to store hashes of the shard key field to partition data in a cluster.

Use hashed shard keys when you want to shard using a field that increases monotonically, like an ObjectId, or has high cardinality but uneven distribution.

Example

A hashed index on an ObjectId will lead to an even distribution of documents across all shards since the hash of two sequential documents will have different hashes.

Note

Hash-based sharding does not support tag-aware sharding.

Warning

hashed indexes truncate floating point numbers to 64-bit integers before hashing. For example, a hashed index would store the same value for a field that held a value of 2.3, 2.2 and 2.9. To prevent collisions, do not use a hashed index for floating point numbers that cannot be consistently converted to 64-bit integers (and then back to floating point.) hashed indexes do not support floating point values larger than 253.

Cluster Balancer

The balancer sub-process is responsible for redistributing chunks evenly among the shards and ensuring that each member of the cluster is responsible for the same volume of data. This section contains complete documentation of the balancer process and operations. For a higher level introduction see the Shard Balancing section.

Balancing Internals

A balancing round originates from an arbitrary mongos instance from one of the cluster’s mongos instances. When a balancer process is active, the responsible mongos acquires a “lock” by modifying a document in the lock collection in the Config Database.

By default, the balancer process is always running. When the number of chunks in a collection is unevenly distributed among the shards, the balancer begins migrating chunks from shards with more chunks to shards with a fewer number of chunks. The balancer will continue migrating chunks, one at a time, until the data is evenly distributed among the shards.

While these automatic chunk migrations are crucial for distributing data, they carry some overhead in terms of bandwidth and workload, both of which can impact database performance. As a result, MongoDB attempts to minimize the effect of balancing by only migrating chunks when the distribution of chunks passes the migration thresholds.

The migration process ensures consistency and maximizes availability of chunks during balancing: when MongoDB begins migrating a chunk, the database begins copying the data to the new server and tracks incoming write operations. After migrating chunks, the “from” mongod sends all new writes to the “receiving” server. Finally, mongos updates the chunk record in the config database to reflect the new location of the chunk.

Note

Changed in version 2.0: Before MongoDB version 2.0, large differences in timekeeping (i.e. clock skew) between mongos instances could lead to failed distributed locks, which carries the possibility of data loss, particularly with skews larger than 5 minutes. Always use the network time protocol (NTP) by running ntpd on your servers to minimize clock skew.

Migration Thresholds

Changed in version 2.2: The following thresholds appear first in 2.2; prior to this release, balancing would only commence if the shard with the most chunks had 8 more chunks than the shard with the least number of chunks.

In order to minimize the impact of balancing on the cluster, the balancer will not begin balancing until the distribution of chunks has reached certain thresholds. These thresholds apply to the difference in number of chunks between the shard with the greatest number of chunks and the shard with the least number of chunks. The balancer has the following thresholds:

Number of Chunks Migration Threshold
Less than 20 2
21-80 4
Greater than 80 8

Once a balancing round starts, the balancer will not stop until the difference between the number of chunks on any two shards is less than two or a chunk migration fails.

Note

You can restrict the balancer so that it only operates between specific start and end times. See Schedule the Balancing Window for more information.

The specification of the balancing window is relative to the local time zone of all individual mongos instances in the sharded cluster.

Chunk Size

The default chunk size in MongoDB is 64 megabytes.

When chunks grow beyond the specified chunk size a mongos instance will split the chunk in half. This will eventually lead to migrations, when chunks become unevenly distributed among the cluster. The mongos instances will initiate a round of migrations to redistribute data in the cluster.

Chunk size is arbitrary and must account for the following:

  1. Small chunks lead to a more even distribution of data at the expense of more frequent migrations, which creates expense at the query routing (mongos) layer.
  2. Large chunks lead to fewer migrations, which is more efficient both from the networking perspective and in terms internal overhead at the query routing layer. Large chunks produce these efficiencies at the expense of a potentially more uneven distribution of data.

For many deployments it makes sense to avoid frequent and potentially spurious migrations at the expense of a slightly less evenly distributed data set, but this value is configurable. Be aware of the following limitations when modifying chunk size:

  • Automatic splitting only occurs when inserting documents or updating existing documents; if you lower the chunk size it may take time for all chunks to split to the new size.
  • Splits cannot be “undone:” if you increase the chunk size, existing chunks must grow through insertion or updates until they reach the new size.

Note

Chunk ranges are inclusive of the lower boundary and exclusive of the upper boundary.

Shard Size

By default, MongoDB will attempt to fill all available disk space with data on every shard as the data set grows. Monitor disk utilization in addition to other performance metrics, to ensure that the cluster always has capacity to accommodate additional data.

You can also configure a “maximum size” for any shard when you add the shard using the maxSize parameter of the addShard command. This will prevent the balancer from migrating chunks to the shard when the value of mapped exceeds the maxSize setting.

Chunk Migration

MongoDB migrates chunks in a sharded cluster to distribute data evenly among shards. Migrations may be either:

  • Manual. In these migrations you must specify the chunk that you want to migrate and the destination shard. Only migrate chunks manually after initiating sharding, to distribute data during bulk inserts, or if the cluster becomes uneven. See Migrating Chunks for more details.
  • Automatic. The balancer process handles most migrations when distribution of chunks between shards becomes uneven. See Migration Thresholds for more details.

All chunk migrations use the following procedure:

  1. The balancer process sends the moveChunk command to the source shard for the chunk. In this operation the balancer passes the name of the destination shard to the source shard.

  2. The source initiates the move with an internal moveChunk command with the destination shard.

  3. The destination shard begins requesting documents in the chunk, and begins receiving these chunks.

  4. After receiving the final document in the chunk, the destination shard initiates a synchronization process to ensure that all changes to the documents in the chunk on the source shard during the migration process exist on the destination shard.

    When fully synchronized, the destination shard connects to the config database and updates the chunk location in the cluster metadata. After completing this operation, once there are no open cursors on the chunk, the source shard starts deleting its copy of documents from the migrated chunk.

If enabled, the _secondaryThrottle setting causes the balancer to wait for replication to secondaries. For more information, see Require Replication before Chunk Migration (Secondary Throttle).

Detect Connections to mongos Instances

If your application must detect if the MongoDB instance its connected to is mongos, use the isMaster command. When a client connects to a mongos, isMaster returns a document with a msg field that holds the string isdbgrid. For example:

{
   "ismaster" : true,
   "msg" : "isdbgrid",
   "maxBsonObjectSize" : 16777216,
   "ok" : 1
}

If the application is instead connected to a mongod, the returned document does not include the isdbgrid string.

Sharded Cluster Metadata

Sharded cluster metadata is contained in the Config Database and comprises information about the sharded cluster’s partitioned data sets. The config database stores the relationship between chunks and where they reside within a sharded cluster. Without a config database, the mongos instances would be unable to route queries or write operations within the cluster.

Config Database

The config database contains information about your sharding configuration and stores the information in a set of collections used by sharding.

Important

Back up the config database before performing any maintenance on the config server.

To access the config database, issue the following command from the mongo shell:

use config

In general, you should never manipulate the content of the config database directly. The config database contains the following collections:

See Config Database for full documentation of these collections and their role in sharded clusters.

Config Servers

Config servers are special mongod instances that maintain the sharded cluster metadata in the config database. A sharded cluster operates with a group of three config servers that use a two-phase commit process that ensures immediate consistency and reliability. Config servers do not run as replica sets.

For testing purposes you may deploy a cluster with a single config server, but this is not recommended for production.

All config servers must be available on initial setup of a sharded cluster. Each mongos instance must be able to write to the config.version collection.

Warning

If your cluster has a single config server, this mongod is a single point of failure. If the instance is inaccessible the cluster is not accessible. If you cannot recover the data on a config server, the cluster will be inoperable.

Always use three config servers for production deployments.

Read and Write Operations on Config Servers

The load on configuration servers is small because each mongos instance maintains a cached copy of the configuration database. MongoDB only writes data to the config server to:

  • create splits in existing chunks, which happens as data in existing chunks exceeds the maximum chunk size.
  • migrate a chunk between shards.

If one or two configuration instances become unavailable, the cluster’s metadata becomes read only. It is still possible to read and write data from the shards, but no chunk migrations or splits will occur until all three servers are accessible. At the same time, config server data is only read in the following situations:

  • A new mongos starts for the first time, or an existing mongos restarts.
  • After a chunk migration, the mongos instances update themselves with the new cluster metadata.

If all three config servers are inaccessible, you can continue to use the cluster as long as you don’t restart the mongos instances until after config servers are accessible again. If you restart the mongos instances and there are no accessible config servers, the mongos would be unable to direct queries or write operations to the cluster.

Because the configuration data is small relative to the amount of data stored in a cluster, the amount of activity is relatively low, and 100% up time is not required for a functioning sharded cluster. As a result, backing up the config servers is not difficult. Backups of config servers are critical as clusters become totally inoperable when you lose all configuration instances and data. Precautions to ensure that the config servers remain available and intact are critical.

Note

Configuration servers store metadata for a single sharded cluster. You must have a separate configuration server or servers for each cluster you administer.