A MongoDB replica set is a cluster of mongod instances that replicate amongst one another and ensure automated failover. Most replica sets consist of two or more mongod instances with at most one of these designated as the primary and the rest as secondary members. Clients direct all writes to the primary, while the secondary members replicate from the primary asynchronously.
MongoDBのデータベースレプリケーションは、冗長性を追加し、高可用性を確保し、バックアップなどの特定の管理タスクを簡素化して、読み出し能力を高めるようにしています。レプリケーションは、大多数の運用配備において使用されています。
If you’re familiar with other database systems, you may think about replica sets as a more sophisticated form of traditional master-slave replication. [1] In master-slave replication, a master node accepts writes while one or more slave nodes replicate those write operations and thus maintain data sets identical to the master. For MongoDB deployments, the member that accepts write operations is the primary, and the replicating members are secondaries.
MongoDB’s replica sets provide automated failover. If a primary fails, the remaining members will automatically try to elect a new primary.
A replica set can have up to 12 members, but only 7 members can have votes. For information regarding non-voting members, see non-voting members
See also
The Replication index for a list of the documents in this manual that describe the operation and use of replica sets.
| [1] | MongoDBは、従来のマスター/スレーブレプリケーションも提供します。マスター/スレーブレプリケーションは、レプリカセットと同じように動作しますが、自動フェイルオーバー機能がありません。運用においてはレプリカセットが推奨されるソリューションですが、レプリカセットは合計最大12メンバーまでしかサポートしません。運用で11を超える slave メンバーが必要な場合は、マスター/スレーブレプリケーションを使用する必要があります。 |
You can configure replica set members in a variety of ways, as listed here. In most cases, members of a replica set have the default proprieties.
These members have data but cannot become primary under any circumstance. To configure a member to be secondary-only, see Prevent Replica Set Member from Becoming Primary.
Delayed members copy and apply operations from the primary’s oplog with a specified delay. If a member has a delay of one hour, then the latest entry in this member’s oplog will not be more recent than one hour old, and the state of data for the member will reflect the state of the set an hour earlier.
Example
If the current time is 09:52 and the secondary is a delayed by an hour, no operation will be more recent than 08:52.
Delayed members may help recover from various kinds of human error. Such errors may include inadvertently deleted databases or botched application upgrades. Consider the following factors when determining the amount of slave delay to apply:
Delayed members must have a priority set to 0 to prevent them from becoming primary in their replica sets. Also these members should be hidden to prevent your application from seeing or querying this member.
To configure a member to be a delayed member, see Configure a Delayed Replica Set Member.
These members have no data and exist solely to participate in elections. Arbiters have the following interactions with the rest of the replica set:
Credential exchanges that authenticate the arbiter with the replica set. All MongoDB processes within a replica set use keyfiles. These exchanges are encrypted.
MongoDB only transmits the authentication credentials in a cryptographically secure exchange, and encrypts no other exchange.
Exchanges of replica set configuration data and of votes. These are not encrypted.
If your MongoDB deployment uses SSL, then all communications between arbiters and the other members of the replica set are secure. See the documentation Connect to MongoDB with SSL for more information. As with all MongoDB components, run arbiters on secure networks.
To add an arbiter to the replica set, see Add an Arbiter to Replica Set.
These members do not vote in elections. Non-voting members are only used for larger sets with more than 12 members. To configure a member as non-voting, see Configure a Non-Voting Replica Set Member.
Replica sets feature automated failover. If the primary goes offline or becomes unresponsive and a majority of the original set members can still connect to each other, the set will elect a new primary.
While failover is automatic, replica set administrators should still understand exactly how this process works. This section below describe failover in detail.
In most cases, failover occurs without administrator intervention seconds after the primary either steps down, becomes inaccessible, or becomes otherwise ineligible to act as primary. If your MongoDB deployment does not failover according to expectations, consider the following operational errors:
In many senses, rollbacks represent a graceful recovery from an impossible failover and recovery situation.
Rollbacks occur when a primary accepts writes that other members of the set do not successfully replicate before the primary steps down. When the former primary begins replicating again it performs a “rollback.” If the operations replicate to another member and that member remains available and accessible to a majority of the replica set, there will be no rollback.
Rollbacks remove those operations from the instance that were never replicated to the set so that the data set is in a consistent state. The mongod program writes rolled back data to a BSON file that you can view using bsondump, applied manually using mongorestore.
You can prevent rollbacks using a replica acknowledged write concern. These write operations require not only the primary to acknowledge the write operation, sometimes even the majority of the set to confirm the write operation before returning.
See also
The Elections section in the Replica Set Fundamental Concepts document, and the Election Internals section in the Replica Set Internals and Behaviors document.
フェイルオーバーが発生すると、プライマリとなるべきメンバーを決めるための選定が行われます。
Elections provide a mechanism for the members of a replica set to autonomously select a new primary without administrator intervention. The election allows replica sets to recover from failover situations very quickly and robustly.
Whenever the primary becomes unreachable, the secondary members trigger an election. The first member to receive votes from a majority of the set will become primary. The most important feature of replica set elections is that a majority of the original number of members in the replica set must be present for election to succeed. If you have a three-member replica set, the set can elect a primary when two or three members can connect to each other. If two members in the replica go offline, then the remaining member will remain a secondary.
Note
When the current primary steps down and triggers an election, the mongod instances will close all client connections. This ensures that the clients maintain an accurate view of the replica set and helps prevent rollbacks.
Members on either side of a network partition cannot see each other when determining whether a majority is available to hold an election.
That means that if a primary steps down and neither side of the partition has a majority on its own, the set will not elect a new primary and the set will become read only. To avoid this situation, attempt to place a majority of instances in one data center with a minority of instances in a secondary facility.
For more information on elections and failover, see the Failover and Recovery section in the Troubleshoot Replica Sets document.
In a replica set, every member has a “priority,” that helps determine eligibility for election to primary. By default, all members have a priority of 1, unless you modify the priority value. All members have a single vote in elections.
Warning
Always configure the priority value to control which members will become primary. Do not configure votes except to permit more than 7 secondary members.
For more information on member priorities, see the Adjust Priority for Replica Set Member document.
本項は、データベース整合性を支える概念、およびユーザーが整合性のあるデータ状態にアクセスできるMongoDBの仕組みの概要を提供します。
In MongoDB, all read operations issued to the primary of a replica set are consistent with the last write operation.
If clients configure the read preference to permit secondary reads, read operations cannot return from secondary members that have not replicated more recent updates or operations. In these situations the query results may reflect a previous state.
This behavior is sometimes characterized as eventual consistency because the secondary member’s state will eventually reflect the primary’s state and MongoDB cannot guarantee strict consistency for read operations from secondary members.
There is no way to guarantee consistency for reads from secondary members, except by configuring the client and driver to ensure that write operations succeed on all members before completing successfully.
In some failover situations primaries will have accepted write operations that have not replicated to the secondaries after a failover occurs. This case is rare and typically occurs as a result of a network partition with replication lag. When this member (the former primary) rejoins the replica set and attempts to continue replication as a secondary the former primary must revert these operations or “roll back” these operations to maintain database consistency across the replica set.
MongoDB writes the rollback data to a BSON file in the database’s dbpath directory. Use bsondump to read the contents of these rollback files and then manually apply the changes to the new primary. There is no way for MongoDB to appropriately and fairly handle rollback situations automatically. Therefore you must intervene manually to apply rollback data. Even after the member completes the rollback and returns to secondary status, administrators will need to apply or decide to ignore the rollback data. MongoDB writes rollback data to a rollback/ folder within the dbpath directory to files with filenames in the following form:
<database>.<collection>.<timestamp>.bson
For example:
records.accounts.2011-05-09T18-10-04.0.bson
The best strategy for avoiding all rollbacks is to ensure write propagation to all or some of the members in the set. Using these kinds of policies prevents situations that might create rollbacks.
Warning
A mongod instance will not rollback more than 300 megabytes of data. If your system needs to rollback more than 300 MB, you will need to manually intervene to recover this data. If this is the case, you will find the following line in your mongod log:
[replica set sync] replSet syncThread: 13410 replSet too much data to roll back
In these situations you will need to manually intervene to either save data or to force the member to perform an initial sync from a “current” member of the set by deleting the content of the existing dbpath directory.
For more information on failover, see:
Client applications are indifferent to the configuration and operation of replica sets. While specific configuration depends to some extent on the client drivers, there is often minimal or no difference between applications using replica sets or standalone instances.
There are two major concepts that are important to consider when working with replica sets:
Write concern sends a MongoDB client a response from the server to confirm successful write operations. In replica sets you can configure replica acknowledged write concern to ensure that secondary members of the set have replicated operations before the write returns.
By default, read operations issued against a replica set return results from the primary. Users may configure read preference on a per-connection basis to prefer that read operations return on the secondary members.
Read preference and write concern have particular consistency implications.
For a more detailed discussion of application concerns, see Replica Set Considerations and Behaviors for Applications and Development.
This section provides a brief overview of concerns relevant to administrators of replica set deployments.
For more information on replica set administration, operations, and architecture, see:
The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify that data stored in your databases. MongoDB applies database operations on the primary and then records the operations on the primary’s oplog. The secondary members then replicate this log and apply the operations to themselves in an asynchronous process. All replica set members contain a copy of the oplog, allowing them to maintain the current state of the database. Operations in the oplog are idempotent.
By default, the size of the oplog is as follows:
For 64-bit Linux, Solaris, FreeBSD, and Windows systems, MongoDB will allocate 5% of the available free disk space to the oplog.
この容量が1ギガバイトより小さい場合は、MongoDBは1ギガバイトの容量を割り当てます。
64ビットOS Xシステムでは、MongoDBは183メガバイトの容量をoplogに割り当てます。
32ビットシステムではMongoDBは約48メガバイトの容量をoplogに割り当てます。
Before oplog creation, you can specify the size of your oplog with the oplogSize option. After you start a replica set member for the first time, you can only change the size of the oplog by using the Change the Size of the Oplog tutorial.
In most cases, the default oplog size is sufficient. For example, if an oplog that is 5% of free disk space fills up in 24 hours of operations, then secondaries can stop copying entries from the oplog for up to 24 hours without becoming stale. However, most replica sets have much lower operation volumes, and their oplogs can hold a much larger number of operations.
The following factors affect how MongoDB uses space in the oplog:
Update operations that affect multiple documents at once.
The oplog must translate multi-updates into individual operations, in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in disk utilization.
If you delete roughly the same amount of data as you insert.
この状況ではデータベースはディスク使用においてさほど増大はしませんが、操作ログのサイズは相当大きくなります。
If a significant portion of your workload entails in-place updates.
インプレースアップデートは多数の操作を創り出しますが、ディスクのデータ量に変更はありません。
レプリカセットの作業量が上記のパターンのひとつに似ていると予測できるなら、デフォルトより大きめのoplogを作成することを検討してください。それとは逆に、MongoDBベースのアプリケーションのアクティビティの大半が読み出しであり、少量のデータしか書き込みをしないのであれば、より小さいoplogで十分となる可能性もあります。
To view oplog status, including the size and the time range of operations, issue the db.printReplicationInfo() method. For more information on oplog status, see Check the Size of the Oplog.
For additional information about oplog behavior, see Oplog Internals and Syncing.
Without replication, a standalone MongoDB instance represents a single point of failure and any disruption of the MongoDB system will render the database unusable and potentially unrecoverable. Replication increase the reliability of the database instance, and replica sets are capable of distributing reads to secondary members depending on read preference. For database work loads dominated by read operations, (i.e. “read heavy”) replica sets can greatly increase the capability of the database system.
The minimum requirements for a replica set include two members with data, for a primary and a secondary, and an arbiter. In most circumstances, however, you will want to deploy three data members.
読み出しをセカンダリインスタンスへ配分することに大きく依存する配備では、負荷が増すごとに追加ノードをセットに加えます。また配備が増えるにつれ、レプリカセットメンバーをセカンダリデータセンター、または地理的に異なる場所に追加するか移動させることで、冗長性を増すことを検討します。いろいろなアーキテクチュアが可能ですが、プライマリを選ぶのに必要なノードの定数は主要設備に確保することを常に念頭に置いてください。
Depending on your operational requirements, you may consider adding members configured for a specific purpose including, a delayed member to help provide protection against human errors and change control, a hidden member to provide an isolated member for reporting and monitoring, and/or a secondary only member for dedicated backups.
新規レプリカセットメンバーの設定プロセスは、現存するノードにとってリソース消費の激しい作業となることがあります。ですから、現在のデマンドが現存のメンバーを消耗させるずっと前に、新規メンバーを現存レプリカセットに配置してください。
Note
ジャーナリングは単一のインスタンスの書き込み耐久性を提供します。ジャーナリングは、データベースの信頼性と耐久性を大きく強化します。MongoDBをジャーナリングとともに実行しないと、MongoDBインスタンスが突然停止した場合、データベースは破壊され、復旧不能になる可能性があります。
ジャーナリングなしで実行されるデータベースは、クラッシュまたは奇怪なシャットダウンの事態が起きた場合、破壊状態または回復不能状態となることを前提として覚悟しておくべきです。
ジャーナリングを使う、ジャーナリングがあるからといって、適切なレプリケーションを省略しないこと
2.0バージョン以降の64-ビット版 MongoDBに は、デフォルトとしてジャーナリング能力が備わっています。
In most cases, replica set administrators do not have to keep additional considerations in mind beyond the normal security precautions that all MongoDB administrators must take. However, ensure that:
For most instances, the most effective ways to control access and to secure the connection between members of a replica set depend on network-level access control. Use your environment’s firewall and network routing to ensure that traffic only from clients and other replica set members can reach your mongod instances. If needed, use virtual private networks (VPNs) to ensure secure connections over wide area networks (WANs.)
Additionally, MongoDB provides an authentication mechanism for mongod and mongos instances connecting to replica sets. These instances enable authentication but specify a shared key file that serves as a shared password.
New in version 1.8: Added support authentication in replica set deployments.
Changed in version 1.9.1: Added support authentication in sharded replica set deployments.
To enable authentication add the following option to your configuration file:
keyFile = /srv/mongodb/keyfile
Note
You may chose to set these run-time configuration options using the --keyFile (or mongos --keyFile) options on the command line.
Setting keyFile enables authentication and specifies a key file for the replica set members to use when authenticating to each other. The content of the key file is arbitrary but must be the same on all members of the replica set and on all mongos instances that connect to the set.
The key file must be less one kilobyte in size and may only contain characters in the base64 set. The key file must not have group or “world” permissions on UNIX systems. Use the following command to use the OpenSSL package to generate “random” content for use in a key file:
openssl rand -base64 753
Note
Key file permissions are not checked on Windows systems.
The architecture and design of the replica set deployment can have a great impact on the set’s capacity and capability. This section provides a general overview of the architectural possibilities for replica set deployments. However, for most production deployments a conventional 3-member replica set with priority values of 1 are sufficient.
以下で説明する追加的柔軟性は、各種の操作の複雑さを管理するのに有用ですが、複雑な要件をして複雑なアーキテクチュアを定めさせることが理にかなっており、不必要な複雑性を展開に加えるにはおよびません。
レプリカセットにアーキテクチュアを展開するときには、以下の要素を考慮してください。
For more information regarding replica set configuration and deployments see Replica Set Architectures and Deployment Patterns.