Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
422 views
in Technique[技术] by (71.8m points)

mongodb - Possibility of duplicate Mongo ObjectId's being generated in two different collections?

Is it possible for the same exact Mongo ObjectId to be generated for a document in two different collections? I realize that it's definitely very unlikely, but is it possible?

Without getting too specific, the reason I ask is that with an application that I'm working on we show public profiles of elected officials who we hope to convert into full fledged users of our site. We have separate collections for users and the elected officials who aren't currently members of our site. There are various other documents containing various pieces of data about the elected officials that all map back to the person using their elected official ObjectId.

After creating the account we still highlight the data that's associated to the elected official but they now also are a part of the users collection with a corresponding users ObjectId to map their profile to interactions with our application.

We had begun converting our application from MySql to Mongo a few months ago and while we're in transition we store the legacy MySql id for both of these data types and we're also starting to now store the elected official Mongo ObjectId in the users document to map back to the elected official data.

I was pondering just specifying the new user ObjectId as the previous elected official ObjectId to make things simpler but wanted to make sure that it wasn't possible to have a collision with any existing user ObjectId.

Thanks for your insight.

Edit: Shortly after posting this question, I realized that my proposed solution wasn't a very good idea. It would be better to just keep the current schema that we have in place and just link to the elected official '_id' in the users document.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Short Answer

Just to add a direct response to your initial question: YES, if you use BSON Object ID generation, then for most drivers the IDs are almost certainly going to be unique across collections. See below for what "almost certainly" means.

Long Answer

The BSON Object ID's generated by Mongo DB drivers are highly likely to be unique across collections. This is mainly because of the last 3 bytes of the ID, which for most drivers is generated via a static incrementing counter. That counter is collection-independent; it's global. The Java driver, for example, uses a randomly initialized, static AtomicInteger.

So why, in the Mongo docs, do they say that the IDs are "highly likely" to be unique, instead of outright saying that they WILL be unique? Three possibilities can occur where you won't get a unique ID (please let me know if there are more):

Before this discussion, recall that the BSON Object ID consists of:

[4 bytes seconds since epoch, 3 bytes machine hash, 2 bytes process ID, 3 bytes counter]

Here are the three possibilities, so you judge for yourself how likely it is to get a dupe:

1) Counter overflow: there are 3 bytes in the counter. If you happen to insert over 16,777,216 (2^24) documents in a single second, on the same machine, in the same process, then you may overflow the incrementing counter bytes and end up with two Object IDs that share the same time, machine, process, and counter values.

2) Counter non-incrementing: some Mongo drivers use random numbers instead of incrementing numbers for the counter bytes. In these cases, there is a 1/16,777,216 chance of generating a non-unique ID, but only if those two IDs are generated in the same second (i.e. before the time section of the ID updates to the next second), on the same machine, in the same process.

3) Machine and process hash to the same values. The machine ID and process ID values may, in some highly unlikely scenario, map to the same values for two different machines. If this occurs, and at the same time the two counters on the two different machines, during the same second, generate the same value, then you'll end up with a duplicate ID.

These are the three scenarios to watch out for. Scenario 1 and 3 seem highly unlikely, and scenario 2 is totally avoidable if you're using the right driver. You'll have to check the source of the driver to know for sure.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...