Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
262 views
in Technique[技术] by (71.8m points)

database design - How should I implement this schema in MongoDB?

I'm trying to write a tracking script and I'm having trouble with figuring out how the database should work.

In MySQL I'd create a table that looks similar to

User:
   username_name: string

Campaign:
   title: string
   description: string
   link: string

UserCampaign:
   user_id: integer
   camp_id: integer

Click:
   os: text
   referer: text
   camp_id: integer
   user_id: integer

I need to be able to:

  • See the information from each click like IP, Referer, OS, etc
  • See how many often clicks are coming from X IP, X Referer, X OS
  • Associate each click with a User and a Campaign

If I do something along the lines of

User {
     Campaigns: [
         {
           Clicks: []
         }
     ]
}

I run into two problems:

  • It creates a new campaign object for each user which is a problem because if I need to update my campaign I'd need to update the object for each user
  • I expect the Clicks array to contain a LARGE amount of data, I feel like having it a part of the User object will make it very slow to query
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

OK, I think you need to break this out into the basic "varieties".

You have two "entity"-style objects:

  • User
  • Campaign

You have one "mapping"-style object:

  • UserCampaign

You have one "transactional"-style object:

  • Click

Step 1: entity

Let's start with the easy ones: User & Campaign. These are truly two separate objects, neither one really depends on the other for its existence. There's also no implicit heirarchy between the two: Users do not belong to Campaigns, nor do Campaigns belong to Users.

When you have two top-level objects like this, they generally earn their own collection. So you'll want a Users collection and a Camapaigns collection.

Step 2: mapping

UserCampaign is currently used to represent an N-to-M mapping. Now, in general, when you have an N-to-1 mapping, you can put the N inside of the 1. However, with the N-to-M mapping, you generally have to "pick a side".

In theory, you could do one of the following:

  1. Put a list of Campaign IDs inside of each User
  2. Put a list of Users IDs inside of each Campaign

Personally, I would do #1. You probably have way more users that campaigns, and you probably want to put the array where it will be shorter.

Step 3: transactional

Clicks is really a completely different beast. In object terms you could think the following: Clicks "belong to" a User, Clicks "belong to" a Campaign. So, in theory, you could just store clicks are part of either of these objects. It's easy to think that Clicks belong under Users or Campaigns.

But if you really dig deeper, the above simplification is really flawed. In your system, Clicks are really a central object. In fact, you might even be able to say that Users & Campaigns are really just "associated with" the click.

Take a look at the questions / queries that you're asking. All of those questions actually center around clicks. Users & Campaigns are not the central object in your data, Clicks are.

Additionally, Clicks are going to be the most plentiful data in your system. You're going to have way more clicks than anything else.

This is the biggest hitch when designing a schema for data like this. Sometimes you need to push off "parent" objects when they're not the most important thing. Imagine building a simple e-commerce system. It's clear that orders would "belong to" users, but orders is so central to the system that it's going to be a "top-level" object.

Wrapping it up

You'll probably want three collections:

  1. User -> has list of campaign._id
  2. Campaign
  3. Clicks -> contains user._id, campaign._id

This should satisfy all of your query needs:

See the information from each click like IP, Referer, OS, etc

db.clicks.find()

See how many often clicks are coming from X IP, X Referer, X OS

db.clicks.group() or run a Map-Reduce.

Associate each click with a User and a Campaign

db.clicks.find({user_id : blah}) It's also possible to push click IDs into both users and campaigns (if that makes sense).

Please note that if you have lots and lots of clicks, you'll really have to analyze the queries you run most. You can't index on every field, so you'll often want to run Map-Reduces to "roll-up" the data for these queries.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...