Mongodb Schema Design Basics

  • Uploaded by: Alvin John Richards
  • 0
  • 0
  • February 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Mongodb Schema Design Basics as PDF for free.

More details

  • Words: 1,720
  • Pages: 51
Loading documents preview...
open-­‐source,  high-­‐performance,   document-­‐oriented  database  

Schema Design Basics Alvin Richards [email protected]

This talk

Part One

Part Two

‣ 

‣ 

Intro

‣ 

‣ 

Terms / Definitions

Getting a flavor

‣ 

Creating a Schema

‣ 

Indexes

‣ 

Evolving the Schema

Data modeling

‣ 

DBRef

‣ 

Single Table Inheritance

‣ 

Many – Many

‣ 

Trees

‣ 

Lists / Queues / Stacks

So why model data?

A brief history of normalization •  1970 E.F.Codd introduces 1st Normal Form (1NF)

•  1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)

•  1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)

•  2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

Goals:

•  Avoid anomalies when inserting, updating or deleting

•  Minimize redesign when extending the schema

•  Make the model informative to users

•  Avoid bias towards a particular style of query

* source : wikipedia

Relational made normalized data look like this

Document databases make normalized data look like this

Some terms before we proceed RDBMS

Document DBs

Table

Collection

Row(s)

JSON Document

Index

Index

Join

Partition

Embedding & Linking across documents

Shard

Partition Key

Shard Key

DB Considerations

How can we manipulate this data ?

•  •  •  • 

Dynamic Queries Secondary Indexes Atomic Updates Map Reduce

Access Patterns ?

•  •  •  • 

Read / Write Ratio Types of updates Types of queries Data life-cycle

Considerations •  No Joins •  Document writes are atomic

Design Session Design documents that simply map to your application post  =  {author:  “kyle”,                  date:  new  Date(),                  text:  “my  blog  post...”,                  tags:  [“mongodb”,  “intro”]}   >db.post.save(post)  

Find the document >db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ] } Notes: •  ID must be unique, but can be anything you’d like •  MongoDB will generate a default ID if one is not supplied

Add and index, find via Index Secondary index for “author” // 1 means ascending, -1 means descending >db.posts.ensureIndex({author: 1}) >db.posts.find({author: 'kyle'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", ... }

Verifying indexes exist >db.system.indexes.find() // Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } }

// Index on author { _id : ObjectId("4c4ba6c5672c685e5e8aabf4"), ns : "test.posts", key : { "author" : 1 }, name : "author_1" }

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}})

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i })

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i }) Counting: // posts written by mike    >db.posts.find({author:  “mike”}).count()  

Extending the Schema new_comment = {author: “fred”, date: new Date(), text: “super duper”} new_info = { ‘$push’: {comments: new_comment}, ‘$inc’: {comments_count: 1}}  >db.posts.update({_id:  “...”  },  new_info)  

Extending the Schema { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ], comments_count: 1, comments : [

{

author : "Fred",

date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)",

text : "Super Duper"

} ]}

Extending the Schema // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”})

Extending the Schema // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date:-1}).limit(5)

Extending the Schema // create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date:-1}).limit(5) // most commented post: >db.posts.find().sort({comments_count:-1}).limit(1) When sorting, check if you need an index

Map Reduce Aggregation and batch manipulation Collection in, Collection out Parallel in sharded environments

Map reduce mapFunc = function () { this.tags.forEach(function (z) {emit(z, {count:1});}); } reduceFunc = function (k, v) { var total = 0; for (var i = 0; i < v.length; i++) { total += v[i].count; } return {count:total}; } res = db.posts.mapReduce(mapFunc, reduceFunc)

>db[res.result].find() { _id : "intro", value : { count : 1 } } { _id : "mongodb", value : { count : 1 } }

Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more

Wordnik 9B records, 100M queries / week, 1.2TB {

entry : {

header: { id: 0,

headword: "m",

sourceDictionary: "GCide",

textProns : [

{text: "(em)",

seq:0}

],

syllables: [



{id: 0,

text: "m"}

],





sourceDictionary: "1913 Webster",

headWord: "m",

id: 1,

definitions: : [

{text: "M, the thirteenth letter..."},

{text: "As a numeral, M stands for 1000"}]

}

} }

Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more Observations: - Using Rich Documents works well - Simplify relations by embedding them - Iterative development is easy with MongoDB

Single Table Inheritance >db.shapes.find() { _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1} { _id: ObjectId("..."), type: "square", area: 4, d: 2} { _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}

// find shapes where radius > 0 >db.shapes.find({radius: {$gt: 0}})

// create index >db.shapes.ensureIndex({radius: 1})

One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents

One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query

One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most flexible - more queries

Many - Many Example: - Product can be in many categories - Category can have many products

Products

- product_id

Prod_Categories

-  id

-  product_id

-  category_id

Category

- category_id

Many - Many products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

Many - Many products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}

Many - Many products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}

//All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})

Many - Many products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}

//All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})

//All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}

Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}

// All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}

// All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

// All categories for a given product product = db.products.find(_id : some_id) >db.categories.find({_id : {$in : product.category_ids}})

Trees Full Tree in Document { comments: [ { author: “rpb”, text: “...”, replies: [ {author: “Fred”, text: “...”, replies: []} ]} ]}

Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 4MB limit

Trees Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)

Array of Ancestors - Store Ancestors of a node { { { { { { {

_id: _id: _id: _id: _id: _id: _id:

"a" } "b", ancestors: [ "a" ], parent: "a" } "c", ancestors: [ "a", "b" ], parent: "b" } "d", ancestors: [ "a", "b" ], parent: "b" } "e", ancestors: [ "a" ], parent: "a" } "f", ancestors: [ "a", "e" ], parent: "e" } "g", ancestors: [ "a", "b", "d" ], parent: "d" }

Array of Ancestors - Store Ancestors of a node { { { { { { {

_id: _id: _id: _id: _id: _id: _id:

"a" } "b", ancestors: [ "a" ], parent: "a" } "c", ancestors: [ "a", "b" ], parent: "b" } "d", ancestors: [ "a", "b" ], parent: "b" } "e", ancestors: [ "a" ], parent: "a" } "f", ancestors: [ "a", "e" ], parent: "e" } "g", ancestors: [ "a", "b", "d" ], parent: "d" }

//find all descendants of b: >db.tree2.find({ancestors: ‘b’})

Array of Ancestors - Store Ancestors of a node { { { { { { {

_id: _id: _id: _id: _id: _id: _id:

"a" } "b", ancestors: [ "a" ], parent: "a" } "c", ancestors: [ "a", "b" ], parent: "b" } "d", ancestors: [ "a", "b" ], parent: "b" } "e", ancestors: [ "a" ], parent: "a" } "f", ancestors: [ "a", "e" ], parent: "e" } "g", ancestors: [ "a", "b", "d" ], parent: "d" }

//find all descendants of b: >db.tree2.find({ancestors: ‘b’})

//find all ancestors of f: >ancestors = db.tree2.findOne({_id:’f’}).ancestors >db.tree2.find({_id: { $in : ancestors})

findAndModify Queue example //Example: find highest priority job and mark job = db.jobs.findAndModify({
 query: {inprogress: false}, sort: {priority: -1), update: {$set: {inprogress: true, started: new Date()}}, new: true})

Cool Stuff -

Aggregation Capped collections GridFS Geo

Learn More •  Kyle’s presentation + video: http://www.slideshare.net/kbanker/mongodb-schema-design http://www.blip.tv/file/3704083

•  Dwight’s presentation http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwightmerriman

•  Documentation Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command Aggregration: http://www.mongodb.org/display/DOCS/Aggregation Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification

Thank You :-)

Download MongoDB and  let  us  know  what  you  think   @mongodb   http://www.mongodb.org  

DBRef DBRef {$ref: collection, $id: id_value} - Think URL - YDSMV: your driver support may vary

Sample Schema: nr = {note_refs: [{"$ref" : "notes", "$id" : 5}, ... ]} Dereferencing: nr.forEach(function(r) { printjson(db[r.$ref].findOne({_id: r.$id})); }

BSON Mongodb stores data in BSON internally Lightweight, Traversable, Efficient encoding Typed boolean, integer, float, date, string, binary, array...

Related Documents

Mongodb
February 2021 3
Mongodb
February 2021 2
Mongodb Aggregate Exercise2
February 2021 3

More Documents from "Dheeraj D Suvarna"