Mongodb内嵌文档插入性能评测
Mongodb作为典型的文档数据库,支持内嵌文档和复杂的查询,这给数据库建模带来了更多的灵活性。在一个博客应用中,有博客(Blog)和评论(Comment),每篇博客可以有多条评论。在关系数据库建模中,通常博客和评论分别对应一张表,评论表有到博客表的外键。在MongoDB中,也可以像关系型数据库那样,将博客和评论分别放到不同的集合中,另外也可以选择将评论嵌入到博客文档中。对于后者,一个博客的数据结构可能像这样:
> db.blog.findOne(){_id: 1,title: "No Free Lunch",author: "Alex",comments: [{ who: "John", comment: "I agree" },{ who: "Felix", comment: "You must be joking..." },]}from pymongo import Connectionimport sys, time# 目的:测试Mongo中内嵌文档的插入速度conn = Connection()db = conn.benchconn.drop_database('bench')def insert_embbded_comments(n, comment_len, count=1, safe=False): comment_text = 'a'*comment_len start = time.time() for c in xrange(count): blog = {'_id': c, 'title': 'Mongodb Benchmark'} db.blog.insert(blog) for i in xrange(n): db.blog.update({'_id': c}, {'$push': { 'comments': {'comment': comment_text}}}, safe=safe) end = time.time() return end - startdef insert_comments(n, comment_len, count=1, safe=False): comment_text = 'a'*comment_len start = time.time() for c in xrange(count): for i in xrange(n): db.blog.comments.insert({'comment': comment_text}, safe=safe) end = time.time() return end - startdef bench(safe=False): total = 10000 print '===== %sINSERT %s comments =====' % ('SAFE ' if safe else '', total) print '%12s %15s %15s %15s %15s %15s' % ('', '1(x10000)', '10(x1000)', '100(x100)', '1000(x10)', '10000(x1)') sys.stdout.write('%12s ' % 'Embeded') sys.stdout.flush() row_types = (1, 10, 100, 1000, 10000) for nrows in row_types: conn.drop_database('bench') count = total / nrows time = insert_embbded_comments(nrows, 1000, count=count, safe=safe) sys.stdout.write('%15s%s' % (time, '\n' if nrows==row_types[-1] else ' ')) sys.stdout.flush() sys.stdout.write('%12s ' % 'Non-embeded') for nrows in row_types: count = total / nrows conn.drop_database('bench') time = insert_comments(nrows, 1000, count=count, safe=safe) sys.stdout.write('%15s%s' % (time, '\n' if nrows==row_types[-1] else ' ')) sys.stdout.flush()bench()bench(safe=True)===== INSERT 10000 comments ===== 1(x10000) 10(x1000) 100(x100) 1000(x10) 10000(x1) Embeded 2.31141519547 1.42457890511 1.34223604202 4.3767850399 35.7308151722 Non-embeded 1.29936504364 1.30167293549 1.30044412613 1.29023313522 1.29240202904===== SAFE INSERT 10000 comments ===== 1(x10000) 10(x1000) 100(x100) 1000(x10) 10000(x1) Embeded 5.45804405212 4.29802298546 4.95570802689 13.7657668591 107.089906216 Non-embeded 3.68912506104 3.65784692764 3.77990913391 3.66531991959 3.70736408234
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 11 0 0 12 0 128m 242m 59m 0 86.9 0 0|0 0|1 12k 2k 2 20:36:23 0 0 10 0 0 11 0 128m 242m 56m 0 110 0 0|0 0|1 11k 2k 2 20:36:24 0 0 7 0 0 8 0 128m 242m 59m 0 80.9 0 0|0 0|1 8k 1k 2 20:36:25 0 0 7 0 0 8 0 128m 242m 59m 0 111 0 0|0 0|1 8k 1k 2 20:36:26 0 0 32 0 0 33 0 128m 242m 56m 0 104 0 0|0 0|1 37k 4k 2 20:36:27 0 0 54 0 0 55 1 128m 242m 56m 0 96.8 0 0|0 0|1 62k 6k 2 20:36:28 0 0 54 0 0 55 0 128m 243m 52m 0 97.3 0 0|0 0|1 62k 6k 2 20:36:29 0 0 53 0 0 54 0 128m 243m 60m 0 95.9 0 0|0 0|1 61k 6k 2 20:36:30 0 0 53 0 0 54 0 128m 243m 60m 0 96.9 0 0|0 0|1 61k 6k 2 20:36:31 0 0 53 0 0 54 0 128m 243m 60m 0 97.2 0 0|0 0|1 61k 6k 2 20:36:32
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time 2582 0 0 0 0 2584 0 32m 136m 22m 5 10.2 0 0|0 0|0 2m 215k 2 20:36:53 2746 0 0 0 0 2747 0 32m 136m 25m 1 7.5 0 0|0 0|0 3m 229k 2 20:36:54 2728 0 0 0 0 2729 0 32m 136m 28m 4 7.6 0 0|0 0|0 3m 227k 2 20:36:55 2713 0 0 0 0 2714 0 32m 136m 30m 2 7.5 0 0|0 0|0 3m 226k 2 20:36:56 2618 0 0 0 0 2620 0 32m 136m 23m 4 10.2 0 0|0 0|0 2m 218k 2 20:36:57 2756 0 0 0 0 2757 0 32m 136m 26m 2 7.6 0 0|0 0|0 3m 229k 2 20:36:58 2711 0 0 0 0 2712 0 32m 136m 28m 4 7.4 0 0|0 0|0 3m 226k 2 20:36:59 2417 0 0 0 0 2418 0 32m 136m 31m 1 6.6 0 0|0 0|0 2m 201k 1 20:37:00
2 楼 yang_44 2012-03-11 请教一下,在一些频繁更改状态的数据中,比如一个任务有执行状态,如果使用内嵌文档的方式记录状态值,每个任务也只保留一条状态文档,代表当前状态,是不是比直接update更新文档的键值快呢?