首页 > Berkeley DB, David Zhao > Berkeley DB Performance Test

Berkeley DB Performance Test

2009年7月10日 davidzhao

各位读者,很抱歉这篇文章是英文的,我当初做笔记的时候,写成英文了,这样才可以在同事之间交流。而现在确实没时间翻译过来了,还望大家理解,谢谢!

In this article I’d like to talk about the caveats and how-to’s when doing performance test with Berkeley DB, when the data volume is huge. For legal reasons I can not publish the result of my test without further approval, so I decided not to do so.

I. Context

I need to insert 10 billion key/data pairs to a btree database, each key item is 768 bytes, with no duplicate keys, and keys are inserted increasing only; each data item varies between KB/2 to 1KB. Thus each key/data pair varies between 1.25KB to 2KB.


After insertion, I search for some keys, some keys are in the db, others not, to find out the average time to insert a key/data pair, and to find a key/data pair. I use a very powerful Linux 64bit machine, which has 4 processers, each processor is a 3.2GB intel Xeon, 8GB memory and 8TB of storage.

II. Things to Note

The specially huge data volumn means a lot in this test, there are many things to note:

1. Integer overflow.
Loop variables will overflow if we insert so many key/data pairs in a loop, so split the job into pieces, making sure each piece of job won’t overflow an 32bit integer. If you used signed loop variables, they can go negative before reaching the limit value, and thus falls into an endless loop. In such a use case, we should be very careful about all integer variables being overflown.

Another example: If using 32bit integer as index keys, they are also overflown, causing not as many key/data pairs inserted as assumed, because later key/data pairs with keys already present in db will overwrite previous ones.

So we must use DB_SEQUENCE to generate 64bit sequential keys.

2. Random integer generator (rand() C function) loses randomness.
The RAND_MAX is only 64K, and when over 32K random integers are generated, the randomness fades, and I observed some none-randomness in my test code.

3. Huge key/data pair.

Context:
Each key/data pair can be as large as 2KB, and there can be millions of pages.

Problem:
A lot of overflow pages can be generated if using default page size.
Solution:
Set page size to 64k, so that each internal node can hold most number of keys, thus, the internal space waste, the number of times to read more internal nodes during search and the number of overflown pages is minimized. Concurrency is not harmed since each key/data pair is so big.

Problem:
Stack space insufficient.
Solution:
Do not allocate huge buffer (more than several MB) on stack, allocate it on heap, otherwise the stack may not be big enough.

4. Berkeley DB configuration

a. Do not use logs, otherwise, we will have to store another copy of the dataset, which is too much in this case as we are inserting 10 billion records each can be 3KB/4 to 2KB size.

b. Need a huge cache, otherwise the internal nodes won’t fit into the cache, btree search would be too slow.

c. When cache size is set to 6G on seneca, if using DB_PRIVATE, it is very slow, cpu usage is always less than 10%, because virtual memory (backed by disk) is used; when turned to use file mapping, cpu usage can be over 70%(later observation shows that cpu usage falls back to less than 10% after several millions of key/data pair are inserted, only when key/data pairs are inserted in order, did the insertion became fast. This is practical though, since we can always find sequentially increasing keys to use, and we can rely on secondary keys to use the various real keys.), the program is much faster.

d. May need to start multiple processes and insert simultaneously. And may need to split the cache into multiple files on some platforms which limit the maximum file size.

e. Use partitioned database, so that each database file is not so huge. If using multiple processes at the same time, each process inserting in one db file, concurrency can be promoted a lot. Though here I only used one process.

5. When changed to use ordered (increasing) keys, the insertion is much faster, as expected, because search becomes a very cheap operation in this case. In such a huge dataset, always use predictable sequentially increasing keys, like sequence numbers(1, 2, 3, …). For example we want to insert many Person objects, each of which has an ‘ID’ field, though we can not get Person objects ordered by ‘ID’, then we should use a sequence as the key for primary db containing Person objects, and create secondary databases each of which uses one of the properties of Person. The ‘ID’ secondary db is much smaller because the ‘data’ item of a key/data pair in it is only a sequence number, thus much easier and faster to find the Person object by its ID, and can insert very fast at the same time.

6. When the search phase begins, the pages are gradually loaded into cache, but since the keys were randomly chosen, the search is not fast. A lot of internal btree pages are being swapped in/out of the cache.

  1. Steve
    2009年7月10日22:05 | #1

    似乎你的文章还未完?

  2. jin
    2009年7月20日15:20 | #2

    最近在做关于BDB的读写性能测试,发现在读写多个文件(大于60)时,BDB的速度很慢,请问想要提高BDB读多个文件时的性能,有比较好的解决方案吗。如果有相关建议和想法,请mail给我,yexi-2004@tom.com,或QQ联系我312021046,谢谢。

  3. davidzhao
    2009年7月24日09:52 | #3

    @jin
    请你把问题描述清楚。我要看到你是怎么使用BDB的,怎么配置的,等等详细信息,才能给你建议。

  4. robinboy
    2009年8月3日14:55 | #4

    当数据量一大的时候,打开一个数据库很慢!

本文的评论功能被关闭了.
Դ