今回は MonetDB のパフォーマンス検証をしてみたいと思う。
- 検証用DB作成
- 検証データ投入
- クエリによるパフォーマンス検証
実施環境
- CloudCore
- OS: Ubuntu 12.04
- CPU: AMD Phenom(tm) 9550 Quad-Core Processor 2.2GHz
- Momory: 2G
1. 検証用DB作成
$ monetdbd create /usr/local/var/db/monetdb/performance $ monetdbd start /usr/local/var/db/monetdb/performance $ monetdb create testdb created database in maintenance mode: testdb $ monetdb start testdb starting database 'testdb'... doneホームディレクトリに下記内容で .monetdb ファイルを作成するとログインパスワードが不要となる。
user=monetdb password=monetdb language=sql接続してテスト用スキーマを作成
$ mclient -d testdb Welcome to mclient, the MonetDB/SQL interactive terminal (Apr2012-SP2) Database: MonetDB v11.9.7 (Apr2012-SP2), 'mapi:monetdb://shimajiro:50000/testdb' Type \q to quit, \? for a list of available commands auto commit mode: on sql>create schema "test" authorization "monetdb"; operation successful (5.994ms) sql>alter user "monetdb" set schema "test"; operation successful (2.406ms) sql>\q $ monetdb release testdb
2. 検証データ投入
テストデータとして下記3ファイルを用意したのでこれをインポートする。
- users: 100万件
- payments: 5000万件
- products: 26件
まずはテーブルの作成から。
$ mclient -d testdb Welcome to mclient, the MonetDB/SQL interactive terminal (Apr2012-SP2) Database: MonetDB v11.9.7 (Apr2012-SP2), 'mapi:monetdb://shimajiro:50000/testdb' Type \q to quit, \? for a list of available commands auto commit mode: on sql> # users sql>create table users(id int, prop1 int, prop2 int, prop3 int, prop4 varchar(8), registered_date date, primary key(id)); operation successful (6.758ms) # payments sql>create table payments(id int, user_id int, product_id int, quantity int, sale int, buy_at timestamp, primary key(id)); operation successful (5.051ms) # products sql>create table products(id int, name varchar(8), price int, primary key(id)); operation successful (4.038ms) # fact sql>create table fact(user_id int, prop1 int, prop2 int, prop3 int, prop4 varchar(8), registered_date date, quantity int, sales int, first_buy_at timestamp, last_buy_at timestamp, primary key(user_id)); operation successful (18.922ms) sql>\d TABLE test.fact TABLE test.payments TABLE test.products TABLE test.usersつづいてデータを投入。
# users sql>copy 1000000 offset 2 records into test.users from '/usr/local/var/db/data/users.csv' using delimiters ','; 1000000 affected rows (2.5s) # payments sql>copy 50000000 offset 2 records into test.payments from '/usr/local/var/db/data/payments.csv' using delimiters ','; 50000000 affected rows (1m 37s) # products sql>copy 26 offset 2 records into test.products from '/usr/local/var/db/data/products.csv' using delimiters ','; 26 affected rows (502.657ms)1秒で50万件以上(= 5000万件 / 97s)インポートできてる。相当早い印象。
3. クエリによるパフォーマンス検証
# all count: users sql>select count(id) from users; +---------+ | L1 | +=========+ | 1000000 | +---------+ 1 tuple (2.497ms) # all count: payments sql>select count(id) from payments; +----------+ | L1 | +==========+ | 50000000 | +----------+ 1 tuple (4.591ms) # all count: products sql>select count(id) from products; +------+ | L1 | +======+ | 26 | +------+ 1 tuple (2.658ms) # 100 records: users sql>select * from users order by id limit 100; +------+-------+-------+-------+--------+-----------------+ | id | prop1 | prop2 | prop3 | prop4 | registered_date | +======+=======+=======+=======+========+=================+ | 1 | 41 | 857 | 7976 | prop_W | 2012-11-09 | … | 100 | 75 | 580 | 4855 | prop_X | 2011-05-14 | +------+-------+-------+-------+--------+-----------------+ 100 tuples (30.210ms) # 100 records: payments sql>select * from payments order by id limit 100; +------+---------+------------+----------+--------+----------------------------+ | id | user_id | product_id | quantity | sale | buy_at | +======+=========+============+==========+========+============================+ | 1 | 817590 | 6 | 12 | 87960 | 2011-05-03 21:12:16.000000 | … | 100 | 468994 | 14 | 39 | 300300 | 2012-08-29 07:10:37.000000 | +------+---------+------------+----------+--------+----------------------------+ 100 tuples (235.568ms) # 100 records: products sql>select * from products order by id limit 100; +------+--------+-------+ | id | name | price | +======+========+=======+ | 1 | item_A | 3800 | … | 26 | item_Z | 8240 | +------+--------+-------+ 26 tuples (4.560ms) # a user: users sql>select * from users where id = 500000; +--------+-------+-------+-------+--------+-----------------+ | id | prop1 | prop2 | prop3 | prop4 | registered_date | +========+=======+=======+=======+========+=================+ | 500000 | 23 | 371 | 5839 | prop_N | 2012-08-22 | +--------+-------+-------+-------+--------+-----------------+ 1 tuple (7.454ms) # a user: payments sql>select * from payments where user_id = 500000; +----------+---------+------------+----------+--------+----------------------------+ | id | user_id | product_id | quantity | sale | buy_at | +==========+=========+============+==========+========+============================+ | 1316372 | 500000 | 5 | 48 | 415680 | 2012-11-07 03:52:05.000000 | … | 49065588 | 500000 | 6 | 32 | 234560 | 2011-01-23 19:02:19.000000 | +----------+---------+------------+----------+--------+----------------------------+ 40 tuples (175.989ms) # group by: users sql>select more> registered_date, more> count(id) as uu more>from users more>group by registered_date more>order by registered_date; +-----------------+------+ | registered_date | uu | +=================+======+ | 2010-01-01 | 974 | … | 2012-12-31 | 940 | +-----------------+------+ 1096 tuples (144.811ms) # group by: payments sql>select more> product_id, more> count(id) as cnt, more> sum(sale) as sale more>from payments more>group by product_id more>order by product_id; +------------+---------+--------------+ | product_id | cnt | sale | +============+=========+==============+ | 1 | 1922437 | 92240910320 | … | 26 | 1924741 | 189485558780 | +------------+---------+--------------+ 26 tuples (2.5s) # data join sql>select more> usr.id as user_id, more> pdt.name as product_name, more> pmt.quantity, more> pmt.sale, more> pmt.buy_at more>from users as usr more> left outer join payments as pmt more> on more> pmt.user_id = usr.id more> left outer join products as pdt more> on more> pdt.id = pmt.product_id more>limit 100; +---------+--------------+----------+--------+----------------------------+ | user_id | product_name | quantity | sale | buy_at | +=========+==============+==========+========+============================+ | 817590 | item_F | 12 | 87960 | 2011-05-03 21:12:16.000000 | … | 468994 | item_N | 39 | 300300 | 2012-08-29 07:10:37.000000 | +---------+--------------+----------+--------+----------------------------+ 100 tuples (1m 22s) # data join & sort sql>select more> usr.id as user_id, more> pdt.name as product_name, more> pmt.quantity, more> pmt.sale, more> pmt.buy_at more>from users as usr more> left outer join payments as pmt more> on more> pmt.user_id = usr.id more> left outer join products as pdt more> on more> pdt.id = pmt.product_id more>order by usr.id more>limit 100; +---------+--------------+----------+--------+----------------------------+ | user_id | product_name | quantity | sale | buy_at | +=========+==============+==========+========+============================+ | 1 | item_Z | 3 | 11580 | 2011-06-09 21:34:58.000000 | … | 2 | item_D | 49 | 229320 | 2012-07-07 02:08:13.000000 | +---------+--------------+----------+--------+----------------------------+ 100 tuples (1m 17s) # join count sql>select count(usr.id) more>from users as usr more> left outer join payments as pmt more> on more> pmt.user_id = usr.id more> left outer join products as pdt more> on more> pdt.id = pmt.product_id; +----------+ | L1 | +==========+ | 50000000 | +----------+ 1 tuple (54.1s) # delete a user: users sql>delete from users where id = 500000; 1 affected row (4.435ms) # delete a user: payments sql>delete from payments where user_id = 500000; 40 affected rows (427.879ms) # delete whole data sql>delete from users; 999999 affected rows (2.932ms) sql>delete from payments; 49999960 affected rows (3.132ms) sql>delete from products; 26 affected rows (2.715ms)その他、スクリプトを書いて下記SQLを実施。
insert into test.fact ( user_id, prop1, prop2, prop3, prop4, registered_date, quantity, sales, first_buy_at, last_buy_at ) select usr.id, min(usr.prop1), min(usr.prop2), min(usr.prop3), min(usr.prop4), min(usr.registered_date), sum(pmt.quantity), sum(pmt.sale), min(pmt.buy_at), max(pmt.buy_at) from test.users as usr left outer join test.payments as pmt on pmt.user_id = usr.id group by usr.id; # ⇒ 2m20s(=200s)主観としてはかなり速く感じられたが他DB、特に InnoDB と比較する事で直感的に速さを体感できるのかもしれない。 引き続き他DBの検証も実施予定。
0 件のコメント:
コメントを投稿