2012年8月5日日曜日

カラム指向DB比較 - MonetDB パフォーマンス検証の巻。

の続き。

今回は MonetDB のパフォーマンス検証をしてみたいと思う。
  1. 検証用DB作成
  2. 検証データ投入
  3. クエリによるパフォーマンス検証
という流れで実施。

実施環境
  • CloudCore 
  • OS: Ubuntu 12.04
  • CPU: AMD Phenom(tm) 9550 Quad-Core Processor 2.2GHz
  • Momory: 2G


1. 検証用DB作成
$ monetdbd create /usr/local/var/db/monetdb/performance
$ monetdbd start /usr/local/var/db/monetdb/performance
$ monetdb create testdb
created database in maintenance mode: testdb
$ monetdb start testdb
starting database 'testdb'... done
ホームディレクトリに下記内容で .monetdb ファイルを作成するとログインパスワードが不要となる。
user=monetdb
password=monetdb
language=sql
接続してテスト用スキーマを作成
$ mclient -d testdb
Welcome to mclient, the MonetDB/SQL interactive terminal (Apr2012-SP2)
Database: MonetDB v11.9.7 (Apr2012-SP2), 'mapi:monetdb://shimajiro:50000/testdb'
Type \q to quit, \? for a list of available commands
auto commit mode: on

sql>create schema "test" authorization "monetdb";
operation successful (5.994ms)
sql>alter user "monetdb" set schema "test";
operation successful (2.406ms)
sql>\q

$ monetdb release testdb

2. 検証データ投入
テストデータとして下記3ファイルを用意したのでこれをインポートする。
  • users: 100万件
  • payments: 5000万件
  • products: 26件

まずはテーブルの作成から。
$ mclient -d testdb
Welcome to mclient, the MonetDB/SQL interactive terminal (Apr2012-SP2)
Database: MonetDB v11.9.7 (Apr2012-SP2), 'mapi:monetdb://shimajiro:50000/testdb'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>

# users
sql>create table users(id int, prop1 int, prop2 int, prop3 int, prop4 varchar(8), registered_date date, primary key(id));
operation successful (6.758ms)

# payments
sql>create table payments(id int, user_id int, product_id int, quantity int, sale int, buy_at timestamp, primary key(id));
operation successful (5.051ms)

# products
sql>create table products(id int, name varchar(8), price int, primary key(id));
operation successful (4.038ms)

# fact
sql>create table fact(user_id int, prop1 int, prop2 int, prop3 int, prop4 varchar(8), registered_date date, quantity int, sales int, first_buy_at timestamp, last_buy_at timestamp, primary key(user_id));
operation successful (18.922ms)

sql>\d
TABLE  test.fact
TABLE  test.payments
TABLE  test.products
TABLE  test.users
つづいてデータを投入。
# users
sql>copy 1000000 offset 2 records into test.users from '/usr/local/var/db/data/users.csv' using delimiters ',';
1000000 affected rows (2.5s)

# payments
sql>copy 50000000 offset 2 records into test.payments from '/usr/local/var/db/data/payments.csv' using delimiters ',';
50000000 affected rows (1m 37s)

# products
sql>copy 26 offset 2 records into test.products from '/usr/local/var/db/data/products.csv' using delimiters ',';
26 affected rows (502.657ms)
1秒で50万件以上(= 5000万件 / 97s)インポートできてる。相当早い印象。

3. クエリによるパフォーマンス検証
# all count: users
sql>select count(id) from users;
+---------+
| L1      |
+=========+
| 1000000 |
+---------+
1 tuple (2.497ms)

# all count: payments
sql>select count(id) from payments;
+----------+
| L1       |
+==========+
| 50000000 |
+----------+
1 tuple (4.591ms)

# all count: products
sql>select count(id) from products;
+------+
| L1   |
+======+
|   26 |
+------+
1 tuple (2.658ms)

# 100 records: users
sql>select * from users order by id limit 100;
+------+-------+-------+-------+--------+-----------------+
| id   | prop1 | prop2 | prop3 | prop4  | registered_date |
+======+=======+=======+=======+========+=================+
|    1 |    41 |   857 |  7976 | prop_W | 2012-11-09      |
…
|  100 |    75 |   580 |  4855 | prop_X | 2011-05-14      |
+------+-------+-------+-------+--------+-----------------+
100 tuples (30.210ms)

# 100 records: payments
sql>select * from payments order by id limit 100;
+------+---------+------------+----------+--------+----------------------------+
| id   | user_id | product_id | quantity | sale   | buy_at                     |
+======+=========+============+==========+========+============================+
|    1 |  817590 |          6 |       12 |  87960 | 2011-05-03 21:12:16.000000 |
…
|  100 |  468994 |         14 |       39 | 300300 | 2012-08-29 07:10:37.000000 |
+------+---------+------------+----------+--------+----------------------------+
100 tuples (235.568ms)

# 100 records: products
sql>select * from products order by id limit 100;
+------+--------+-------+
| id   | name   | price |
+======+========+=======+
|    1 | item_A |  3800 |
…
|   26 | item_Z |  8240 |
+------+--------+-------+
26 tuples (4.560ms)

# a user: users
sql>select * from users where id = 500000;
+--------+-------+-------+-------+--------+-----------------+
| id     | prop1 | prop2 | prop3 | prop4  | registered_date |
+========+=======+=======+=======+========+=================+
| 500000 |    23 |   371 |  5839 | prop_N | 2012-08-22      |
+--------+-------+-------+-------+--------+-----------------+
1 tuple (7.454ms)

# a user: payments
sql>select * from payments where user_id = 500000;
+----------+---------+------------+----------+--------+----------------------------+
| id       | user_id | product_id | quantity | sale   | buy_at                     |
+==========+=========+============+==========+========+============================+
|  1316372 |  500000 |          5 |       48 | 415680 | 2012-11-07 03:52:05.000000 |
…
| 49065588 |  500000 |          6 |       32 | 234560 | 2011-01-23 19:02:19.000000 |
+----------+---------+------------+----------+--------+----------------------------+
40 tuples (175.989ms)

# group by: users
sql>select
more>  registered_date,
more>  count(id) as uu
more>from users
more>group by registered_date
more>order by registered_date;
+-----------------+------+
| registered_date | uu   |
+=================+======+
| 2010-01-01      |  974 |
…
| 2012-12-31      |  940 |
+-----------------+------+
1096 tuples (144.811ms)

# group by: payments
sql>select
more>  product_id,
more>  count(id) as cnt,
more>  sum(sale) as sale
more>from payments
more>group by product_id
more>order by product_id;
+------------+---------+--------------+
| product_id | cnt     | sale         |
+============+=========+==============+
|          1 | 1922437 |  92240910320 |
…
|         26 | 1924741 | 189485558780 |
+------------+---------+--------------+
26 tuples (2.5s)

# data join
sql>select
more>  usr.id as user_id,
more>  pdt.name as product_name,
more>  pmt.quantity,
more>  pmt.sale,
more>  pmt.buy_at
more>from users as usr
more>  left outer join payments as pmt
more>  on
more>    pmt.user_id = usr.id
more>  left outer join products as pdt
more>  on
more>    pdt.id = pmt.product_id
more>limit 100;
+---------+--------------+----------+--------+----------------------------+
| user_id | product_name | quantity | sale   | buy_at                     |
+=========+==============+==========+========+============================+
|  817590 | item_F       |       12 |  87960 | 2011-05-03 21:12:16.000000 |
…
|  468994 | item_N       |       39 | 300300 | 2012-08-29 07:10:37.000000 |
+---------+--------------+----------+--------+----------------------------+
100 tuples (1m 22s)

# data join & sort
sql>select
more>  usr.id as user_id,
more>  pdt.name as product_name,
more>  pmt.quantity,
more>  pmt.sale,
more>  pmt.buy_at
more>from users as usr
more>  left outer join payments as pmt
more>  on
more>    pmt.user_id = usr.id
more>  left outer join products as pdt
more>  on
more>    pdt.id = pmt.product_id
more>order by usr.id
more>limit 100;
+---------+--------------+----------+--------+----------------------------+
| user_id | product_name | quantity | sale   | buy_at                     |
+=========+==============+==========+========+============================+
|       1 | item_Z       |        3 |  11580 | 2011-06-09 21:34:58.000000 |
…
|       2 | item_D       |       49 | 229320 | 2012-07-07 02:08:13.000000 |
+---------+--------------+----------+--------+----------------------------+
100 tuples (1m 17s)

# join count
sql>select count(usr.id)
more>from users as usr
more>  left outer join payments as pmt
more>  on
more>    pmt.user_id = usr.id
more>  left outer join products as pdt
more>  on
more>    pdt.id = pmt.product_id;
+----------+
| L1       |
+==========+
| 50000000 |
+----------+
1 tuple (54.1s)

# delete a user: users
sql>delete from users where id = 500000;
1 affected row (4.435ms)

# delete a user: payments
sql>delete from payments where user_id = 500000;
40 affected rows (427.879ms)

# delete whole data
sql>delete from users;
999999 affected rows (2.932ms)
sql>delete from payments;
49999960 affected rows (3.132ms)
sql>delete from products;
26 affected rows (2.715ms)
その他、スクリプトを書いて下記SQLを実施。
insert into test.fact
(
  user_id,
  prop1,
  prop2,
  prop3,
  prop4,
  registered_date,
  quantity,
  sales,
  first_buy_at,
  last_buy_at
)
select
  usr.id,
  min(usr.prop1),
  min(usr.prop2),
  min(usr.prop3),
  min(usr.prop4),
  min(usr.registered_date),
  sum(pmt.quantity),
  sum(pmt.sale),
  min(pmt.buy_at),
  max(pmt.buy_at)
from test.users as usr
  left outer join test.payments as pmt
  on
    pmt.user_id = usr.id
group by usr.id;
# ⇒ 2m20s(=200s)
主観としてはかなり速く感じられたが他DB、特に InnoDB と比較する事で直感的に速さを体感できるのかもしれない。 引き続き他DBの検証も実施予定。

0 件のコメント:

コメントを投稿