构建百万千万级表的多种方法及其性能对比分析前两个实验是前一段自己做的。后面的实验是一本书上的。实验环
构建百万千万级表的多种方法及其性能对比分析
前两个实验是前一段自己做的。后面的实验是一本书上的。
实验环境:虚拟机,LINUX+ORACLE 11G
说明:每个实验完成后,需要及时删除表,以便进行下一个实验。 我这里节约篇幅,省略了。
删除重新开始下一个实验
drop table test1 purge;
drop table test2 purge;
truncate table test3;
drop table test3 purge;
清除缓冲区和共享池
alter system flush buffer_cache;
alter system flush shared_pool; --这里只清共享池就可以。
关于硬解析与软解析-来自网络:
一次硬解析,多次软解析,Session_cache_cursor设置为0,最容易发生,软解析的代价是,每次要在library cache中定位游标。
一次硬解析,多次软解析,Session_cache_cursor设置为非空值,PGA中保留了指向library cache的指针,直接定位子游标。
一次解析,多次执行,保持游标打开(pin住内存堆),没有定位自由表的过程,容易造成打开游标数过多,要记得代码的最后关闭游标。
select t.sql_text, t.sql_id,t.PARSE_CALLS, t.EXECUTIONS from v$sql t where sql_text like '%insert into test1 %' and rownum <10;
方法一:使用布尔积,速度很快。BYS@ bys001>create table test1 as select rownum a from dual connect by rownum<1001;
BYS@ bys001>alter system flush shared_pool;
BYS@ bys001>create table test3 as select a.* from test1 a,test1 b;
Table created.
Elapsed: 00:00:01.63
BYS@ bys001>select count(*) from test3;
COUNT(*)
----------
1000000
##########################################################################
方法二:利用自查询插入,速度比较慢。BYS@ bys001>create table test1 as select rownum a from dual connect by rownum<10001;
Table created.
Elapsed: 00:00:00.25
begin
for i in 1 .. 10 loop
insert into test1 select * from test1;
end loop;
commit;
end;
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:43.62
BYS@ bys001>select count(*) from test1;
COUNT(*)
----------
1024000
Elapsed: 00:00:00.02
插入速度是每秒两万多条
BYS@ bys001>select 1024000/43 from dual;
1024000/43
----------
23813.9535
19:34:16 SQL> select t.sql_text, t.sql_id,t.PARSE_CALLS, t.EXECUTIONS
2 from v$sql t
3 where sql_text like '%INSERT INTO TEST1%' and rownum <10;
SQL_TEXT SQL_ID PARSE_CALLS EXECUTIONS
------------------------------ ------------- ----------- ----------
INSERT INTO TEST1 SELECT * FRO cwjfx8x2zfvq1 1 10
批量提交节约了每次提交时 commit耗费的时间。
但是在大DML事务时,要注意延迟块清除可能引起的ORA-01555错误。在此不多说这个问题。
########################################################
方法三:创建存储过程,未使用绑定变量。速度很慢这里只插入100万条数据来测试,因为插入1000万条数据需要时间太长(我虚拟机是一个小时左右)。
create or replace procedure proc_test1
as
begin
for i in 1 .. 1000000
loop
execute immediate
'insert into test1 values ( '||i||')';
commit;
end loop;
end;
/
Procedure created.
BYS@ bys001>create table test1(a number);
BYS@ bys001>alter system flush shared_pool;
BYS@ bys001>exec proc_test1;
PL/SQL procedure successfully completed.
Elapsed: 00:13:03.43
BYS@ bys001>select count(*) from test1;
COUNT(*)
----------
1000000
每秒插入一千多条数据,数据条数除所用时间
BYS@ bys001>select 1000000/13/60 from dual;
1000000/13/60
-------------
1282.05128
查询共享池中有缓存,可以看到每个语句都是只解析一次,执行一次。
整个存储过程解析了100万次,所以耗时很长。
SQL> select t.sql_text, t.sql_id,t.PARSE_CALLS, t.EXECUTIONS
2 from v$sql t
3 where sql_text like '%insert into test1 %' and rownum <10;
SQL_TEXT SQL_ID PARSE_CALLS EXECUTIONS
------------------------------------------------ ------------- ----------- ----------
insert into test1 values ( 991386) 4d4ywgu81n009 1 1
insert into test1 values ( 997580) 0mg9u6mx6s00j 1 1
insert into test1 values ( 997063) gfkpbx0av000w 1 1
insert into test1 values ( 992660) 3pruwwdb00026 1 1
insert into test1 values ( 997621) 27acn7hja802u 1 1
###########################################################################
方法四:使用绑定变量,一次解析,多次执行。create or replace procedure proc_test1
as
begin
for i in 1 .. 1000000
loop
execute immediate
'insert into test1 values (:aaa)' using i;
commit;
end loop;
10 end;
11 /
Procedure created.
Elapsed: 00:00:02.02
BYS@ bys001>create table test1(a number);
BYS@ bys001>alter system flush shared_pool;
BYS@ bys001>exec proc_test1;
PL/SQL procedure successfully completed.
Elapsed: 00:04:20.11
查询共享池中有缓存,可以看到每个语句都是只解析一次,执行一百万次。
SQL> select t.sql_text, t.sql_id,t.PARSE_CALLS, t.EXECUTIONS
2 from v$sql t
3 where sql_text like '%insert into test1 %' and rownum <10;
SQL_TEXT SQL_ID PARSE_CALLS EXECUTIONS
------------------------------ ------------- ----------- ----------
insert into test1 values (:aaa 7mj20sj5apmsf) 0 1000000
每秒可以插入将近4千条数据,速度是未使用绑定变量时的3倍多点。
BYS@ bys001>select 1000000/260 from dual;
1000000/260
-----------
3846.15385
###########################################################
方法五:使用静态SQL,速度和上一个实验动态SQL相比略有提升。create or replace procedure proc_test1
as
begin
for i in 1 .. 1000000
loop
insert into test1 values (i) ;
commit;
end loop;
end;
/
Warning: Procedure created with compilation errors.
Elapsed: 00:00:00.84
BYS@ bys001>show error
Errors for PROCEDURE PROC_TEST1:
LINE/COL ERROR
-------- -----------------------------------------------------------------
6/1 PL/SQL: SQL Statement ignored
6/13 PL/SQL: ORA-00942: table or view does not exist
BYS@ bys001>create table test1(a number);
BYS@ bys001>alter system flush shared_pool;
BYS@ bys001>exec proc_test1;
PL/SQL procedure successfully completed.
Elapsed: 00:04:13.53
这次没有使用execute immediate
execute immediate是动态SQL写法,常用于表名字段名是变量等情况。
动态SQL特点是执行过程中解析,静态SQL是编译过程就解析,
所以速度又有一点提升。
同样也用到了绑定变量,一次解析,多次执行。
19:04:33 SQL> select t.sql_text, t.sql_id,t.PARSE_CALLS, t.EXECUTIONS
2 from v$sql t
3 where sql_text like '%INSERT INTO TEST1%' and rownum <10;
SQL_TEXT SQL_ID PARSE_CALLS EXECUTIONS
------------------------------ ------------- ----------- ----------
INSERT INTO TEST1 VALUES (:B1 b9vjmmkg8ffhg 0 1000000)
BYS@ bys001>select 1000000/253 from dual;
1000000/253
-----------
3952.56917
##############################################################
方法六:使用批量插入后再使用一个commit的方法, 极大提高了插入数据的性能。因为前面的实验时每次插入都要做一个commit,整个存储过程就需要一百万次commit,
即使每次commit所需时间很短,提交一百万次时间也比较可观了。
create or replace procedure proc_test1
as
begin
for i in 1 .. 1000000
loop
insert into test1 values (i) ;
end loop;
commit;
end;
/
Warning: Procedure created with compilation errors.
Elapsed: 00:00:00.14
这是因为先创建过程还没创建表,不用管就可以。
BYS@ bys001>show error
Errors for PROCEDURE PROC_TEST1:
LINE/COL ERROR
-------- -----------------------------------------------------------------
6/1 PL/SQL: SQL Statement ignored
6/13 PL/SQL: ORA-00942: table or view does not exist
BYS@ bys001>create table test1(a number);
BYS@ bys001>alter system flush shared_pool;
BYS@ bys001>exec proc_test1;
PL/SQL procedure successfully completed.
Elapsed: 00:01:21.39
19:12:38 SQL> select t.sql_text, t.sql_id,t.PARSE_CALLS, t.EXECUTIONS
2 from v$sql t
3 where sql_text like '%INSERT INTO TEST1%' and rownum <10;
SQL_TEXT SQL_ID PARSE_CALLS EXECUTIONS
------------------------------ ------------- ----------- ----------
INSERT INTO TEST1 VALUES (:B1 b9vjmmkg8ffhg 0 1000000
BYS@ bys001>select 1000000/81 from dual;
1000000/81
----------
12345.679
##########################################
方法七:这种方法是把原来的过程变成了单条SQL,把一条一条插入的语句变成一个集合,成批次的写入data buffer区,再次提高了速度。
BYS@ bys001>create table test1(a number);
BYS@ bys001>alter system flush shared_pool;
BYS@ bys001>insert into test1 select rownum from dual connect by rownum<1000001;
1000000 rows created.
Elapsed: 00:00:04.46
BYS@ bys001>commit;
BYS@ bys001>select count(*) from test1;
COUNT(*)
----------
1000000
#############################################
方法八:使用CREATE TABLE as select的方法来创建表更快速。原因是insert into test1 select的方式需要先将数据写入data buffer区,再写入磁盘
create table test1 as select的方式跳过data buffer区直接写入磁盘。
BYS@ bys001>alter system flush shared_pool;
BYS@ bys001>create table test1 as select rownum a from dual connect by rownum<1000001;
BYS@ bys001>alter system flush shared_pool;
BYS@ bys001>create table test1 as select rownum a from dual connect by rownum<1000001;
Table created.
Elapsed: 00:00:03.10
############################
方法九 :设置并行度为2,再次提升速度。修改虚拟机配置为使用两个核心。查看:
[oracle@oel-01 ~]$ grep 'physical id' /proc/cpuinfo | so
physical id : 0
[oracle@oel-01 ~]$ grep 'core id' /proc/cpuinfo | sort -u | wc -l
2
比如我这里就是单CPU双核心,设置并行度为2
BYS@ bys001>alter system flush shared_pool;
System altered.
Elapsed: 00:00:00.35
BYS@ bys001>create table test1 parallel 2 as select rownum a from dual connect by rownum<1000001;
Table created.
Elapsed: 00:00:02.20