Sometimes it is useful to sort the lineitem table, in the order that Query-6 will read the data back.
Query-6 looks like so :-
set timing on;
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= to_date( '1995-01-01', 'YYYY-MM-DD')
and l_shipdate < add_months(to_date( '1995-01-01', 'YYYY-MM-DD'), 12)
and l_discount between 0.07 - 0.01 and 0.07 + 0.01
and l_quantity < 25;
exit;
and our import script looks like this :-
partition by range (l_shipdate)
subpartition by hash(l_partkey)
and the field shipdate is the 11th field
create table l_et(
l_orderkey number ,
l_partkey number ,
l_suppkey number ,
l_linenumber number ,
l_quantity number ,
l_extendedprice number ,
l_discount number ,
l_tax number ,
l_returnflag char(1) ,
l_linestatus char(1) ,
l_shipdate date ,
l_commitdate date ,
l_receiptdate date ,
l_shipinstruct char(25) ,
l_shipmode char(10) ,
l_comment varchar(44)
So we need to sort the lineitem flat file(s) on the 11th item. My TPCH dataset is spread over 15 datafiles, so I can use a simple shell script and the regular unix 'sort' command to sort the files (I am using Redhat Linux FWIW). On my sort command line I override the location of the temp files to be on a NetApp NFS share, which produces a small improvement in overall runtime (the majority of the runtime is spent in userland CPU time)
linux:[/flatfiles] $ time sort lineitem.tbl.1 -t'|' -k11 -T /flatfiles > lineitem.tbl.sorted.1
real 89m5.468s
user 81m22.279s
sys 1m9.209s
Anyhow here is the command line used to achieve the sort (there is no 'space' between the ' and the | symbol although it looks like there is, at least on my browser)
linux:[/flatfiles] $ for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> do
> sort lineitem.tbl.$i -t'|' -k11 -T /flatfiles > lineitem.tbl.sorted.$i
> done