Read Oracle Essentials Oracle Database 11g Online
Authors: Rick Greenwald
|
229
Query Optimization
Oracle first provided the ability to recognize a star schema in the query optimizer in Oracle7 and has focused on making its cost-based query optimizer smarter in response to business intelligence queries in subsequent database releases. Further improving optimizer prediction accuracy, since Oracle Database 10
g
, optimizer predictions are compared to actual runtime performance, and any errors are subsequently corrected automatically. The optimizer also can provide query rewrite transparently to summary levels commonly deployed with star schema through materialized views. Oracle Database 11
g
added query rewrite for the OLAP Option as well as improved solving of queries containing inline views.
How does the optimizer handle a query against a star schema? First, it finds a sales transactions fact table (shown in
Figure 10-2)
with a lot more entries than the surrounding dimension tables. This is the clue that a star schema exists. As Oracle7
evolved, the optimizer began to produce much smarter plans. The optimizer for a standard relational database typically would have tried to join each of the dimension tables to the fact table, one at a time. Because the fact table is usually very large, using the fact table in multiple joins takes a lot of time.
Cartesian product joins were added to Oracle7 to first join the dimension tables, with a subsequent single join back to the fact table in the final step. This technique works relatively well when there are not many dimension tables (typically six or fewer, as a rule of thumb, to keep the Cartesian product small) and when data is relatively well populated.
In some situations, there are a fairly large number of dimension tables or the data in the fact table is sparse. For joining such tables, a parallel bitmap star join may be selected by the optimizer.
In earlier releases of the Oracle database, DBAs had to set initialization parameters (e.g., STAR_TRANSFORMATION) and gather statistics, enabling the optimizer to recognize the best methods for solving such queries. Today, needed parameters are preset upon installation and statistics are automatically gathered by the Oracle database.
Bitmap Indexes and Parallelism
Bitmap indexes
, described in
Chapter 4,
were first introduced in Oracle7 to speed up the type of data retrieval and joins in data warehousing queries. Bitmap indexes in Oracle are typically considered for columns in which the data has low cardinality.
Cardinality
is the number of different values in an index divided by the number of rows. There are various opinions about what low cardinality actually is. Some consider cardinality as high as 10% to be low, but remember that if a table has a million rows, that “low” cardinality would mean 100,000 different values in a column!
230
|
Chapter 10: Oracle Data Warehousing and Business Intelligence
In a bitmap index, a value of 1 in the index indicates that a value is present in a particular row and 0 indicates that the value is not present. A bitmap is built for each of the values in the indexed columns. Because computers are built on a concept of 1s and 0s, this technique can greatly speed up data retrieval. In addition, join operations such as AND become a simple addition operation across multiple bitmaps. A side benefit is that bitmap indexes can provide considerable storage savings.
Figure 10-3
illustrates the use of a bitmap index in a compound WHERE clause. Bitmap indexes can be used together for even faster performance. The bitmap indexes are essentially stacked together, as a set of punch cards might be. Oracle simply looks for those parts of the stack with all the bits turned on (indicating the presence of the value), in the same way that you could try to stick a knitting needle through the portions of the card stack that were punched out on all of the cards.
SELECT count(*)
PARTS table
FROM parts
WHERE
partno
color
size
weight
size = ‘MED’
AND
001
GREEN
MED
98.1
002
RED
MED
1241
color = ‘RED’
003
RED
SMALL
100.1
004
BLUE
LARGE
54.9
005
RED
MED
124.1
006
GREEN
SMALL
60.1
...
...
.....
...
Index on ‘color’
color =
'BLUE'
0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0
color =
'RED'
1
1
0 1 0 0 0 1 0 0 0 0 1 0 0 1
color =
'GREEN'
1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0
3 bits in
index entries
size =
'SMALL'
0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 1
size =
'MED'
1
1
1 0 0 0 1 0 0 0 0 1 0 1 0 0
size =
'LARGE'
0 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0
Index on ‘size’
Figure 10-3. Bitmap index operation in a compound WHERE clause
In Oracle, star-query performance is improved when bitmap indexes are created on the foreign-keys columns of the fact table that link to the surrounding dimension tables. A parallel bitmap star join occurs in which the bitmaps retrieve only the necessary rows from the fact table and the rows are joined to the dimension tables.
During the join, sparseness (i.e., a large quantity of empty values) is recognized inherently in the bitmaps, and the number of dimension tables isn’t a problem. This algorithm can also efficiently handle a
snowflake schema
, which is an extension of a standard star schema in which there are multiple tables for each dimension.
Query Optimization
|
231
To further speed queries, Oracle9
i
added a bitmap join index from fact tables to dimension tables. A bitmap join index is simply the bitmap index of a join of two or more tables. The speedup in performance comes from avoiding actual table joins or reducing the amount of data joined by taking into account restrictions in advance of the joining of data. Performance speedup for star queries with multiple dimension tables can be greatly improved since bitwise operations in star transformations can now be eliminated.
Performing queries in parallel also obviously improves performance. Joins and sorts are frequently used to solve decision-support queries. Parallelism is described in
Chapter 7.
That chapter lists functions that Oracle can perform in parallel (see
Real Application Clusters, which replaced Oracle Parallel Server as of Oracle9
i
, further expands parallelism by enabling queries to transparently scale across nodes in clusters or in grids of computer systems.
Remember that these Oracle features use the cost-based optimizer and, prior to Oracle Database 10
g,
you should run statistics periodically (using the ANALYZE command) on the tables to ensure good performance. Statistics gathering can be done in parallel.
Since Oracle Database 10
g
, statistics gathering is automatic and populates the Automatic Workload Repository. For example, the SQL Access Advisor leverages this information when making tuning recommendations.
Summary Tables
Data within dimensions is usually hierarchical in nature (e.g., in the time dimension, day rolls up to week, which rolls up to month, which rolls up to quarter, which rolls up to year). If the query is simply looking for data summarized at a monthly level, why should it have to sort through more detailed daily and weekly data? Instead, it can simply view data at or above that level of the hierarchy. Formerly, data warehousing performance consultants designed these types of summary tables—
including multiple levels of precalculated summarization. For example, all the time periods listed in
Figure 10-2
can be calculated on the fly using different groupings of days. However, to speed queries based on a different time series, a data warehouse can have values precalculated for weeks and months and stored in summary tables to which queries can be redirected.
Materialized Views
Oracle8
i
introduced the concept of
materialized views
for the creation of summary tables for facts and dimensions that can represent rollup levels in the hierarchies. A materialized view provides precomputed summary data; most importantly, a
232
|
Chapter 10: Oracle Data Warehousing and Business Intelligence
materialized view is automatically substituted for a larger detailed table when appropriate. The cost-based query optimizer can perform query rewrites to these summary tables and rollup levels in the hierarchy transparently, often resulting in dramatic increases in performance. For instance, if a query can be answered by summary data based on sales by month, the query optimizer will automatically substitute the materialized view for the more granular table when processing the query. A query at the quarter level might use monthly aggregates in the materialized view, selecting the months needed for the quarter(s). Oracle Database 10
g
added query rewrite capabilities such that the optimizer can make use of multiple appropriate materialized views.
Materialized views can be managed through Oracle Enterprise Manager (see also
Chapter 5).
The SQL Advisor accessible in Enterprise Manager includes a SQL
Access Advisor that can recommend when to create materialized views.
Analytics, OLAP, and Data Mining in the Database
Analysis of large data sets is faster when it takes place in the database where the data is stored. This section describes the database functions and other features available for analytics and statistics, online analytical processing (OLAP) multidimensional deployment choices, and data mining.
It is worth noting here that the growing use of Oracle for statistical computations led to support for floating-point number types providing the precision outlined in the IEEE 754-1985 standard (with minor differences). These are provided in the datatypes BINARY_FLOAT and BINARY_DOUBLE in Oracle Database 10
g
and more recent database releases.
Analytic and Statistical Functions
Oracle releases dating back to Oracle8
i
have continued to add new analytic and statistical functions as SQL extensions to the core Oracle Enterprise Edition and Standard Edition databases. These analytic functions now include:
Ranking functions
Used to compute a record’s rank with respect to other records. Functions include RANK, DENSE_RANK, CUME_DIST, PERCENT_RANK, NTILE, and ROW_NUMBER. Hypothetical ranking is supported.
Windowing aggregate functions
Used to compute cumulative and moving averages. Functions include SUM, AVG, MIN, MAX, COUNT, VARIANCE, STDDEV, and FIRST_VALUE,
LAST_VALUE.
LAG/LEAD functions
Often used to compare values from similar time periods, such as the first quarter of 2006 and the first quarter of 2007.
Analytics, OLAP, and Data Mining in the Database
|
233
Reporting aggregate functions
Include SUM, AVG, MIN, MAX, COUNT, VARIANCE, STDDEV, and RATIO_
TO_REPORT.
Linear regression functions
Include REGR_COUNT, REGR_AVGX and REGR_AVGY, REGR_SLOPE,
REGR_INTERCEPT, REGR_R2, and other functions used in regression line fitting for a set of numbers in pairs (e.g., having X and Y values).
Also supported in Oracle are pivoting operations, histograms (using WIDTH_
BUCKET), CASE expressions, filling gaps in data, and time-series calculations.
The database includes a statistics package, DBMS_STATS_FUNCS. Functions in the statistics package support linear algebra, frequent itemsets, descriptive statistics, hypothesis testing (T-test, F-test, Binomial test, Wilcoxon Signed Ranks Test, One-Way ANOVA, Chi-square, Mann Whitney, Kolmogorov-Smirnov), crosstab statistics (% statistics, chi squared, phi coefficient, Cramer’s V, contingency coefficient, and Cohen’s kappa), and nonparametric correlation (Pearson’s correlation coefficients, and Spearman’s and Kendall’s).
MODEL Clause in SELECT
The SQL MODEL clause first appeared in Oracle Database 10
g
as an extension to the SELECT statement. This clause enables relational data to be treated as multidimensional arrays (much like spreadsheets) and is also used to define formulas for the arrays, avoiding multiple joins and UNION clauses.