A join group is a user-created dictionary object that lists two columns that can be meaningfully joined.
This chapter contains the following topics:
Topics:
CREATE INMEMORY JOIN GROUP
statement. Candidates are columns that are frequently paired in a join predicate, for example, a column joining a fact and dimension table.DBMS_SQLTUNE.REPORT_SQL_MONITOR_XML
function.Joins are an integral part of data warehousing workloads. The IM column store enhances the performance of joins when the tables being joined are stored in memory.
Because of faster scan and join processing, complex multi-table joins and simple joins that use Bloom filters benefit from the IM column store. In a data warehousing environment, the most frequently-used joins involved a fact table and one or more dimension tables.
The following joins run faster when the tables are populated in the IM column store:
Joins that are amenable to using Bloom filters
Joins of multiple small dimension tables with one fact table
Joins between two tables that have a primary key-foreign key relationship
When the IM column store is enabled, the database can use join groups to optimize joins of tables populated in the IM column store.
A join group is a set of columns on which a set of tables is frequently joined. The column set contains one or more columns; the table set contains one or more tables. The columns in the join group can be in the same or different tables. For example, if sales
and times
frequently join on the time_id
column, then you might create a join group for (times(time_id), sales(time_id))
. The maximum number of columns in a join group is 255.
Note:
A column cannot be a member of multiple join groups.
When you create a join group, the database invalidates the current In-Memory contents of the tables referenced in the join group. Subsequent repopulation causes the database to re-encode the IMCUs of the tables with the common dictionary. Thus, Oracle recommends that you first create the join group, and then populate the tables.
Create join groups using the CREATE INMEMORY JOIN GROUP
statement. To add columns to or drop columns from a join group, use an ALTER INMEMORY JOIN GROUP
statement. Drop a join group using the DROP INMEMORY JOIN GROUP
statement.
Note:
In Oracle Active Data Guard, a standby database ignores join group definitions. A standby database does not use common dictionaries, and executes queries as if join groups did not exist.
Example 6-1 Creating a Join Group
This example creates a join group named deptid_jg
that includes the department_id
column in the hr.employees
and hr.departments
tables.
CREATE INMEMORY JOIN GROUP deptid_jg (hr.employees(department_id),hr.departments(department_id));
In certain queries, join groups eliminate the performance overhead of decompressing and hashing column values.
Without join groups, if the optimizer uses a hash join but cannot use a Bloom filter, or if the Bloom filter does not filter rows effectively, then the database must decompress IMCUs and use an expensive hash join. To illustrate the problem, assume a star schema has a sales
fact table and a vehicles
dimension table. The following query joins these tables, but does not filter the output, which means that the database cannot use a Bloom filter:
SELECT v.year, v.name, s.sales_price FROM vehicles v, sales s WHERE v.name = s.name;
The following figure illustrates how the database joins the two data sets.
The database performs a hash join as follows:
Scans the vehicles
table, decompresses the rows that satisfy the predicate (in this case, all rows satisfy the predicate because no filters exist), and sends the rows to the hash join
Builds a hash table in the PGA based on the decompressed rows
Scans the sales
table and applies any filters (in this case, the query does not specify filters)
Decompresses matching rows from the IMCUs, hashes them, and then sends them to the join
Probes the hash table using the join column, which in this case is the vehicle name
If a join group exists on the v.name
and s.name
columns, however, then the database can make the preceding steps more efficient, eliminating the decompression and filtering overhead. The benefits of join groups are:
The database operates on compressed data.
In a hash join based on a join group, the database uses an array instead of building a hash table.
The database stores codes for each join column value in a common dictionary. The database joins on the codes rather than on the actual column values. This technique avoids the overhead of copying row sources.
The dictionary codes are dense and have a fixed length, which makes them space efficient.
Optimizing a query with a join group is sometimes possible when it is not possible to use a Bloom filter.
In a join group, the database compresses all columns in the join group using the same common dictionary.
This section contains the following topics:
Topics:
A common dictionary is a table-level, instance-specific set of dictionary codes.
The database automatically creates a common dictionary in the IM column store when a join group is defined on the underlying columns. The common dictionary enables the join columns to share the same dictionary codes.
A common dictionary provides the following benefits:
Encodes the values in the local dictionaries with codes from the common dictionary, which provides compression and increases the cache efficiency of the IMCU
Enables joins to use dictionary codes to construct and probe the data structures used during hash joins
Enables the optimizer to obtain statistics such as cardinality, distribution of column values, and so on
The following figure illustrates a common dictionary that corresponds to a join group created on the sales.name
and vehicles.name
tables.
Figure 6-2 Common Dictionary for a Join Group
When the database uses a common dictionary, the local dictionary for each CU does not store the original values: Audi
, BMW
, and so on. Instead, the local dictionary stores references to the values stored in the common dictionary. For example, the local dictionary might store 101
for Audi
, 220
for BMW
, and so on.
The key optimization is joining on common dictionary codes instead of column values, thereby avoiding the use of a hash table for the join.
Consider the following query, which uses a join group to join vehicles
and sales
on the name
column:
SELECT v.year, v.name, s.sales_price FROM vehicles v, sales s WHERE v.name = s.name AND v.name IN ('Audi', 'BMW', 'Porsche', 'VW');
The following figure illustrates how the join benefits from the common dictionary created on the join group.
As illustrated in the preceding diagram, the database performs a hash join on the compressed data as follows:
Scans the vehicles
table, and sends the dictionary codes (not the original column values) to the hash join: 0
(Audi), 1
(BMW), 2
(Cadillac), and so on
Builds an array of distinct common dictionary codes in the PGA
Scans the sales
table and applies any filters (in this case, the filter is for German cars only)
Sends matching rows to the join in compressed format
Looks up corresponding values in the array rather than probing a hash table, thus avoiding the need to compute a hash function on the join key columns
In this example, the vehicles
table has only seven rows. The vehicles.name
column has the following values:
Audi BMW Cadillac Ford Porsche Tesla VW
The common dictionary assigns a dictionary code to each distinct value. Conceptually, the common dictionary looks as follows:
Audi 0 BMW 1 Cadillac 2 Ford 3 Porsche 4 Tesla 5 VW 6
The database scans vehicles.name
, starting at the first dictionary code in the first IMCU and ending at the last code in the last IMCU. It stores a 1
for every row that matches the filter (German cars only), and 0
for every row that does not match the filter. Conceptually, the array might look as follows:
array[0]: 1 array[1]: 1 array[2]: 0 array[3]: 0 array[4]: 1 array[5]: 0 array[6]: 1
The database now scans the sales
fact table. To simplify the example, assume that the sales
table only has 6 rows. The database scans the rows as follows (the common dictionary code for each value is shown in parentheses):
Cadillac (2) Cadillac (2) BMW (1) Ford (3) Audi (0) Tesla (5)
The database then proceeds through the vehicles.name
array, looking for matches. If a row matches, then the database sends the matching row with its associated common dictionary code, and retrieves the corresponding column value from the vehicles.name
and sales.name
IMCUs:
2 -> array[2] is 0, so no join 2 -> array[2] is 0, so no join 1 -> array[1] is 1, so join 3 -> array[3] is 0, so no join 0 -> array[0] is 1, so join 5 -> array[5] is 0, so no join
Define join groups using the CREATE INMEMORY JOIN GROUP
statement. Candidates are columns that are frequently paired in a join predicate, for example, a column joining a fact and dimension table.
The CREATE INMEMORY JOIN GROUP
statement immediately defines a join group, which means that its metadata is visible in the data dictionary. The database does not immediately construct the common dictionary. Rather, the database builds the common dictionary the next time that a table referenced in the join group is populated or repopulated in the IM column store.
Guidelines
Creating, modifying, or dropping a join group typically invalidates all the underlying tables referenced in the join group. Thus, Oracle recommends that you create join groups before initially populating the tables.
To create a join group:
In SQL*Plus or SQL Developer, log in to the database as a user with the necessary privileges.
Create a join group by using a statement in the following form:
CREATE INMEMORY JOIN GROUP join_group_name ( table1(col1), table2(col2) );
For example, the following statement creates a join group named sales_products_jg
:
CREATE INMEMORY JOIN GROUP sales_products_jg (sales(prod_id), products(prod_id));
Optionally, view the join group definition by querying the data dictionary (sample output included):
COL JOINGROUP_NAME FORMAT a18 COL TABLE_NAME FORMAT a8 COL COLUMN_NAME FORMAT a7 SELECT JOINGROUP_NAME, TABLE_NAME, COLUMN_NAME, GD_ADDRESS FROM DBA_JOINGROUPS; JOINGROUP_NAME TABLE_NA COLUMN_ GD_ADDRESS ------------------ -------- ------- ---------------- SALES_PRODUCTS_JG SALES PROD_ID 00000000A142AE50 SALES_PRODUCTS_JG PRODUCTS PROD_ID 00000000A142AE50
Populate the tables referenced in the join group, or repopulate them if they are currently populated.
See Also:
Oracle Database SQL Language Reference to learn about the CREATE INMEMORY JOIN GROUP
statement
Example 6-2 Optimizing a Query Using a Join Group
In this example, you log in to the database as SYSTEM
, and then create a join group on the prod_id
column of sales
and products
, which are not yet populated in the IM column store:
CREATE INMEMORY JOIN GROUP sh.sales_products_jg (sh.sales(prod_id), sh.products(prod_id));
You enable the sh.sales
and sh.products
tables for population in the IM column store:
ALTER TABLE sh.sales INMEMORY; ALTER TABLE sh.products INMEMORY;
The following query indicates the tables are not yet populated in the IM column store (sample output included):
COL OWNER FORMAT a3 COL NAME FORMAT a10 COL STATUS FORMAT a20 SELECT OWNER, SEGMENT_NAME NAME, POPULATE_STATUS STATUS FROM V$IM_SEGMENTS; no rows selected
Query both tables to populate them in the IM column store:
SELECT /*+ FULL(s) NO_PARALLEL(s) */ COUNT(*) FROM sh.sales s; SELECT /*+ FULL(p) NO_PARALLEL(p) */ COUNT(*) FROM sh.products p;
The following query indicates the tables are now populated in the IM column store (sample output included):
COL OWNER FORMAT a3 COL NAME FORMAT a10 COL PARTITION FORMAT a13 COL STATUS FORMAT a20 SELECT OWNER, SEGMENT_NAME NAME, PARTITION_NAME PARTITION, POPULATE_STATUS STATUS, BYTES_NOT_POPULATED FROM V$IM_SEGMENTS; OWN NAME PARTITION STATUS BYTES_NOT_POPULATED --- ---------- ------------- -------------------- ------------------- SH SALES SALES_Q3_1998 COMPLETED 0 SH SALES SALES_Q4_2001 COMPLETED 0 SH SALES SALES_Q4_1999 COMPLETED 0 SH PRODUCTS COMPLETED 0 SH SALES SALES_Q1_2001 COMPLETED 0 SH SALES SALES_Q1_1999 COMPLETED 0 SH SALES SALES_Q2_2000 COMPLETED 0 SH SALES SALES_Q2_1998 COMPLETED 0 SH SALES SALES_Q3_2001 COMPLETED 0 SH SALES SALES_Q3_1999 COMPLETED 0 SH SALES SALES_Q4_2000 COMPLETED 0 SH SALES SALES_Q4_1998 COMPLETED 0 SH SALES SALES_Q1_2000 COMPLETED 0 SH SALES SALES_Q1_1998 COMPLETED 0 SH SALES SALES_Q2_2001 COMPLETED 0 SH SALES SALES_Q2_1999 COMPLETED 0 SH SALES SALES_Q3_2000 COMPLETED 0
Query DBA_JOINGROUPS
to get information about the join group (sample output included):
COL JOINGROUP_NAME FORMAT a18 COL TABLE_NAME FORMAT a8 COL COLUMN_NAME FORMAT a7 SELECT JOINGROUP_NAME, TABLE_NAME, COLUMN_NAME, GD_ADDRESS FROM DBA_JOINGROUPS; JOINGROUP_NAME TABLE_NA COLUMN_ GD_ADDRESS ------------------ -------- ------- ---------------- SALES_PRODUCTS_JG SALES PROD_ID 00000000A142AE50 SALES_PRODUCTS_JG PRODUCTS PROD_ID 00000000A142AE50
The preceding output shows that the join group sales_products_jg
joins on the same common dictionary address.
See Also:
Oracle Database SQL Language Reference to learn about the CREATE INMEMORY JOIN GROUP
statement
Oracle Database Reference to learn about the DBA_JOINGROUPS
view
To determine whether queries are using the join group, you can pass the SQL ID to the DBMS_SQLTUNE.REPORT_SQL_MONITOR_XML
function.
DBMS_SQLTUNE.REPORT_SQL_MONITOR_XML
output for a SQL ID. If the query returns rows, then the database used the join group for the statement associated with this SQL ID. Otherwise, the database did not use the join group.Prerequisites
To monitor join groups, you must meet the following prerequisites:
A join group must exist.
The columns referenced by the join group must have been populated after join group creation.
You must execute a join query that could potentially use the join group.
To monitor join group usage:
Log in to the database with the necessary privileges.
Obtain the SQL ID for the query you want to monitor.
For example, execute the query that you want to monitor, and then query V$SESSION.PREV_SQL_ID
.
Use the DBMS_SQLTUNE.REPORT_SQL_MONITOR_XML.EXTRACT
function to determine whether the database used the join group in the hash join.
If querying the DBMS_SQLTUNE.REPORT_SQL_MONITOR_XML.EXTRACT
function output returns rows, then the database used the join group.
Example 6-3 Monitoring a Join Group
In this example, you create a join group on the prod_id
columns of sh.products
and sh.sales
tables, and then join these tables on this column. Your goal is to determine whether the join query used the join group. You grant the sh
account administrative privileges. You log in as sh
, and then proceed as follows:
Create a SQL*Plus variable for the SQL ID as follows:
VAR B_SQLID VARCHAR2(13)
Apply the INMEMORY
attribute to the sh.products
and sh.sales
tables as follows:
ALTER TABLE sales INMEMORY MEMCOMPRESS FOR QUERY; ALTER TABLE products INMEMORY MEMCOMPRESS FOR QUERY;
Create a join group on prod_id
:
CREATE INMEMORY JOIN GROUP sh_jg (products(prod_id), sales(prod_id));
Scan the tables to populate them in the IM column store:
SELECT /*+ FULL(s) */ COUNT(*) FROM sales s; SELECT /*+ FULL(p) */ COUNT(*) FROM products p;
Execute a query that joins on the prod_id
column, and then aggregates product sales:
SELECT /*+ USE_HASH(sales) LEADING(products sales) MONITOR */ products.prod_id, SUM(sales.amount_sold) FROM products, sales WHERE products.prod_id = sales.prod_id GROUP BY products.prod_id;
Obtain the SQL ID of the preceding aggregation query:
BEGIN SELECT PREV_SQL_ID INTO :B_SQLID FROM V$SESSION WHERE SID=USERENV('SID'); END;
Determine whether the database used the join group:
SET LONGCHUNKSIZE 10000000 LONG 10000000 COL JOIN_GROUP_USAGE FORMAT A50 SELECT DBMS_SQLTUNE.REPORT_SQL_MONITOR_XML(sql_id=>:B_SQLID). EXTRACT(q'#//operation[@name='HASH JOIN']/rwsstats/stat[@id='9']#'). GETCLOBVAL(2,2) join_group_usage FROM DUAL; JOIN_GROUP_USAGE -------------------------------------------------- <stat id="9">1</stat>
The query returned rows, so the database used the join group for the statement associated with this SQL ID.
See Also:
Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_SQLTUNE.REPORT_SQL_MONITOR_XML
function
Oracle Database Reference to learn about the V$SESSION
view