You can use SQL to query property graph data in Oracle Spatial and Graph.
For the property graph support in Oracle Spatial and Graph, all the vertices and edges data are persisted in relational form in Oracle Database. For detailed information about the Oracle Spatial and Graph property graph schema objects, see Property Graph Schema Objects for Oracle Database.
This chapter provides examples of typical graph queries implemented using SQL. The audience includes DBAs as well as application developers who understand SQL syntax and property graph schema objects.
The benefits of querying directly property graph using SQL include:
There is no need to bring data outside Oracle Database.
You can leverage the industry-proven SQL engine provided by Oracle Database.
You can easily join or integrate property graph data with other data types (relational, JSON, XML, and so on).
You can take advantage of existing Oracle SQL tuning and database management tools and user interface.
The examples assume that there is a property graph named connections
in the current schema. The SQL queries and example output are for illustration purpose only, and your output may be different depending on the data in your connections
graph. In some examples, the output is reformatted for readability.
The examples in this topic query vertices, edges, and properties of the graph.
Example 4-1 Find a Vertex with a Specified Vertex ID
This example find the vertex with vertex ID 1 in the connections
graph.
SQL> select vid, k, v, vn, vt from connectionsVT$ where vid=1;
The output might be as follows:
1 country United States 1 name Barack Obama 1 occupation 44th president of United States of America ...
Example 4-2 Find an Edge with a Specified Edge ID
This example find the edge with edge ID 100 in the connections
graph.
SQL> select eid,svid,dvid,k,t,v,vn,vt from connectionsGE$ where eid=1000;
The output might be as follows:
1000 1 2 weight 3 1 1
In the preceding output, the K of the edge property is "weight" and the type ID of the value is 3, indicating a float value.
Example 4-3 Perform Simple Counting
This example performs simple counting in the connections
graph.
SQL> -- Get the total number of K/V pairs of all the vertices SQL> select /*+ parallel */ count(1) from connectionsVT$; 299 SQL> -- Get the total number of K/V pairs of all the edges SQL> select /*+ parallel(8) */ count(1) from connectionsGE$; 164 SQL> -- Get the total number of vertices SQL> select /*+ parallel */ count(distinct vid) from connectionsVT$; 78 SQL> -- Get the total number of edges SQL> select /*+ parallel */ count(distinct eid) from connectionsGE$; 164
Example 4-4 Get the Set of Property Keys Used
This example gets the set of property keys used for the vertices n the connections
graph.
SQL> select /*+ parallel */ distinct k from connectionsVT$; company show occupation type team religion criminal charge music genre genre name role political party country 13 rows selected. SQL> -- get the set of property keys used for edges SQL> select /*+ parallel */ distinct k from connectionsGE$; weight
Example 4-5 Find Vertices with a Value
This example finds vertices with a value (of any property) that is of String type, and where and the value contains two adjacent occurrences of a, e, i, o, or u, regardless of case.n the connections
graph.
SQL> select vid, t, k, v from connectionsVT$ where t=1 and regexp_like(v, '([aeiou])\1', 'i'); 6 1 name Jordan Peele 6 1 show Key and Peele 54 1 name John Green ...
It is usually hard to leverage a B-Tree index for the preceding kind of query because it is difficult to know beforehand what kind of regular expression is going to be used. For the above query, you might get the following execution plan. Note that full table scan is chosen by the optimizer.
------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 795 | 28 (0)| 00:00:01 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 15 | 795 | 28 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | |
|* 4 | TABLE ACCESS FULL| CONNECTIONSVT$ | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(INTERNAL_FUNCTION("V") AND REGEXP_LIKE ("V",U'([aeiou])\005C1','i') AND "T"=1 AND INTERNAL_FUNCTION("K"))
Note
-----
- Degree of Parallelism is 2 because of table property
If the Oracle Database In-Memory option is available and memory is sufficient, it can help performance to place the table (full table or a set of relevant columns) in memory. One way to achieve that is as follows:
SQL> alter table connectionsVT$ inmemory; Table altered.
Now, entering the same SQL containing the regular expression shows a plan that performs a "TABLE ACCESS INMEMORY FULL".
--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 795 | 28 (0)| 00:00:01 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 15 | 795 | 28 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | |
|* 4 | TABLE ACCESS INMEMORY FULL| CONNECTIONSVT$ | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | |
---------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(INTERNAL_FUNCTION("V") AND REGEXP_LIKE ("V",U'([aeiou])\005C1','i') AND "T"=1 AND INTERNAL_FUNCTION("K"))
Note
-----
- Degree of Parallelism is 2 because of table property
If values of a property (vertex property or edge property) contain free text, then it might help performance to create an Oracle Text index on the V column.
Oracle Text can process text that is directly stored in the database. The text can be short strings (such as names or addresses), or it can be full-length documents. These documents can be in a variety of textual format.
The text can also be in many different languages. Oracle Text can handle any space-separated languages (including character sets such as Greek or Cyrillic). In addition, Oracle Text is able to handle the Chinese, Japanese and Korean pictographic languages)
Because the property graph feature uses NVARCHAR typed column for better support of Unicode, it is highly recommended that UTF8 (AL32UTF8) be used as the database character set.
To create an Oracle Text index on the vertices table (or edges table), the ALTER SESSION privilege is required. For example:
SQL> grant alter session to <YOUR_USER_SCHEMA_HERE>;
If customization is required, also grant the EXECUTE privilege on CTX_DDL:
SQL> grant execute on ctx_ddl to <YOUR_USER_SCHEMA_HERE>;
The following shows some example statements for granting these privileges to SCOTT.
SQL> conn / as sysdba Connected. SQL> -- This is a PDB setup -- SQL> alter session set container=orcl; Session altered. SQL> grant execute on ctx_ddl to scott; Grant succeeded. SQL> grant alter session to scott; Grant succeeded.
Example 4-6 Create a Text Index
This example creates an Oracle Text index on the vertices table (V column) of the connections graph in the SCOTT schema. Note that unlike the text index capabilities provided by either Apache Lucene or Apache SolrCloud, the Oracle Text index created here is for all property keys, not just one or a subset of property keys. In addition, if a new property is added to the graph and the property value is of String data type, then it will automatically be included in the same text index.
The example uses the OPG_AUTO_LEXER lexer owned by MDSYS.
SQL> execute opg_apis.create_vertices_text_idx('scott', 'connections', pref_owner=>'MDSYS', lexer=>'OPG_AUTO_LEXER', dop=>2);
If customization is desired, you can use the ctx_ddl.create_preference API. For example:
SQL> -- The following requires access privilege to CTX_DDL SQL> exec ctx_ddl.create_preference('SCOTT.OPG_AUTO_LEXER', 'AUTO_LEXER'); PL/SQL procedure successfully completed. SQL> execute opg_apis.create_vertices_text_idx('scott', 'connections', pref_owner=>'scott', lexer=>'OPG_AUTO_LEXER', dop=>2); PL/SQL procedure successfully completed.
You can now use a rich set of functions provided by Oracle Text to perform queries against graph elements.
Note:
If you no longer need an Oracle Text index, you can use the drop_vertices_text_idx or opg_apis.drop_edges_text_idx API to drop it. The following statements drop the text indexes on the vertices and edges of a graph named connections
owned by SCOTT:
SQL> exec opg_apis.drop_vertices_text_Idx('scott', 'connections'); SQL> exec opg_apis.drop_edges_text_Idx('scott', 'connections');
Example 4-7 Find a Vertex that Has a Property Value
The following example find a vertex that has a property value (of string type) containing the keyword "Obama".
SQL> select vid, k, t, v from connectionsVT$ where t=1 and contains(v, 'Obama', 1) > 0 order by score(1) desc ;
The output and SQL execution plan from the preceding statement may appear as follows. Note that DOMAIN INDEX appears as an operation in the execution plan.
1 name 1 Barack Obama
Execution Plan
----------------------------------------------------------
Plan hash value: 1619508090
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 56 | 5 (20) | 00:00:01 | | |
| 1 | SORT ORDER BY | | 1 | 56 | 5 (20) | 00:00:01 | | |
|* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$ | 1 | 56 | 4 (0)| 00:00:01 | ROWID | ROWID |
|* 3 | DOMAIN INDEX | CONNECTIONSXTV$ | | | 4 (0)| 00:00:01 | | |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"=1 AND INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
3 - access("CTXSYS"."CONTAINS"("V",'Obama',1)>0)
Example 4-8 Fuzzy Match
The following example finds a vertex that has a property value (of string type) containing variants of "ameriian" (a deliberate misspelling for this example) Fuzzy match is used.
SQL> select vid, k, t, v
from connectionsVT$
where contains(v, 'fuzzy(ameriian,,,weight)', 1) > 0
order by score(1) desc;
The output and SQL execution plan from the preceding statement may appear as follows.
8 role 1 american business man 9 role 1 american business man 4 role 1 american economist 6 role 1 american comedian actor 7 role 1 american comedian actor 1 occupation 1 44th president of United States of America 6 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1619508090 ----------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 56 | 5 (20)| 00:00:01 | | | | 1 | SORT ORDER BY | | 1 | 56 | 5 (20)| 00:00:01 | | | |* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$ | 1 | 56 | 4 (0)| 00:00:01 | ROWID | ROWID | |* 3 | DOMAIN INDEX | CONNECTIONSXTV$ | | | 4 (0)| 00:00:01 | | | ----------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
Example 4-9 Query Relaxation
The following example is a sophisticated Oracle Text query that implements query relaxation, which enables you to execute the most restrictive version of a query first, progressively relaxing the query until the required number of matches is obtained. Using query relaxation with queries that contain multiple strings, you can provide guidance for determining the “best” matches, so that these appear earlier in the results than other potential matches.
This example searchs for "american actor" with a query relaxation sequence.
SQL> select vid, k, t, v from connectionsVT$ where CONTAINS (v, '<query> <textquery lang="ENGLISH" grammar="CONTEXT"> <progression> <seq>{american} {actor}</seq> <seq>{american} NEAR {actor}</seq> <seq>{american} AND {actor}</seq> <seq>{american} ACCUM {actor}</seq> </progression> </textquery> <score datatype="INTEGER" algorithm="COUNT"/> </query>') > 0;
The output and SQL execution plan from the preceding statement may appear as follows.
7 role 1 american comedian actor 6 role 1 american comedian actor 44 occupation 1 actor 8 role 1 american business man 53 occupation 1 actor film producer 52 occupation 1 actor 4 role 1 american economist 47 occupation 1 actor 9 role 1 american business man 9 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2158361449 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 56 | 4 (0)| 00:00:01 | | | |* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$ | 1 | 56 | 4 (0)| 00:00:01 | ROWID | ROWID | |* 2 | DOMAIN INDEX | CONNECTIONSXTV$ | | | 4 (0)| 00:00:01 | | | ---------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V")) 2 - access("CTXSYS"."CONTAINS"("V",'<query> <textquery lang="ENGLISH" grammar="CONTEXT"> <progression> <seq>{american} {actor}</seq> <seq>{american} NEAR {actor}</seq> <seq>{american} AND {actor}</seq> <seq>{american} ACCUM {actor}</seq> </progression> </textquery> <score datatype="INTEGER" algorithm="COUNT"/> </query>')>0)
Example 4-10 Find an Edge
Just as with vertices, you can create an Oracle Text index on the V column of the edges table (GE$) of a property graph. The following example uses the OPG_AUTO_LEXER lexer owned by MDSYS.
SQL> exec opg_apis.create_edges_text_idx('scott', 'connections', pref_owner=>'mdsys', lexer=>'OPG_AUTO_LEXER', dop=>4);
If customization is required, use the ctx_ddl.create_preference API.
A key benefit of using a graph data model is that you can easily navigate across entities (people, movies, products, services, events, and so on) that are modeled as vertices, following links and relationships modeled as edges. In addition, graph matching templates can be defined to do such things as detect patterns, aggregate individuals, and analyze trends.
This topic provides graph navigation and pattern matching examples using the example property graph named connections. Most of the SQL statements are relatively simple, but they can be used as building blocks to implement requirements that are more sophisticated. It is generally best to start from something simple, and progressively add complexity.
Example 4-11 Who Are a Person's Collaborators?
The following SQL ststement finds all entities that a vertex with ID 1 collaborates with. For simplicity, it considers only outgoing relationships.
SQL> select dvid, el, k, vn, v from connectionsGE$ where svid=1 and el='collaborates';
Note:
To find the specific vertex ID of interest, you can perform a text query on the property graph using keywords or fuzzy matching. (For details and examples, see Text Queries on Property Graphs.)
The preceding example's output and execution plan may be as follows.
2 collaborates weight 1 1 21 collaborates weight 1 1 22 collaborates weight 1 1 .... 26 collaborates weight 1 1 10 rows selected. ------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 460 | 2 (0)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 10 | 460 | 2 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) | | 3 | PX PARTITION HASH ALL | | 10 | 460 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 4 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$ | 10 | 460 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 5 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates' AND INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V")) 5 - access("SVID"=1)
Example 4-12 Who Are a Person's Collaborators and What are Their Occupations?
The following SQL statement finds collaborators of the vertex with ID 1, and the occupation of each collaborator. A join with the vertices table (VT$) is required.
SQL> select dvid, vertices.v from connectionsGE$, connectionsVT$ vertices where svid=1 and el='collaborates' and dvid=vertices.vid and vertices.k='occupation';
The preceding example's output and execution plan may be as follows.
21 67th United States Secretary of State 22 68th United States Secretary of State 23 chancellor 28 7th president of Iran 19 junior United States Senator from New York ... -------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 7 | 525 | 7 (0)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 7 | 525 | 7 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) | | 3 | NESTED LOOPS | | 7 | 525 | 7 (0)| 00:00:01 | | | Q1,00 | PCWP | | | 4 | PX PARTITION HASH ALL | | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 5 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$ | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 6 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | | 7 | PARTITION HASH ITERATOR | | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | |* 8 | TABLE ACCESS BY LOCAL INDEX ROWID | CONNECTIONSVT$ | | | | | KEY | KEY | Q1,00 | PCWP | | |* 9 | INDEX UNIQUE SCAN | CONNECTIONSXQV$ | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates') 6 - access("SVID"=1) 8 - filter(INTERNAL_FUNCTION("VERTICES"."V")) 9 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'occupation') filter(INTERNAL_FUNCTION("VERTICES"."K"))
Example 4-13 Find a Person's Enemies and Aggregate Them by Their Country
The following SQL statement finds enemies (that is, those with the feuds
relationship) of the vertex with ID 1, and aggregates them by their countries. A join with the vertices table (VT$) is required.
SQL> select vertices.v, count(1) from connectionsGE$, connectionsVT$ vertices where svid=1 and el='feuds' and dvid=vertices.vid and vertices.k='country' group by vertices.v;
The example's output and execution plan may be as follows. In this case, the vertex with ID 1 has 3 enemies in the United States and 1 in Russia.
United States 3 Russia 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 5 | 375 | 5 (20)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,01 | P->S | QC (RAND) | | 3 | HASH GROUP BY | | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,01 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,00 | P->P | HASH | | 6 | HASH GROUP BY | | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,00 | PCWP | | | 7 | NESTED LOOPS | | 5 | 375 | 4 (0)| 00:00:01 | | | Q1,00 | PCWP | | | 8 | PX PARTITION HASH ALL | | 5 | 125 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 9 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$ | 5 | 125 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 10 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | | 11 | PARTITION HASH ITERATOR | | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | |* 12 | TABLE ACCESS BY LOCAL INDEX ROWID | CONNECTIONSVT$ | | | | | KEY | KEY | Q1,00 | PCWP | | |* 13 | INDEX UNIQUE SCAN | CONNECTIONSXQV$ | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 9 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'feuds') 10 - access("SVID"=1) 12 - filter(INTERNAL_FUNCTION("VERTICES"."V")) 13 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'country') filter(INTERNAL_FUNCTION("VERTICES"."K"))
Example 4-14 Find a Person's Collaborators, and aggregate and sort them
The following SQL statement finds the collaborators of the vertex with ID 1, aggregates them by their country, and sorts them in ascending order.
SQL> select vertices.v, count(1) from connectionsGE$, connectionsVT$ vertices where svid=1 and el='collaborates' and dvid=vertices.vid and vertices.k='country' group by vertices.v order by count(1) asc;
The example output and execution plan may be as follows. In this case, the vertex with ID 1 has the most collaborators in the United States.
Germany 1 Japan 1 Iran 1 United States 7 --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 750 | 9 (23)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (ORDER) | :TQ10002 | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,02 | P->S | QC (ORDER) | | 3 | SORT ORDER BY | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,02 | PCWP | | | 4 | PX RECEIVE | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,02 | PCWP | | | 5 | PX SEND RANGE | :TQ10001 | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,01 | P->P | RANGE | | 6 | HASH GROUP BY | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,01 | PCWP | | | 7 | PX RECEIVE | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,01 | PCWP | | | 8 | PX SEND HASH | :TQ10000 | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,00 | P->P | HASH | | 9 | HASH GROUP BY | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,00 | PCWP | | | 10 | NESTED LOOPS | | 10 | 750 | 7 (0)| 00:00:01 | | | Q1,00 | PCWP | | | 11 | PX PARTITION HASH ALL | | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 12 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | CONNECTIONSGE$ | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 13 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | | 14 | PARTITION HASH ITERATOR | | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | |* 15 | TABLE ACCESS BY LOCAL INDEX ROWID | CONNECTIONSVT$ | | | | | KEY | KEY | Q1,00 | PCWP | | |* 16 | INDEX UNIQUE SCAN | CONNECTIONSXQV$ | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 12 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates') 13 - access("SVID"=1) 15 - filter(INTERNAL_FUNCTION("VERTICES"."V")) 16 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'country') filter(INTERNAL_FUNCTION("VERTICES"."K"))
The CONNECT BY clause and parallel recursion provide options for advanced navigation and querying.
CONNECT BY lets you navigate and find matches in a hierarchical order. To follow outgoing edges, you can use prior dvid = svid to guide the navigation.
Parallel recursion lets you perform navigation up to a specified number of hops away.
The examples use a property graph named connections.
Example 4-15 CONNECT WITH
The following SQL statement follows the outgoing edges by 1 hop.
SQL> select G.dvid from connectionsGE$ G start with svid = 1 connect by nocycle prior dvid = svid and level <= 1;
The preceding example's output and execution plan may be as follows.
2 3 4 5 6 7 8 9 10 ... ------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 7 | 273 | 3 (67)| 00:00:01 | | | | | | |* 1 | CONNECT BY WITH FILTERING| | | | | | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 2 | 12 | 0 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) | | 4 | PX PARTITION HASH ALL | | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 5 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 6 | FILTER | | | | | | | | | | | | 7 | NESTED LOOPS | | 5 | 95 | 1 (0)| 00:00:01 | | | | | | | 8 | CONNECT BY PUMP | | | | | | | | | | | | 9 | PARTITION HASH ALL | | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | | | | |* 10 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | | | | ------------------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("SVID"=PRIOR "DVID") filter(LEVEL<=2) 5 - access("SVID"=1) 6 - filter(LEVEL<=2) 10 - access("connect$_by$_pump$_002"."prior dvid "="SVID")
To extend from 1 hop to multiple hops, change 1 in the preceding example to another integer. For example, to change it to 2 hops, specify: level <= 2
Example 4-16 Parallel Recursion
The following SQL statement uses recursion within the WITH clause to perform navigation up to 4 hops away, a using recursively defined graph expansion: g_exp
references g_exp
in the query, and that defines the recursion. The example also uses the PARALLEL optimizer hint for parallel execution.
SQL> WITH g_exp(svid, dvid, depth) as ( select svid as svid, dvid as dvid, 0 as depth from connectionsGE$ where svid=1 union all select g2.svid, g1.dvid, g2.depth + 1 from g_exp g2, connectionsGE$ g1 where g2.dvid=g1.svid and g2.depth <= 3 ) select /*+ parallel(4) */ dvid, depth from g_exp where svid=1 ;
The example's output and execution plan may be as follows. Note that CURSOR DURATION MEMORY
is chosen in the execution, which indicates the graph expansion stores the intermediate data in memory.
22 4
25 4
24 4
1 4
23 4
33 4
22 4
22 4
... ...
Execution Plan
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 801 | 31239 | 147 (0)| 00:00:01 | | | | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | | | |
| 3 | UNION ALL (RECURSIVE WITH) BREADTH FIRST | | | | | | | | | | |
| 4 | PX COORDINATOR | | | | | | | | | | |
| 5 | PX SEND QC (RANDOM) | :TQ20000 | 2 | 12 | 0 (0)| 00:00:01 | | | Q2,00 | P->S | QC (RAND) |
| 6 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | Q2,00 | PCWP | |
| 7 | PX PARTITION HASH ALL | | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q2,00 | PCWC | |
|* 8 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q2,00 | PCWP | |
| 9 | PX COORDINATOR | | | | | | | | | | |
| 10 | PX SEND QC (RANDOM) | :TQ10000 | 799 | 12M| 12 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) |
| 11 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | Q1,00 | PCWP | |
|* 12 | HASH JOIN | | 799 | 12M| 12 (0)| 00:00:01 | | | Q1,00 | PCWP | |
| 13 | BUFFER SORT (REUSE) | | | | | | | | Q1,00 | PCWP | |
| 14 | PARTITION HASH ALL | | 164 | 984 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | |
| 15 | INDEX FAST FULL SCAN | CONNECTIONSXDE$ | 164 | 984 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | |
| 16 | PX BLOCK ITERATOR | | | | | | | | Q1,00 | PCWC | |
|* 17 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | Q1,00 | PCWP | |
| 18 | PX COORDINATOR | | | | | | | | | | |
| 19 | PX SEND QC (RANDOM) | :TQ30000 | 801 | 31239 | 135 (0)| 00:00:01 | | | Q3,00 | P->S | QC (RAND) |
|* 20 | VIEW | | 801 | 31239 | 135 (0)| 00:00:01 | | | Q3,00 | PCWP | |
| 21 | PX BLOCK ITERATOR | | 801 | 12M| 135 (0)| 00:00:01 | | | Q3,00 | PCWC | |
| 22 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6614_11CB2D2 | 801 | 12M| 135 (0)| 00:00:01 | | | Q3,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8 - access("SVID"=1)
12 - access("G2"."DVID"="G1"."SVID")
17 - filter("G2"."INTERNAL_ITERS$"=LEVEL AND "G2"."DEPTH"<=3)
20 - filter("SVID"=1)
The PIVOT clause lets you dynamically add columns to a table to create a new table.
The schema design (VT$ and GE$) of the property graph is narrow ("skinny") rather than wide ("fat"). This means that if a vertex or edge has multiple properties, those property keys, values, data types, and so on will be stored using multiple rows instead of multiple columns. Such a design is very flexible in the sense that you can add properties dynamically without having to worry about adding too many columns or even reaching the physical maximum limit of number of columns a table may have. However, for some applications you may prefer to have a wide table if the properties are somewhat homogeneous.
Example 4-17 Pivot
The following CREATE TABLE ... AS SELECT statement uses PIVOT to add four columns: ‘company’,’ occupation’,’ name’, and ‘religion’.
SQL> CREATE TABLE table pg_wide as with G AS (select vid, k, t, v from connectionsVT$ ) select * from G pivot ( min(v) for k in ('company', 'occupation', 'name', 'religion') ); Table created.
The following DESCRIBE statement shows the definition of the new table, including the four added columns. (The output is reformatted for readability.)
SQL> DESCRIBE pg_wide; Name Null? Type --------------------------------------------------- -------- -------------------- VID NOT NULL NUMBER T NUMBER(38) 'company' NVARCHAR2(15000) 'occupation' NVARCHAR2(15000) 'name' NVARCHAR2(15000) 'religion' NVARCHAR2(15000)
In addition to the analytical functions offered by the in-memory analyst, the property graph feature in Oracle Spatial and Graph supports several native, SQL-based property graph analytics.
SQL-based analytics are especially useful if:
The computation is not bound by the physical memory.
There is no need to move the graph data outside the database.
Computation is easily done against the latest snapshot of a property graph.
There is no need to 'sync up' (synchronize) data inside and outside the database.
However, when a graph (or a subgraph) fits in memory, then running analytics provided by the in-memory analyst usually provides better better performance than using SQL-based analytics. Because many of the analytics implementation require using intermediate data structures, most SQL- (PL/SQL-) based analytics APIs have parameters for working tables (wt). A typical flow has the following steps:
Prepare the working table or tables.
Perform analytics (one or multiple calls).
Perform cleanup
Example 4-18 Shortest Path
Consider shortest path, for example. Internally, Oracle Database uses the bidirectional Dijkstra algorithm. The following code snippet shows an end-to-end flow.
set serveroutput on DECLARE wt1 varchar2(100); -- intermediate working tables n number; path varchar2(1000); weights varchar2(1000); BEGIN -- prepare opg_apis.find_sp_prep('connectionsGE$', wt1); dbms_output.put_line('working table name ' || wt1); -- compute opg_apis.find_sp( 'connectionsGE$', 1, -- start vertex ID 53, -- destination vertex ID wt1, -- working table (for Dijkstra expansion) dop => 1, -- degree of parallelism stats_freq=>1000, -- frequency to collect statistics path_output => path, -- shortest path (a sequence of vertices) weights_output => weights, -- edge weights options => null ); dbms_output.put_line('path ' || path); dbms_output.put_line('weights ' || weights); -- cleanup (commented out here; see text after the example) -- opg_apis.find_sp_cleanup('connectionsGE$', wt1); END; /
This example may produce the following output. Note that if no working table name is provided, the preparation step will automatically generate a temporary table name and create it. Because the temporary working table name uses the session ID, your output will probably be different.
working table name "CONNECTIONSGE$$TWFS12" path 1 3 52 53 weights 4 3 1 1 1 PL/SQL procedure successfully completed.
If you want to know the definition of the working table or tables, then skip the cleanup phase (as shown in the preceding example that comments out the call to find_sp_cleanup
). After the computation is done, you can describe the working table or tables.
SQL> describe "CONNECTIONSGE$$TWFS12" Name Null? Type --------- -------- ---------------------------- NID NUMBER D2S NUMBER P2S NUMBER D2T NUMBER P2T NUMBER F NUMBER(38) B NUMBER(38)
For advanced users who want to try different table creation options, such as using inmemory or advanced compression, you can pre-create the preceding working table and pass the name in.
Example 4-19 Create Working Table and Perform Analytics
The following statements first create a working table with the same column structure and basic compression enabled, then pass it to the computation.
create table connections$MY_EXP( NID NUMBER, D2S NUMBER, P2S NUMBER, D2T NUMBER, P2T NUMBER, F NUMBER(38), B NUMBER(38) ) compress nologging; DECLARE wt1 varchar2(100) := 'connections$MY_EXP'; n number; path varchar2(1000); weights varchar2(1000); BEGIN dbms_output.put_line('working table name ' || wt1); -- compute opg_apis.find_sp( 'connectionsGE$', 1, 53, wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null ); dbms_output.put_line('path ' || path); dbms_output.put_line('weights ' || weights); -- cleanup -- opg_apis.find_sp_cleanup('connectionsGE$', wt1); END; /
At the end of the computation, if the working table has not been dropped or truncated, you can check the content of the working table, as follows. Note that the working table structure may vary between releases.
SQL> select * from connections$MY_EXP; NID D2S P2S D2T P2T F B ---------- ---------- ---------- ---------- ---------- ---------- ---------- 1 0 1.000E+100 1 -1 53 1.000E+100 0 -1 1 54 1.000E+100 1 53 -1 1 52 1.000E+100 1 53 -1 1 5 1 1 1.000E+100 0 -1 26 1 1 1.000E+100 0 -1 8 1000 1 1.000E+100 0 -1 3 1 1 2 52 0 0 15 1 1 1.000E+100 0 -1 21 1 1 1.000E+100 0 -1 19 1 1 1.000E+100 0 -1 ...
Example 4-20 Multiple Calls to Same Graph
To perform multiple calls to the same graph, only a single call to the preparation step is needed. The following shows an example of computing shortest path for multiple pairs of vertices.
DECLARE wt1 varchar2(100); -- intermediate working tables n number; path varchar2(1000); weights varchar2(1000); BEGIN -- prepare opg_apis.find_sp_prep('connectionsGE$', wt1); dbms_output.put_line('working table name ' || wt1); -- find shortest path from vertex 1 to vertex 53 opg_apis.find_sp( 'connectionsGE$', 1, 53, wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null); dbms_output.put_line('path ' || path); dbms_output.put_line('weights ' || weights); -- find shortest path from vertex 2 to vertex 36 opg_apis.find_sp( 'connectionsGE$', 2, 36, wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null); dbms_output.put_line('path ' || path); dbms_output.put_line('weights ' || weights); -- find shortest path from vertex 30 to vertex 4 opg_apis.find_sp( 'connectionsGE$', 30, 4, wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null); dbms_output.put_line('path ' || path); dbms_output.put_line('weights ' || weights); -- cleanup opg_apis.find_sp_cleanup('connectionsGE$', wt1); END; /
The example's output may be as follows: three shortest paths have been found for the multiple pairs of vertices provided.
working table name "CONNECTIONSGE$$TWFS12" path 1 3 52 53 weights 4 3 1 1 1 path 2 36 weights 2 1 1 path 30 21 1 4 weights 4 3 1 1 1 PL/SQL procedure successfully completed.
PGQL is a SQL-like query language for property graph data structures that consist of nodes that are connected to other nodes by edges, each of which can have key-value pairs (properties) associated with them.
The language is based on the concept of graph pattern matching, which allows you to specify patterns that are matched against vertices and edges in a data graph.
Related Topics
Pattern matching is done using topology constraints, which describe a pattern of connections between nodes in the graph, Value constraints (similar to their SQL equivalents) let you further constrain matches by specifying properties that those connections and nodes must have.
For example, assume a graph of TCP/IP connections on a computer network, and you want to detect cases where someone logged into one machine, from there into another, and from there into yet another. You would query for that pattern like this:
SELECT host1.id(), host2.id(), host3.id() WHERE /* choose what to return */ (host1) -[c1 WITH toPort = 22 and opened = true]-> (host2) /* topology must match this pattern */ -[connection2 WITH toPort = 22 and opened = true]-> (host3), connection1.bytes > 300, /* meaningful amount of data was exchanged */ connection2.bytes > 300, connection1.start < connection2.start, /* second connection within time-frame of first */ connection2.start + connection2.duration < connection1.start + connection1.duration GROUP BY host1.id(), host2.id(), host3.id() /* aggregate multiple matching connections */ ORDER BY DESC(connection1.when) /* reverse sort chronologically */
A topological constraint has a direction, as edges in graphs do. Thus, (a) <-[]- (b)
specifies a case where b has an edge pointing at a, whereas (a) -[]-> (b)
looks for an edge in the opposite direction.
The following example finds common friends of April and Chris who are older than both of them.
SELECT friend.name, friend.dob WHERE /* note the arrow directions below */ (p1:person WITH name = 'April') -[:likes]-> (friend) <-[:likes]- (p2:person WITH name = 'Chris'), friend.dob > p1.dob AND friend.dob > p2.dob ORDER BY friend.dob DESC
Labels are a way of attaching type information to edges and nodes in a graph, and can be used in constraints in graphs where not all nodes represent the same thing. For example:
SELECT p WHERE (p:person) -[e:likes]-> (m:movie WITH title='Star Wars'), (p) -[e:likes]-> (m:movie WITH title='Avatar')
Regular path queries allow a pattern to be reused. The following example finds all of the common ancestors of Mario and Luigi.
PATH has_parent := () -[:has_father|has_mother]-> () SELECT ancestor.name WHERE (:Person WITH name = 'Mario') -/:has_parent*/-> (ancestor:Person), (:Person WITH name = 'Luigi') -/:has_parent*/-> (ancestor)
The preceding path specification also shows the use of anonymous constraints, because there is no need to define names for intermediate edges or nodes that will not be used in additional constraints or query results. Anonymous elements can have constraints, such as [:has_father|has_mother]
: the edge does not get a variable name (because it will not be referenced elsewhere), but it is constrained.