Go to main content
1/29
Contents
Title and Copyright Information
Preface
Audience
Documentation Accessibility
Related Documentation
Oracle Data Mining Resources on the Oracle Technology Network
Application Development and Database Administration Documentation
Conventions
Part I Introductions
1
What Is Data Mining?
What Is Data Mining?
Automatic Discovery
Prediction
Grouping
Actionable Information
Data Mining and Statistics
Data Mining and OLAP
Data Mining and Data Warehousing
What Can Data Mining Do and Not Do?
Asking the Right Questions
Understanding Your Data
The Data Mining Process
Problem Definition
Data Gathering, Preparation, and Feature Engineering
Model Building and Evaluation
Knowledge Deployment
2
Introduction to Oracle Data Mining
About Oracle Data Mining
Data Mining in the Database Kernel
Oracle Data Mining with R Extensibility
Data Mining in Oracle Exadata
About Partitioned Model
Interfaces to Oracle Data Mining
PL/SQL API
SQL Functions
Oracle Data Miner
Predictive Analytics
Overview of Database Analytics
3
Oracle Data Mining Basics
Mining Functions
Supervised Data Mining
Supervised Learning: Testing
Supervised Learning: Scoring
Unsupervised Data Mining
Unsupervised Learning: Scoring
Algorithms
Oracle Data Mining Supervised Algorithms
Oracle Data Mining Unsupervised Algorithms
Data Preparation
Oracle Data Mining Simplifies Data Preparation
Case Data
Nested Data
Text Data
In-Database Scoring
Parallel Execution and Ease of Administration
SQL Functions for Model Apply and Dynamic Scoring
Part II Mining Functions
4
Regression
About Regression
How Does Regression Work?
Linear Regression
Multivariate Linear Regression
Regression Coefficients
Nonlinear Regression
Multivariate Nonlinear Regression
Confidence Bounds
Testing a Regression Model
Regression Statistics
Root Mean Squared Error
Mean Absolute Error
Regression Algorithms
5
Classification
About Classification
Testing a Classification Model
Confusion Matrix
Lift
Lift Statistics
Receiver Operating Characteristic (ROC)
The ROC Curve
Area Under the Curve
ROC and Model Bias
ROC Statistics
Biasing a Classification Model
Costs
Costs Versus Accuracy
Positive and Negative Classes
Assigning Costs and Benefits
Priors and Class Weights
Classification Algorithms
6
Anomaly Detection
About Anomaly Detection
One-Class Classification
Anomaly Detection for Single-Class Data
Anomaly Detection for Finding Outliers
Anomaly Detection Algorithm
7
Clustering
About Clustering
How are Clusters Computed?
Scoring New Data
Hierarchical Clustering
Rules
Support and Confidence
Evaluating a Clustering Model
Clustering Algorithms
8
Association
About Association
Association Rules
Market-Basket Analysis
Association Rules and eCommerce
Transactional Data
Association Algorithm
9
Feature Selection and Extraction
Finding the Best Attributes
About Feature Selection and Attribute Importance
Attribute Importance and Scoring
About Feature Extraction
Feature Extraction and Scoring
Algorithms for Attribute Importance and Feature Extraction
Part III Algorithms
10
Apriori
About Apriori
Association Rules and Frequent Itemsets
Antecedent and Consequent
Confidence
Data Preparation for Apriori
Native Transactional Data and Star Schemas
Items and Collections
Sparse Data
Calculating Association Rules
Itemsets
Frequent Itemsets
Example: Calculating Rules from Frequent Itemsets
Aggregates
Example: Calculating Aggregates
Including and Excluding Rules
Performance Impact for Aggregates
Evaluating Association Rules
Support
Minimum Support Count
Confidence
Reverse Confidence
Lift
11
Decision Tree
About Decision Tree
Decision Tree Rules
Confidence and Support
Advantages of Decision Trees
XML for Decision Tree Models
Growing a Decision Tree
Splitting
Cost Matrix
Preventing Over-Fitting
Tuning the Decision Tree Algorithm
Data Preparation for Decision Tree
12
Expectation Maximization
About Expectation Maximization
Expectation Step and Maximization Step
Probability Density Estimation
Algorithm Enhancements
Scalability
High Dimensionality
Number of Components
Parameter Initialization
From Components to Clusters
Configuring the Algorithm
Data Preparation for Expectation Maximization
13
Explicit Semantic Analysis
About Explicit Semantic Analysis
Scoring with ESA
Scoring Large ESA Models
ESA for Text Mining
Data Preparation for ESA
14
Generalized Linear Models
About Generalized Linear Models
GLM in Oracle Data Mining
Interpretability and Transparency
Wide Data
Confidence Bounds
Ridge Regression
Configuring Ridge Regression
Ridge and Confidence Bounds
Ridge and Data Preparation
Scalable Feature Selection
Feature Selection
Configuring Feature Selection
Feature Selection and Ridge Regression
Feature Generation
Configuring Feature Generation
Tuning and Diagnostics for GLM
Build Settings
Diagnostics
Coefficient Statistics
Global Model Statistics
Row Diagnostics
Data Preparation for GLM
Data Preparation for Linear Regression
Data Preparation for Logistic Regression
Missing Values
Linear Regression
Coefficient Statistics for Linear Regression
Global Model Statistics for Linear Regression
Row Diagnostics for Linear Regression
Logistic Regression
Reference Class
Class Weights
Coefficient Statistics for Logistic Regression
Global Model Statistics for Logistic Regression
Row Diagnostics for Logistic Regression
15
k
-Means
About
k
-Means
Oracle Data Mining Enhanced
k
-Means
Centroid
k
-Means Algorithm Configuration
Data Preparation for
k
-Means
16
Minimum Description Length
About MDL
Compression and Entropy
Values of a Random Variable: Statistical Distribution
Values of a Random Variable: Significant Predictors
Total Entropy
Model Size
Model Selection
The MDL Metric
Data Preparation for MDL
17
Naive Bayes
About Naive Bayes
Advantages of Naive Bayes
Tuning a Naive Bayes Model
Data Preparation for Naive Bayes
18
Non-Negative Matrix Factorization
About NMF
Matrix Factorization
Scoring with NMF
Text Mining with NMF
Tuning the NMF Algorithm
Data Preparation for NMF
19
O-Cluster
About O-Cluster
Partitioning Strategy
Partitioning Numerical Attributes
Partitioning Categorical Attributes
Active Sampling
Process Flow
Scoring
Tuning the O-Cluster Algorithm
Data Preparation for O-Cluster
User-Specified Data Preparation for O-Cluster
20
Singular Value Decomposition
About Singular Value Decomposition
Matrix Manipulation
Low Rank Decomposition
Scalability
Configuring the Algorithm
Model Size
Performance
PCA scoring
Data Preparation for SVD
21
Support Vector Machines
About Support Vector Machines
Advantages of SVM
Advantages of SVM in Oracle Data Mining
Usability
Scalability
Kernel-Based Learning
Tuning an SVM Model
Data Preparation for SVM
Normalization
SVM and Automatic Data Preparation
SVM Classification
Class Weights
One-Class SVM
SVM Regression
Glossary
Index
Scripting on this page enhances content navigation, but does not change the content in any way.