Glossary of Terms



Word
Definition
Algorithm A specific technique or procedure for producing a data mining model. An algorithm uses a specific model representation and may support one or more functional areas. Examples of algorithms used by ODM include Naive Bayes, Adaptive Bayes Networks, and Support Vector Machine for classification, Support Vector Machine for regression, K-Means and O-Cluster for clustering, MDL for attribute importance, and Apriori for association models.
Analytic workspace An analytic workspace stores data in a multidimensional format where it can be manipulated by the OLAP engine. The analytic workspaces are stored in tables in a relational schema.
Apprehension Rate Apprehensions as a percentage of number of incidents identified.
Approximation Approximation is a data mining function for predicting continuous target values for new records
using a model built from records with known target values. ODM supports the Support Vector Machine algorithm for regression. Approximation is another word for regression.
Attribute In the Java interface, an instance of Attribute maps to a column with a name and data type. The attribute corresponds to a column in a database table. When assigned to a column, the column must have a compatible data type; if the data type is not compatible, a runtime exception is likely. Attributes are also called variables, features, data fields, or table columns.
Attribute importance A measure of the importance of an attribute in predicting a specified target. The measure of different attributes of a build data table enables users to select the attributes that are found to be most relevant to a mining model. A smaller set of attributes results in a faster model build; the resulting model could be more accurate. ODM uses the minimum description length principle to discover important attributes. Sometimes referred to as feature selection and key fields.
Backroom inventory Product located at the store (in the warehouse or a storage area), but not offered for sale.
Binning Binning, also known as discretization, means grouping related values together, thus reducing the number of distinct values for an attribute.
Bottom products Products that had the lowest product margins and sold the least number of units within its respective category.
Business area A business area is a set of folders containing related information with a common business purpose. For example, information about all products may be sorted in one business area, whereas information about all customers or employees is stored in another business area.
Campaign A grouping of individual promotions that are designed and executed for promoting the sale of one or more items.
Cash short/over The difference between what is counted in the bill versus the total of the transactions by Tender Class or grouping.
Churn rate Churn rate is also sometimes called attrition rate. It is one of two primary factors that determine the steady-state level of customers a business will support. Churn rate, as applied to a customer base, refers to the proportion of contractual customers or subscribers who leave a supplier during a given time period.
Cost matrix A cost matrix is a two-dimensional, n by n table that defines the cost associated with a prediction versus the actual value. A cost matrix is typically used in classification models, where n is the number of distinct values in the target, and the columns and rows are labeled with target values. The rows are the actual values; the columns are the predicted values.
Cross sell Products that sell together as a result of a marketing or sales effort. Generally, this is a complementary item to the intended product sale. An example of a complementary item might be sea salt when tequila and lime juice are purchased.

Cube

A cube in a multidimensional data source has the following components: measures and dimensions. The cube contains a measure value for each possible combination of the different dimensions.

Customer target segment User-defined customer grouping that is defined by list generation. An example of customer target segment might be "yuppie", where the criteria are specific to age, income, and education.

CWM (Common Warehouse Model)

An integration approach for data warehousing, incorporating both technical and business metadata into a single model concentrating on the needs of data warehousing and decision support.

Data mining

Data mining enables companies to extract information efficiently from the very largest databases and build integrated business intelligence applications by finding patterns and insights hidden in the data. Data mining allows application developers to quickly automate the extraction and distribution of new business intelligence throughout the organization through the use of predictions, patterns and discoveries.

Oracle Data Mining (ODM) supports functionality in Oracle Database 10g for the following data mining problems: classification, prediction, regression, clustering, associations, attribute importance, feature extraction and sequence similarity searches and analysis (BLAST). All model-building, scoring, and metadata management operations are accessed via the Oracle Data Mining Client and either a PL/SQL or Java-based API and occur entirely within the relational database.

Denormalized data

Denormalized data is planned redundant data posted from one object to another for performance considerations.

Dimension

A dimension is the textual descriptions of the business. Dimensions provide perspective regarding the "whys" and "hows" of the business and element transactions, for example, product, customer, and time dimensions.

Dimension attribute

A dimension attribute describes a characteristic that is shared by dimension members. Dimension attributes enable you to select data based on similar characteristics. For example, a Product dimension might have a Color attribute that enables you to search for all red products.

Dimension hierarchy

A dimension hierarchy describes a hierarchical relationship between two or more dimension members. Individual dimension members might be related to each other in a hierarchical way. For example, a specific day belongs to a particular month, which in turn is within a particular year. To reflect such relationships, dimension members are organized into dimension hierarchies. A dimension hierarchy is a logical structure that uses ordered levels as a means of organizing and aggregating data. For example, the Time dimension might have a hierarchy to aggregate data from the Month level to the Quarter level to the Year level. A dimension can have more than one hierarchy. For example, as well as the Month-Quarter-Year dimension hierarchy, the Time dimension might also have a Day-Month-Year dimension hierarchy. Note that where multiple dimension hierarchies exist for the same dimension, one dimension hierarchy must be specified as the default hierarchy.

Dimension measure

Measures have dimensions that categorize the data in the measure. For example, a Sales measure might have Product, Time, and Geography as its dimensions. When a measure has a particular dimension, the measure is said to be dimensioned by that dimension. For example, Sales is dimensioned by Product. The group of dimensions for a measure constitutes the dimensionality of that measure. For example, the dimensionality of Sales is Product, Time, and Geography. Each element in a dimension is a dimension member. For example, January 2001, February 2001, March 2001, Quarter 1 2001, and the year 2001 are likely members of the Time dimension.

Drilling Drilling enables you to view different levels of data by varying the amount of detail. By drilling up or down, you view less or more of the worksheet data.
EAS Electronic article surveillance (EAS) is a proven loss prevention technique that protects assets, using security tags and EAS detection equipment. EAS systems provide security for buildings, entrances, exits, and so forth by setting off an alarm when the tag passes through the equipment without previously being deactivated.
EDI Electronic Data Interchange (EDI) is the standard for electronic exchange of information between trading partners. The standards cover many transactions such as purchase orders, invoices, electronic payment, and so forth.
End-cap Promotion where the product is displayed on the end of a store aisle.
Enterprise intelligence Enterprise intelligence consists of the analysis performed by retailers to effectively manage and plan operations around the various retail lines of business (LOB’s).
EPC Electronic Product Code (EPC) is a unique number that identifies a specific item moving within the supply chain. EPC is the next-generation barcode that no longer requires line-of-sight scanning.

EPM

Enterprise Performance Management (EPM) is the next generation of business intelligence. A corporate culture embraced by managers at all levels, EPM provides an infrastructure that crosses all disciplines within an organization, including sales, marketing, production, human resources (relating to staffing), and so forth. Reiterative planning, forecasting, and a clear course of corrective action drives performance improvements in all aspects of the business, ultimately leading to better decisions regarding supply chain, customer relationship management CRM), reduction of costs, and so forth.

ERD An Entity Relationship Diagram (ERD) is a data modeling tool that assists you with building a graphical representation of your enterprise's data storage and organization needs. ERDs provide a visual representation of how your organization captures its data not only for day-to-day business requirements and processing, but also for reporting and analysis to make the business more profitable.
ETL (extraction, transformation, and loading) ETL is the process of obtaining data from one data store or source (extract), modifying it (transform), and inserting it into a different data store (load).

Fact

A fact contains a numeric value that measures an aspect of the business. Typical examples are gross sales dollars, total cost, profit, margin dollars, or quantity sold. A fact (or measure) can be additive or partially additive across dimensions.

Feature A feature is a combination of attributes in the data that is of special interest and that captures important characteristics of the data.
GTIN

Global Trade Item Number (GTIN) is an identifier for trade items developed by GS1 (comprising the former EAN International and Uniform Code Council).

Householding

Householding is the process of matching customers who belong to the same household, usually identified by the same address. Customer names are not merged; rather they are linked to the address that is stored once. The benefit of householding is the improved ability to understand and target customers.

Inheritance Also known as transference, inheritance is the process by which redundant data is posted from one object to another for performance considerations.

Integrated

Integrated data is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

Item class Item classes are groups of items that share some similar properties. Discoverer uses item classes to implement the following features: Lists of Values (LOVs) alternative sorts, and drill to detail links.
Inventory turns Measure of inventory velocity at a store. For example: period total sales / average stock on hand.
Invoice match rule Measure of the accuracy of the vendor’s invoice versus the retailer’s purchase order. An accurate invoice occurs when the universal product codes (UPC) ordered equals the UPCs invoiced and the total value of the receipt with the UPCs matches the total value of the PO.

KPIs

Key Performance Indicators (KPIs) are high-level snapshots of a business or organization based on specific predefined measures. KPIs typically consist of any combination of reports, spreadsheets, or charts. They may include global or regional sales figures and trends over time, or anything else that is deemed critical to a corporation's success.

Lift Lift is a measure of how much better prediction results are using a model than could be obtained by chance. For example, suppose that 2% of the customers mailed the Wireless plan without the rate per call would make a purchase. However, with the rate per call, 10% would make a purchase. Then the lift is 10/2 or 5. Lift may also be used as a measure to compare different data mining models. Since lift is computed using a data table with actual outcomes, lift compares how well a model performs with respect to this data on predicted outcomes. Lift indicates how well the model improved the predictions over a random selection given actual results. Lift allows a user to infer how a model will perform on new data.
Loss prevention Loss prevention is the methodology a retail business employs to curb physical loss of property (from earthquakes, floods, hurricanes, and so forth), reduction in inventory (due to theft, damage, spoilage and so forth), and loss of money due to clerical error or theft (employee, customer, or vendor).
Market share The amount of revenue the company generates from an entire market. Totals sales for a product / total sales for the market of the product. The same calculation can be used for category or classification of merchandise.
Marketing channel The specific instance of the media used to advertise the item. For example, if an item is advertised on television, the marketing channel might be NBC.

Materialized view

A materialized view, supported by Oracle 8.1.7 Database (or later), contains preaggregated data. Materialized views are snapshot views that are created when you define summaries by using Oracle Discoverer Administrator. Queries are redirected to the materialized views instead of the large detail tables and improve query performance in Discoverer Plus and Discoverer Viewer. Oracle 8.1.7 Database (or later) automatically recognizes when a materialized view can be used to satisfy a query request. Oracle 8.1.7 Database (or later) rewrites the query to use the materialized view. Queries are then directed to the materialized view and not to the underlying detail tables or views.

Measure

The name given to the data itself. In OLAP metadata, measures represent data that can be examined and analyzed in crosstabs and graphs. Examples include Sales, Cost, and Profit.

Media The mechanism used to execute the promotion, such as television, radio, newspaper, and so forth.
Merchandise management Merchandise management, is the methodology employed by a retail business to manage the commodities offered for sale. It includes analysis, planning, acquisition, handling, and control of the merchandise investments for the retail operation.
Mining model A mining model is the result of building a model from mining function settings (Java interface) or mining settings table (PL/SQL interface). The representation of the model is specific to the algorithm specified by the user or selected by the DMS. A model can be used for direct inspection, e.g., to examine the rules produced from an ABN model or association models, or to score data.
Mining result In the Java interface, the end product(s) of a mining task is the mining result. For example, a build task produces a mining model; a test task produces a test result.
Missing value A missing value is a data value that is missing because it was not measured (that is, has a null value), not answered, was unknown, or was lost. Data mining systems vary in the way they treat missing values. There are several typical ways to treat them: ignore then, omit any records containing missing values, replace missing values with the mode or mean, or infer missing values from existing values. ODM ignores missing values during mining operations.
Model (mining) An important function of data mining is the production of a model. A model can be descriptive or predictive. A descriptive model helps in understanding underlying processes or behavior. For example, an association model describes consumer behavior. A predictive model is an equation or set of rules that makes it possible to predict an unseen or unmeasured value (the dependent variable or output) from other, known values (independent variables or input). The form of the equation or rules is suggested by mining data collected from the process under study. Some training or estimation technique is used to estimate the parameters of the equation or rules.
MOLAP

The Oracle MOLAP (Multidimensional Online Analytical Processing) model is based on Cubes.

  • A Cube logically represents data in a similar way to tables, although the data is actually stored in multidimensional arrays. Like dimension tables, cube dimensions organize members into hierarchies, levels, and attributes.
  • The cube stores the fact data (in measures); the dimensions form the edges of the cube. Measure data may be stored or calculated at query time.
  • Stored measures are loaded and stored at the leaf level. Commonly, there is also a percentage of summary data that is stored. Summary data that is not stored is dynamically aggregated when queried. Calculated measures are measures whose values are calculated dynamically at query time. Only the calculation rules are stored in the database. Common calculations include measures such as ratios, differences, moving totals, and averages. Calculations do not require disk storage space, and they do not extend the processing time required for data maintenance.
Multi-record case Each case in the data is stored as multiple records in a table with columns sequenceID, attribute_name, and value. Multi-record case is also known as transactional format.
Network feature A network feature is a tree-like multi-attribute structure. From the standpoint of the network, features are conditionally independent components. Features contain at least one attribute (the root attribute). Network features are used in the Adaptive Bayes Network algorithm.
Nontransactional format In a nontransactional format, each case in the data is stored as one record (row) in a table. Nontransactional format is also known as single-record case.
Normalization Normalization is the process of eliminating redundant data in your database and ensuring that relationships and dependencies are correctly stated. Typically, when you discuss normalization, you discuss three types: first, second, and third.
Operational intelligence Operational intelligence consists of the analysis performed within each functional organization: store management, merchandise management, supply chain management, CRM, and corporate administration.

Oracle OLAP

Oracle OLAP is a database option, a service, and it contains several APIs that enable open access to MOLAP data and the analytic features of the OLAP calculation engine. Also see: MOLAP

Outlier An outlier is a value that is far outside the normal range in a data set, typically a value that is several standard deviations from the mean. In other words,it is a data value that does not come from the typical population of data --extreme values. In a normal distribution, outliers are typically at least three standard deviations from the mean.

Parallelism

Parallelism is the transparent decomposition and simultaneous execution of multiple operations.

Partition

A partition is a logical subset of data. In most cases, data warehouses are partitioned by some date field.

Physical data In the Java interface, physical data identifies data to be used as input to data mining. Through the use of attribute assignment, attributes of the physical data are mapped to logical attributes of a model’s logical data. The data referenced by a physical data object can be used in model building, model application (scoring), lift computation, statistical analysis, etc.

Pivoting

Pivoting enables you to change the order in which columns appear in a table, or interchange items between axes. By pivoting data, you change the way a report is presented in your worksheet.

PLU Price-Look Up (PLU) codes are assigned to products to make check-out and inventory control easier.
POS Point of Sale (POS) is a transaction that is generated by a store cask register or workstation. This transaction can be initiated from a merchandise sale or product return.
Predictor A predictor is an attribute used as input to a supervised model or algorithm to build a model.
Prior probability The set of prior probabilities specifies the distribution of examples of the various classes in data. Also referred to as priors, these could be different from the distribution observed in the data.
Promotion A marketing activity planned, developed, and executed to generate sales for products and services. Promotion is the lowest level in the campaign hierarchy.

Query optimization

Query optimization is the process by which a database management system decides exactly how a query will execute.

Relational data source

A relational data source is a database in which information is stored in a number of database tables. Each database table comprises several columns, and one or more rows. The different tables in a database can be related. Having data in separate but related tables is an efficient way to store and retrieve information.

Risk management Risk management is the process of measuring, or assessing financial and operational risk, and then developing strategies to control that risk.
ROI (return on investment) Calculation: (Revenue generated – Costs) / Investment. Investment is typically the value of inventory at the acquisition price.
ROLAP ROLAP (Relational Online Analytical Processing) is a two-dimensional table where queries are posed and run without the assistance of cubes providing greater flexibility for drilling down, across, and pivoting results. Each row in the table holds data that pertain to some thing or a portion of some thing. Each column of the table contains data regarding an attribute.
Sales channel Channel through which a product sale was made. Examples of sales channels include: Internet, resellers, call centers, sales team, storefront, and so forth.
Score Scoring data means applying a data mining model to new data to generate predictions.
Shop.org Shop.org is the association for retailers online.
Single-record case In a nontransactional format, each case in the data is stored as one record (row) in a table. Single-record case is also known as nontransactional format.
SKU A stock keeping unit (SKU) is the unit identification (typically the UPC) that is used to track store inventory and sales.
Shrinkage Shrinkage is the difference between actual inventory on-hand and expected amounts tallied from purchase and sales orders.
Snowflake schema The snowflake schema is an extended, more normalized star model. A dimension is said to be snowflaked when the low cardinality fields in the dimension have been removed to separate tables and linked back to the original tables with artificial keys.
Sparse data In ODM, data is said to be sparse if only a small fraction (no more than 20%, often 3% or less) of the attributes are non-zero or non-null for any given case. Sparse data occurs, for example, in market basket problems.
SPIFF

SPIFF or push money is a cash premium, prize, or additional commission for pushing or increasing sales of a particular item or type of merchandise that generally is considered an otherwise hard item to sell, for example an item that is undesirable. SPIFF is generally paid as an incentive to an employee. (Please note there is no literal translation for the acronym.)

Star schema

A star schema is a central table containing fact data, and multiple tables radiating out from it, connected by the primary and foreign keys of the database. Every star schema design is composed of one table called the fact table, and a set of smaller tables called dimension tables. A star schema has denormalized dimensions.

Stoplight formatting

A stoplight format (or traffic light format) enables you to categorize numeric worksheet values as unacceptable, acceptable, and desirable using different colors. The default stoplight format uses the familiar red, yellow, and green color scheme to represent unacceptable, acceptable, and desirable values. (See also conditional formatting .)

Summary folders Folders that contain aggregated queried data, created by the Discoverer Administrator, that have been saved for reuse. The data is stored in the database as summary tables and materialized views.
TPR A temporary price reduction (TPR) is a reduction in price on a product, typically for a promotional event.
Tender Tender is a form of payment for goods or services that are accepted by the Retail Store. Forms of tender include: check, manufacturer's coupon, credit card, and so forth.
Till

A Till is a logical entity denoting the cashier, shift, workstation (register), or lane.

TMF-SID TeleManagement Forum Shared Information Model
Transference Also known as inheritance, transference is the process by which redundant data is posted from one object to another for performance considerations.

Transformations

Transformations are PL/SQL functions, procedures, and packages that enable you to change data.

UCC (Uniform Code Council) GS1 US is the new name for the Uniform Code Council. GS1 US is a family of subsidiaries and partnerships that focuses on leading the establishment and implementation of global standards to drive improvement of the supply and demand chains.
Up-sell The sale of a superior product, generally in quality and price, in lieu of the advertised item.
UPC The universal product code (UPC) is a standard for electronically encoding a set of lines and spaces that can be scanned and interpreted into numbers for product identification.
Value chain Value chain refers to the full range of activities conducted by a business that increases competitive excellence and shareholder value. The activities facilitating these increases include inbound and outbound logistics (receipt, warehousing, and distribution), actual business operations, marketing and sales, and customer service.
Void transaction A transaction that was cancelled after billing was completed.