Loading documents preview...
Sales Data Warehouse Advisor : Dr. Irwin Levinstein Presented By : Kalyan Yadavalli
Contents
Mission of the project Need For a Data Warehouse Data Warehouse - Overview Sales Data Warehouse Conclusion
Mission of the project
The mission of this project is to provide strategic and tactical support to the Marketing-Sales and Advertising departments of a media company through the acquisition and analysis of data pertaining to their customers and markets.
This project helps to identify areas of readership and marketing through creation of a Data Warehouse that will provide a media company with a better understanding of its customers and markets.
Need For a Data Warehouse
To provide an environment where a relatively small amount of knowledge of the technical aspects of database technology is required to write and maintain queries and reports . To provide a means to speed up the writing and maintaining of queries and reports by technical personnel. For example, a query that requests the total sales income and quantity sold for a range of products in a specific geographical region for a specific time period can typically be answered in a few seconds or less regardless of how many hundreds of millions of rows of data are stored in the data warehouse database.
To make it easier, on a regular basis, to query and report data from multiple transaction processing systems ,external data sources for querying or reporting purposes. To prevent persons who only need to query and report transaction processing system data from having any access whatsoever to transaction processing system databases and logic used to maintain those databases.
Data Warehouse – Overview
A data warehouse is a copy of data combined from different data sources specifically structured for querying and reporting. Data warehouses support business decisions by collecting, consolidating, and organizing data for reporting and analysis with tools such as online analytical processing (OLAP) and data mining.
Dimensional Modeling VS Entity-Relationship Modeling
An OLTP system requires a normalized structure to minimize redundancy, provide validation of input data, and support a high volume of fast transactions. A transaction usually involves a single business event, such as placing an order or posting an invoice payment. An OLTP model often looks like a spider web of hundreds or even thousands of related tables. In contrast, a typical dimensional model uses a star design that is easy to understand and relate to business needs, supports simplified business queries, and provides superior query performance by minimizing table joins.
Dimensional Modeling VS Entity-Relationship Modeling
This project uses Dimensional modeling, which is the name of the logical design technique often used for data warehouses. It is different from entity-relationship modeling.
Entity relationship modeling is a logical design technique that seeks to eliminate data redundancy while Dimensional modeling seeks to present data in a standard framework that is intuitive and allows for high-performance access.
For example, a query that requests the total sales income and quantity sold for a range of products in a specific geographical region for a specific time period can typically be answered in a few seconds or less regardless of how many hundreds of millions of rows of data are stored in the data warehouse database.
Entity–Relationship Modeling Customer
CustomerSubscriptions
Payment
Demographics
Salesperson
SalesConditions
Campaign Offer
Channel
Carrier Master
Campaign History District City
Zones
Carrier History
Dimensional Modeling Dimensions
Fact Table
Dimensions
Subscription Sales Customer
Payment
Campaign
Route
EffectiveDateKey CustomerKey SubscriptionsKey PaymentKey CampaignKey SalesPersonKey RouteKey Demographics Key UnitsSold DollarsSold DiscountCost PremiumCost
Date
Subscriptions
Salesperson
Demographics
Kimball- Dimensional life cycle diagram
Technical Architecture Design
Product Selection & Installation
Dimensional Modeling
Physical Design
Business Project Planning
Requirement
Data Staging Design & Development
Definition
End-User Application Specification
End-User Application Development
Project Management
Deployment
Maintenance and Growth
Sales Data Warehouse
Business Users Requirements Technical Architecture Product [ Software] Selection Dimensional Modeling Logical Design Data Staging Design & Development Building Data Cube using SQL Analysis Services End User Application Specification & Development Deployment Maintenance & Growth
Requirements Gathering This phase involves the following steps: Collect some business questions the users want an answer for. Gather details/requirements from the business users Get user sign off on the business questions. Business Questions: Can we profile our "best subscribers" to pull lists of "like" non-subscribers that we could touch in some way? Who exists in the marketplace and have we touched them? Can we build a loyalty model based on a subscriber's payment history?
Sales Data Warehouse High Level Technical Architecture
Source Systems
Marketing/Sales Data
Extract
Data Staging Area
Presentation Area
Services Transform from source to target. Maintain conformed dimensions. Load
Demographics
Name Phone Data
Extract
Extract
Data Storage Flat files or relational tables Design Goals Staging throughput. Integrity and consistency.
Subscription Sales Dimensional. Atomic and summary data. Business process. Design Goals Ease-of-use. Query performance. Dimensional Bus: Conformed facts and dimensions
Data Access Tools
SQL Reporting Services Excel Access
Access
Product Selection Hardware Specs: AMD Opteron Processor 252 2.6 GHz, 3.83 GB RAM Operating System: Windows Server 2003 Software Specs: Kimball Data Warehouse Tool [Create staging and production databases] Microsoft ® SQL Server™ 2000 [ETL { Extract Transform Load} ] Microsoft ® SQL Server™ 2005 Integration Services [ Nightly Automation] Microsoft ® SQL Server™ 2005 Analysis Services [ Create OLAP Data Cube] Microsoft ® SQL Server™ 2005 Reporting Services[ End User Reports] Internet Information Services [ IIS 6.0] [ Web Server to Host the Reports]
Dimensional Modeling
Design Dimensions
Attributes of the dimension Hierarchy in the dimension
Dimensional Bus Matrix Design Fact Tables
Dimension Hierarchy-Subscriptions
Subscription Name
Rate
Rate Year
Rate Area
Rate Type
Discount Category
Service
Term
Publication
Frequency Groups
Term Length Groups
Business Group
Short or Long Term
Dimensional Bus Matrix Dimensions
Date
Demographics
Sales Salesperson Conditio ns
Campaign ……
Subscription Sales ( starts)
X
X
X
X
X
Subscription Tracking
X
X
X
X
Complaints
X
X
X
Stops
X
X
X
Upgrades And Downgrades
X
X
X
Business Processes
X
X
SubscriptionSales Fact Table Design Fact Tables •Choose the Business Process as the fact table •Declare the grain •Choose the dimensions •Choose the facts
EffectiveDateKey CampaignKey SalespersonKey CustomerKey DempgraphicsKey SubscriptionKey ……………….. Grain: Each subscription sold Facts Units Sold Number of Sales (=1) Dollars Sold Discount Cost Premium Cost
Dimension Model: SubscriptionSales
Date Sales Conditions
Customer
Subscription Sales
LoyaltyPayment
Grain: Each subscription sold
Campaign
Measures
Subscr. Units Sold Sales Number of Sales (=1) Dollars Sold (starts) Discount Cost Premium Cost
Demographics
Subscription
Salesperson
Address Route
Logical Design
Fact Table Design
Dimension Table Design
Slowly Changing Dimensions Type 1: The new record replaces the original record. No trace of the old record exists. Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.
Fact Table -SubscriptionSales Column Names
EffectiveDateKe y
Data Type
int
N U L L Ke ? y?
FK TO Dimension
Description
N FK
DimDate
Key of effective date
int
N FK
DimDate
Key of date entered in the system
CustomerKey
int
N FK
DimCustomer
Key of customer
LoyaltyKey
int
N FK
DimLoyalty
Key of loyalty score
PaymentKey
int
N FK
DimPaymentHistory
Key of payment behavior
EnteredDateKey
int
N FK
DimSalesPerson
Key of sales person for the change
CampaignKey
int
N FK
DimCampaign
Key of campaign
SalesConditions Key
int
N FK
DimSalesConditions
Key of sales conditions
SubscriptionKey
int
N FK
DimSubscription
Key of subscription
dimPerson
Key of person on the subscription
SalesPersonKey
PersonKey int
N FK
Dimension Table - CustomerSubscriptions Column Names
Data Type
N U L L ?
Key?
FK TO Dimension
Description
CustomerSubscriptionK ey
int
N
PK ID
Surrogate Primary Key
AddressNum
int
Y
BK
Business key of the subscription summary record address
SubscriptionNum
int
Y
BK
BusinessKey
int
Y
BK
CustomerID
int
N
BillingMethod
int
OriginalStartDateKey
int
FK
StartDateKey
int
StopDateKey
ExpireDateKey
`
Slowly changin g dimensio n type
Business key of the numbered subscription at the address
Concatenated business key
Unique identifier for this customer
1
The method of delivery for the customers bill
2
DimDate
The earliest start date on record for this customer
1
FK
DimDate
The start date of this customer’s current subscription
2
int
FK
DimDate
The most recent stop date for this customer
2
int
FK
DimDate
The expiration date of this customer’s current or most recent subscription
2
Data Staging The following are the sub processes of Data Staging process. Extracting : Reading and understanding the source data, copying the parts that are needed to the staging area. Transforming: Possible transformation steps in the data staging area
Cleaning the data – correct misspellings, deal with missing data elements, parsing into standard formats. Purging selected data which is not required Combining data sources, by matching exactly on key values or performing fuzzy matches on non-key attributes. Creating surrogate keys for each dimensional record. Building aggregates to boost performance of common queries.
Loading – Loading the transformed data into the production database.
Data Staging -ETL Architecture Source Schema Creates the tables for the common source database
Source Queries Creates views (queries) that feed data to production
Common Source Database
Kimball Data Modeling tool Creates the staging and production database tables and metadata.
Staging Database
Setup files on SQL server that is running the DTS packages.
Database Configuration File Database connection information for the ETL process
Dates Configuration File Important date info for the ETL process
DTS Data Transformation Services Package
Production Data Warehouse Database
Data Staging –Development for Sales Data Warehouse The ETL packages perform the following work. Extract the full sets of dimension rows Most transformation logic occurs in the extract query, using SQL Extracted rows are stored in a staging table until the ETL package is run. There are steps for the staged rows to be fixed up, via SQL statements. There’s a statement for deleting bogus rows, and a separate statement for updating rows. Find rows that are new; insert them into the target table. Use a checksum to find rows that have seen a Type 1 change. Update the appropriate columns in the target table. Use a checksum to find rows that have seen a Type 2 change. Propagate a new dimension row. Log the number of rows extracted, staged, deleted and updated from the staging tables, inserted into target, Type 1 and Type 2 rows updated in target.
Dimension-Customer Subscription
Building Data Cube using SQL Analysis Services
SQL Server Analysis Services 2005 provides tools for developing OLAP applications
OLAP [ Online Anlytical Processing ] organizes data warehouse data into multidimensional cubes based on the dimensional model, and then preprocesses these cubes to provide maximum performance for queries that summarize data in various ways.
Build the cube using SQL Analysis Services and deploy it to SQL Analysis Services Server.
End User Application Development Reporting Services - Uses the Analysis Services Data Cube as Data Source. Sample Report Screenshots
Solicitor Sales Sales Type
Sales Channel
Sales Agent
Number Of Sales
Cost Per Unit
Retention
Carrier Sales
2
$0.00
50.0%
124
$0.13
13.7%
2005 THISISHAMPRDS FREE2WKSAMP
8
$0.00
12.5%
AD CONTRACT START
8
$0.17
62.5%
1,914
$0.04
43.5%
7
$0.10
71.4%
COLLECTIONS TEAM
15
$0.14
73.3%
COOLSAVINGS.COM
9,415
$21.55
7.1%
CUSTOMER SERVICE PROMO STARTS
1,934
$0.15
73.7%
DATA ENTRY STARTS
49
$0.14
59.2%
Carrier Sales Carrier Sales
DM Sales DM Sales DM Sales NonSolicited Other
ALLCONNECT CAN'T AFFORD DM "2 WKS FREE"
Sales Agent Details Sales Agent
Source Name
AD CONTRACT START
ULTIMATE TAN OF SMITHFIELD
AD CONTRACT START
Phone
Address
St
City
ZI P
# Sale s
Units Sold
$ Sold
Discou nt Cost
Premiu m Cost
(757) 365-9400
13412 BENNS CHURCH BLVD
VA
SMITHF IELD
234 30
1
260
$30.9 9
$37.91
$5.00
CHOREY & ASSOCIATE
(757) 539-7451
330 W CONSTANCE RD # 100
VA
SUFFOL K
234 34
1
260
$30.9 9
$37.91
$5.00
AD CONTRACT START
CHOREY AND ASSOCIATE
(757) 539-7454
804 W WASHINGTON ST
VA
SUFFOL K
234 34
1
260
$30.9 9
$37.91
$5.00
AD CONTRACT START
VIRGINIA STAGE CO
0
254 GRANBY ST
VA
NORFO LK
235 10
1
260
$30.9 9
$37.91
$5.00
AD CONTRACT START
SPINE & ORTHAPEDIC CTR, PC
0
6160 KEMPSVILLE CIR # 303A
VA
NORFO LK
235 02
1
260
$30.9 9
$37.91
$5.00
AD CONTRACT START
COUNTRYWID E HOME LOAN
0
3000 WOODLAWN DR
VA
SUFFOL K
234 34
1
260
$30.9 9
$37.91
$5.00
AD CONTRACT START
WHITE, E.D.
0
730 10TH ST
VA
VIRGINI A BEACH
234 51
1
260
$30.9 9
$37.91
$5.00
AD CONTRACT START
PERMANENT COATING SOLUTIONS IN
(757) 539-4366
434 N MAIN ST # D
VA
SUFFOL K
234 34
1
260
$30.9 9
$37.91
$5.00
8
2,080
$247. 94
$303.26
$40.00
Deployment Deploy the Reports to SQL Reporting Services 2005 server. Give Access to the users to view the reports. Desktop Installation – Dot Net Framework 2.0 For access to Report Builder.
Maintenance & Growth Training the End Users. Automated Nightly Updates to Data Warehouse.
Conclusion The reports generated from the data warehouse answered the following questions collected form the business users during the requirement gathering phase of the project. Identify their best customers/loyal customers [ customer subscriptions /subscription sales] Non-subscribers who can be reached Contact history of customers in market place [ Demographic data]
Benefits to Marketing Increased telemarketing close rates and increased direct mail response rates Reduced cost and use of outside telemarketing services and reduced print and mailing costs Identification of new product bundling and distribution opportunities Increased acquisition and retention rates, and reduced cost of acquisitions
Benefits to Advertising An increase in the annual rate of revenue growth. Increase in new advertisers Improved targeting capabilities
.
QUESTIONS ?