Resources for BSC(IT) STUDENTS

Tybsc (IT) SEM - 6

Data warehousing

Unit-I

History of data warehousing:
Database management system, personal computers and 4GL technology, spider web environment, evolution from business perspective, data warehouse environment, what is datawarehouse? , integrating data, volumes of data, different development approach, evolution to DW 2.0 environment, business impact of the data warehouse, components of datawarehouse environment, evolution of data warehouse from the business perspective, other notions about data warehouse, federated data warehouse, star
schema, data mart.

Introduction:
Lifecycle of data warehouse, reasons for different sectors, metadata, Access of data, structures data/ unstructured data, Textual analysis, blather, issue of terminology, specific text, metadata – a major component, local metadata, changing business requirements, flow of data within DW 2.0, volumes,
useful applications, DW 2.0 and referential integrity, reporting in DW 2.0

DW components:
Interactive sector, integrated sector, Near Line sector, Archival sector.

Unit-II

Metadata in DW:
Reusability of data analysis, Metadata, Active/ passive repository, enterprise metadata, metadata and the system record, Taxonomy, Internal and external taxonomy, metadata in archival sector, maintaining metadata, using metadata – an example, end user perspective.

Methodology and Approach for DW:
Spiral model methodology, seven streams approach, enterprise reference model, enterprise knowledge coordination stream, information factory development stream, Data correction stream, infrastructure stream, Total information quality management stream.

Statistical processing and DW:
Two types of transaction, statistical analysis, integrity of comparison, heuristic analysis, freezing data, exploration processing, frequency of analysis, exploration facility, sources for exploration processing, refreshing exploration data, project based data, Data marts and exploration facility, A backflow of data, using exploration data internally, perspective of business analyst.

Unit-III

Data models and DW:
Datamodel and business, scope of integration, making the distinction between granular and summarized data, levels of the data model, data models and interactive sector, corporate data model, transformation of models,
data models and unstructured data, perspective of business user.

Monitoring the DW environment:
Monitoring DW environment, transaction monitor, monitoring data quality, datawarehouse monitor, transaction monitor, peak period processing, ETL data quality monitor, Dormant data.

DW and security:
Protecting access to data, encryption, drawbacks, firewall, moving data offline, limiting encryption, direct dump, datawarehouse monitor, sensing an attack, security for near line data.

Unit-IV

Time variant data:
All data in DW, Time relativity in the interactive sector, data relativity elsewhere in DW, Transactions in integrated sector, discrete data, continuous time span data, a sequence of records, nonoverlapping records, beginning and ending a sequence of records, continuity of data, Time-collapsed data, time variance in the archival sector

Flow of data in DW:
flow of data throughout the architecture, entering the interactive sector, role of ETL, data flow into integrated sector, near line, archival sector, falling probability of data access, exception flow of data.

ETL processing and DW:
Changing states of data, Where ETL fits, application data to corporate data, ETL in online mode and batch mode, source and target, ETL mapping, more complex transformation, ETL and throughput, ETL and metadata, ETL and an audit trail, ETL and data quality, creating ETL, code creation or parametrically driven ETL, ETL and rejects, changed data capture, ETL and rejects, Changed data capture, ELT

Unit-V

DW and granularity manager:
granularity manager, raising the level of granularity, filtering data, functions of the granularity manager, homegrown versus third party granularity manager, parallelizing the granularity manager.

DW and performance:
Online response time, analytical response time, flow of data, Queues, heuristic processing, analytical productivity and response time, many facets to performance, indexing, removing dormant data, end user education, monitoring the environment, capacity planning, metadata, batch parallelization, parallelization for transaction processing, workload management, data marts, exploration facilities, separation of transactions into classes, service level agreements, protecting the interactive sector, partitioning data, choosing the proper hardware, separating farmers and explorers, physically group data.

Migration:
Migration in perfect world, adding components incrementally, adding archival sector, creating enterprise metadata, building the metadata infrastructure, swallowing source system, ETL as shock absorber, migration to the unstructured environment.

Unit-VI

Implementation And Maintenance:
Physical design process, data warehouse deployment, growth and maintenance.

Books

DW2.0 The architecture for Next Generation of Datawarehousing W.H. Inmon, Derek Strauss, Genia Neushloss, ELSEVIER. (Unit I to V)

Paulraj Ponnian, “Data Warehousing Fundamentals”, John Wiley. (Unit VI)

Practical List:

1.            Create around 25 records and design the tabular view using Excel. Create around 25 records and design the Chart view using Excel.

2.            Extract the data from excel , access and sql and integrate it in SQL server.

3.            Perform the same process using DTS package

4.            Design the star schema and create a cube using OLAP services

5.            Perform the cube analysis on MOLAP

6.            Perform the cube analysis on ROLAP

7.            Perform the cube analysis on   HOLAP

8. Consider a data warehouse storing data about sales, where the total items sold are stored, organised by customer order and product. Each customer order includes the name of the customer and the date of the order; each product includes a description of the product and its price.

i.   Devise the relational schema (specifying the relations, the attributes, the primary keys, and the foreign keys) of the above data warehouse using the star schema.

ii.   Write a SQL query to answer the following question: ”Which customer(s) made an order containing at least five products with different descriptions?”

iii.   Write a SQL query for the following report: ”W hich customer(s) made the largest order (i.e., those that would result in the largest bill)?”

iv.   Consider to add a new level product categories to the product dimension. Devise the new relational star schema, and write a SQL query for the following report: ”Select the total number of products sold per product category”.

9.            Design at least five queries for the created cube using MDX application.

10. Retrieve the cube data into the excel sheet and present the information in tabular and graphical form.

Resources for BSC(IT) STUDENTS - BY ANKUR MISHRA

Friday, 3 May 2013