Thursday 11 October 2012

Data warehousing FAQs


Data warehousing FAQs

WHAT ARE CHARACTERISTICS OF DATA IN DATA WAREHOUSE?

Ans: 1) Subject Oriented: Data warehouses are designed to help you analyze data.
 For example, to learn more about your company’s sales data, you can build a
Warehouse that concentrates on sales. Using this warehouse, you can answer
Questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case makes the data warehouse subject oriented.
     2) Integrated: Integration is closely related to subject orientation.
 Data warehouses must put data from disparate sources into a consistent format.
They must resolve such problems as naming conflicts and inconsistencies
Among units of measure. When they achieve this, they are said to be integrated.
     3) Nonvolatile: Nonvolatile means that, once entered into the warehouse, data
Should not change.This is logical because the purpose of a warehouse is to

     4) TIME variant: time variant means it will maintain history

WHAT IS DATA MODELING? WHAT ARE DIFFERENT STEPS FOR DATA MODELING?


Ans: 1) data modeling represent information in the entities, tributes and relationships.
     2) Visual representation of the information

Different steps for data modeling
     1) Define the problem and scope of the problem.
     2) Information gathering.
     3) Analysis (normalization)
     4) Create a logical data model (independent of platform).         
     5) Decision about physical platform like oracle or SQL etc.
     6) Create a physical data model which is platform specific.
     7) Database creation.

WHAT ARE DIFFERENT TYPES OF RELATIONSHIPS?

Ans: one to one, one to many, many to one, many to many.


What are non-additive facts? What are non-additive facts in detail?

A. Fact tables in which data in columns data cannot be aggregated or calculations cannot be possible for producing the expected results or known as non-additive facts.
A. A fact may be measure, metric or a dollar value. Measure and metric
Are non-additive facts.
Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount.
A non-additive fact, for e.g. measure height(s) for 'citizens by
geographical location' , when we rollup 'city
a ta to 'state' level data
we should not add heights of the citizens rather we may want to use it
to derive 'count'
Q. What is a fact less fact table? Where you have used them?
A. Fact less table means which doesn't have measures.
Used only to put relation between the elements of various dimensions.


Q. What is the difference between ODS and OLTP
A. ODS: - It is nothing but a collection of tables created in the Data
warehouse that maintains only current data whereas OLTP maintains
the data only for transactions, these are designed for recording daily
operations and transactions of a business.
Q. What are aggregate table and aggregate fact table?
A. Aggregate table contains summarized data. The materialized views
Are aggregated tables.
For ex in sales we have only date transaction .If we want to create a
report like sales by product per year. In such cases we aggregate the
date vales into week_agg, month_agg, quarter_agg, year_agg. To
retrieve date from these tables we use @aggregate function
.
Q: What is the difference between View and Materialized View?
A: Data of a Materialized View is saved in a physical table, so data
access is fast due to direct access to the table. View will perform join
on tables based on the query every time it is referred
.







Q. Explain in detail about type 1, type 2(SCD), type 3?
A. Type 1: overwrite data.
Type 2: current, recent and history data should be there.
Type 3: current and recent data should be there
Q: What is the difference between OLAP, ROLAP, MOLAP and HOLAP?

A: On Line Analytical Processing (OLAP), Relational OLAP (use RDBMS),
Multi-dimensional OLAP (cube), Hybrid OLAP (ROLAP+MOLAP).
ROLAP stands for Relational OLAP. Users see their data organized
In cubes with dimensions, but the data is really stored in a
Relational Database (RDBMS).
MOLAP stands for Multidimensional OLAP. Users see their data
Organized in cubes with dimensions, but the data is store in a
Multi-dimensional database (MDBMS) like Oracle Express Server.
In a MOLAP system lot of queries have a finite answer and
performance is usually critical and fast.
HOLAP stands for Hybrid OLAP; it is a combination of both worlds.
Seagate Software'sHolos is an example HOLAP environment. In a
HOLAP system one will find queries on aggregated data as well
as on detailed data.
Q. What is a Conformed Dimension? What is a Conformed Fact?
When the enterprise decides to create a set of common labels across
all the sources of data, the separate data mart teams (or, single
centralized team) must sit down to create master dimensions that
everyone will use for every data source. These master dimensions are
called Conformed Dimensions.
Two dimensions are conformed if the fields that you use as row
headers have the same domain.
If the definitions of measurements (facts) are highly consistent, we call Them as Conformed Facts.


Q. What are the 3 important fundamental themes in a data warehouse?

The 3 most important fundamental themes are:
1. Drilling Down
2. Drilling Across and
3. Handling Time


Q. What is the necessity of having surrogate keys?

Production may reuse keys that it has purged but that you are
Still maintaining
Production might legitimately overwrite some part of a
Product description or a customer description with new values but not change the product key or the customer key to a new value. We might be wondering what to do about the revised attribute values (slowly changing dimension crisis)
Production may generalize its key format to handle some new
Situation in the transaction system. E.g. changing the
production keys from integers to alphanumeric or may have
12-byte keys you are used to have become 20-byte keys
Acquisition of companies

Q. What is a Star Schema?
A star schema is a set of tables comprised of a single, central fact table
surrounded by de-normalized dimensions. Each dimension is
represented in a single table. Star schema implement dimensional data
structures with de- normalized dimensions. Snowflake schema is an
alternative to star schema. A relational database schema for
representing multidimensional data. The data is stored in a central fact
table, with one or more tables holding information on each dimension.
Dimensions have levels, and all levels are usually shown as columns in
each dimension table.


Q. What is a Snowflake Schema?
A snowflake schema is a set of tables comprised of a single, central
fact table surrounded by normalized dimension hierarchies. Each
dimension level is represented in a table. Snowflake schema
implements dimensional data structures with fully normalized
dimensions. Star schema is an alternative to snowflake schema.
An example would be to break down the Time dimension and create tables for each level; years, quarters, months; weeks, days… These additional branches on the ERD create more of a Snowflake shape then STAR



Q. What is the difference between OLTP and OLAP?

OLAP - Online Analytical processing, mainly required for DSS, data is in de-normalized manner and mainly used for nonvolatile data, highly
indexed, improve query response time
OLTP - Transactional Processing - DML, highly normalized to reduce
Deadlock & increase concurrency

Q. What is BI? And why do we need BI?

Business Intelligence, it is an ongoing process of various integration
Packages to analyze data.

Q. What are the important fields in a recommended Time dimension table?
Time_key
Day_of_week
Day_number_in_month
Day_number_overall
Month
Month_number_overall
Quarter
Fiscal_period
Season
Holiday_flag
Weekday_flag
Last_day_in_month_flag
Q. Which approach is better and why? Loading data from
Data marts to data warehouse or vice versa?

A. Best way is to move data from ODS --> Data warehouse to Data
Marts.
Q: What is ETL?
A: ETL is the Data Warehouse acquisition processes of Extracting (E),
Transforming or Transporting (T) and Loading (L) data from source
systems into the data warehouse
.










No comments:

Post a Comment