Data warehousing FAQs
WHAT ARE CHARACTERISTICS OF DATA IN DATA
WAREHOUSE?
Ans: 1) Subject Oriented: Data warehouses are designed to help you
analyze data.
For example, to learn more about your
company’s sales data, you can build a
Warehouse that
concentrates on sales. Using this warehouse, you can answer
Questions like
"Who was our best customer for this item last year?"
This ability to
define a data warehouse by subject matter, sales in this case makes the data
warehouse subject oriented.
2) Integrated:
Integration is closely related to subject orientation.
Data warehouses must put data from disparate
sources into a consistent format.
They must resolve
such problems as naming conflicts and inconsistencies
Among units of
measure. When they achieve this, they are said to be integrated.
3) Nonvolatile:
Nonvolatile means that, once entered into the warehouse, data
Should not
change.This is logical because the purpose of a warehouse is to
4) TIME variant:
time variant means it will maintain history
WHAT IS DATA MODELING? WHAT ARE DIFFERENT STEPS FOR DATA MODELING?
Ans: 1) data
modeling represent information in the entities, tributes and relationships.
2) Visual representation of the
information
Different steps
for data modeling
1) Define the problem and scope of the
problem.
2) Information gathering.
3) Analysis (normalization)
4) Create a logical data model
(independent of platform).
5) Decision about physical platform like
oracle or SQL etc.
6) Create a physical data model which is
platform specific.
7) Database creation.
WHAT ARE DIFFERENT TYPES OF RELATIONSHIPS?
Ans: one to one,
one to many, many to one, many to many.
What are non-additive facts? What are non-additive facts
in detail?
A. Fact tables in
which data in columns data cannot be aggregated or calculations cannot be
possible for producing the expected results or known as non-additive facts.
A. A fact may be measure,
metric or a dollar value. Measure and metric
Are non-additive facts.
Dollar
value is additive fact. If we want to find out the amount for a particular
place for a particular period of time, we can add the dollar amounts and come
up with the total amount.
A non-additive fact, for e.g. measure height(s) for 'citizens by
geographical location' , when we rollup 'city ‘a ta to 'state' level data
we should not add heights of the citizens rather we may want to use it
to derive 'count'
geographical location' , when we rollup 'city ‘a ta to 'state' level data
we should not add heights of the citizens rather we may want to use it
to derive 'count'
Q. What is a fact less fact table? Where you
have used them?
A. Fact less table means
which doesn't have measures.
Used only to put relation
between the elements of various dimensions.
Q. What is the difference between ODS and
OLTP
A. ODS: - It is nothing but a
collection of tables created in the Data
warehouse that maintains only current data whereas OLTP maintains
the data only for transactions, these are designed for recording daily
operations and transactions of a business.
warehouse that maintains only current data whereas OLTP maintains
the data only for transactions, these are designed for recording daily
operations and transactions of a business.
Q. What are aggregate table and aggregate
fact table?
A. Aggregate table contains
summarized data. The materialized views
Are aggregated tables.
For ex in sales we have only date transaction
.If we want to create a
report like sales by product per year. In such cases we aggregate the
date vales into week_agg, month_agg, quarter_agg, year_agg. To
retrieve date from these tables we use @aggregate function.
report like sales by product per year. In such cases we aggregate the
date vales into week_agg, month_agg, quarter_agg, year_agg. To
retrieve date from these tables we use @aggregate function.
Q: What is the difference between View and
Materialized View?
A: Data of a Materialized View is saved in a physical table, so data
access is fast due to direct access to the table. View will perform join
on tables based on the query every time it is referred.
access is fast due to direct access to the table. View will perform join
on tables based on the query every time it is referred.
Q. Explain in detail about type 1, type
2(SCD), type 3?
A. Type 1: overwrite data.
Type 2: current, recent and history data should be there.
Type 3: current and recent data should be there
Q: What is the difference between OLAP,
ROLAP, MOLAP and HOLAP?
A: On Line Analytical
Processing (OLAP), Relational OLAP (use RDBMS),
Multi-dimensional OLAP
(cube), Hybrid OLAP (ROLAP+MOLAP).
ROLAP stands for Relational OLAP. Users see their
data organized
In cubes with dimensions,
but the data is really stored in a
Relational Database
(RDBMS).
MOLAP stands for Multidimensional OLAP. Users see
their data
Organized in cubes with
dimensions, but the data is store in a
Multi-dimensional database (MDBMS) like Oracle Express Server.
In a MOLAP system lot of queries have a finite answer and
performance is usually critical and fast.
Multi-dimensional database (MDBMS) like Oracle Express Server.
In a MOLAP system lot of queries have a finite answer and
performance is usually critical and fast.
HOLAP stands for Hybrid OLAP; it is a combination
of both worlds.
Seagate Software'sHolos is an example HOLAP environment. In a
HOLAP system one will find queries on aggregated data as well
as on detailed data.
HOLAP system one will find queries on aggregated data as well
as on detailed data.
Q. What is a
Conformed Dimension? What is a
Conformed Fact?
When the enterprise decides to create a set of common labels across
all the sources of data, the separate data mart teams (or, single
centralized team) must sit down to create master dimensions that
everyone will use for every data source. These master dimensions are
called Conformed Dimensions.
Two dimensions are conformed if the fields that you use as row
headers have the same domain.
all the sources of data, the separate data mart teams (or, single
centralized team) must sit down to create master dimensions that
everyone will use for every data source. These master dimensions are
called Conformed Dimensions.
Two dimensions are conformed if the fields that you use as row
headers have the same domain.
If the definitions of
measurements (facts) are highly consistent, we call Them as Conformed Facts.
Q. What are the 3 important fundamental
themes in a data warehouse?
The 3 most important
fundamental themes are:
1. Drilling Down
2. Drilling Across and
3. Handling Time
Q. What is the necessity of having surrogate
keys?
Production
may reuse keys that it has purged but that you are
Still maintaining
Production
might legitimately overwrite some part of a
Product
description or a customer description with new values but not change the
product key or the customer key to a new value. We might be wondering what to
do about the revised attribute values (slowly changing dimension crisis)
Production
may generalize its key format to handle some new
Situation in the
transaction system. E.g. changing the
production keys from integers to alphanumeric or may have
12-byte keys you are used to have become 20-byte keys
production keys from integers to alphanumeric or may have
12-byte keys you are used to have become 20-byte keys
Acquisition of companies
Q. What is a Star Schema?
A star schema is a set of
tables comprised of a single, central fact table
surrounded by de-normalized dimensions. Each dimension is
represented in a single table. Star schema implement dimensional data
structures with de- normalized dimensions. Snowflake schema is an
alternative to star schema. A relational database schema for
representing multidimensional data. The data is stored in a central fact
table, with one or more tables holding information on each dimension.
Dimensions have levels, and all levels are usually shown as columns in
each dimension table.
surrounded by de-normalized dimensions. Each dimension is
represented in a single table. Star schema implement dimensional data
structures with de- normalized dimensions. Snowflake schema is an
alternative to star schema. A relational database schema for
representing multidimensional data. The data is stored in a central fact
table, with one or more tables holding information on each dimension.
Dimensions have levels, and all levels are usually shown as columns in
each dimension table.
Q. What is a Snowflake Schema?
A snowflake schema is a set
of tables comprised of a single, central
fact table surrounded by normalized dimension hierarchies. Each
dimension level is represented in a table. Snowflake schema
implements dimensional data structures with fully normalized
dimensions. Star schema is an alternative to snowflake schema.
fact table surrounded by normalized dimension hierarchies. Each
dimension level is represented in a table. Snowflake schema
implements dimensional data structures with fully normalized
dimensions. Star schema is an alternative to snowflake schema.
An example would be to break down the Time
dimension and create tables for each level; years, quarters, months; weeks,
days… These additional branches on the ERD create more of a Snowflake shape
then STAR
Q. What is the difference between OLTP and
OLAP?
OLAP - Online Analytical
processing, mainly required for DSS, data is in de-normalized manner and mainly
used for nonvolatile data, highly
indexed, improve query response time
indexed, improve query response time
OLTP - Transactional Processing
- DML, highly normalized to reduce
Deadlock & increase concurrency
Q. What is BI? And why do we need BI?
Business Intelligence, it
is an ongoing process of various integration
Packages to analyze data.
Q. What are the important fields in a
recommended Time dimension table?
Time_key
Day_of_week
Day_number_in_month
Day_number_overall
Month
Month_number_overall
Quarter
Fiscal_period
Season
Holiday_flag
Weekday_flag
Last_day_in_month_flag
Day_of_week
Day_number_in_month
Day_number_overall
Month
Month_number_overall
Quarter
Fiscal_period
Season
Holiday_flag
Weekday_flag
Last_day_in_month_flag
Q. Which approach is better and why? Loading
data from
Data marts to data warehouse or vice versa?
A. Best way is to move data
from ODS --> Data warehouse to Data
Marts.
A: ETL is the Data Warehouse acquisition processes of Extracting (E),
Transforming or Transporting (T) and Loading (L) data from source
systems into the data warehouse.
Transforming or Transporting (T) and Loading (L) data from source
systems into the data warehouse.
No comments:
Post a Comment