Abstract
One of the greatest risks to success any company
implementing business intelligence
system can make is rushing a data warehouse into service
without testing it effectively.
Even wise IT managers, who follow the Old Russian proverb,
"trust, but verify," need to,
maintain their vigilance. There are pitfalls in the testing
process, too.
Many organizations create test plans and assume that testing
is over when every single test
condition passes their expected results. In reality, this is
a very difficult bar to meet. Often,
some requirements turn out to be unattainable when tested
against production information.
Some business rules turn out to be false or incredibly more
complex than originally thought.
Data warehousing applications keep on changing with changing
requirements. This white paper shares
some of the best practices of the experiences of data
warehouse testing.
“If you torture data sufficiently, it will confess to almost
anything.”
1 About Data Warehousing Testing
We know how critical the data is in a data warehouse when it
integrates data from
different sources. For example, in the healthcare industry,
it helps users to answer
business questions about physicians, plan the performance,
market share and geographic
variations in clinical practice, health outcomes etc. Thus
if the data is so sensitive, critical
and vast, we can understand how much challenging it would
be. Thus this is a menial
effort to write about some of the best practices we learned
while doing it on ground to
share it with others.
How much confident a company can be to implement its data
warehouse in the market
without actually testing it thoroughly. The organizations
gain the real confidence once the
data warehouse is verified and validated by the independent
group of experts known as
“Data warehouse testers”.
Many organizations don’t follow the right kind of testing
methodology and are never sure
about how much testing is enough to implement their data
warehouse in the market. In
reality, this is a very difficult bar to meet. Often, some
requirements are difficult to test.
Some business rules turn out to be erroneous, wrongly
understood or highly complex than
originally thought.
Data warehousing applications keep on changing with changing
requirements. This white
paper explains the need to have a data warehouse application
testing in place and it
mentions the various steps of the testing process covering
the best practices.
2 Need for Data Warehouse testing: Best Practices
As we all know that a data warehouse is the main repository
of any organization's
historical data. It contains the material and information
for management's decision
support system.
Most of the organization runs their businesses on the basis
of collection of data for
strategic decision- making. To take a competitive edge the
organization should have the
ability to review historical trends and monitor real-time
functional data.
Most importantly, we never appreciate if the bug is detected
at the later stage of testing
cycles because it could easily lead to very high financial
losses to the project. So data
warehouse testing following the best practice is unavoidable
to remain at the top of the
business.
3 Data warehousing testing phases
While implementing the best practices at our testing we
follow the various phases in our
data warehouse testing. They are:
1) Business understanding
a. High Level Test Approach
b. Test Estimation
c. Review Business Specification
d. Attend Business Specification and Technical Specification
walkthroughs
2) Test plan creation, review and walkthrough
3) Test case creation, review and walkthrough
4) Test Bed & Environment setup
5) Receiving test data file from the developers
6) Test predictions creation, review (Setting up the
expected results)
7) Test case execution and (regression testing if required).
a. Comparing the predictions with the actual results by
testing the
business rules in the test environment.
b. Displaying the compare result in the separate worksheet.
8) Deployment
a. Validating the business rule in the production
environment.
When we test, we take sample data from the designed
architecture and the test data files
are usually provided to the testers by the developers. Of
course, the test data should be
able to cover all the possible scenarios w.r.t the
requirements while we do the predictions
to define the expected test case results, however if the
data provided to us is not
supportive enough to cover all the business rules then we go
for the data mocking that is
explained in the later section of the paper.
Before we move further let me share a very interesting
example which we came across
recently when performing our testing.
On May, 2008 our Data warehouse implemented the logic to use
a field, Calendar
Year/Plan Year (field name) which was passed by the Source,
for which the source
considered the Policy year as 'Y' and Calendar year as 'N',
however the ETL used
different notation to populate Calendar Year and Policy Year
details on the target table,
as the logic change was not properly informed. After the
project was implemented it was
discovered that since there was a miscommunication between
source system and
Business Analysts, it did not recognize the change and
millions of rows which were
populated with incorrect data, i.e.
The expected data was Policy year = Y and
Calendar year = N
The actual data was Policy year = N and
Calendar year = Y
Because of wrong notation, data is behaving oddly and in
reporting, it leads to huge loss
to the project. Thus effective communication as a best
practice could have helped prevent
this situation.
4 What is ETL?
ETL stands for extract, transform, and load. It can
consolidate the scattered data for any
organization while working with different departments. It
can very well handle the data
coming from different departments.
For example, a health insurance organization might have
information on a customer in
several departments and each department might have that
customer's information listed in
a different way. The membership department might list the
customer by name, whereas
the claims department might list the customer by number. ETL
can bundle all this data
and consolidate it into a uniform presentation, such as for
storing in a database or data
warehouse.
ETL can transform not only data from different departments
but also data from different
sources altogether. For example, any organization is running
its business on different
environments like SAP and Oracle Apps for their businesses.
If the higher management
wants to take discussion on their business, they want to
make the data integrated and used
it for their reporting purposes. ETL can take these two
source system data and make it
integrated in to single format and load it into the tables.
5 How is data warehouse testing different from normal
testing?
Generally the normal testing steps are:
• Requirements Analysis
• Testing Methodologies
• Test Plans and approach
• Test Cases
• Test Execution
• Verification and Validation
• Reviews and Walkthroughs
The main difference in testing a data warehouse (DW) is that
we basically involve the
SQL queries in our test case documents. It is vital to test
both the initial loads of the Data
Warehouse from the source i.e. when it gets extracted and
then updating it on the target
table i.e. the loading step. In specific cases, where
trouble shooting is required, we verify
intermediate steps as well.
A defect or bug detection can be appreciated if and only if
it is detected early and is fixed
at the right time without leading to a high cost. So to
achieve it, it is very important to set
some basic testing rules. They are:
• No Data losses
• Correct transformation rules
• Data validation
• Regression Testing
• Oneshot/ retrospective testing
• Prospective testing
• View testing
• Sampling
• Post implementation
We are now going to talk with reference to the
practices/strategies we implement in our
current project on each of them.
6 No Data losses
• We verify that all expected data gets loaded into the data
warehouse. This
includes validating that all records, all fields and the
full contents of each field are
loaded without any truncation occurs at any step in the
process.
• As and when required negative scenarios are also
validated. Some of the
examples are validating special characters etc
7 Correct transformation Rules
• We ensure that all data is transformed correctly according
to business rules, it
could be straight move, simple transformation or the complex
transformation.
• The best method could be to pick some sample records, use
the “EXACT”
formula in the excel worksheet and compare the results to
validate data
transformations manually. This should be ideally done once
the testers are
absolutely clear about the transformation rules in the
business specification.
8 Data Validation
• We say we have achieved the quality when we successfully
fulfill customer’s
requirements. In other words we basically achieve a value
for our customer. Since
in data warehouse testing; the test execution revolves
around the data, so it is
important to achieve the degree of excellence for the data
and for that we do the
data validation for both the data extracted from the source
and then getting loaded
at the table.
Heading level Example Font size and style
Title (centered) Title of paper 14 point, bold; U1
1st-level heading 1 Introduction 12 point, bold;
heading1
2nd-level heading 1.1 Subsection 12 point, bold;
heading2
3rd-level heading 1.1.1 Headings. 12 point, bold, italics
Text Text goes here … 12 point, justified,
single line spacing
Table 1 Give the table caption below the table. (Source:
give the reference of the source, if the table is
adopted from elsewhere)
4 Citations
The surnames of authors and year of publication should be
given to the corresponding
text. The references should be left aligned in 6-point
spacing (after) and 0-point spacing
(before) should be there for each reference.
No comments:
Post a Comment