Tab Pages

Tuesday, March 1, 2011

Data Warehouse Testing


Data Warehouse Testing is Different
All works in Data Warehouse population are mostly through batch runs. Therefore the testing is different from what is done in transaction systems.
Unlike a typical transaction system, data warehouse testing is different on the following counts:
User-Triggered vs. System triggered
Most of the production/Source system testing is the processing of individual transactions, which are driven by some input from the users (Application Form, Servicing Request.). There are very few test cycles, which cover the system-triggered scenarios (Like billing, Valuation.)
In data Warehouse, most of the testing is system triggered as per the scripts for ETL ('Extraction, Transformation and Loading'), the view refresh scripts etc.
Therefore typically Data-Warehouse testing is divided into two parts--> 'Back-end' testing where the source systems data is compared to the end-result data in Loaded area, and 'Front-end' testing where the user checks the data by comparing their MIS with the data displayed by the end-user tools like OLAP.
Batch vs. online gratification
This is something, which makes it a challenge to retain users interest.
A transaction system will provide instant OR at least overnight gratification to the users, when they enter a transaction, which either is processed online OR maximum via overnight batch. In the case of data- warehouse, most of the action is happening in the back-end and users have to trace the individual transactions to the MIS and views produced by the OLAP tools. This is the same challenge, when you ask users to test the month-end mammoth reports/financial statements churned out by the transaction systems.
Volume of Test Data
The test data in a transaction system is a very small sample of the overall production data. Typically to keep the matters simple, we include as many test cases as are needed to comprehensively include all possible test scenarios, in a limited set of test data..
Data Warehouse has typically large test data as one does try to fill-up maximum possible combination and permutations of dimensions and facts.
For example, if you are testing the location dimension, you would like the location-wise sales revenue report to have some revenue figures for most of the 100 cities and the 44 states. This would mean that you have to have thousands of sales transaction data at sales office level (assuming that sales office is lowest level of granularity for location dimension).
Possible scenarios/ Test Cases
If a transaction system has hundred (say) different scenarios, the valid and possible combination of those scenarios will not be unlimited. However, in case of Data Warehouse, the permutations and combinations one can possibly test is virtually unlimited due to the core objective of Data Warehouse is to allow all possible views of Data. In other words, 'You can never fully test a data Warehouse'
Therefore one has to be creative in designing the test scenarios to gain a high level of confidence.
Test Data Preparation
This is linked to the point of possible test scenarios and volume of data. Given that a data- warehouse needs lots of both, the effort required to prepare the same is much more.
Programming for testing challenge
In case of transaction systems, users/business analysts typically test the output of the system. However, in case of data warehouse, as most of the action is happening at the back-end, most of the 'Data Warehouse data Quality testing' and 'Extraction, Transformation and Loading' testing is done by running separate stand-alone scripts. These scripts compare pre-Transformation to post Transformation (say) comparison of aggregates and throw out the pilferages. Users roles come in play, when their help is needed to analyze the same (if designers OR business analysts are not able to figure it out).

 Data Warehouse Testing Categories

Categories of Data Warehouse testing includes different stages of the process. The testing is done on individual and end to end basis.
Good part of the testing of data warehouse testing can be linked to 'Data Warehouse Quality Assurance'. Data Warehouse Testing will include the following chapters:
Extraction Testing
This testing checks the following:
  • Data is able to extract the required fields.
  • The Extraction logic for each source system is working
  • Extraction scripts are granted security access to the source systems.
  • Updating of extract audit log and time stamping is happening.
  • Source to Extraction destination is working in terms of completeness and accuracy.
  • Extraction is getting completed with in the expected window.
Transformation Testing
  • Transaction scripts are transforming the data as per the expected logic.
  • The one time Transformation for historical snap-shots are working.
  • Detailed and aggregated data sets are created and are matching.
  • Transaction Audit Log and time stamping is happening.
  • There is no pilferage of data during Transformation process.
  • Transformation is getting completed with in the given window
Loading Testing
  • There is no pilferage during the Loading process.
  • Any Transformations during Loading process is working.
  • Data sets in staging to Loading destination is working.
  • One time historical snap-shots are working.
  • Both incremental and total refresh are working.
  • Loading is happening with in the expected window.
End User Browsing and OLAP Testing
  • The Business views and dashboard are displaying the data as expected.
  • The scheduled reports are accurate and complete.
  • The scheduled reports and other batch operations like view refresh etc. is happening in the expected window.
  • 'Analysis Functions' and 'Data Analysis' are working.
  • There is no pilferage of data between the source systems and the views.
Ad-hoc Query Testing
  • Ad-hoc queries creation is as per the expected functionalities.
  • Ad-hoc queries output response time is as expected.
Down Stream Flow Testing
  • Data is extracted from the data warehouse and updated in the down-stream systems/data marts.
  • There is no pilferage.
One Time Population testing
  • The one time ETL for the production data is working
  • The production reports and the data warehouse reports are matching
  • T he time taken for one time processing will be manageable within the conversion weekend.
End-to-End Integrated Testing
  • End to end data flow from the source system to the down stream system is complete and accurate.
Stress and volume Testing
This part of testing will involve, placing maximum volume OR failure points to check the robustness and capacity of the system. The level of stress testing depends upon the configuration of the test environment and the level of capacity planning done. Here are some examples from the ideal world:
  • Server shutdown during batch process.
  • Extraction, Transformation and Loading with two to three times of maximum possible imagined data (for which the capacity is planned)
  • Having 2 to 3 times more users placing large numbers of ad-hoc queries.
  • Running large number of scheduled reports.
Parallel Testing
Parallel testing is done where the Data Warehouse is run on the production data as it would have done in real life and its outputs are compared with the existing set of reports to ensure that they are in synch OR have the explained mismatches.
Security Framework testing
Check all possible aspects of Security Framework.
Data Warehouse Implementation Deployment
Data Warehouse implementation will need installation of checking in the final version, productionizing, installing client applications, establishing service and support mechanism, user communications and week-end conversion of database.
After the testing phase is complete, the systems are now ready for implementation
List of Tasks for Data Warehouse Implementation and Deployment
Version the DW implementation scripts and environments & Freeze
  • All scripts related to Extraction, Transformation and Loading
  • All configurations of staging and Loading, and access environments
  • All Data in the source systems.
Run the DW productionization processing
  • Run the historical snap-shots (Some times they are done before hand to save time on the implementation weekend).
  • Run the ETL scripts
  • Run the refresh scripts for downstream data marts, systems.
Data Quality and DW implementation Verification
Business analysis teams mainly do this before they hand it over to users for their verification:
Data Warehouse End User Applications
End user applications need not be implemented along with the Data Warehouse. Typically Data Warehouse is implemented along with an OLAP tool and some end-user tools.
Same rigor is followed for end user applications as that of Data Warehouse, in terms of versioning, production processing and verification. Clients of the end user applications are installed and tested.
End User Training
  • The end user training material is ready.
  • A group of power users has already been taken through it (they themselves might be involved in creation of the training material). These power users will act as the trainers for their respective functions.
  • The training material is divided into different category of users- Power users/Designers, Ad-hoc users (typically they place the ad-hoc queries and use the reports generated by the designers) and recipient users. (The users who only view the reports created by designers)
End User Support mechanisms and communication
  • 'Support mechanism for users' is established and communicated (Help line numbers, with in department mentors/trainers, escalation matrix)
  • Support mechanism for technology staff (vendor help lines and escalation matrix.)
  • The reporting and communication framework on defects, and user feedback is established.
Data Warehouse Test Scenarios
Data Warehouse testing includes all typical testing like exception testing, boundary testing, stress testing etc..
Real Life Simple Scenarios
Simple scenarios are those, which are relatively straightforward and can be the first step to understand the health of the system. examples are:
  • Extraction– Complete table Extraction from a core system with robust DBMS.
  • Transformation – Creation of simple derived attributes (creating complete bill amount from individual billing items) OR creating aggregates.
  • Loading – Loading a dimension set with lesser attributes and without any Transformation during Loading.
  • OLAP – Testing using 'Basic Functions'
Real Life complex scenarios
  • Extraction – Data Extraction from an excel sheet involving filtering out the customers not matching the standard customer code.
  • Transformation – 'De-Dup', 'Integration'
  • Loading – Loading dimensions with large set of attributes.
  • OLAP – Testing population of OLAP population.
Boundary Testing
These are the conditions, which will test the extreme situations possible to be faced by Data warehouse. For example
  • Extraction – No data in the source system.
  • Transformation – Creating derived attribute with input figure being very large OR very small. (For example a % sales revenue figure for sales of USD 10 out of the total sales of USD one million)
Negative Testing
Checking on how the system handles the negative conditions:
  • Extraction – Wrong OR unexpected data in the table. (For example you place the wrong customer ID format, character fields in what should be numeric etc.)
  • Transformation- having negative sales numbers, age of 200 years etc. This is important, as the transformation logic should not only work on what it wants to do, but what all it could face.
  • Loading – Having wrong data sets. For example having data set of dimension 'location' has two columns less OR not existing OR having null values. There should be some fundamental checks, which need to be run by Loading system before it goes for bulk Loading.
  • OLAP- Users entering wrong formulae.
Full Production Simulation
This can be a full scale parallel testing, but is something more than that. Where-as parallel testing is done in synch with the production, the production simulation does not necessarily have to do the same. One takes the back up of the source systems from an earlier date and runs the complete ETL and 'end user tools' operations to look at the results. This typically is a step before the parallel testing is done. Production simulation is more of a lab test by technology before the system is released to full user view of parallel testing.

No comments:

Post a Comment