Months ago I was in a discussion with professionals in my network about how to apply agile methods or agility in Business Intelligence (BI) projects. This blog post is to share some of the key aspects involved in considering agile methods for BI or data warehousing projects. In addition to implementing or considering methodology specific ceremonies – for example, Release Planning, Sprint Planning, Daily Stand-up, and the likes of Scrum, I think one must do certain things that are specific to BI or data warehousing projects. What are those? Read on.
1) Agile Approach: BI Life Cycle has several key milestones (see images). These are ‘Tool Selection & Proof of Concept’, ‘BI Charter & High Level Planning’, ‘Data Extraction’, ‘Data Transformation’, ‘Data Loading’, ‘Reporting & Analysis’, ‘Governance’, ‘Enhancement’, and ‘DW/BI Maintenance’. Some of these can be done in parts. For example, ‘Data Extraction’ in large projects can be done one week or two weeks at a time. How? Data extraction from a set of tables from DataSource-1 can be done during week-1, data extraction from the rest of the tables from DataSource-2 can be done during week-2 etc., The lessons learned during each week can be applied to the activities of the next week. This is a way to implement continuous improvement. Also, this approach has the potential to improve visibility and predictability. And it provides us an opportunity to do what matters (to business users) first. This is how we embrace agile principles in BI projects. (Similarly, ‘Data Transformation’ and ‘Data Load’ and other ‘mini-projects’ can also happen in parts.)
Data Extraction in Small Chunks
2) Automation: In BI projects when we follow an approach like this, we must consider automation. Automation is required to improve productivity and quality. Automation can happen in small steps. Some of the candidates for automation include,
a. Test Bed Setup (Data Cleanup & Loading)
b. Test Execution
c. Analysis of Test Results
d. Referential Integrity Checks
e. Validation using aggregates (sum, average, etc.)
f. Data Quality Assurance
3) Planning: It is critical to invest time in planning and creating a BI road map. This planning activity may require 2 or 3 months at the start of BI project. One may argue that this is not ‘agile’. However, one must agree that BI projects cannot have the objective of delivering something from the first week or even the first month. A good amount of planning followed by an iterative (&incremental) approach helps. So, a large BI projects can be seen as a project that starts with adequate planning (to arrive at a road map) followed by several small projects implementing agile practices. In fact, there are several myths on agile. One of them is 'Agile means no planning'. Read my post 'Agile Myths and Misunderstandings' - this post links to a free PDF on this topic. Happy reading!
4) Working Software and Feedback: When we adopt agile practices in BI projects, it is important to keep in mind that some iteration may not result in demonstrable software useful to ‘business users’. This is because of the inherent nature of BI projects. BI projects may require several initial iterations to set up the target data source and populate data. When data is ready and reports are working, end users can see the working product. Until that time the technical architect or data architect is your end user and she consumes and provides feedback on what you deliver.
5) BI Tools: Tools used in BI projects can be of two types - a) Commercial tools (for data extraction, loading etc.,) and b) homemade tools (small scripts to big routines). Teams must be ready to see the potential of homemade tools and think deep, collaborate and create small tools that can help in several ways. One approach is to identify at least 1 or 2 engineers who are ‘tool smiths’ in the team and encourage everyone in the team to come up with new ideas and tools. Tool smiths can help in implementing these ideas.
6) Why Agile? Adopting agile practices in BI projects helps in identifying risks at early stages and also enables proactive thinking and preparedness for production release. For example, iterative and incremental approaches help BI project teams estimate, measure and optimize the downtime required to launch the warehouse. In classical or traditional approaches, this happens at the final lap of the project.
7) Budget Control: Agile adoption is a feasible way to release BI project in parts (incremental manner) to the world (or business users). This helps in budget control as well as optimization (in the form of process reuse or component reuse). Instead of delivering all 40+ reports in one stretch, you can deliver subset of prioritized reports in batches. This will help you save budget in developing the low priority ones.
8) Data Quality Assurance (DQA): DQA is one of the key activities in BI projects. This is because unless we find the right steps (or processes) to identify or assess the quality of data flowing from different sources, the quality of data in the BI store or warehouse may get contaminated. This is obvious. However, in practice, during the maintenance phase of BI projects, a number of defects reported by end users are found to be related to data quality issues. Agile practices can be leveraged to identify potential data quality issues ahead of time and implement periodic checks to assess data quality (through automated scripts).
When we adopt agile practices, we come across several opportunities not only to provide predictability and visibility but also to improve the level of automation and hence to deliver home grown automation tools (such as scripts for data validation or data quality assurance) to customers. These automation tools carry the potential of providing long term value in BI projects.
What else do you do to improve agility in BI or data warehousing projects?
Related Post: TDD in ETL Projects