Friday, December 4, 2009

Some TLC for Your Data

Did you ever wonder why the data error was entered into the system, database, or report that’s right there in front of you?

You can look at it a hundred different ways.

-- The customer entered the data wrong on the website.
-- The call centre rep, entered the data wrong on the website.
-- The sales rep, forgot to enter his sales for the month of September and keyed it in for October.
-- The programmer entered the wrong statement in the data integration script.
-- The programmer put ‘greater than’ instead of ‘less than’ statement in the summarization script.
-- The business analyst did not provide the correct data retention requirements, and that’s why you have 6 months of summarized data vs. 16 months.
-- No one could come to a consensus on the definition of the value, that’s why we have 187 values for that field.

These are just a few reasons bad data is where it is.


What I’d like to know is what are your reasons, or the reasons you’ve heard. Feel free to let me know, send me a list of reasons why bad data existed in your system, application, database or report. I don’t want high level reasons, let’s have the granular reasons. When I get to 101 I’ll publish the list for all to see (no names or course).

However, here’s another reason to think about it. It’s apathy.

Really, really.

To have good, quality, accurate data all you need is a little TLC. For data to be accurate people, need to care just a little more about what they are doing. In the above examples, if people gave a little TLC there would be no bad data.

We live in a rushed, hurried world where everything is needed yesterday so a little TLC is hard to come by.

Thursday, November 12, 2009

Book Review: Viral Data in SOA


“Virus: A microorganism smaller than a bacteria, which cannot grow or reproduce apart from a living cell. A virus invades living cells and uses their chemical machinery to keep itself alive and to replicate itself. It may reproduce with fidelity or with errors (mutations)-this ability to mutate is responsible for the ability of some viruses to change slightly in each infected person, making treatment more difficult.” Medicine.net

In the early stages of the flu season, it is only appropriate we take a quick look at viral infections.

With discussions about service oriented architecture, concerns about data quality, and data management will become highlighted to any organization. As bad data infects one portion it will easily flow through to other modules, databases, process flows, reports and decision points of your company. One must be vigilante in monitoring data, and managing it.

What did I like about this book, everything. What didn’t I like about I, not much.

“Service –Oriented architectures are intended to encourage solution builders to create offerings that can readily transcend point-in-time solutions.”

The author, Neal A. Fishman, talks about data governance and the critical role communication plays. He identifies that data governance can also be handled in a proactive and reactive manner. He identifies what needs to be done to enforce data governance in a SOA environment, and how the control points can govern data quality. Those points being:

-- 1. Ensure: Controls for operating
-- 2. Assure : Controls for performing
-- 3. Insure: Controls for sustaining
-- 4. Reassure: Controls for continuity

He describes data quality and data governance in great detail within the SOA environment and the author states:

“The effectiveness of data governance depends on how the governance body reacts and adapts to the cultural environment.”

With that the author describes the dialing system to tweak operations. ED-SODA provides the dimensions needed to adjust the data governance process. It can be used for virtually any culture if not all.

And if you are having issues with building a data governance model, step into the reference model, for this is where you will get your basics for developing controls in data governance.

Even though the book is about data in a SOA environment, this is a book for every data analyst particularly the sections on data quality, data governance, and a myriad of thought provoking points throughout the book. and how bad data can become viral. These points and examples of what others have done will provide insight into your own issues and processes.


Friday, October 2, 2009

September Festival del IDQ Bloggers


With the month of September come and gone, the changing colour of the leaves starting, hockey season starting and the wind getting colder by the day, we found another month filled with interesting posts about data quality. This month I am happy to say, I'm hosting September's "Festival del IDQ Bloggers".

An annual data quality blogging carnival held by the International Association for Information and Data Quality, an international not-for-profit association dedicated to the development of the data and information and data quality profession.


The following is a quick list, I'll say quick but it actually took an excruciating long time to compile, split coffee on my keyboard, hit my head on the light and I stubbed my toe in the process ;-)
On with the data quality blog round-up...
From the DoBlog (http://obriend.info/) the personal blog of Daragh O Brien, IAIDQ Director and Information Quality consultant and writer. Since 2006 Daragh has been writing about Information Quality related topics (amongst other things) on this blog and has even won an Obsessive Blogger award for his writing on Information Quality topics.

We find two posts of interest one about the Law and the other about Market Research.

Blog Post: http://obriend.info/2009/09/25/finding-red-herrings-or-missing-a-trick/

Market Research often falls foul of poor quality information about the target sample population. In this post Colin Boylan (a freelance Market Researcher) discusses some of the issues that can lead to you chasing Red Herrings or just Missing a Trick.

Colin Boylan is a freelance market researcher living and working in Ireland. He has worked with many of the leading market research firms in Ireland and the UK, with particular experience in Pharmaceutical studies (where good quality data is essential).

Also in the same blogging journal we have an interesting tale about the law looking at data quality...



Blog Post: http://obriend.info/2009/09/29/a-game-changer-ferguson-v-british-gas/

For about 4 years, Daragh has been hammering on about how poor quality information can and could get an organization sued. In January it happened, with a very clear and explicit ruling in the Court of Appeal for England and Wales that sets a very interesting legal precedent (binding in England and Wales and persuasive in all other Common Law jurisdictions such as Ireland, Canada, Australia, India, USA, Pakistan....). This post (based on an article Daragh wrote for the IAIDQ in April) looks at that case and the implications for Information Quality professionals.

Daragh O Brien is a Director of IAIDQ, a Fellow of the Irish Computer Society and, after escaping from indentured servitude in a leading Irish Telco after 12 years is in the process of establishing a specialist Information Quality Management and training practice. He is also writing a book on legal issues in Information Quality with Fergal Crehan, a prominent Irish barrister (lawyer).

No Blog Carnival is complete without a post from the Obsessed Jim Harris in this short but sweet post Jim talks about knowledge and the fact that we know what we know, and we don’t know the rest. Something to think about as you read Jim’s post.






Jim’s OCDQ blog is an independent blog offering a vendor neutral perspective on data quality. A place where he offers a diversity of viewpoints in a collaborative style environment. Jim himself is an independent consultant, speaker, writer and blogger with over 15 years of professional services and application development experience in data quality (DQ), data integration, data warehousing (DW), business intelligence (BI), customer data integration (CDI), and master data management (MDM). Jim has worked with Global 500 companies in finance, brokerage, banking, insurance, healthcare, pharmaceuticals, manufacturing, retail, telecommunications, and utilities.

Jumping across the pond and over to Sweden, where I’ll take a moment and say hi to the Ericsson’s, Brit, Mikael, Max, Guztav and Hanah, I hope all is well. Then a quick move to Denmark where, we have DQ blogger Henrik Liliendahl Sørenson. A man of many talents, who has worked over 20 years in applications, databases and data in general. Henrik has demonstrated his expertise in business directory matching and international aspects of data quality improvement and master data management.
Henrik’s blog, Liliendahl on Data Quality, is a collection of his personal opinions, experiences and observations around data quality. Accumulated over decades and I do mean decades of experience.

Henrik discusses the multi-use potentials of data quality...could it be...can data quality be used for increasing revenues, and for marketing...read on and find out.


The post has a follow up post sparked by the comments: http://liliendahl.wordpress.com/2009/09/27/process-of-consolidating-master-data/

Going across the globe and going to that lovely local known as Australia we have Vincent McBurney a manager with Deloitte Consulting in Australia and has 15 years as an application programmer, database programmer, ERP implementer and information management consultant. The blog is dedicated to a tool based approach to data integration with news and tips on IBM InfoSphere, Informatica, Oracle, Microsoft and any breaking data integration news.

This particular post is interesting because it talks about something we all like - fudge. But not this fudge, no way, this fudge is actually fudging moments and then having to apply some kludge techniques or go in and kludge the situation to fix it. Fudge, Kludge all around a great read.

Blog Post: http://it.toolbox.com/blogs/infosphere/the-data-quality-and-how-to-fudge-it-34289

From the Data Quality Pro…Dylan Jones provides us an excellent interview with Ken O’Connor who discuss the a means in creating a data issue assessment process, something everything DQ Team should have in place, and that’s why it’s here. Coming from the trenches this is something any data quality analyst can use over and over again.


DataQualityPro is an online community resource that is solely dedicated to the needs and development of data quality professionals everywhere. Dylan Jones, is the founder and editor of Data Quality Pro and Data Migration Pro, leading online knowledge centre and community sites for their respective professions.With a 15 year background in data quality and data migration Dylan now supports a global community of several thousand professionals who actively collaborate and contribute to help increase the collective knowledge in these fields.

Here’s an interesting read about data governance from Gwen Thomas. Gwen Thomas is Founder and President of The Data Governance Institute, which is the premier provider of in-depth, vendor-neutral information about, and assistance with, tools, techniques, models, and best practices for the governance/stewardship of data and information. This is Gwen’s personal blog from the Data Governance Institute.

This post is here because data quality is a big piece of data governance. Data governance provides guidance in defining quality. There is a symbiotic relationship between the two.
With this post we get a high level view of the net-centric governance and the potential issues of control one may have with it. It gets the wheels turning when you begin to think of the implications that may and could very well follow.

Blog Post: http://datagovernancematters.com/2009/09/14/net-centric-data-governance-not-for-sissies/


Finally…
After hearing about the actions of an old friend of mine working hard in a data quality team, I was surprised to learn she is still having to justify the existence of the data quality team. It’s always good to know about what you’re doing and the value you bring to the table, but this being the 3rd time within a year, I believe enough is enough…so here it is. Who am I, well I am this guy, business analyst by day, data quality analyst by night. My blog, Data Quality Edge, is really a place to voice my opinions and what I hope will provide a grassroots look at data quality, something really for the data quality analyst in the trench. Because they are the ones that get the job done.


Wishing you all the best in the cooler months ahead! Good reading.


Dan



Wednesday, September 16, 2009

Stop Justifying Data Quality Programs and Do the DQ Work Already!

In a recent discussion with a good friend, I learned that they are in the middle of justifying their work in a data quality team. This being said, a few months ago they were doing it as well, and at the beginning of the year they had just wrapped up another justification project, in the beginning of the economic downturn, it was being done as well. I also know that a few years ago when I was with the team, we also had to do it.

It's a shame. A terrible shame! Some organizations understand the importance of data quality, sometimes that understanding has come at a cost:


• Lost thousands to millions;
• Faced national embarrassment;
• Or made significantly big policy screw-ups.

While other organizations, are more pro-active and have established a data quality team and program to prevent such events from happening. An activity that is considered a best practice and essential to any information technology/business intelligence structure.

However, in either case, you may have someone, traditionally a senior manager, who sees data quality as a cost, a black hole. Yes there is a cost, however the benefits outweigh the costs in a variety of ways.

• Reduction in re-work due to good data quality;
• Improved incoming data quality and data processing due to pro-active initiatives with incoming data migration and integration projects;
• Proactively preventing data quality issues from occurring;
• Improved decision making, using quality data, and more.


To my old team and senior management:


Stop with the justification exercises and begin looking at the benefits and what this dedicated group of data quality analysts have accomplished year after year.

• Recognized Finalist Best Practice by TDWI in DQ;
• Hundreds of data modelling, metadata, data processing and data corrections to incoming projects per year;
• Proactively seeks data processing improvements to improve data loads - ultimately reducing costs;
• Client support to decision makers who really don't understand the technology aspects of the data and its routines;
• Dozens of change management practices each year to improve data quality and data processing which collectively prevents lost revenues, increases sales and manages maintenance costs by reducing reruns and supporting programs such as customer profitability, and other CRM initiatives.
• The estimated benefits weigh in at an average of $1-1.5 million a year if not more.

Another justification exercise only takes the team away from doing what needs to be done, data quality.

So to the senior management in this organization and any other, yes there is a cost to any data quality program. Just remember a data quality team is your vanguard to any organization that deals heavily in data. They bring in benefit. They enable your decision makers. They protect your greatest asset - data!

A good DQ team = Great Value!