We’ve all been looking at a lot of health data over the past few weeks, learning about flattened curves and infection rates. But the statistics and graphs on which major decisions of life and freedom are based, are only as good as the quality of data behind them.
Gordon Hamilton has been focused on this issue of data quality for 20 years. He now teaches the subject for BCIT Computing in a Part-time Studies Data Quality Improvement course.
“I had the idea for a course that teaches these topics when I was working at Vancouver Coastal Health. I was doing a data warehouse project for health care decision support,” explains Gordon. “Health care is trying to integrate all their information across silos, bringing in more data sets all the time.” The struggle to establish and follow best practices, and to identify areas for better care, as well as cost control, is essential.
“There’s so much potential for critical insight in health care, if the data is clean,” emphasizes Gordon. “In fact, in data science, probably 80% of the time is spent organizing and cleaning data, and just 20% in analysis.”
He tells a health data story from a neonatal critical care unit that was trying to figure out how to predict when babies might be about to have a life-threatening episode. Counter-intuitive though it seemed, after establishing continuous monitoring of multiple indicators – temperature, breathing, etc. – doctors were able to learn that just before a crisis all the indicators actually tended to stabilize.
“It was totally out of the blue, something they would not have been able to predict, but quality data enabled the insight. You just don’t know what’s going to come out of the data!”
“There’s so much potential for critical insight in health care, if the data is clean” – Gordon Hamilton
Six weeks, six facets of improving data quality
Over six weeks in COMP 3839, Gordon covers quality, how to make rules about data, what needs to be managed, determining what is under control, a ten-step program for improving critical data sets, integration of data quality into data warehouse development, and governance.
“All data has variants, just like manufactured products. The trick is how to recognize these issues and decide if the priority ones are controllable.”
Gordon uses an iterative project approach, which he feels is a good technique to teach the fundamentals of continuous quality improvement.
Data for good, data for fun
Now Senior Data Quality Specialist at gaming giant Electronic Arts (EA), Gordon is also involved in the DAMA Vancouver Chapter with a Data Governance sub-chapter, as well as Data for Good Vancouver, which ran a remote hackathon this spring. Originally planned to run at the BCIT Downtown Campus, and supported by the BCIT Centre of Excellence in Analytics – Powered by SAP, COVID-19 necessitated a location revision.
Data for Good Vancouver crowd-sources data expertise for progressive service organizations who don’t have capacity to analyze their own data. They started in Vancouver by helping the Overdose Prevention Society. According to Gordon, “initially the data needed a lot of work, but eventually the team could see some patterns. Mostly this kind of work shows where there are gaps in the data, the discontinuities.”
Ultimately the organization can improve how it’s collecting and managing data for better insight in the future – i.e., continuous improvement.
Huge potential for long term benefit
Gordon says most organizations only have a handle on a small percentage of their data – anecdotal estimates range from about 10% to just 1%. “With things like Internet of Things (IoT), that percentage handled could decrease exponentially.”
“Every organization that I know – big or small, from social non-profits to manufacturing – runs on data,” he emphasizes. “And statistical studies have shown that 20-35% of an organization’s revenue is likely wasted every year due to poor data quality.”
“The cleaner the data, the better your future predictions will be.”
As we all hover over predictions of what the future months will bring, data quality is clearly paramount.
“20-35% of an organization’s revenue is likely wasted every year due to poor data quality” – Gordon Hamilton
Learn more about data quality and analysis in BCIT Computing’s Applied Data Analytics Certificate (ADAC).