Modern DataWarehouse: The Weakest Link

It has been quite some time, that the term Modern Data Warehouse has appeared on the horizon of so many data professionals and publications around the world.

Some people describe this term as a conjunction of many different types of data sources containing different types of data (structured or unstructured) seamlessly integrated with 1 principal source, which functions and performs undoubtedly close to perfection.
It does not really matter how you name it, PDW & Hadoop, APS (Analytics Platform System) or something else – it is all around interoperability in the modern world of diversified data types & sources.

There might be some architectures which solve those modern datatype problems, but there is one thing that urgently needs to get addressed – Data Quality.

Data Quality

Almost every database has got a Data Quality problem or two, no one has any shadow of doubt about it.
When integrating a multiple number of different data sources, data quality issues rise exponentially.

You can try to battle this problem with T (transformation) as in ETL, TEL, etc, but somehow it seems to me that everyone have already assumed that by applying a couple of statements or operations can’t solve any of the major problems addressing different types of misspellings or different definitions.

What kind of solutions are there to address those issues ? – Data Quality Services appeared in SQL Server 2012 with no further major improvement in SQL Server 2014.

Did you ever try to run data matching project on 300.000 rows ? How many hours did it took ?
See, that is a problem. A major one.

We need a special high performance data quality layer to address data quality problems and this one won’t take away any working places, but instead shall create new ones. Every consultancy and solution provider working with Microsoft Data Platform will be getting more opportunities and every other platform users would be considering to try Microsoft Data Platform out in order to solve the .

This would provide a truly MODERN outlook and a modern Datawarehouse to those who truly need it.

Niko Neugebauer

SQL Server, Columnstore, Data Platform & Community

Modern DataWarehouse: The Weakest Link

Data Quality

Leave a Reply Cancel reply