Misplaced Pages

Data quality firewall

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Data quality firewall" – news · newspapers · books · scholar · JSTOR (December 2023) (Learn how and when to remove this message)

A data quality firewall is the use of software to protect a computer system from the entry of erroneous, duplicated or poor quality data. Gartner estimated in 2017 that poor quality data cost organizations an average of $15 million a year. Older technology required the tight integration of data quality software, whereas this can now be accomplished by loosely coupling technology in a service-oriented architecture.

Features and functionality

A data quality firewall guarantees database accuracy and consistency. This application ensures that only valid and high quality data enter the system, which means that it obliquely protects the database from damage; this is extremely important since database integrity and security are absolutely essential. A data quality firewall provides real time feedback information about the quality of the data submitted to the system.

The main goal of a data quality process consists in capturing erroneous and invalid data, processing them and eliminating duplicates and, lastly, exporting valid data to the user without failing to store a back-up copy into the database. A data quality firewall acts similarly to a network security firewall. It enables packets to pass through specified ports by filtering out data that present quality issues and allowing the remaining, valid data to be stored in the database. In other words, the firewall sits between the data source and the database and works throughout the extraction, processing and loading of data.

It is necessary that data streams be subject to accurate validity checks before they can be considered as being correct or trustworthy. Such checks are of a temporal, formal, logic and forecasting kind.

See also

References

  1. "4 Steps to Overcome Data Quality Challenges". Gartner. Retrieved 2023-12-27.
Category: