Using Blockchain to Ensure Data Integrity for Data Science
Problems with data integrity aren't new. In fact, most sources credit an IBM programmer and instructor from the 1960s with the famous acronym GIGO -- or garbage in, garbage out. Errors or even intentional manipulation of data have always plagued information systems. Even though modern systems may have better features for reducing errors, the very speed data gets generated today multiplies issues. Learn how a relatively new technology, blockchain, can help limit various threats to data integrity in this age of rapid data generation.
Does Maintaining Data Integrity Still Present a Challenge?
In the early days of computing, limited technology may have made it harder to ensure data integrity. After all, they mostly had to enter information with a keypunch machine. Certainly, people made errors and lacked the interfaces to catch common mistakes and enforce high-level security. At the same time, a lower level of tech limited the speed they could generate any data, good or bad. Also, malicious actors had not developed the sophistication they demonstrate today. Decades ago, data may have been more vulnerable; however, the problem may have been more manageable.
In our Information Age, data scientists use data to gather actionable insights. Increasingly, they gather critical governmental or business intelligence from massive and rapidly growing datasets. To practice data science, they employ big data analytics, machine intelligence, and many other disciplines. Information may come from human input, but just as likely, it's generated at the speed of Wi-Fi from sensors, GPS signals, transaction records, smart machines, and automated systems.
In return, data science has provided solutions to critical problems. For instance, healthcare companies have relied upon it to help improve patient outcomes and manage expensive equipment. Other organizations may employ data scientists to produce business intelligence to refine their customer experience, operate more efficiently and safely, and even to make better use of energy to reduce pollution.
Data science has plenty of potential to improve government, business, and even lives. Still, a 2017 survey found that data scientists believe that bad data still presents them with the biggest challenge to maximizing the benefits of their profession.
How Blockchain Can Help Maintain Data Integrity
Blockchain got a lot of attention because it was originally developed to support cryptocurrency. Even though decentralized currencies and tokens have uses, they may simply be the initial driver for this developing technology. Increasingly, data scientists have explored ways to use blockchain tech to improve data quality.
To explain the benefits of ensuring data integrity with blockchain, it's helpful to review several key features:
Unlike traditional data storage, blockchain relies upon distributed storage and decentralized processing. Since multiple servicers contain identical records, any corruption or mischief on one server won't damage the network. Users access blockchain systems with crypto keys. The system can assign various levels of authority to each key or group of keys.
Multiple nodes have to agree to process a transaction, according to set criteria. Not only does this make it impossible for one corrupted node to damage the entire network, it also makes it simple to identify problem nodes and simply expunge them from the blockchain.
The decentralized nature of blockchain also gives these systems plenty of processing power from multiple servers. For example, it's possible to utilize dozens or even hundreds of servers for data science, making calculations possible that would never be feasible with a centralized computer.
The blockchain contains verified and structured data, helping to simplify processing. It also helps manage data sharing, so various teams don't need to duplicate effort and can all operate with the same information. Some organizations even find ways to monetize both the data they have collected and the results of their big data analytics.
Even so, it's important to note that this technology is still developing. For instance, while blockchain can deliver a lot of processing power, it's still relatively expensive to use when compared to more traditional data storage and processing systems. Reasons for this include maintaining multiple servers and even the cost of blockchain developers, who are in short supply. When the technology evolves to address these issues, it has the potential to disrupt the way our ever-increasing supply of data can be managed and utilized.
How does Onyx leverage Distributed Ledger Technology?
One of our customers’ greatest challenges is data integrity both in flight and at rest. In an effort to mitigate these concerns as it relates to their most sensitive information, Onyx has deployed blockchain and related distributed ledger technologies for the following specific purposes:
- Ensuring Trust: Inherent transparency provides traceability during transit and interactions with the content from origin to destination
- Preventing Tampering: Immutable nature and distributed storage of blockchain makes it extremely difficult to perform malicious actions on information
- Manage Sharing: Can be used to perform workflow and interaction tracking of access and use of the information
In support of the above, as transactions flow to their eventual persistent store, we make use of distributed Onyx Accelerator Engines to generate hashes. These hashes are then stored in the immutable blockchain for the most sensitive payloads. The blockchain is then used by supporting tools to validate and provide chain of custody information about access to the information.
Please reach out to someone on the technology team to get more information!
https://www.atlasobscura.com/articles/is-this-the-first-time-anyone-pri… - origin of GIGO
https://towardsdatascience.com/how-blockchain-will-disrupt-data-science… - client resources
https://searchengineland.com/the-importance-of-big-data-integrity-and-s… - no link