Security

Nasuni. A blockchain system for storing information

2021-01-15 | 14 mins

nasuni-a-blockchain-like-system-to-store-information

How to use Nasuni UniFS® as a blockchain system to solve the regulatory, scalability, and storage issues of blockchain

Intro

Nowadays, everyone wants to have a blockchain. Its virtues are widely acclaimed, while its limitations are less known. Solving these limitations in order to apply blockchain to real-world problems has led to a wide variety of blockchain implementations and types that somewhat deviate from its original focus but still maintain some of its key virtues.

Now we have public blockchains, federated blockchains, consortium blockchains, private blockchains, storage blockchains, blockchains with limited computing power, blockchains with more computing power, smart blockchains based on contracts, permissioned blockchains, permissionless blockchains, protocol 2 blockchains, blockchains with shards, work-based proof blockchains, stake-based proof blockchains, space-based proof blockchains, and so on... But can anyone tell me what the hell a blockchain actually is?

The goal of this article is to demonstrate that Nasuni implements a storage system based on the fundamental principles of blockchain technology, while improving or even completely eliminating its most significant problems for the use case of a consortium or private blockchain.

Blockchain

Blockchain data structures gained popularity in 2009 with Satoshi Nakamoto's paper titled "Bitcoin: A Peer-to-Peer Electronic Cash System". In this paper, Satoshi uses a blockchain data structure as a ledger to securely store the transaction history of bitcoin. However, the ideas behind blockchain are quite old and trace back to a 1991 paper by Haber and Stornetta.

Their proposal was a method for securely timestamping digital documents, rather than a digital money scheme. The goal of timestamping was to provide an approximate idea of when a document was created. Most importantly, the timestamp reflects the order in which these documents were created: if one existed before the other, the timestamps will reflect that. To ensure this is secure, it is required that the timestamp of a document cannot be changed after the fact.

In summary, blockchain systems use public-key cryptography to create an append-only, immutable, and timestamped chain of content. Copies of the blockchain are distributed across each participant node in the network.

Blockchain systems are theoretically ideal for storing highly sensitive information for three reasons:

Replication: They maintain a very high level of data replication, making it extremely resilient and durable.
Immutability: Once information is written, it remains immutable from that point (with date and time), evolving only based on changes, which are stored separately and recomposed when the information is requested at a given time.
Cryptographic Signature: Each change is cryptographically signed by the author, adding a layer of non-repudiation and ownership.

That being said, there are several issues that make it practically impossible to use a blockchain system for storing information at a production scale.

Consensus and Proof of Work: Transaction-based blockchains were made possible by decentralization, where no one has control by themselves. This decentralization was achieved through consensus algorithms and a noise-reducing mechanism called proof of work.

In Satoshi’s proposal, to gain the right to propose a new piece of information to be stored in the chain, one must solve a puzzle (a hash puzzle) that is computationally complex and costly, and include proof of having solved it in the proposal. This information is then validated by each node in the network based on the consensus protocol rules, and all nodes following the same protocol eventually store the piece of information and build the same chain.

However, this mechanism, which worked perfectly and made electronic money systems possible, is extremely slow and costly for almost everything else.

Storage Cost: Blockchain networks aim to gather many nodes together to improve reliability. They also try to keep the storage of these nodes as small as possible (the latest trend is to run nodes on Raspberry Pi devices). Everything written to a blockchain is replicated across each node in the network, which is why writing more than 1 MB is extremely expensive or simply not allowed.

Operational Scalability: This is probably the biggest issue and source of debate and research with blockchain systems at the moment. Currently, in all blockchain protocols, each node stores all states and processes all transactions. This provides high levels of availability and resilience, but it also severely limits scalability: a blockchain cannot process more transactions than a single node can handle. Partly because of this, Bitcoin is limited to ~3-7 transactions per second, Ethereum to 7-15, etc. This rate is insufficient even for a single company, where hundreds of records per second may be generated.

Regulatory Compliance: If the blockchain system is going to store sensitive or personal data, the way this storage, access, and retrieval of information is handled must align with regulations such as GDPR, HIPAA, etc.

Difficult to Integrate: Almost all applications used by businesses today are compatible with file-sharing protocols like CIFS/SMB and NFS. As a result, applications can read and write to any file server or NAS device that supports these standards. This is not the case with blockchain.

Cloud Object Storage

Object storage is a data storage architecture that manages data as objects, unlike other storage architectures such as file systems, which manage data as a hierarchy of files, and block storage, which manages data as blocks within sectors and tracks. This object storage treats each piece of information (the object) as a string of ones and zeros that can be up to 5 TB in size, with a series of tags or metadata, including a globally unique identifier, associated with that string. These objects are usually files, though they are not treated as such, and once stored, the bulk is referred to as "unstructured file data."

Simple, beautiful, and highly robust.

Given this simplicity, these objects are easily replicable across disk arrays, data centers, or locations, and because these objects have no interrelationships, the infrastructure supporting them is simple and highly scalable.

Public cloud providers typically offer a range of object storage services designed for durability of around 99.999999999% and an availability of 99.99% of objects during a given year. That’s huge. Generally, customers choose the replication and availability levels they want through different storage classes or tiers.

Additionally, they offer encryption of data at rest with keys provided by them, by the customer, or even allow the customer to encrypt the data before uploading it and then re-encrypt it once it is there.

For our purposes, this gives us a robust backend for raw data storage with high replication and confidentiality levels, but we still have problems to solve: how do we efficiently handle all this information? How do we get the basic functionalities of a conventional file system? How do we make the information easily integrable with existing applications or other file systems?

Let’s add a bit of sugar on top.

Nasuni

Nasuni® Cloud File Services™ is based on UniFS®, a downloadable software that is the first global file system designed for modern cloud object storage, allowing users to access their external file storage providers from anywhere quickly, securely, and protected. Through Nasuni Cloud File Services, organizations can store, protect, synchronize, and collaborate on unstructured file data, from active data to inactive data, across all locations.

UniFS is designed around WORM principles and never overwrites an object once it is written. This means that files in the file system remain immutable to UniFS. This also applies to file versions: each file change is timestamped and assigned its own object to provide complete data protection, eliminating the need for separate backup tools or file replication processes.

Each node/device includes Nasuni Continuous (and infinite) File Versioning, a high-performance cache software that periodically takes snapshots of the file system. This continuous snapshot captures changes to the files as they occur and transmits only those changes to the third-party cloud storage system, so the third-party cloud storage system always contains the latest version of each client file. It also provides highly granular file-level data protection that offers better recovery points and times compared to traditional file backup, removing the need for hardware, software, and backup maintenance.

Each change in every file is securely transmitted to the native cloud file system, UniFS, which keeps the immutable record of each file version in public cloud object stores such as Azure or Amazon Web Services.

That said, comparing Nasuni to a blockchain:

Immutable information (state machine): With Nasuni, once data is written, it is never modified. Blockchain systems move from state N to state N+1 every time a block is extracted. With Nasuni's continuous versioning, we can say that the file system moves from state N to state N+1 every time a snapshot is made or a file is modified. All changes are recorded and leave a trace.
A high level of information replication: With Nasuni, there is a high level of information replication "by design," both in the cloud backend and in the different edge/cache nodes of local devices, maximizing both durability and availability. These replication levels are "controllable" in the backend (via native object storage replication) by adding or removing edge cache nodes. Blockchain systems have a simpler architecture and store the information in each node. Many nodes are required to maintain consensus, so information is replicated at an extremely high level.
Fine-grained authentication and authorization: Nasuni volumes and actions can be integrated with directory services like Microsoft AD or Open LDAP, allowing a wide range of authentication and authorization methods for accessing information. Blockchain systems use public keys as identities, meaning real identities are essentially hidden, and they do not have the concept of shared space or "sharing," nor a way to manage authorization beyond having or not having write access. Each user has their own space, and things are passed from one to another.
Easy application integration: Nasuni shares information through common protocols like CIFS or NFS, making it easy to integrate with all existing applications. Blockchain systems require custom development to integrate with an application (aka Dapps).

Conclusion

At their core, blockchains are state machines of exclusive application information that overproduce this stored information to improve its availability and resilience. Due to this high level of replication, storing information tends to be very expensive or even impossible.

With Nasuni combined with cloud object storage, we can leverage the fundamental advantages that make blockchains so praised, while eliminating the drawbacks.

At least not for electronic money systems, but for information storage systems. But remember, today, information is even more valuable than money.

So it makes sense to store it in a vault.

Many industries with highly critical and regulated information processes, such as healthcare, pharmaceuticals, banking, automotive, infrastructure, or transportation, can benefit from a distributed, secure, and highly available information storage system, similar to a blockchain, that enables them to meet regulatory requirements and evolve in a sustainable and healthy way.

You can also read this article in English on Medium.

Por Javier Jiménez

CEO & Founder