Ecosystem

・

December 20, 2022

Flow Ecosystem Priorities: Core Protocol

Flow

・

Team

and

・

Flow Ecosystem Priorities: Core Protocol

Protocol working group responsibility is dedicated to the security, scalability, progressive decentralization, and maintenance of the different subsystems of the protocol.

It operates at the intersection of distributed byzantine fault tolerance systems, protocol economics, Cadence smart contract language, and Governance.

Performance and Scalability Improvements

One of the top priorities is to improve the network throughput TPS to better support the growth and adoption of applications with high transaction volume. While increasing throughput is critical, it is essential that results from achieving this goal remain decentralized and secure. It's imperative that any performance gain does not raise the barrier to entry for node operators.

There is sufficient low-hanging fruit in the performance realm to expect 1000 TPS in the near future. The Core Protocol working group is focused on several workstreams:

Improvements to the execution state code (also known as Ledger) in terms of reducing memory and improving the speed of operations and proof verifications
Parallel execution of non-conflicting transactions, while keeping a deterministic and synchronous ordering of the block

Another important priority for the core protocol is improving the scalability and performance of the execution storage. These workstreams are being prioritized to tackle the long-term state bloating by:

Reducing memory and disk usage by optimizing and compressing stored cadence and FVM values, improving the structure of the merkle tree
Optimizing storage operation performance by minimizing bytes read/written during a transaction through consolidating stored values based on the access patterns (e.g. improvements to the Atree)
Reducing proof sizes and memory usage by reducing the depth of the merkle trees using alternative cryptographic solutions (e.g. vector commitments)

Zero-downtime rolling upgrades of the protocol

Software upgrades and maintenance are an inevitable part of any decentralized service, and for Flow, these are achieved via “sporks.” A spork usually happens about once every two months, currently with a temporary downtime of roughly 90 minutes. Community node operators and the protocol engineering team are responsible for improving the network uptime, reducing spork maintenance windows and getting to the state of rolling upgrades with zero downtime.

The current goal is to move toward a zero downtime protocol upgrade when no data migrations are required, which normally only occur when the data format has changed. This can be achieved through a height coordinated protocol upgrade, similar to strategies used by other chains. In the near term, a spork may still be required until the height coordinated upgrade process matures and if there has been an update to data storage format. Because of this, the current migration and sporking process are also being optimized, and it should be possible to spork within a 30-minute period when a spork does not involve any data storage format upgrade. Reducing spork frequency to every quarter plus frequent rolling upgrades will ensure that the protocol is providing more seamless upgrades without compromising the pace of innovation.

The Core Protocol working group is working on more concrete ideas for achieving zero-downtime state migrations and will update this page at an appropriate time.

Scalable peer-to-peer network for data retrievability

At the moment, Flow's core protocol does not specify any means for permanently storing all of the execution state changes that it commits over time. The amount of data is too large to achieve this, plus it would be a waste of valuable on-chain storage for storing historical data. Introducing observer and archival nodes would allow users to extract data locally for archival and verification purposes. Still, there is a need for an economically viable solution to access historical data without running a full archival node, which currently requires full data center grade hardware. There are a few ideas on who will store the historical data including:

Incentivized marketplaces in decentralized protocols that can provide historical data with merkle proofs
Managed service platforms like Coinbase Cloud, Block Daemon, Quicknode, Infura that can run archival nodes
Clients in a peer-to-peer (P2P) network that could store random portions of chain history. All clients in this voluntary P2P network provide the data and functionality necessary to expose the standard gRPC & REST API

The P2P network can be designed to ensure that clients participating in these networks can do so with minimal networking bandwidth, CPU, RAM, and HDD resources. In the short term, historical access can be provided by Dapper Labs’ archival nodes. However, in the mid to long term, serving historical access through other protocols or community incentives will be more efficient.

We invite the community to propose additional features for Flow’s execution-state sync protocol. The goal is to enable valuable add-on services utilizing the execution state that are beyond the scope of the core protocol but require (small) additional protocol functionality to be implemented. The community can contribute to this workstream via the Flow developer grants program. If you would like to contribute, please get in touch.

Improve byzantine fault tolerance

Byzantine fault tolerance [BFT] refers to the network's ability to securely operate even in the presence of actively malicious (aka byzantine) nodes in a decentralized network, which is key to unlocking permissionless node operations. It necessitates that all nodes must be resilient to any conceivable malicious actions originating from the permissionless nodes. BFT can be partitioned into three classes of attacks: message-level attacks (including impersonation or masquerade attacks), protocol-level attacks (multiple individually-valid messages constitute a protocol violation), and spamming.

As a first step, a network-wide framework for mitigating impersonation or masquerade attacks needs to lay the foundations. A new BFT protocol for execution-state replication will complement the network's protection and close the remaining surface for protocol-level attacks by Access Nodes, Observer Nodes, and Archival Nodes, thereby enabling their permissionless operation.

In later steps, core contributors should focus on further hardening the execution-state replication protocol against large-scale spamming and DDoS attacks, thereby allowing us to further increase the number of permissionless Access, Observer, and Archival Nodes.

At this time, the Core Protocol working group is seeking peer reviews for the execution-state-replication protocol for spamming vulnerabilities.

Improve runtime resilience of consensus

Flow uses HotStuff, a leader-based consensus algorithm. Over the last two years, considerable research advancements have been made in this domain. Specifically, the Jolteon protocol (June 2021) improves on the original HotStuff [v6] in two important areas:

On the happy path Jolteon only requires two additional rounds to finalize a block (vs 3 for HotStuff v6)
Jolteon incorporates a PaceMaker

Jolteon's PaceMaker utilizes dedicated messages for BFT view synchronization, which substantially improves the protocol's resilience to a broad class of failure scenarios, including successive leader failures, network partitions, unfavorable bootstrapping conditions, etc. Jolteon's key advancements have also been adopted by the Diem team (formerly Facebook) resulting in DiemBFT v4 (August 2021).

After a comprehensive review of cutting-edge approaches, the Core Protocol team has decided to adopt Jolteon (with some minor revisions). Furthermore, the PaceMaker-generated Timeout Certificates are also a prerequisite for byzantine-resilient Epoch switchover in Flow. The research and implementation are already ongoing since the beginning of the year.

At this time, the group is inviting help to further modularize the Flow protocol code base. A standalone consensus implementation for research purposes that shares the code base with Flow's production implementation would be particularly helpful (see details here). Furthermore, we are grateful for contributions from the academic community regarding the optimal parametrization of the Jolteon PaceMaker (see details here).

Permissionless Node Operations

The goal is to be fully permissionless for all node types, but the network can enable active ownership with an increasingly broad set of community node operators through progressive decentralization and a stepwise process. In short, we are committed to fully empowering any willing participant to contribute and benefit from their efforts within the Flow ecosystem. This is accomplished through progressive decentralization — a process in which we will relinquish control by degrees over time. Taking a step-by-step approach allows us to focus and create a path toward a secure network.

With the introduction of the Observer Node and, soon, the Archival Node, the network unlocks participation opportunities for everyone without any staking. Having your own node means you do not have to rely on third parties for the state of the network. You may not reap the same financial gains as the fully-staked nodes but you may see other benefits such as privacy, security, balanced load distribution, reduced reliance on third-party servers, and network decentralization.

The permissionless node operations for a staked Access Node will need a new operator selection algorithm that is transparent to the community. Extensive research has gone into implementing byzantine fault tolerance [BFT] protection against any attacks that may arise from a permissionless byzantine actor in the ecosystem. This workstream can unlock permissionless operations for other node types (Verification, Collection, Consensus) in the future.

At this time, the working group is seeking peer review and inviting bounty proposals for battle-testing the feature. If you want to know more, you are invited to get in touch!

Introduce a new staked node operator selection algorithm

Historically, to become a node operator, you would need to ensure you meet the minimum staking and the program requirements to be approved. While this manual process ensured the continued security of the network, the amount of $FLOW required to be eligible to become a staked node operator was extremely high, which ruled out many aspiring operators. The selection process also lacked transparency to the public. Flow should introduce a process of node operator selection which can relinquish control by degrees over time. This can be done in two steps:

The introduction of automated slot assignment (staking slots)
Reducing the minimum staking requirements

The idea of staking slots is to build an automated process for including new nodes in the staking table while managing the max number of nodes per node type. This process can be extended in the future with an auction mechanism for selecting node operators in a fully permissionless way.

Reducing minimum staking requirements is still under research to ensure the stake is high enough to not compromise the network's security.

Support Massive State Sizes

Flow’s execution state is stored in a merkle trie data structure. The current implementation of this in the Flow node software, called mtrie, resides fully in-memory and is optimized for high-performance applications on Execution nodes. Since the execution state is very large, there is a tradeoff between performance and node hardware requirements. As a result, Execution nodes require large amounts of memory (512 GB of RAM) to store the entire trie plus a small amount of history fully in memory.

For other applications like Archival nodes, storage capacity and relaxed hardware requirements are more important than performance so a scalable trie optimized for large data sets is desired. This would allow for nodes running on consumer-grade hardware but with large disks.

We would like to invite community contributions to this workstream via the Flow Developer Grants program.

Integration of Cryptography advancements

‍Currently, the core group is hardening the implemented cryptography primitives and protocols further to improve the byzantine fault tolerance of the chain. Such improvements include integrating the proof of possession (PoP) of the BLS private key of node operators into the staking process. PoP is the defense chosen by the protocol to secure the multiple BLS signature aggregations and the BLS-based SPoCK scheme. BLS signatures and BLS-based threshold-signature implementations are continuously being updated to include the latest performance advancements and improve their standard compliance. The Pedersen-based distributed key generation of the random beacon is also being updated to be more resilient against malicious behaviors.

Moreover, the protocol attempts to use a few additional cryptography research areas to improve the chain's security and performance. One such area is the BLS-based Specialized Proof of Confidential Knowledge (SPoCK), which is being extended to support aggregations and optimize the protocol's sealing mechanism.

Other possible research topics include exploration into crypto accumulators and vector commitments.

The current chair of the Core Protocol working group is Alex Hentschel and the core contributors are Dapper Labs, Coinbase Cloud, NCC Group, Halborn and Metrika. The working group is collaborating in the Core Protocol GitHub and in the near future will host a public R&D meeting to discuss updates to work streams, cross-functional brainstorming, and receive feedback from one another.