What is Solana?

Solana is an open source project implementing a new, high-performance, permissionless blockchain. Solana is also the name of a company headquartered in San Francisco that maintains the open source project.

About this Book

This book describes the Solana open source project, a blockchain built from the ground up for scale. The book covers why Solana is useful, how to use it, how it works, and why it will continue to work long after the company Solana closes its doors. The goal of the Solana architecture is to demonstrate there exists a set of software algorithms that when used in combination to implement a blockchain, removes software as a performance bottleneck, allowing transaction throughput to scale proportionally with network bandwidth. The architecture goes on to satisfy all three desirable properties of a proper blockchain: it is scalable, secure and decentralized.

The architecture describes a theoretical upper bound of 710 thousand transactions per second (tps) on a standard gigabit network and 28.4 million tps on 40 gigabit. Furthermore, the architecture supports safe, concurrent execution of programs authored in general purpose programming languages such as C or Rust.

Disclaimer

All claims, content, designs, algorithms, estimates, roadmaps, specifications, and performance measurements described in this project are done with the author's best effort. It is up to the reader to check and validate their accuracy and truthfulness. Furthermore, nothing in this project constitutes a solicitation for investment.

History of the Solana Codebase

In November of 2017, Anatoly Yakovenko published a whitepaper describing Proof of History, a technique for keeping time between computers that do not trust one another. From Anatoly's previous experience designing distributed systems at Qualcomm, Mesosphere and Dropbox, he knew that a reliable clock makes network synchronization very simple. When synchronization is simple the resulting network can be blazing fast, bound only by network bandwidth.

Anatoly watched as blockchain systems without clocks, such as Bitcoin and Ethereum, struggled to scale beyond 15 transactions per second worldwide when centralized payment systems such as Visa required peaks of 65,000 tps. Without a clock, it was clear they'd never graduate to being the global payment system or global supercomputer most had dreamed them to be. When Anatoly solved the problem of getting computers that don’t trust each other to agree on time, he knew he had the key to bring 40 years of distributed systems research to the world of blockchain. The resulting cluster wouldn't be just 10 times faster, or a 100 times, or a 1,000 times, but 10,000 times faster, right out of the gate!

Anatoly's implementation began in a private codebase and was implemented in the C programming language. Greg Fitzgerald, who had previously worked with Anatoly at semiconductor giant Qualcomm Incorporated, encouraged him to reimplement the project in the Rust programming language. Greg had worked on the LLVM compiler infrastructure, which underlies both the Clang C/C++ compiler as well as the Rust compiler. Greg claimed that the language's safety guarantees would improve software productivity and that its lack of a garbage collector would allow programs to perform as well as those written in C. Anatoly gave it a shot and just two weeks later, had migrated his entire codebase to Rust. Sold. With plans to weave all the world's transactions together on a single, scalable blockchain, Anatoly called the project Loom.

On February 13th of 2018, Greg began prototyping the first open source implementation of Anatoly's whitepaper. The project was published to GitHub under the name Silk in the loomprotocol organization. On February 28th, Greg made his first release, demonstrating 10 thousand signed transactions could be verified and processed in just over half a second. Shortly after, another former Qualcomm cohort, Stephen Akridge, demonstrated throughput could be massively improved by offloading signature verification to graphics processors. Anatoly recruited Greg, Stephen and three others to co-found a company, then called Loom.

Around the same time, Ethereum-based project Loom Network sprung up and many people were confused about whether they were the same project. The Loom team decided it would rebrand. They chose the name Solana, a nod to a small beach town North of San Diego called Solana Beach, where Anatoly, Greg and Stephen lived and surfed for three years when they worked for Qualcomm. On March 28th, the team created the Solana Labs GitHub organization and renamed Greg's prototype Silk to Solana.

In June of 2018, the team scaled up the technology to run on cloud-based networks and on July 19th, published a 50-node, permissioned, public testnet consistently supporting bursts of 250,000 transactions per second. In a later release in December, called v0.10 Pillbox, the team published a permissioned testnet running 150 nodes on a gigabit network and demonstrated soak tests processing an average of 200 thousand transactions per second with bursts over 500 thousand. The project was also extended to support on-chain programs written in the C programming language and run concurrently in a safe execution environment called BPF.

What is a Solana Cluster?

A cluster is a set of computers that work together and can be viewed from the outside as a single system. A Solana cluster is a set of independently owned computers working together (and sometimes against each other) to verify the output of untrusted, user-submitted programs. A Solana cluster can be utilized any time a user wants to preserve an immutable record of events in time or programmatic interpretations of those events. One use is to track which of the computers did meaningful work to keep the cluster running. Another use might be to track the possession of real-world assets. In each case, the cluster produces a record of events called the ledger. It will be preserved for the lifetime of the cluster. As long as someone somewhere in the world maintains a copy of the ledger, the output of its programs (which may contain a record of who possesses what) will forever be reproducible, independent of the organization that launched it.

What are Sols?

A sol is the name of Solana's native token, which can be passed to nodes in a Solana cluster in exchange for running an on-chain program or validating its output. The Solana protocol defines that only 1 billion sols will ever exist, but that the system may perform micropayments of fractional sols, and that a sol may be split as many as 34 times. The fractional sol is called a lamport. It is named in honor of Solana's biggest technical influence, Leslie Lamport. A lamport has a value of approximately 0.0000000000582 sol (2^-34).

Terminology

The following terms are used throughout this book.

account

A persistent file addressed by public key and with lamports tracking its lifetime.

app

A front-end application that interacts with a Solana cluster.

blob

A fraction of a block; the smallest unit sent between fullnodes.

block

A contiguous set of entries on the ledger covered by a vote. A leader produces at most one block per slot.

block height

The number of blocks beneath the current block. The first block after the genesis block has height zero.

block id

The entry id of the last entry in a block.

bootstrap leader

The first fullnode to take the leader role.

CBC block

Smallest encrypted chunk of ledger, an encrypted ledger segment would be made of many CBC blocks. ledger_segment_size / cbc_block_size to be exact.

client

A node that utilizes the cluster.

cluster

A set of fullnodes maintaining a single ledger.

confirmation

The wallclock duration between a leader creating a tick entry and recognizing a supermajority of ledger votes with a ledger interpretation that matches the leader's.

control plane

A gossip network connecting all nodes of a cluster.

data plane

A multicast network used to efficiently validate entries and gain consensus.

drone

An off-chain service that acts as a custodian for a user's private key. It typically serves to validate and sign transactions.

fake storage proof

A proof which has the same format as a storage proof, but the sha state is actually from hashing a known ledger value which the storage client can reveal and is also easily verifiable by the network on-chain.

entry

An entry on the ledger either a tick or a transactions entry.

entry id

A globally unique identifier that is also a proof that the entry was generated after a duration of time, all transactions included in the entry, and all previous entries on the ledger. See Proof of History.

epoch

The time, i.e. number of slots, for which a leader schedule is valid.

finality

When nodes representing 2/3rd of the stake have a common root.

fork

A ledger derived from common entries but then diverged.

fullnode

A full participant in the cluster either a leader or validator node.

fullnode state

The result of interpreting all programs on the ledger at a given tick height. It includes at least the set of all accounts holding nonzero native tokens.

genesis block

The configuration file that prepares the ledger for the first block.

hash

A digital fingerprint of a sequence of bytes.

instruction

The smallest unit of a program that a client can include in a transaction.

keypair

A public key and corresponding secret key.

lamport

A fractional native token with the value of approximately 0.0000000000582 sol (2^-34).

loader

A program with the ability to interpret the binary encoding of other on-chain programs.

leader

The role of a fullnode when it is appending entries to the ledger.

leader schedule

A sequence of fullnode public keys. The cluster uses the leader schedule to determine which fullnode is the leader at any moment in time.

ledger

A list of entries containing transactions signed by clients.

ledger segment

Portion of the ledger which is downloaded by the replicator where storage proof data is derived.

ledger vote

A hash of the fullnode's state at a given tick height. It comprises a validator's affirmation that a block it has received has been verified, as well as a promise not to vote for a conflicting block (i.e. fork) for a specific amount of time, the lockout period.

light client

A type of client that can verify it's pointing to a valid cluster. It performs more ledger verification than a thin client and less than a fullnode.

lockout

The duration of time for which a fullnode is unable to vote on another fork.

native token

The token used to track work done by nodes in a cluster.

node

A computer participating in a cluster.

node count

The number of fullnodes participating in a cluster.

PoH

See Proof of History.

program

The code that interprets instructions.

program id

The public key of the account containing a program.

Proof of History

A stack of proofs, each which proves that some data existed before the proof was created and that a precise duration of time passed before the previous proof. Like a VDF, a Proof of History can be verified in less time than it took to produce.

public key

The public key of a keypair.

replicator

Storage mining client, stores some part of the ledger enumerated in blocks and submits storage proofs to the chain. Not a full-node.

root

A block or slot that has reached maximum lockout on a validator. The root is the highest block that is an ancestor of all active forks on a validator. All ancestor blocks of a root are also transitively a root. Blocks that are not an ancestor and not a descendant of the root are excluded from consideration for consensus and can be discarded.

runtime

The component of a fullnode responsible for program execution.

secret key

The private key of a keypair.

slot

The period of time for which a leader ingests transactions and produces a block.

sol

The native token tracked by a cluster recognized by the company Solana.

stake

Tokens forfeit to the cluster if malicious fullnode behavior can be proven.

storage proof

A set of sha hash state which is constructed by sampling the encrypted version of the stored ledger segment at certain offsets.

storage proof challenge

A transaction from a replicator that verifiably proves that a validator confirmed a fake proof.

storage proof claim

A transaction from a validator which is after the timeout period given from the storage proof confirmation and which no successful challenges have been observed which rewards the parties of the storage proofs and confirmations.

storage proof confirmation

A transaction by a validator which indicates the set of real and fake proofs submitted by a storage miner. The transaction would contain a list of proof hash values and a bit which says if this hash is valid or fake.

storage validation capacity

The number of keys and samples that a validator can verify each storage epoch.

thin client

A type of client that trusts it is communicating with a valid cluster.

tick

A ledger entry that estimates wallclock duration.

tick height

The Nth tick in the ledger.

token

A scarce, fungible member of a set of tokens.

tps

Transactions per second.

transaction

One or more instructions signed by the client and executed atomically.

transactions entry

A set of transactions that may be executed in parallel.

validator

The role of a fullnode when it is validating the leader's latest entries.

VDF

See verifiable delay function.

verifiable delay function

A function that takes a fixed amount of time to execute that produces a proof that it ran, which can then be verified in less time than it took to produce.

vote

See ledger vote.

Getting Started

The Solana git repository contains all the scripts you might need to spin up your own local testnet. Depending on what you're looking to achieve, you may want to run a different variation, as the full-fledged, performance-enhanced multinode testnet is considerably more complex to set up than a Rust-only, singlenode testnode. If you are looking to develop high-level features, such as experimenting with smart contracts, save yourself some setup headaches and stick to the Rust-only singlenode demo. If you're doing performance optimization of the transaction pipeline, consider the enhanced singlenode demo. If you're doing consensus work, you'll need at least a Rust-only multinode demo. If you want to reproduce our TPS metrics, run the enhanced multinode demo.

For all four variations, you'd need the latest Rust toolchain and the Solana source code:

First, install Rust's package manager Cargo.

$ curl https://sh.rustup.rs -sSf | sh
$ source $HOME/.cargo/env

Now checkout the code from github:

$ git clone https://github.com/solana-labs/solana.git
$ cd solana

The demo code is sometimes broken between releases as we add new low-level features, so if this is your first time running the demo, you'll improve your odds of success if you check out the latest release before proceeding:

$ TAG=$(git describe --tags $(git rev-list --tags --max-count=1))
$ git checkout $TAG

Configuration Setup

Ensure important programs such as the vote program are built before any nodes are started

$ cargo build --all

The network is initialized with a genesis ledger generated by running the following script.

$ ./multinode-demo/setup.sh

Drone

In order for the fullnodes and clients to work, we'll need to spin up a drone to give out some test tokens. The drone delivers Milton Friedman-style "air drops" (free tokens to requesting clients) to be used in test transactions.

Start the drone with:

$ ./multinode-demo/drone.sh

Singlenode Testnet

Before you start a validator, make sure you know the IP address of the machine you want to be the bootstrap leader for the demo, and make sure that udp ports 8000-10000 are open on all the machines you want to test with.

Now start the bootstrap leader in a separate shell:

$ ./multinode-demo/bootstrap-leader.sh

Wait a few seconds for the server to initialize. It will print "leader ready..." when it's ready to receive transactions. The leader will request some tokens from the drone if it doesn't have any. The drone does not need to be running for subsequent leader starts.

Multinode Testnet

To run a multinode testnet, after starting a leader node, spin up some additional validators in separate shells:

$ ./multinode-demo/validator-x.sh

To run a performance-enhanced full node on Linux, CUDA 10.0 must be installed on your system:

$ ./fetch-perf-libs.sh
$ SOLANA_CUDA=1 ./multinode-demo/bootstrap-leader.sh
$ SOLANA_CUDA=1 ./multinode-demo/validator.sh

Testnet Client Demo

Now that your singlenode or multinode testnet is up and running let's send it some transactions!

In a separate shell start the client:

$ ./multinode-demo/client.sh # runs against localhost by default

What just happened? The client demo spins up several threads to send 500,000 transactions to the testnet as quickly as it can. The client then pings the testnet periodically to see how many transactions it processed in that time. Take note that the demo intentionally floods the network with UDP packets, such that the network will almost certainly drop a bunch of them. This ensures the testnet has an opportunity to reach 710k TPS. The client demo completes after it has convinced itself the testnet won't process any additional transactions. You should see several TPS measurements printed to the screen. In the multinode variation, you'll see TPS measurements for each validator node as well.

Testnet Debugging

There are some useful debug messages in the code, you can enable them on a per-module and per-level basis. Before running a leader or validator set the normal RUST_LOG environment variable.

For example

  • To enable info everywhere and debug only in the solana::banking_stage module:

    $ export RUST_LOG=solana=info,solana::banking_stage=debug
    
  • To enable BPF program logging:

    $ export RUST_LOG=solana_bpf_loader=trace
    

Generally we are using debug for infrequent debug messages, trace for potentially frequent messages and info for performance-related logging.

You can also attach to a running process with GDB. The leader's process is named solana-validator:

$ sudo gdb
attach <PID>
set logging on
thread apply all bt

This will dump all the threads stack traces into gdb.txt

Public Testnet

In this example the client connects to our public testnet. To run validators on the testnet you would need to open udp ports 8000-10000.

$ ./multinode-demo/client.sh --entrypoint testnet.solana.com:8001 --drone testnet.solana.com:9900 --duration 60 --tx_count 50

You can observe the effects of your client's transactions on our dashboard

Testnet Participation

This document describes how to participate in the testnet as a validator node.

Please note some of the information and instructions described here may change in future releases.

Overview

The testnet features a validator running at testnet.solana.com, which serves as the entrypoint to the cluster for your validator.

Additionally there is a blockexplorer available at http://testnet.solana.com/.

The testnet is configured to reset the ledger daily, or sooner should the hourly automated cluster sanity test fail.

There is a #validator-support Discord channel available to reach other testnet participants, https://discord.gg/pquxPsq.

Also we'd love it if you choose to register your validator node with us at https://forms.gle/LfFscZqJELbuUP139.

Machine Requirements

Since the testnet is not intended for stress testing of max transaction throughput, a higher-end machine with a GPU is not necessary to participate.

However ensure the machine used is not behind a residential NAT to avoid NAT traversal issues. A cloud-hosted machine works best. Ensure that IP ports 8000 through 10000 are not blocked for Internet inbound and outbound traffic.

Prebuilt binaries are available for Linux x86_64 (Ubuntu 18.04 recommended). MacOS or WSL users may build from source.

For a performance testnet with many transactions we have some preliminary recommended setups:

Low endMedium endHigh endNotes
CPUAMD Threadripper 1900xAMD Threadripper 2920xAMD Threadripper 2950xConsider a 10Gb-capable motherboard with as many PCIe lanes and m.2 slots as possible.
RAM16GB32GB64GB
OS DriveSamsung 860 Evo 2TBSamsung 860 Evo 4TBSamsung 860 Evo 4TBOr equivalent SSD
Accounts Drive(s)NoneSamsung 970 Pro 1TB2x Samsung 970 Pro 1TB
GPU4x Nvidia 1070 or 2x Nvidia 1080 Ti or 2x Nvidia 20702x Nvidia 2080 Ti4x Nvidia 2080 TiAny number of cuda-capable GPUs are supported on Linux platforms.

GPU Requirements

CUDA is required to make use of the GPU on your system. The provided Solana release binaries are built on Ubuntu 18.04 with CUDA Toolkit 10.1 update 1". If your machine is using a different CUDA version then you will need to rebuild from source.

Confirm The Testnet Is Reachable

Before attaching a validator node, sanity check that the cluster is accessible to your machine by running some simple commands. If any of the commands fail, please retry 5-10 minutes later to confirm the testnet is not just restarting itself before debugging further.

Fetch the current transaction count over JSON RPC:

$ curl -X POST -H 'Content-Type: application/json' -d '{"jsonrpc":"2.0","id":1, "method":"getTransactionCount"}' http://testnet.solana.com:8899

Inspect the blockexplorer at http://testnet.solana.com/ for activity.

View the metrics dashboard for more detail on cluster activity.

Validator Setup

Obtaining The Software

Bootstrap with solana-install

The solana-install tool can be used to easily install and upgrade the cluster software on Linux x86_64 and mac OS systems.

$ curl -sSf https://raw.githubusercontent.com/solana-labs/solana/v0.17.0/install/solana-install-init.sh | sh -s

Alternatively build the solana-install program from source and run the following command to obtain the same result:

$ solana-install init

After a successful install, solana-install update may be used to easily update the cluster software to a newer version at any time.

Download Prebuilt Binaries

If you would rather not use solana-install to manage the install, you can manually download and install the binaries.

Linux

Download the binaries by navigating to https://github.com/solana-labs/solana/releases/latest, download solana-release-x86_64-unknown-linux-gnu.tar.bz2, then extract the archive:

$ tar jxf solana-release-x86_64-unknown-linux-gnu.tar.bz2
$ cd solana-release/
$ export PATH=$PWD/bin:$PATH
mac OS

Download the binaries by navigating to https://github.com/solana-labs/solana/releases/latest, download solana-release-x86_64-apple-darwin.tar.bz2, then extract the archive:

$ tar jxf solana-release-x86_64-apple-darwin.tar.bz2
$ cd solana-release/
$ export PATH=$PWD/bin:$PATH
Build From Source

If you are unable to use the prebuilt binaries or prefer to build it yourself from source, navigate to https://github.com/solana-labs/solana/releases/latest, and download the Source Code archive. Extract the code and build the binaries with:

$ ./scripts/cargo-install-all.sh .
$ export PATH=$PWD/bin:$PATH

If building for CUDA (Linux only), fetch the perf-libs first then include the cuda feature flag when building:

$ ./fetch-perf-libs.sh
$ source /home/mvines/ws/solana/target/perf-libs/env.sh
$ ./scripts/cargo-install-all.sh . cuda
$ export PATH=$PWD/bin:$PATH

Starting The Validator

Sanity check that you are able to interact with the cluster by receiving a small airdrop of lamports from the testnet drone:

$ solana-wallet airdrop 123
$ solana-wallet balance

Also try running following command to join the gossip network and view all the other nodes in the cluster:

$ solana-gossip --entrypoint testnet.solana.com:8001 spy
# Press ^C to exit

Now configure a key pair for your validator by running:

$ solana-keygen new -o ~/validator-keypair.json

Then use one of the following commands, depending on your installation choice, to start the node:

If this is a solana-install-installation:

$ validator.sh --identity ~/validator-keypair.json --config-dir ~/validator-config --rpc-port 8899 --poll-for-new-genesis-block testnet.solana.com

Alternatively, the solana-install run command can be used to run the validator node while periodically checking for and applying software updates:

$ solana-install run validator.sh -- --identity ~/validator-keypair.json --config-dir ~/validator-config --rpc-port 8899 --poll-for-new-genesis-block testnet.solana.com

If you built from source:

$ NDEBUG=1 USE_INSTALL=1 ./multinode-demo/validator.sh --identity ~/validator-keypair.json --rpc-port 8899 --poll-for-new-genesis-block testnet.solana.com

Enabling CUDA

By default CUDA is disabled. If your machine has a GPU with CUDA installed, define the SOLANA_CUDA flag in your environment before running any of the previusly mentioned commands

$ export SOLANA_CUDA=1

When your validator is started look for the following log message to indicate that CUDA is enabled: "[<timestamp> solana::validator] CUDA is enabled"

Controlling local network port allocation

By default the validator will dynamically select available network ports in the 8000-10000 range, and may be overridden with --dynamic-port-range. For example, validator.sh --dynamic-port-range 11000-11010 ... will restrict the validator to ports 11000-11011.

Validator Monitoring

When validator.sh starts, it will output a validator configuration that looks similar to:

======================[ validator configuration ]======================
identity pubkey: 4ceWXsL3UJvn7NYZiRkw7NsryMpviaKBDYr8GK7J61Dm
vote pubkey: 2ozWvfaXQd1X6uKh8jERoRGApDqSqcEy6fF1oN13LL2G
ledger: ...
accounts: ...
======================================================================

The identity pubkey for your validator can also be found by running:

$ solana-keygen pubkey ~/validator-keypair.json

From another console, confirm the IP address and identity pubkey of your validator is visible in the gossip network by running:

$ solana-gossip --entrypoint testnet.solana.com:8001 spy

Provide the vote pubkey to the solana-wallet show-vote-account command to view the recent voting activity from your validator:

$ solana-wallet show-vote-account 2ozWvfaXQd1X6uKh8jERoRGApDqSqcEy6fF1oN13LL2G

The vote pubkey for the validator can also be found by running:

# If this is a `solana-install`-installation run:
$ solana-keygen pubkey ~/.local/share/solana/install/active_release/config-local/validator-vote-keypair.json
# Otherwise run:
$ solana-keygen pubkey ./config-local/validator-vote-keypair.json

Validator Metrics

Metrics are available for local monitoring of your validator.

Docker must be installed and the current user added to the docker group. Then download solana-metrics.tar.bz2 from the Github Release and run

$ tar jxf solana-metrics.tar.bz2
$ cd solana-metrics/
$ ./start.sh

A local InfluxDB and Grafana instance is now running on your machine. Define SOLANA_METRICS_CONFIG in your environment as described at the end of the start.sh output and restart your validator.

Metrics should now be streaming and visible from your local Grafana dashboard.

Timezone For Log Messages

Log messages emitted by your validator include a timestamp. When sharing logs with others to help triage issues, that timestamp can cause confusion as it does not contain timezone information.

To make it easier to compare logs between different sources we request that everybody use Pacific Time on their validator nodes. In Linux this can be accomplished by running:

$ sudo ln -sf /usr/share/zoneinfo/America/Los_Angeles /etc/localtime

Publishing Validator Info

You can publish your validator information to the chain to be publicly visible to other users.

Run the solana-validator-info CLI to populate a validator-info account:

$ solana-validator-info publish ~/validator-keypair.json <VALIDATOR_NAME> <VALIDATOR_INFO_ARGS>

Optional fields for VALIDATOR_INFO_ARGS:

  • Website
  • Keybase Username
  • Details
Keybase

Including a Keybase username allows client applications (like the Solana Network Explorer) to automatically pull in your validator public profile, including cryptographic proofs, brand identity, etc. To connect your validator pubkey with Keybase:

  1. Join https://keybase.io/ and complete the profile for your validator
  2. Add your validator identity pubkey to Keybase:
  • Create an empty file on your local computer called validator-<PUBKEY>
  • In Keybase, navigate to the Files section, and upload your pubkey file to a solana subdirectory in your public folder: /keybase/public/<KEYBASE_USERNAME>/solana
  • To check your pubkey, ensure you can successfully browse to https://keybase.pub/<KEYBASE_USERNAME>/solana/validator-<PUBKEY>
  1. Add or update your solana-validator-info with your Keybase username. The CLI will verify the validator-<PUBKEY> file

Testnet Replicator

This document describes how to setup a replicator in the testnet

Please note some of the information and instructions described here may change in future releases.

Overview

Replicators are specialized light clients. They download a part of the ledger (a.k.a Segment) and store it. They earn rewards for storing segments.

The testnet features a validator running at testnet.solana.com, which serves as the entrypoint to the cluster for your replicator node.

Additionally there is a blockexplorer available at http://testnet.solana.com/.

The testnet is configured to reset the ledger daily, or sooner should the hourly automated cluster sanity test fail.

Machine Requirements

Replicators don't need specialized hardware. Anything with more than 128GB of disk space will be able to participate in the cluster as a replicator node.

Currently the disk space requirements are very low but we expect them to change in the future.

Prebuilt binaries are available for Linux x86_64 (Ubuntu 18.04 recommended), macOS, and Windows.

Confirm The Testnet Is Reachable

Before starting a replicator node, sanity check that the cluster is accessible to your machine by running some simple commands. If any of the commands fail, please retry 5-10 minutes later to confirm the testnet is not just restarting itself before debugging further.

Fetch the current transaction count over JSON RPC:

$ curl -X POST -H 'Content-Type: application/json' -d '{"jsonrpc":"2.0","id":1, "method":"getTransactionCount"}' http://testnet.solana.com:8899

Inspect the blockexplorer at http://testnet.solana.com/ for activity.

View the metrics dashboard for more detail on cluster activity.

Replicator Setup

Obtaining The Software
Bootstrap with solana-install

The solana-install tool can be used to easily install and upgrade the cluster software.

Linux and mac OS
$ curl -sSf https://raw.githubusercontent.com/solana-labs/solana/v0.17.0/install/solana-install-init.sh | sh -s

Alternatively build the solana-install program from source and run the following command to obtain the same result:

$ solana-install init
Windows

Download and install solana-install-init from https://github.com/solana-labs/solana/releases/latest

After a successful install, solana-install update may be used to easily update the software to a newer version at any time.

Download Prebuilt Binaries

If you would rather not use solana-install to manage the install, you can manually download and install the binaries.

Linux

Download the binaries by navigating to https://github.com/solana-labs/solana/releases/latest, download solana-release-x86_64-unknown-linux-gnu.tar.bz2, then extract the archive:

$ tar jxf solana-release-x86_64-unknown-linux-gnu.tar.bz2
$ cd solana-release/
$ export PATH=$PWD/bin:$PATH
mac OS

Download the binaries by navigating to https://github.com/solana-labs/solana/releases/latest, download solana-release-x86_64-apple-darwin.tar.bz2, then extract the archive:

$ tar jxf solana-release-x86_64-apple-darwin.tar.bz2
$ cd solana-release/
$ export PATH=$PWD/bin:$PATH
Windows

Download the binaries by navigating to https://github.com/solana-labs/solana/releases/latest, download solana-release-x86_64-pc-windows-msvc.tar.bz2, then extract it into a folder. It is a good idea to add this extracted folder to your windows PATH.

Starting The Replicator

Try running following command to join the gossip network and view all the other nodes in the cluster:

$ solana-gossip --entrypoint testnet.solana.com:8001 spy
# Press ^C to exit

Now configure the keypairs for your replicator by running:

Navigate to the solana install location and open a cmd prompt

$ solana-keygen new -o replicator-keypair.json
$ solana-keygen new -o storage-keypair.json

Use solana-keygen to show the public keys for each of the keypairs, they will be needed in the next step:

  • Windows
# The replicator's identity
$ solana-keygen pubkey replicator-keypair.json
$ solana-keygen pubkey storage-keypair.json
  • Linux and mac OS
$ export REPLICATOR_IDENTITY=$(solana-keygen pubkey replicator-keypair.json)
$ export STORAGE_IDENTITY=$(solana-keygen pubkey storage-keypair.json)

Then set up the storage accounts for your replicator by running:

$ solana-wallet --keypair replicator-keypair.json airdrop 100000
$ solana-wallet --keypair replicator-keypair.json create-replicator-storage-account $REPLICATOR_IDENTITY $STORAGE_IDENTITY

Note: Every time the testnet restarts, run the wallet steps to setup the replicator accounts again.

To start the replicator:

$ solana-replicator --entrypoint testnet.solana.com:8001 --identity replicator-keypair.json --storage-keypair storage-keypair.json --ledger replicator-ledger

Verify Replicator Setup

From another console, confirm the IP address and identity pubkey of your replicator is visible in the gossip network by running:

$ solana-gossip --entrypoint testnet.solana.com:8001 spy

Provide the storage account pubkey to the solana-wallet show-storage-account command to view the recent mining activity from your replicator:

$ solana-wallet --keypair storage-keypair.json show-storage-account $STORAGE_IDENTITY

Example app: Web Wallet

Build and run a web wallet locally

First fetch the example code:

$ git clone https://github.com/solana-labs/example-webwallet.git
$ cd example-webwallet
$ TAG=$(git describe --tags $(git rev-list --tags
--max-count=1))
$ git checkout $TAG

Next, follow the steps in the git repository's README.

Programming Model

A client app interacts with a Solana cluster by sending it transactions with one or more instructions. The Solana runtime passes those instructions to user-contributed programs. An instruction might, for example, tell a program to transfer lamports from one account to another or create an interactive contract that governs how lamports are transfered. Instructions are executed atomically. If any instruction is invalid, any changes made within the transaction are discarded.

Deploying Programs to a Cluster

SDK tools

As shown in the diagram above a client creates a program and compiles it to an ELF shared object containing BPF bytecode and sends it to the Solana cluster. The cluster stores the program locally and makes it available to clients via a program ID. The program ID is a public key generated by the client and is used to reference the program in subsequent transactions.

A program may be written in any programming language that can target the Berkley Packet Filter (BPF) safe execution environment. The Solana SDK offers the best support for C programs, which is compiled to BPF using the LLVM compiler infrastructure.

Storing State between Transactions

If the program needs to store state between transactions, it does so using accounts. Accounts are similar to files in operating systems such as Linux. Like a file, an account may hold arbitrary data and that data persists beyond the lifetime of a program. Also like a file, an account includes metadata that tells the runtime who is allowed to access the data and how. Unlike a file, the account includes metadata for the lifetime of the file. That lifetime is expressed in "tokens", which is a number of fractional native tokens, called lamports. Accounts are held in validator memory and pay "rent" to stay there. Each fullnode periodically scan all accounts and collects rent. Any account that drops to zero lamports is purged.

If an account is marked "executable", it will only be used by a loader to run programs. For example, a BPF-compiled program is marked executable and loaded by the BPF loader. No program is allowed to modify the contents of an executable account.

An account also includes "owner" metadata. The owner is a program ID. The runtime grants the program write access to the account if its ID matches the owner. If an account is not owned by a program, the program is permitted to read its data and credit the account.

In the same way that a Linux user uses a path to look up a file, a Solana client uses public keys to look up accounts. To create an account, the client generates a keypair and registers its public key using the CreateAccount instruction. Once registered, transactions reference account keys to grant programs access to accounts. The runtime grants programs read access by default. To grant write access, the client must either assign the account to a program or sign the transaction using the keypair's secret key. Since only the holder of the secret key can produce valid signatures matching the account's public key, the runtime recognizes the signature as authorization to modify account data or debit the account.

After the runtime executes each of the transaction's instructions, it uses the account metadata and transaction signatures to verify that none of the access rules were violated. If a program violates an access rule, the runtime discards all account changes made by all instructions and marks the transaction as failed.

Example app: Tic-Tac-Toe

Click here to play Tic-Tac-Toe on the Solana testnet. Open the link and wait for another player to join, or open the link in a second browser tab to play against yourself. You will see that every move a player makes stores a transaction on the ledger.

Build and run Tic-Tac-Toe locally

First fetch the latest release of the example code:

$ git clone https://github.com/solana-labs/example-tictactoe.git
$ cd example-tictactoe
$ TAG=$(git describe --tags $(git rev-list --tags
--max-count=1))
$ git checkout $TAG

Next, follow the steps in the git repository's README.

Getting lamports to users

You may have noticed you interacted with the Solana cluster without first needing to acquire lamports to pay transaction fees. Under the hood, the web app creates a new ephemeral identity and sends a request to an off-chain service for a signed transaction authorizing a user to start a new game. The service is called a drone. When the app sends the signed transaction to the Solana cluster, the drone's lamports are spent to pay the transaction fee and start the game. In a real world app, the drone might request the user watch an ad or pass a CAPTCHA before signing over its lamports.

Creating Signing Services with Drones

This chapter defines an off-chain service called a drone, which acts as custodian of a user's private key. In its simplest form, it can be used to create airdrop transactions, a token transfer from the drone's account to a client's account.

Signing Service

A drone is a simple signing service. It listens for requests to sign transaction data. Once received, the drone validates the request however it sees fit. It may, for example, only accept transaction data with a SystemInstruction::Transfer instruction transferring only up to a certain amount of tokens. If the drone accepts the transaction, it returns an Ok(Signature) where Signature is a signature of the transaction data using the drone's private key. If it rejects the transaction data, it returns a DroneError describing why.

Examples

Granting access to an on-chain game

Creator of on-chain game tic-tac-toe hosts a drone that responds to airdrop requests containing an InitGame instruction. The drone signs the transaction data in the request and returns it, thereby authorizing its account to pay the transaction fee and as well as seeding the game's account with enough tokens to play it. The user then creates a transaction for its transaction data and the drones signature and submits it to the Solana cluster. Each time the user interacts with the game, the game pays the user enough tokens to pay the next transaction fee to advance the game. At that point, the user may choose to keep the tokens instead of advancing the game. If the creator wants to defend against that case, they could require the user to return to the drone to sign each instruction.

Worldwide airdrop of a new token

Creator of a new on-chain token (ERC-20 interface), may wish to do a worldwide airdrop to distribute its tokens to millions of users over just a few seconds. That drone cannot spend resources interacting with the Solana cluster. Instead, the drone should only verify the client is unique and human, and then return the signature. It may also want to listen to the Solana cluster for recent entry IDs to support client retries and to ensure the airdrop is targeting the desired cluster.

Attack vectors

Invalid recent_blockhash

The drone may prefer its airdrops only target a particular Solana cluster. To do that, it listens to the cluster for new entry IDs and ensure any requests reference a recent one.

Note: to listen for new entry IDs assumes the drone is either a fullnode or a light client. At the time of this writing, light clients have not been implemented and no proposal describes them. This document assumes one of the following approaches be taken:

  1. Define and implement a light client
  2. Embed a fullnode
  3. Query the jsonrpc API for the latest last id at a rate slightly faster than ticks are produced.

Double spends

A client may request multiple airdrops before the first has been submitted to the ledger. The client may do this maliciously or simply because it thinks the first request was dropped. The drone should not simply query the cluster to ensure the client has not already received an airdrop. Instead, it should use recent_blockhash to ensure the previous request is expired before signing another. Note that the Solana cluster will reject any transaction with a recent_blockhash beyond a certain age.

Denial of Service

If the transaction data size is smaller than the size of the returned signature (or descriptive error), a single client can flood the network. Considering that a simple Transfer operation requires two public keys (each 32 bytes) and a fee field, and that the returned signature is 64 bytes (and a byte to indicate Ok), consideration for this attack may not be required.

In the current design, the drone accepts TCP connections. This allows clients to DoS the service by simply opening lots of idle connections. Switching to UDP may be preferred. The transaction data will be smaller than a UDP packet since the transaction sent to the Solana cluster is already pinned to using UDP.

A Solana Cluster

A Solana cluster is a set of fullnodes working together to serve client transactions and maintain the integrity of the ledger. Many clusters may coexist. When two clusters share a common genesis block, they attempt to converge. Otherwise, they simply ignore the existence of the other. Transactions sent to the wrong one are quietly rejected. In this chapter, we'll discuss how a cluster is created, how nodes join the cluster, how they share the ledger, how they ensure the ledger is replicated, and how they cope with buggy and malicious nodes.

Creating a Cluster

Before starting any fullnodes, one first needs to create a genesis block. The block contains entries referencing two public keys, a mint and a bootstrap leader. The fullnode holding the bootstrap leader's secret key is responsible for appending the first entries to the ledger. It initializes its internal state with the mint's account. That account will hold the number of native tokens defined by the genesis block. The second fullnode then contacts the bootstrap leader to register as a validator or replicator. Additional fullnodes then register with any registered member of the cluster.

A validator receives all entries from the leader and submits votes confirming those entries are valid. After voting, the validator is expected to store those entries until replicator nodes submit proofs that they have stored copies of it. Once the validator observes a sufficient number of copies exist, it deletes its copy.

Joining a Cluster

Validators and replicators enter the cluster via registration messages sent to its control plane. The control plane is implemented using a gossip protocol, meaning that a node may register with any existing node, and expect its registration to propagate to all nodes in the cluster. The time it takes for all nodes to synchronize is proportional to the square of the number of nodes participating in the cluster. Algorithmically, that's considered very slow, but in exchange for that time, a node is assured that it eventually has all the same information as every other node, and that that information cannot be censored by any one node.

Sending Transactions to a Cluster

Clients send transactions to any fullnode's Transaction Processing Unit (TPU) port. If the node is in the validator role, it forwards the transaction to the designated leader. If in the leader role, the node bundles incoming transactions, timestamps them creating an entry, and pushes them onto the cluster's data plane. Once on the data plane, the transactions are validated by validator nodes and replicated by replicator nodes, effectively appending them to the ledger.

Confirming Transactions

A Solana cluster is capable of subsecond confirmation for up to 150 nodes with plans to scale up to hundreds of thousands of nodes. Once fully implemented, confirmation times are expected to increase only with the logarithm of the number of validators, where the logarithm's base is very high. If the base is one thousand, for example, it means that for the first thousand nodes, confirmation will be the duration of three network hops plus the time it takes the slowest validator of a supermajority to vote. For the next million nodes, confirmation increases by only one network hop.

Solana defines confirmation as the duration of time from when the leader timestamps a new entry to the moment when it recognizes a supermajority of ledger votes.

A gossip network is much too slow to achieve subsecond confirmation once the network grows beyond a certain size. The time it takes to send messages to all nodes is proportional to the square of the number of nodes. If a blockchain wants to achieve low confirmation and attempts to do it using a gossip network, it will be forced to centralize to just a handful of nodes.

Scalable confirmation can be achieved using the follow combination of techniques:

  1. Timestamp transactions with a VDF sample and sign the timestamp.
  2. Split the transactions into batches, send each to separate nodes and have each node share its batch with its peers.
  3. Repeat the previous step recursively until all nodes have all batches.

Solana rotates leaders at fixed intervals, called slots. Each leader may only produce entries during its allotted slot. The leader therefore timestamps transactions so that validators may lookup the public key of the designated leader. The leader then signs the timestamp so that a validator may verify the signature, proving the signer is owner of the designated leader's public key.

Next, transactions are broken into batches so that a node can send transactions to multiple parties without making multiple copies. If, for example, the leader needed to send 60 transactions to 6 nodes, it would break that collection of 60 into batches of 10 transactions and send one to each node. This allows the leader to put 60 transactions on the wire, not 60 transactions for each node. Each node then shares its batch with its peers. Once the node has collected all 6 batches, it reconstructs the original set of 60 transactions.

A batch of transactions can only be split so many times before it is so small that header information becomes the primary consumer of network bandwidth. At the time of this writing, the approach is scaling well up to about 150 validators. To scale up to hundreds of thousands of validators, each node can apply the same technique as the leader node to another set of nodes of equal size. We call the technique data plane fanout; learn more in the data plan fanout section.

Synchronization

Fast, reliable synchronization is the biggest reason Solana is able to achieve such high throughput. Traditional blockchains synchronize on large chunks of transactions called blocks. By synchronizing on blocks, a transaction cannot be processed until a duration called "block time" has passed. In Proof of Work consensus, these block times need to be very large (~10 minutes) to minimize the odds of multiple fullnodes producing a new valid block at the same time. There's no such constraint in Proof of Stake consensus, but without reliable timestamps, a fullnode cannot determine the order of incoming blocks. The popular workaround is to tag each block with a wallclock timestamp. Because of clock drift and variance in network latencies, the timestamp is only accurate within an hour or two. To workaround the workaround, these systems lengthen block times to provide reasonable certainty that the median timestamp on each block is always increasing.

Solana takes a very different approach, which it calls Proof of History or PoH. Leader nodes "timestamp" blocks with cryptographic proofs that some duration of time has passed since the last proof. All data hashed into the proof most certainly have occurred before the proof was generated. The node then shares the new block with validator nodes, which are able to verify those proofs. The blocks can arrive at validators in any order or even could be replayed years later. With such reliable synchronization guarantees, Solana is able to break blocks into smaller batches of transactions called entries. Entries are streamed to validators in realtime, before any notion of block consensus.

Solana technically never sends a block, but uses the term to describe the sequence of entries that fullnodes vote on to achieve confirmation. In that way, Solana's confirmation times can be compared apples to apples to block-based systems. The current implementation sets block time to 800ms.

What's happening under the hood is that entries are streamed to validators as quickly as a leader node can batch a set of valid transactions into an entry. Validators process those entries long before it is time to vote on their validity. By processing the transactions optimistically, there is effectively no delay between the time the last entry is received and the time when the node can vote. In the event consensus is not achieved, a node simply rolls back its state. This optimisic processing technique was introduced in 1981 and called Optimistic Concurrency Control. It can be applied to blockchain architecture where a cluster votes on a hash that represents the full ledger up to some block height. In Solana, it is implemented trivially using the last entry's PoH hash.

Relationship to VDFs

The Proof of History technique was first described for use in blockchain by Solana in November of 2017. In June of the following year, a similar technique was described at Stanford and called a verifiable delay function or VDF.

A desirable property of a VDF is that verification time is very fast. Solana's approach to verifying its delay function is proportional to the time it took to create it. Split over a 4000 core GPU, it is sufficiently fast for Solana's needs, but if you asked the authors of the paper cited above, they might tell you (and have) that Solana's approach is algorithmically slow and it shouldn't be called a VDF. We argue the term VDF should represent the category of verifiable delay functions and not just the subset with certain performance characteristics. Until that's resolved, Solana will likely continue using the term PoH for its application-specific VDF.

Another difference between PoH and VDFs is that a VDF is used only for tracking duration. PoH's hash chain, on the other hand, includes hashes of any data the application observed. That data is a double-edged sword. On one side, the data "proves history" - that the data most certainly existed before hashes after it. On the side, it means the application can manipulate the hash chain by changing when the data is hashed. The PoH chain therefore does not serve as a good source of randomness whereas a VDF without that data could. Solana's leader rotation algorithm, for example, is derived only from the VDF height and not its hash at that height.

Relationship to Consensus Mechanisms

Proof of History is not a consensus mechanism, but it is used to improve the performance of Solana's Proof of Stake consensus. It is also used to improve the performance of the data plane and replication protocols.

More on Proof of History

Leader Rotation

At any given moment, a cluster expects only one fullnode to produce ledger entries. By having only one leader at a time, all validators are able to replay identical copies of the ledger. The drawback of only one leader at a time, however, is that a malicious leader is capable of censoring votes and transactions. Since censoring cannot be distinguished from the network dropping packets, the cluster cannot simply elect a single node to hold the leader role indefinitely. Instead, the cluster minimizes the influence of a malicious leader by rotating which node takes the lead.

Each validator selects the expected leader using the same algorithm, described below. When the validator receives a new signed ledger entry, it can be certain that entry was produced by the expected leader. The order of slots which each leader is assigned a slot is called a leader schedule.

Leader Schedule Rotation

A validator rejects blocks that are not signed by the slot leader. The list of identities of all slot leaders is called a leader schedule. The leader schedule is recomputed locally and periodically. It assigns slot leaders for a duration of time called an epoch. The schedule must be computed far in advance of the slots it assigns, such that the ledger state it uses to compute the schedule is finalized. That duration is called the leader schedule offset. Solana sets the offset to the duration of slots until the next epoch. That is, the leader schedule for an epoch is calculated from the ledger state at the start of the previous epoch. The offset of one epoch is fairly arbitrary and assumed to be sufficiently long such that all validators will have finalized their ledger state before the next schedule is generated. A cluster may choose to shorten the offset to reduce the time between stake changes and leader schedule updates.

While operating without partitions lasting longer than an epoch, the schedule only needs to be generated when the root fork crosses the epoch boundary. Since the schedule is for the next epoch, any new stakes committed to the root fork will not be active until the next epoch. The block used for generating the leader schedule is the first block to cross the epoch boundary.

Without a partition lasting longer than an epoch, the cluster will work as follows:

  1. A validator continuously updates its own root fork as it votes.

  2. The validator updates its leader schedule each time the slot height crosses an epoch boundary.

For example:

The epoch duration is 100 slots. The root fork is updated from fork computed at slot height 99 to a fork computed at slot height 102. Forks with slots at height 100,101 were skipped because of failures. The new leader schedule is computed using fork at slot height 102. It is active from slot 200 until it is updated again.

No inconsistency can exist because every validator that is voting with the cluster has skipped 100 and 101 when its root passes 102. All validators, regardless of voting pattern, would be committing to a root that is either 102, or a descendant of 102.

Leader Schedule Rotation with Epoch Sized Partitions.

The duration of the leader schedule offset has a direct relationship to the likelihood of a cluster having an inconsistent view of the correct leader schedule.

Consider the following scenario:

Two partitions that are generating half of the blocks each. Neither is coming to a definitive supermajority fork. Both will cross epoch 100 and 200 without actually committing to a root and therefore a cluster wide commitment to a new leader schedule.

In this unstable scenario, multiple valid leader schedules exist.

  • A leader schedule is generated for every fork whose direct parent is in the previous epoch.

  • The leader schedule is valid after the start of the next epoch for descendant forks until it is updated.

Each partition's schedule will diverge after the partition lasts more than an epoch. For this reason, the epoch duration should be selected to be much much larger then slot time and the expected length for a fork to be committed to root.

After observing the cluster for a sufficient amount of time, the leader schedule offset can be selected based on the median partition duration and its standard deviation. For example, an offset longer then the median partition duration plus six standard deviations would reduce the likelihood of an inconsistent ledger schedule in the cluster to 1 in 1 million.

Leader Schedule Generation at Genesis

The genesis block declares the first leader for the first epoch. This leader ends up scheduled for the first two epochs because the leader schedule is also generated at slot 0 for the next epoch. The length of the first two epochs can be specified in the genesis block as well. The minimum length of the first epochs must be greater than or equal to the maximum rollback depth as defined in Tower BFT.

Leader Schedule Generation Algorithm

Leader schedule is generated using a predefined seed. The process is as follows:

  1. Periodically use the PoH tick height (a monotonically increasing counter) to seed a stable pseudo-random algorithm.
  2. At that height, sample the bank for all the staked accounts with leader identities that have voted within a cluster-configured number of ticks. The sample is called the active set.
  3. Sort the active set by stake weight.
  4. Use the random seed to select nodes weighted by stake to create a stake-weighted ordering.
  5. This ordering becomes valid after a cluster-configured number of ticks.

Schedule Attack Vectors

Seed

The seed that is selected is predictable but unbiasable. There is no grinding attack to influence its outcome.

Active Set

A leader can bias the active set by censoring validator votes. Two possible ways exist for leaders to censor the active set:

  • Ignore votes from validators
  • Refuse to vote for blocks with votes from validators

To reduce the likelihood of censorship, the active set is calculated at the leader schedule offset boundary over an active set sampling duration. The active set sampling duration is long enough such that votes will have been collected by multiple leaders.

Staking

Leaders can censor new staking transactions or refuse to validate blocks with new stakes. This attack is similar to censorship of validator votes.

Validator operational key loss

Leaders and validators are expected to use ephemeral keys for operation, and stake owners authorize the validators to do work with their stake via delegation.

The cluster should be able to recover from the loss of all the ephemeral keys used by leaders and validators, which could occur through a common software vulnerability shared by all the nodes. Stake owners should be able to vote directly co-sign a validator vote even though the stake is currently delegated to a validator.

Appending Entries

The lifetime of a leader schedule is called an epoch. The epoch is split into slots, where each slot has a duration of T PoH ticks.

A leader transmits entries during its slot. After T ticks, all the validators switch to the next scheduled leader. Validators must ignore entries sent outside a leader's assigned slot.

All T ticks must be observed by the next leader for it to build its own entries on. If entries are not observed (leader is down) or entries are invalid (leader is buggy or malicious), the next leader must produce ticks to fill the previous leader's slot. Note that the next leader should do repair requests in parallel, and postpone sending ticks until it is confident other validators also failed to observe the previous leader's entries. If a leader incorrectly builds on its own ticks, the leader following it must replace all its ticks.

Fork Generation

The chapter describes how forks naturally occur as a consequence of leader rotation.

Overview

Nodes take turns being leader and generating the PoH that encodes state changes. The cluster can tolerate loss of connection to any leader by synthesizing what the leader would have generated had it been connected but not ingesting any state changes. The possible number of forks is thereby limited to a "there/not-there" skip list of forks that may arise on leader rotation slot boundaries. At any given slot, only a single leader's transactions will be accepted.

Message Flow

  1. Transactions are ingested by the current leader.
  2. Leader filters valid transactions.
  3. Leader executes valid transactions updating its state.
  4. Leader packages transactions into entries based off its current PoH slot.
  5. Leader transmits the entries to validator nodes (in signed blobs)
    1. The PoH stream includes ticks; empty entries that indicate liveness of the leader and the passage of time on the cluster.
    2. A leader's stream begins with the tick entries necessary complete the PoH back to the leaders most recently observed prior leader slot.
  6. Validators retransmit entries to peers in their set and to further downstream nodes.
  7. Validators validate the transactions and execute them on their state.
  8. Validators compute the hash of the state.
  9. At specific times, i.e. specific PoH tick counts, validators transmit votes to the leader.
    1. Votes are signatures of the hash of the computed state at that PoH tick count
    2. Votes are also propagated via gossip
  10. Leader executes the votes as any other transaction and broadcasts them to the cluster.
  11. Validators observe their votes and all the votes from the cluster.

Partitions, Forks

Forks can arise at PoH tick counts that correspond to a vote. The next leader may not have observed the last vote slot and may start their slot with generated virtual PoH entries. These empty ticks are generated by all nodes in the cluster at a cluster-configured rate for hashes/per/tick Z.

There are only two possible versions of the PoH during a voting slot: PoH with T ticks and entries generated by the current leader, or PoH with just ticks. The "just ticks" version of the PoH can be thought of as a virtual ledger, one that all nodes in the cluster can derive from the last tick in the previous slot.

Validators can ignore forks at other points (e.g. from the wrong leader), or slash the leader responsible for the fork.

Validators vote based on a greedy choice to maximize their reward described in Tower BFT.

Validator's View

Time Progression

The diagram below represents a validator's view of the PoH stream with possible forks over time. L1, L2, etc. are leader slots, and Es represent entries from that leader during that leader's slot. The xs represent ticks only, and time flows downwards in the diagram.

Fork generation

Note that an E appearing on 2 forks at the same slot is a slashable condition, so a validator observing E3 and E3' can slash L3 and safely choose x for that slot. Once a validator commits to a forks, other forks can be discarded below that tick count. For any slot, validators need only consider a single "has entries" chain or a "ticks only" chain to be proposed by a leader. But multiple virtual entries may overlap as they link back to the a previous slot.

Time Division

It's useful to consider leader rotation over PoH tick count as time division of the job of encoding state for the cluster. The following table presents the above tree of forks as a time-divided ledger.

leader slotL1L2L3L4L5
dataE1E2E3E4E5
ticks since prevxxx

Note that only data from leader L3 will be accepted during leader slot L3. Data from L3 may include "catchup" ticks back to a slot other than L2 if L3 did not observe L2's data. L4 and L5's transmissions include the "ticks to prev" PoH entries.

This arrangement of the network data streams permits nodes to save exactly this to the ledger for replay, restart, and checkpoints.

Leader's View

When a new leader begins a slot, it must first transmit any PoH (ticks) required to link the new slot with the most recently observed and voted slot. The fork the leader proposes would link the current slot to a previous fork that the leader has voted on with virtual ticks.

Managing Forks in the Ledger

The ledger is permitted to fork at slot boundaries. The resulting data structure forms a tree called a blocktree. When the fullnode interprets the blocktree, it must maintain state for each fork in the chain. We call each instance an active fork. It is the responsibility of a fullnode to weigh those forks, such that it may eventually select a fork.

A fullnode selects a fork by submiting a vote to a slot leader on that fork. The vote commits the fullnode for a duration of time called a lockout period. The fullnode is not permitted to vote on a different fork until that lockout period expires. Each subsequent vote on the same fork doubles the length of the lockout period. After some cluster-configured number of votes (currently 32), the length of the lockout period reaches what's called max lockout. Until the max lockout is reached, the fullnode has the option to wait until the lockout period is over and then vote on another fork. When it votes on another fork, it performs a operation called rollback, whereby the state rolls back in time to a shared checkpoint and then jumps forward to the tip of the fork that it just voted on. The maximum distance that a fork may roll back is called the rollback depth. Rollback depth is the number of votes required to achieve max lockout. Whenever a fullnode votes, any checkpoints beyond the rollback depth become unreachable. That is, there is no scenario in which the fullnode will need to roll back beyond rollback depth. It therefore may safely prune unreachable forks and squash all checkpoints beyond rollback depth into the root checkpoint.

Active Forks

An active fork is as a sequence of checkpoints that has a length at least one longer than the rollback depth. The shortest fork will have a length exactly one longer than the rollback depth. For example:

Forks

The following sequences are active forks:

  • {4, 2, 1}
  • {5, 2, 1}
  • {6, 3, 1}
  • {7, 3, 1}

Pruning and Squashing

A fullnode may vote on any checkpoint in the tree. In the diagram above, that's every node except the leaves of the tree. After voting, the fullnode prunes nodes that fork from a distance farther than the rollback depth and then takes the opportunity to minimize its memory usage by squashing any nodes it can into the root.

Starting from the example above, wth a rollback depth of 2, consider a vote on 5 versus a vote on 6. First, a vote on 5:

Forks after pruning

The new root is 2, and any active forks that are not descendants from 2 are pruned.

Alternatively, a vote on 6:

Forks

The tree remains with a root of 1, since the active fork starting at 6 is only 2 checkpoints from the root.

Turbine Block Propagation

A Solana cluster uses a multi-layer block propagation mechanism called Turbine to broadcast transaction blobs to all nodes with minimal amount of duplicate messages. The cluster divides itself into small collections of nodes, called neighborhoods. Each node is responsible for sharing any data it receives with the other nodes in its neighborhood, as well as propagating the data on to a small set of nodes in other neighborhoods. This way each node only has to communicate with a small number of nodes.

During its slot, the leader node distributes blobs between the validator nodes in the first neighborhood (layer 0). Each validator shares its data within its neighborhood, but also retransmits the blobs to one node in some neighborhoods in the next layer (layer 1). The layer-1 nodes each share their data with their neighborhood peers, and retransmit to nodes in the next layer, etc, until all nodes in the cluster have received all the blobs.

Neighborhood Assignment - Weighted Selection

In order for data plane fanout to work, the entire cluster must agree on how the cluster is divided into neighborhoods. To achieve this, all the recognized validator nodes (the TVU peers) are sorted by stake and stored in a list. This list is then indexed in different ways to figure out neighborhood boundaries and retransmit peers. For example, the leader will simply select the first nodes to make up layer 0. These will automatically be the highest stake holders, allowing the heaviest votes to come back to the leader first. Layer-0 and lower-layer nodes use the same logic to find their neighbors and next layer peers.

To reduce the possibility of attack vectors, each blob is transmitted over a random tree of neighborhoods. Each node uses the same set of nodes representing the cluster. A random tree is generated from the set for each blob using randomness derived from the blob itself. Since the random seed is not known in advance, attacks that try to eclipse neighborhoods from certain leaders or blocks become very difficult, and should require almost complete control of the stake in the cluster.

Layer and Neighborhood Structure

The current leader makes its initial broadcasts to at most DATA_PLANE_FANOUT nodes. If this layer 0 is smaller than the number of nodes in the cluster, then the data plane fanout mechanism adds layers below. Subsequent layers follow these constraints to determine layer-capacity: Each neighborhood contains DATA_PLANE_FANOUT nodes. Layer-0 starts with 1 neighborhood with fanout nodes. The number of nodes in each additional layer grows by a factor of fanout.

As mentioned above, each node in a layer only has to broadcast its blobs to its neighbors and to exactly 1 node in some next-layer neighborhoods, instead of to every TVU peer in the cluster. A good way to think about this is, layer-0 starts with 1 neighborhood with fanout nodes, layer-1 adds "fanout" neighborhoods, each with fanout nodes and layer-2 will have fanout * number of nodes in layer-1 and so on.

This way each node only has to communicate with a maximum of 2 * DATA_PLANE_FANOUT - 1 nodes.

The following diagram shows how the Leader sends blobs with a Fanout of 2 to Neighborhood 0 in Layer 0 and how the nodes in Neighborhood 0 share their data with each other.

Leader sends blobs to Neighborhood 0 in Layer 0

The following diagram shows how Neighborhood 0 fans out to Neighborhoods 1 and 2.

Neighborhood 0 Fanout to Neighborhood 1 and 2

Finally, the following diagram shows a two layer cluster with a Fanout of 2.

Two layer cluster with a Fanout of 2

Configuration Values

DATA_PLANE_FANOUT - Determines the size of layer 0. Subsequent layers grow by a factor of DATA_PLANE_FANOUT. The number of nodes in a neighborhood is equal to the fanout value. Neighborhoods will fill to capacity before new ones are added, i.e if a neighborhood isn't full, it must be the last one.

Currently, configuration is set when the cluster is launched. In the future, these parameters may be hosted on-chain, allowing modification on the fly as the cluster sizes change.

Neighborhoods

The following diagram shows how two neighborhoods in different layers interact. To cripple a neighborhood, enough nodes (erasure codes +1) from the neighborhood above need to fail. Since each neighborhood receives blobs from multiple nodes in a neighborhood in the upper layer, we'd need a big network failure in the upper layers to end up with incomplete data.

Inner workings of a neighborhood

Ledger Replication

At full capacity on a 1gbps network solana will generate 4 petabytes of data per year. To prevent the network from centralizing around validators that have to store the full data set this protocol proposes a way for mining nodes to provide storage capacity for pieces of the data.

The basic idea to Proof of Replication is encrypting a dataset with a public symmetric key using CBC encryption, then hash the encrypted dataset. The main problem with the naive approach is that a dishonest storage node can stream the encryption and delete the data as it's hashed. The simple solution is to periodically regenerate the hash based on a signed PoH value. This ensures that all the data is present during the generation of the proof and it also requires validators to have the entirety of the encrypted data present for verification of every proof of every identity. So the space required to validate is number_of_proofs * data_size

Optimization with PoH

Our improvement on this approach is to randomly sample the encrypted segments faster than it takes to encrypt, and record the hash of those samples into the PoH ledger. Thus the segments stay in the exact same order for every PoRep and verification can stream the data and verify all the proofs in a single batch. This way we can verify multiple proofs concurrently, each one on its own CUDA core. The total space required for verification is 1_ledger_segment + 2_cbc_blocks * number_of_identities with core count equal to number_of_identities. We use a 64-byte chacha CBC block size.

Network

Validators for PoRep are the same validators that are verifying transactions. If a replicator can prove that a validator verified a fake PoRep, then the validator will not receive a reward for that storage epoch.

Replicators are specialized light clients. They download a part of the ledger (a.k.a Segment) and store it, and provide PoReps of storing the ledger. For each verified PoRep replicators earn a reward of sol from the mining pool.

Constraints

We have the following constraints:

  • Verification requires generating the CBC blocks. That requires space of 2 blocks per identity, and 1 CUDA core per identity for the same dataset. So as many identities at once should be batched with as many proofs for those identities verified concurrently for the same dataset.
  • Validators will randomly sample the set of storage proofs to the set that they can handle, and only the creators of those chosen proofs will be rewarded. The validator can run a benchmark whenever its hardware configuration changes to determine what rate it can validate storage proofs.

Validation and Replication Protocol

Constants

  1. SLOTS_PER_SEGMENT: Number of slots in a segment of ledger data. The unit of storage for a replicator.
  2. NUM_KEY_ROTATION_SEGMENTS: Number of segments after which replicators regenerate their encryption keys and select a new dataset to store.
  3. NUM_STORAGE_PROOFS: Number of storage proofs required for a storage proof claim to be successfully rewarded.
  4. RATIO_OF_FAKE_PROOFS: Ratio of fake proofs to real proofs that a storage mining proof claim has to contain to be valid for a reward.
  5. NUM_STORAGE_SAMPLES: Number of samples required for a storage mining proof.
  6. NUM_CHACHA_ROUNDS: Number of encryption rounds performed to generate encrypted state.
  7. NUM_SLOTS_PER_TURN: Number of slots that define a single storage epoch or a "turn" of the PoRep game.

Validator behavior

  1. Validators join the network and begin looking for replicator accounts at each storage epoch/turn boundary.
  2. Every turn, Validators sign the PoH value at the boundary and use that signature to randomly pick proofs to verify from each storage account found in the turn boundary. This signed value is also submitted to the validator's storage account and will be used by replicators at a later stage to cross-verify.
  3. Every NUM_SLOTS_PER_TURN slots the validator advertises the PoH value. This is value is also served to Replicators via RPC interfaces.
  4. For a given turn N, all validations get locked out until turn N+3 (a gap of 2 turn/epoch). At which point all validations during that turn are available for reward collection.
  5. Any incorrect validations will be marked during the turn in between.

Replicator behavior

  1. Since a replicator is somewhat of a light client and not downloading all the ledger data, they have to rely on other validators and replicators for information. Any given validator may or may not be malicious and give incorrect information, although there are not any obvious attack vectors that this could accomplish besides having the replicator do extra wasted work. For many of the operations there are a number of options depending on how paranoid a replicator is:
    • (a) replicator can ask a validator
    • (b) replicator can ask multiple validators
    • (c) replicator can ask other replicators
    • (d) replicator can subscribe to the full transaction stream and generate the information itself (assuming the slot is recent enough)
    • (e) replicator can subscribe to an abbreviated transaction stream to generate the information itself (assuming the slot is recent enough)
  2. A replicator obtains the PoH hash corresponding to the last turn with its slot.
  3. The replicator signs the PoH hash with its keypair. That signature is the seed used to pick the segment to replicate and also the encryption key. The replicator mods the signature with the slot to get which segment to replicate.
  4. The replicator retrives the ledger by asking peer validators and replicators. See 6.5.
  5. The replicator then encrypts that segment with the key with chacha algorithm in CBC mode with NUM_CHACHA_ROUNDS of encryption.
  6. The replicator initializes a chacha rng with the a signed recent PoH value as the seed.
  7. The replicator generates NUM_STORAGE_SAMPLES samples in the range of the entry size and samples the encrypted segment with sha256 for 32-bytes at each offset value. Sampling the state should be faster than generating the encrypted segment.
  8. The replicator sends a PoRep proof transaction which contains its sha state at the end of the sampling operation, its seed and the samples it used to the current leader and it is put onto the ledger.
  9. During a given turn the replicator should submit many proofs for the same segment and based on the RATIO_OF_FAKE_PROOFS some of those proofs must be fake.
  10. As the PoRep game enters the next turn, the replicator must submit a transaction with the mask of which proofs were fake during the last turn. This transaction will define the rewards for both replicators and validators.
  11. Finally for a turn N, as the PoRep game enters turn N + 3, replicator's proofs for turn N will be counted towards their rewards.

The PoRep Game

The Proof of Replication game has 4 primary stages. For each "turn" multiple PoRep games can be in progress but each in a different stage.

The 4 stages of the PoRep Game are as follows:

  1. Proof submission stage
    • Replicators: submit as many proofs as possible during this stage
    • Validators: No-op
  2. Proof verification stage
    • Replicators: No-op
    • Validators: Select replicators and verify their proofs from the previous turn
  3. Proof challenge stage
    • Replicators: Submit the proof mask with justifications (for fake proofs submitted 2 turns ago)
    • Validators: No-op
  4. Reward collection stage
    • Replicators: Collect rewards for 3 turns ago
    • Validators: Collect rewards for 3 turns ago

For each turn of the PoRep game, both Validators and Replicators evaluate each stage. The stages are run as separate transactions on the storage program.

Finding who has a given block of ledger

  1. Validators monitor the turns in the PoRep game and look at the rooted bank at turn boundaries for any proofs.
  2. Validators maintain a map of ledger segments and corresponding replicator public keys. The map is updated when a Validator processes a replicator's proofs for a segment. The validator provides an RPC interface to access the this map. Using this API, clients can map a segment to a replicator's network address (correlating it via cluster_info table). The clients can then send repair requests to the replicator to retrieve segments.
  3. Validators would need to invalidate this list every N turns.

Sybil attacks

For any random seed, we force everyone to use a signature that is derived from a PoH hash at the turn boundary. Everyone uses the same count, so the same PoH hash is signed by every participant. The signatures are then each cryptographically tied to the keypair, which prevents a leader from grinding on the resulting value for more than 1 identity.

Since there are many more client identities then encryption identities, we need to split the reward for multiple clients, and prevent Sybil attacks from generating many clients to acquire the same block of data. To remain BFT we want to avoid a single human entity from storing all the replications of a single chunk of the ledger.

Our solution to this is to force the clients to continue using the same identity. If the first round is used to acquire the same block for many client identities, the second round for the same client identities will force a redistribution of the signatures, and therefore PoRep identities and blocks. Thus to get a reward for replicators need to store the first block for free and the network can reward long lived client identities more than new ones.

Validator attacks

  • If a validator approves fake proofs, replicator can easily out them by showing the initial state for the hash.
  • If a validator marks real proofs as fake, no on-chain computation can be done to distinguish who is correct. Rewards would have to rely on the results from multiple validators to catch bad actors and replicators from being denied rewards.
  • Validator stealing mining proof results for itself. The proofs are derived from a signature from a replicator, since the validator does not know the private key used to generate the encryption key, it cannot be the generator of the proof.

Reward incentives

Fake proofs are easy to generate but difficult to verify. For this reason, PoRep proof transactions generated by replicators may require a higher fee than a normal transaction to represent the computational cost required by validators.

Some percentage of fake proofs are also necessary to receive a reward from storage mining.

Notes

  • We can reduce the costs of verification of PoRep by using PoH, and actually make it feasible to verify a large number of proofs for a global dataset.
  • We can eliminate grinding by forcing everyone to sign the same PoH hash and use the signatures as the seed
  • The game between validators and replicators is over random blocks and random encryption identities and random data samples. The goal of randomization is to prevent colluding groups from having overlap on data or validation.
  • Replicator clients fish for lazy validators by submitting fake proofs that they can prove are fake.
  • To defend against Sybil client identities that try to store the same block we force the clients to store for multiple rounds before receiving a reward.
  • Validators should also get rewarded for validating submitted storage proofs as incentive for storing the ledger. They can only validate proofs if they are storing that slice of the ledger.

Secure Vote Signing

A validator fullnode receives entries from the current leader and submits votes confirming those entries are valid. This vote submission presents a security challenge, because forged votes that violate consensus rules could be used to slash the validator's stake.

The validator votes on its chosen fork by submitting a transaction that uses an asymmetric key to sign the result of its validation work. Other entities can verify this signature using the validator's public key. If the validator's key is used to sign incorrect data (e.g. votes on multiple forks of the ledger), the node's stake or its resources could be compromised.

Solana addresses this risk by splitting off a separate vote signer service that evaluates each vote to ensure it does not violate a slashing condition.

Validators, Vote Signers, and Stakeholders

When a validator receives multiple blocks for the same slot, it tracks all possible forks until it can determine a "best" one. A validator selects the best fork by submitting a vote to it, using a vote signer to minimize the possibility of its vote inadvertently violating a consensus rule and getting a stake slashed.

A vote signer evaluates the vote proposed by the validator and signs the vote only if it does not violate a slashing condition. A vote signer only needs to maintain minimal state regarding the votes it signed and the votes signed by the rest of the cluster. It doesn't need to process a full set of transactions.

A stakeholder is an identity that has control of the staked capital. The stakeholder can delegate its stake to the vote signer. Once a stake is delegated, the vote signer votes represent the voting weight of all the delegated stakes, and produce rewards for all the delegated stakes.

Currently, there is a 1:1 relationship between validators and vote signers, and stakeholders delegate their entire stake to a single vote signer.

Signing service

The vote signing service consists of a JSON RPC server and a request processor. At startup, the service starts the RPC server at a configured port and waits for validator requests. It expects the following type of requests:

  1. Register a new validator node
    • The request must contain validator's identity (public key)
    • The request must be signed with the validator's private key
    • The service drops the request if signature of the request cannot be verified
    • The service creates a new voting asymmetric key for the validator, and returns the public key as a response
    • If a validator tries to register again, the service returns the public key from the pre-existing keypair
  2. Sign a vote
    • The request must contain a voting transaction and all verification data
    • The request must be signed with the validator's private key
    • The service drops the request if signature of the request cannot be verified
    • The service verifies the voting data
    • The service returns a signature for the transaction

Validator voting

A validator node, at startup, creates a new vote account and registers it with the cluster by submitting a new "vote register" transaction. The other nodes on the cluster process this transaction and include the new validator in the active set. Subsequently, the validator submits a "new vote" transaction signed with the validator's voting private key on each voting event.

Configuration

The validator node is configured with the signing service's network endpoint (IP/Port).

Registration

At startup, the validator registers itself with its signing service using JSON RPC. The RPC call returns the voting public key for the validator node. The validator creates a new "vote register" transaction including this public key, and submits it to the cluster.

Vote Collection

The validator looks up the votes submitted by all the nodes in the cluster for the last voting period. This information is submitted to the signing service with a new vote signing request.

New Vote Signing

The validator creates a "new vote" transaction and sends it to the signing service using JSON RPC. The RPC request also includes the vote verification data. On success, the RPC call returns the signature for the vote. On failure, RPC call returns the failure code.

Stake Delegation and Rewards

Stakers are rewarded for helping to validate the ledger. They do this by delegating their stake to validator nodes. Those validators do the legwork of replaying the ledger and send votes to a per-node vote account to which stakers can delegate their stakes. The rest of the cluster uses those stake-weighted votes to select a block when forks arise. Both the validator and staker need some economic incentive to play their part. The validator needs to be compensated for its hardware and the staker needs to be compensated for the risk of getting its stake slashed. The economics are covered in staking rewards. This chapter, on the other hand, describes the underlying mechanics of its implementation.

Basic Design

The general idea is that the validator owns a Vote account. The Vote account tracks validator votes, counts validator generated credits, and provides any additional validator specific state. The Vote account is not aware of any stakes delegated to it and has no staking weight.

A separate Stake account (created by a staker) names a Vote account to which the stake is delegated. Rewards generated are proportional to the amount of lamports staked. The Stake account is owned by the staker only. Some portion of the lamports stored in this account are the stake.

Passive Delegation

Any number of Stake accounts can delegate to a single Vote account without an interactive action from the identity controlling the Vote account or submitting votes to the account.

The total stake allocated to a Vote account can be calculated by the sum of all the Stake accounts that have the Vote account pubkey as the StakeState::Stake::voter_pubkey.

Vote and Stake accounts

The rewards process is split into two on-chain programs. The Vote program solves the problem of making stakes slashable. The Stake account acts as custodian of the rewards pool, and provides passive delegation. The Stake program is responsible for paying out each staker once the staker proves to the Stake program that its delegate has participated in validating the ledger.

VoteState

VoteState is the current state of all the votes the validator has submitted to the network. VoteState contains the following state information:

  • votes - The submitted votes data structure.

  • credits - The total number of rewards this vote program has generated over its lifetime.

  • root_slot - The last slot to reach the full lockout commitment necessary for rewards.

  • commission - The commission taken by this VoteState for any rewards claimed by staker's Stake accounts. This is the percentage ceiling of the reward.

  • Account::lamports - The accumulated lamports from the commission. These do not count as stakes.

  • authorized_vote_signer - Only this identity is authorized to submit votes. This field can only modified by this identity.

VoteInstruction::Initialize

  • account[0] - RW - The VoteState VoteState::authorized_vote_signer is initialized to account[0] other VoteState members defaulted

VoteInstruction::AuthorizeVoteSigner(Pubkey)

  • account[0] - RW - The VoteState VoteState::authorized_vote_signer is set to to Pubkey, the transaction must by signed by the Vote account's current authorized_vote_signer.
    VoteInstruction::AuthorizeVoter allows a staker to choose a signing service for its votes. That service is responsible for ensuring the vote won't cause the staker to be slashed.

VoteInstruction::Vote(Vec)

  • account[0] - RW - The VoteState VoteState::lockouts and VoteState::credits are updated according to voting lockout rules see Tower BFT

  • account[1] - RO - A list of some N most recent slots and their hashes for the vote to be verified against.

StakeState

A StakeState takes one of three forms, StakeState::Uninitialized, StakeState::Stake and StakeState::RewardsPool.

StakeState::Stake

StakeState::Stake is the current delegation preference of the staker and contains the following state information:

  • Account::lamports - The lamports available for staking.

  • stake - the staked amount (subject to warm up and cool down) for generating rewards, always less than or equal to Account::lamports

  • voter_pubkey - The pubkey of the VoteState instance the lamports are delegated to.

  • credits_observed - The total credits claimed over the lifetime of the program.

  • activated - the epoch at which this stake was activated/delegated. The full stake will be counted after warm up.

  • deactivated - the epoch at which this stake will be completely de-activated, which is cool down epochs after StakeInstruction::Deactivate is issued.

StakeState::RewardsPool

To avoid a single network wide lock or contention in redemption, 256 RewardsPools are part of genesis under pre-determined keys, each with std::u64::MAX credits to be able to satisfy redemptions according to point value.

The Stakes and the RewardsPool are accounts that are owned by the same Stake program.

StakeInstruction::DelegateStake(u64)

The Stake account is moved from Uninitialized to StakeState::Stake form. This is how stakers choose their initial delegate validator node and activate their stake account lamports.

  • account[0] - RW - The StakeState::Stake instance.
    StakeState::Stake::credits_observed is initialized to VoteState::credits,
    StakeState::Stake::voter_pubkey is initialized to account[1],
    StakeState::Stake::stake is initialized to the u64 passed as an argument above,
    StakeState::Stake::activated is initialized to current Bank epoch, and
    StakeState::Stake::deactivated is initialized to std::u64::MAX

  • account[1] - R - The VoteState instance.

  • account[2] - R - syscall::current account, carries information about current Bank epoch

StakeInstruction::RedeemVoteCredits

The staker or the owner of the Stake account sends a transaction with this instruction to claim rewards.

The Vote account and the Stake account pair maintain a lifetime counter of total rewards generated and claimed. Rewards are paid according to a point value supplied by the Bank from inflation. A point is one credit * one staked lamport, rewards paid are proportional to the number of lamports staked.

  • account[0] - RW - The StakeState::Stake instance that is redeeming rewards.
  • account[1] - R - The VoteState instance, must be the same as StakeState::voter_pubkey
  • account[2] - RW - The StakeState::RewardsPool instance that will fulfill the request (picked at random).
  • account[3] - R - syscall::rewards account from the Bank that carries point value.

Reward is paid out for the difference between VoteState::credits to StakeState::Stake::credits_observed, multiplied by syscall::rewards::Rewards::validator_point_value. StakeState::Stake::credits_observed is updated toVoteState::credits. The commission is deposited into the Vote account token balance, and the reward is deposited to the Stake account token balance.

let credits_to_claim = vote_state.credits - stake_state.credits_observed;
stake_state.credits_observed = vote_state.credits;

credits_to_claim is used to compute the reward and commission, and StakeState::Stake::credits_observed is updated to the latest VoteState::credits value.

StakeInstruction::Deactivate

A staker may wish to withdraw from the network. To do so he must first deactivate his stake, and wait for cool down.

  • account[0] - RW - The StakeState::Stake instance that is deactivating, the transaction must be signed by this key.
  • account[1] - R - syscall::current account from the Bank that carries current epoch

StakeState::Stake::deactivated is set to the current epoch + cool down. The account's stake will ramp down to zero by that epoch, and Account::lamports will be available for withdrawal.

StakeInstruction::Withdraw(u64)

Lamports build up over time in a Stake account and any excess over activated stake can be withdrawn.

  • account[0] - RW - The StakeState::Stake from which to withdraw, the transaction must be signed by this key.
  • account[1] - RW - Account that should be credited with the withdrawn lamports.
  • account[2] - R - syscall::current account from the Bank that carries current epoch, to calculate stake.

Benefits of the design

  • Single vote for all the stakers.

  • Clearing of the credit variable is not necessary for claiming rewards.

  • Each delegated stake can claim its rewards independently.

  • Commission for the work is deposited when a reward is claimed by the delegated stake.

Example Callflow

Passive Staking Callflow

Performance Metrics

Solana cluster performance is measured as average number of transactions per second that the network can sustain (TPS). And, how long it takes for a transaction to be confirmed by super majority of the cluster (Confirmation Time).

Each cluster node maintains various counters that are incremented on certain events. These counters are periodically uploaded to a cloud based database. Solana's metrics dashboard fetches these counters, and computes the performance metrics and displays it on the dashboard.

TPS

The leader node's banking stage maintains a count of transactions that it recorded. The dashboard displays the count averaged over 2 second period in the TPS time series graph. The dashboard also shows per second mean, maximum and total TPS as a running counter.

Confirmation Time

Each validator node maintains a list of active ledger forks that are visible to the node. A fork is considered to be frozen when the node has received and processed all entries corresponding to the fork. A fork is considered to be confirmed when it receives cumulative super majority vote, and when one of its children forks is frozen.

The node assigns a timestamp to every new fork, and computes the time it took to confirm the fork. This time is reflected as validator confirmation time in performance metrics. The performance dashboard displays the average of each validator node's confirmation time as a time series graph.

Anatomy of a Validator

Validator block diagrams

Pipelining

The validators make extensive use of an optimization common in CPU design, called pipelining. Pipelining is the right tool for the job when there's a stream of input data that needs to be processed by a sequence of steps, and there's different hardware responsible for each. The quintessential example is using a washer and dryer to wash/dry/fold several loads of laundry. Washing must occur before drying and drying before folding, but each of the three operations is performed by a separate unit. To maximize efficiency, one creates a pipeline of stages. We'll call the washer one stage, the dryer another, and the folding process a third. To run the pipeline, one adds a second load of laundry to the washer just after the first load is added to the dryer. Likewise, the third load is added to the washer after the second is in the dryer and the first is being folded. In this way, one can make progress on three loads of laundry simultaneously. Given infinite loads, the pipeline will consistently complete a load at the rate of the slowest stage in the pipeline.

Pipelining in the Validator

The validator contains two pipelined processes, one used in leader mode called the TPU and one used in validator mode called the TVU. In both cases, the hardware being pipelined is the same, the network input, the GPU cards, the CPU cores, writes to disk, and the network output. What it does with that hardware is different. The TPU exists to create ledger entries whereas the TVU exists to validate them.

The Transaction Processing Unit

TPU Block Diagram

The Transaction Validation Unit

TVU Block Diagram

Blocktree

After a block reaches finality, all blocks from that one on down to the genesis block form a linear chain with the familiar name blockchain. Until that point, however, the validator must maintain all potentially valid chains, called forks. The process by which forks naturally form as a result of leader rotation is described in fork generation. The blocktree data structure described here is how a validator copes with those forks until blocks are finalized.

The blocktree allows a validator to record every blob it observes on the network, in any order, as long as the blob is signed by the expected leader for a given slot.

Blobs are moved to a fork-able key space the tuple of leader slot + blob index (within the slot). This permits the skip-list structure of the Solana protocol to be stored in its entirety, without a-priori choosing which fork to follow, which Entries to persist or when to persist them.

Repair requests for recent blobs are served out of RAM or recent files and out of deeper storage for less recent blobs, as implemented by the store backing Blocktree.

Functionalities of Blocktree

  1. Persistence: the Blocktree lives in the front of the nodes verification pipeline, right behind network receive and signature verification. If the blob received is consistent with the leader schedule (i.e. was signed by the leader for the indicated slot), it is immediately stored.
  2. Repair: repair is the same as window repair above, but able to serve any blob that's been received. Blocktree stores blobs with signatures, preserving the chain of origination.
  3. Forks: Blocktree supports random access of blobs, so can support a validator's need to rollback and replay from a Bank checkpoint.
  4. Restart: with proper pruning/culling, the Blocktree can be replayed by ordered enumeration of entries from slot 0. The logic of the replay stage (i.e. dealing with forks) will have to be used for the most recent entries in the Blocktree.

Blocktree Design

  1. Entries in the Blocktree are stored as key-value pairs, where the key is the concatenated slot index and blob index for an entry, and the value is the entry data. Note blob indexes are zero-based for each slot (i.e. they're slot-relative).

  2. The Blocktree maintains metadata for each slot, in the SlotMeta struct containing:

    • slot_index - The index of this slot
    • num_blocks - The number of blocks in the slot (used for chaining to a previous slot)
    • consumed - The highest blob index n, such that for all m < n, there exists a blob in this slot with blob index equal to n (i.e. the highest consecutive blob index).
    • received - The highest received blob index for the slot
    • next_slots - A list of future slots this slot could chain to. Used when rebuilding the ledger to find possible fork points.
    • last_index - The index of the blob that is flagged as the last blob for this slot. This flag on a blob will be set by the leader for a slot when they are transmitting the last blob for a slot.
    • is_rooted - True iff every block from 0...slot forms a full sequence without any holes. We can derive is_rooted for each slot with the following rules. Let slot(n) be the slot with index n, and slot(n).is_full() is true if the slot with index n has all the ticks expected for that slot. Let is_rooted(n) be the statement that "the slot(n).is_rooted is true". Then:

    is_rooted(0) is_rooted(n+1) iff (is_rooted(n) and slot(n).is_full()

  3. Chaining - When a blob for a new slot x arrives, we check the number of blocks (num_blocks) for that new slot (this information is encoded in the blob). We then know that this new slot chains to slot x - num_blocks.

  4. Subscriptions - The Blocktree records a set of slots that have been "subscribed" to. This means entries that chain to these slots will be sent on the Blocktree channel for consumption by the ReplayStage. See the Blocktree APIs for details.

  5. Update notifications - The Blocktree notifies listeners when slot(n).is_rooted is flipped from false to true for any n.

Blocktree APIs

The Blocktree offers a subscription based API that ReplayStage uses to ask for entries it's interested in. The entries will be sent on a channel exposed by the Blocktree. These subscription API's are as follows:

  1. fn get_slots_since(slot_indexes: &[u64]) -> Vec<SlotMeta>: Returns new slots connecting to any element of the list slot_indexes.

  2. fn get_slot_entries(slot_index: u64, entry_start_index: usize, max_entries: Option<u64>) -> Vec<Entry>: Returns the entry vector for the slot starting with entry_start_index, capping the result at max if max_entries == Some(max), otherwise, no upper limit on the length of the return vector is imposed.

Note: Cumulatively, this means that the replay stage will now have to know when a slot is finished, and subscribe to the next slot it's interested in to get the next set of entries. Previously, the burden of chaining slots fell on the Blocktree.

Interfacing with Bank

The bank exposes to replay stage:

  1. prev_hash: which PoH chain it's working on as indicated by the hash of the last entry it processed

  2. tick_height: the ticks in the PoH chain currently being verified by this bank

  3. votes: a stack of records that contain:

    1. prev_hashes: what anything after this vote must chain to in PoH
    2. tick_height: the tick height at which this vote was cast
    3. lockout period: how long a chain must be observed to be in the ledger to be able to be chained below this vote

Replay stage uses Blocktree APIs to find the longest chain of entries it can hang off a previous vote. If that chain of entries does not hang off the latest vote, the replay stage rolls back the bank to that vote and replays the chain from there.

Pruning Blocktree

Once Blocktree entries are old enough, representing all the possible forks becomes less useful, perhaps even problematic for replay upon restart. Once a validator's votes have reached max lockout, however, any Blocktree contents that are not on the PoH chain for that vote for can be pruned, expunged.

Replicator nodes will be responsible for storing really old ledger contents, and validators need only persist their bank periodically.

Gossip Service

The Gossip Service acts as a gateway to nodes in the control plane. Validators use the service to ensure information is available to all other nodes in a cluster. The service broadcasts information using a gossip protocol.

Gossip Overview

Nodes continuously share signed data objects among themselves in order to manage a cluster. For example, they share their contact information, ledger height, and votes.

Every tenth of a second, each node sends a "push" message and/or a "pull" message. Push and pull messages may elicit responses, and push messages may be forwarded on to others in the cluster.

Gossip runs on a well-known UDP/IP port or a port in a well-known range. Once a cluster is bootstrapped, nodes advertise to each other where to find their gossip endpoint (a socket address).

Gossip Records

Records shared over gossip are arbitrary, but signed and versioned (with a timestamp) as needed to make sense to the node receiving them. If a node receives two records from the same source, it updates its own copy with the record with the most recent timestamp.

Gossip Service Interface

Push Message

A node sends a push message to tells the cluster it has information to share. Nodes send push messages to PUSH_FANOUT push peers.

Upon receiving a push message, a node examines the message for:

  1. Duplication: if the message has been seen before, the node drops the message and may respond with PushMessagePrune if forwarded from a low staked node

  2. New data: if the message is new to the node

    • Stores the new information with an updated version in its cluster info and purges any previous older value
    • Stores the message in pushed_once (used for detecting duplicates, purged after PUSH_MSG_TIMEOUT * 5 ms)
    • Retransmits the messages to its own push peers
  3. Expiration: nodes drop push messages that are older than PUSH_MSG_TIMEOUT

Push Peers, Prune Message

A nodes selects its push peers at random from the active set of known peers. The node keeps this selection for a relatively long time. When a prune message is received, the node drops the push peer that sent the prune. Prune is an indication that there is another, higher stake weighted path to that node than direct push.

The set of push peers is kept fresh by rotating a new node into the set every PUSH_MSG_TIMEOUT/2 milliseconds.

Pull Message

A node sends a pull message to ask the cluster if there is any new information. A pull message is sent to a single peer at random and comprises a Bloom filter that represents things it already has. A node receiving a pull message iterates over its values and constructs a pull response of things that miss the filter and would fit in a message.

A node constructs the pull Bloom filter by iterating over current values and recently purged values.

A node handles items in a pull response the same way it handles new data in a push message.

Purging

Nodes retain prior versions of values (those updated by a pull or push) and expired values (those older than GOSSIP_PULL_CRDS_TIMEOUT_MS) in purged_values (things I recently had). Nodes purge purged_values that are older than 5 * GOSSIP_PULL_CRDS_TIMEOUT_MS.

Eclipse Attacks

An eclipse attack is an attempt to take over the set of node connections with adversarial endpoints.

This is relevant to our implementation in the following ways.

  • Pull messages select a random node from the network. An eclipse attack on pull would require an attacker to influence the random selection in such a way that only adversarial nodes are selected for pull.

  • Push messages maintain an active set of nodes and select a random fanout for every push message. An eclipse attack on push would influence the active set selection, or the random fanout selection.

Time and Stake based weights

Weights are calculated based on time since last picked and the natural log of the stake weight.

Taking the ln of the stake weight allows giving all nodes a fairer chance of network coverage in a reasonable amount of time. It helps normalize the large possible stake weight differences between nodes. This way a node with low stake weight, compared to a node with large stake weight will only have to wait a few multiples of ln(stake) seconds before it gets picked.

There is no way for an adversary to influence these parameters.

Pull Message

A node is selected as a pull target based on the weights described above.

Push Message

A prune message can only remove an adversary from a potential connection.

Just like pull message, nodes are selected into the active set based on weights.

Notable differences from PlumTree

The active push protocol described here is based on Plum Tree. The main differences are:

  • Push messages have a wallclock that is signed by the originator. Once the wallclock expires the message is dropped. A hop limit is difficult to implement in an adversarial setting.

  • Lazy Push is not implemented because its not obvious how to prevent an adversary from forging the message fingerprint. A naive approach would allow an adversary to be prioritized for pull based on their input.

The Runtime

The runtime is a concurrent transaction processor. Transactions specify their data dependencies upfront and dynamic memory allocation is explicit. By separating program code from the state it operates on, the runtime is able to choreograph concurrent access. Transactions accessing only credit-only accounts are executed in parallel whereas transactions accessing writable accounts are serialized. The runtime interacts with the program through an entrypoint with a well-defined interface. The data stored in an account is an opaque type, an array of bytes. The program has full control over its contents.

The transaction structure specifies a list of public keys and signatures for those keys and a sequential list of instructions that will operate over the states associated with the account keys. For the transaction to be committed all the instructions must execute successfully; if any abort the whole transaction fails to commit.

Account Structure

Accounts maintain a lamport balance and program-specific memory.

Transaction Engine

The engine maps public keys to accounts and routes them to the program's entrypoint.

Execution

Transactions are batched and processed in a pipeline. The TPU and TVU follow a slightly different path. The TPU runtime ensures that PoH record occurs before memory is committed.

The TVU runtime ensures that PoH verification occurs before the runtime processes any transactions.

Runtime pipeline

At the execute stage, the loaded accounts have no data dependencies, so all the programs can be executed in parallel.

The runtime enforces the following rules:

  1. Only the owner program may modify the contents of an account. This means that upon assignment data vector is guaranteed to be zero.

  2. Total balances on all the accounts is equal before and after execution of a transaction.

  3. After the transaction is executed, balances of credit-only accounts must be greater than or equal to the balances before the transaction.

  4. All instructions in the transaction executed atomically. If one fails, all account modifications are discarded.

Execution of the program involves mapping the program's public key to an entrypoint which takes a pointer to the transaction, and an array of loaded accounts.

SystemProgram Interface

The interface is best described by the Instruction::data that the user encodes.

  • CreateAccount - This allows the user to create an account with an allocated data array and assign it to a Program.

  • Assign - Allows the user to assign an existing account to a program.

  • Transfer - Transfers lamports between accounts.

Program State Security

For blockchain to function correctly, the program code must be resilient to user inputs. That is why in this design the program specific code is the only code that can change the state of the data byte array in the Accounts that are assigned to it. It is also the reason why Assign or CreateAccount must zero out the data. Otherwise there would be no possible way for the program to distinguish the recently assigned account data from a natively generated state transition without some additional metadata from the runtime to indicate that this memory is assigned instead of natively generated.

To pass messages between programs, the receiving program must accept the message and copy the state over. But in practice a copy isn't needed and is undesirable. The receiving program can read the state belonging to other Accounts without copying it, and during the read it has a guarantee of the sender program's state.

Notes

  • There is no dynamic memory allocation. Client's need to use CreateAccount instructions to create memory before passing it to another program. This instruction can be composed into a single transaction with the call to the program itself.

  • CreateAccount and Assign guarantee that when account is assigned to the program, the Account's data is zero initialized.

  • Once assigned to program an Account cannot be reassigned.

  • Runtime guarantees that a program's code is the only code that can modify Account data that the Account is assigned to.

  • Runtime guarantees that the program can only spend lamports that are in accounts that are assigned to it.

  • Runtime guarantees the balances belonging to accounts are balanced before and after the transaction.

  • Runtime guarantees that instructions all executed successfully when a transaction is committed.

Future Work

Anatomy of a Transaction

Transactions encode lists of instructions that are executed sequentially, and only committed if all the instructions complete successfully. All account states are reverted upon the failure of a transaction. Each Transaction details the accounts used, including which must sign and which are credit only, a recent blockhash, the instructions, and any signatures.

Accounts and Signatures

Each transaction explicitly lists all accounts that it needs access to. This includes accounts that are transferring tokens, accounts whose user data is being modified, and the program accounts that are being called by the instructions. Each account that is not an executable program can be marked as a requiring a signature and/or as credit only. All accounts marked as signers must have a valid signature in the transaction's list of signatures before the transaction is considered valid. Any accounts marked as credit only may only have their token value increased, and their user data is read only. Accounts are locked by the runtime, ensuring that they are not modified by a concurrent program while the transaction is running. Credit only accounts can safely be shared, so the runtime will allow multiple concurrent credit only locks on an account.

Recent Blockhash

A Transaction includes a recent blockhash to prevent duplication and to give transactions lifetimes. Any transaction that is completely identical to a previous one is rejected, so adding a newer blockhash allows multiple transactions to repeat the exact same action. Transactions also have lifetimes that are defined by the blockhash, as any transaction whose blockhash is too old will be rejected.

Instructions

Each instruction specifies a single program account (which must be marked executable), a subset of the transaction's accounts that should be passed to the program, and a data byte array instruction that is passed to the program. The program interprets the data array and operates on the accounts specified by the instructions. The program can return successfully, or with an error code. An error return causes the entire transaction to fail immediately.

API Reference

The following sections contain API references material you may find useful when developing applications utilizing a Solana cluster.

The Transaction

Components of a Transaction

  • Transaction:
    • message: Defines the transaction
      • header: Details the account types of and signatures required by the transaction
        • num_required_signatures: The total number of signatures required to make the transaction valid.
        • num_credit_only_signed_accounts: The last num_credit_only_signed_accounts signatures refer to signing credit only accounts. Credit only accounts can be used concurrently by multiple parallel transactions, but their balance may only be increased, and their account data is read-only.
        • num_credit_only_unsigned_accounts: The last num_credit_only_unsigned_accounts pubkeys in account_keys refer to non-signing credit only accounts
      • account_keys: List of pubkeys used by the transaction, including by the instructions and for signatures. The first num_required_signatures pubkeys must sign the transaction.
      • recent_blockhash: The ID of a recent ledger entry. Validators will reject transactions with a recent_blockhash that is too old.
      • instructions: A list of instructions that are run sequentially and committed in one atomic transaction if all succeed.
    • signatures: A list of signatures applied to the transaction. The list is always of length num_required_signatures, and the signature at index i corresponds to the pubkey at index i in account_keys. The list is initialized with empty signatures (i.e. zeros), and populated as signatures are added.

Transaction Signing

A Transaction is signed by using an ed25519 keypair to sign the serialization of the message. The resulting signature is placed at the index of signatures matching the index of the keypair's pubkey in account_keys.

Transaction Serialization

Transactions (and their messages) are serialized and deserialized using the bincode crate with a non-standard vector serialization that uses only one byte for the length if it can be encoded in 7 bits, 2 bytes if it fits in 14 bits, or 3 bytes if it requires 15 or 16 bits. The vector serialization is defined by Solana's short-vec.

Instructions

For the purposes of building a Transaction, a more verbose instruction format is used:

  • Instruction:
    • program_id: The pubkey of the on-chain program that executes the instruction
    • accounts: An ordered list of accounts that should be passed to the program processing the instruction, including metadata detailing if an account is a signer of the transaction and if it is a credit only account.
    • data: A byte array that is passed to the program executing the instruction

A more compact form is actually included in a Transaction:

  • CompiledInstruction:
    • program_id_index: The index of the program_id in the account_keys list
    • accounts: An ordered list of indices into account_keys specifying the accounds that should be passed to the program processing the instruction.
    • data: A byte array that is passed to the program executing the instruction

Blockstreamer

Solana supports a node type called an blockstreamer. This fullnode variation is intended for applications that need to observe the data plane without participating in transaction validation or ledger replication.

A blockstreamer runs without a vote signer, and can optionally stream ledger entries out to a Unix domain socket as they are processed. The JSON-RPC service still functions as on any other node.

To run a blockstreamer, include the argument no-signer and (optional) blockstream socket location:

$ ./multinode-demo/validator-x.sh --no-signer --blockstream <SOCKET>

The stream will output a series of JSON objects:

  • An Entry event JSON object is sent when each ledger entry is processed, with the following fields:

    • dt, the system datetime, as RFC3339-formatted string
    • t, the event type, always "entry"
    • s, the slot height, as unsigned 64-bit integer
    • h, the tick height, as unsigned 64-bit integer
    • entry, the entry, as JSON object
  • A Block event JSON object is sent when a block is complete, with the following fields:

    • dt, the system datetime, as RFC3339-formatted string
    • t, the event type, always "block"
    • s, the slot height, as unsigned 64-bit integer
    • h, the tick height, as unsigned 64-bit integer
    • l, the slot leader id, as base-58 encoded string
    • id, the block id, as base-58 encoded string

JSON RPC API

Solana nodes accept HTTP requests using the JSON-RPC 2.0 specification.

To interact with a Solana node inside a JavaScript application, use the solana-web3.js library, which gives a convenient interface for the RPC methods.

RPC HTTP Endpoint

Default port: 8899 eg. http://localhost:8899, http://192.168.1.88:8899

RPC PubSub WebSocket Endpoint

Default port: 8900 eg. ws://localhost:8900, http://192.168.1.88:8900

Methods

Request Formatting

To make a JSON-RPC request, send an HTTP POST request with a Content-Type: application/json header. The JSON request data should contain 4 fields:

  • jsonrpc, set to "2.0"
  • id, a unique client-generated identifying integer
  • method, a string containing the method to be invoked
  • params, a JSON array of ordered parameter values

Example using curl:

curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0", "id":1, "method":"getBalance", "params":["83astBRguLMdt2h5U1Tpdq5tjFoJ6noeGwaY3mDLVcri"]}' 192.168.1.88:8899

The response output will be a JSON object with the following fields:

  • jsonrpc, matching the request specification
  • id, matching the request identifier
  • result, requested data or success confirmation

Requests can be sent in batches by sending an array of JSON-RPC request objects as the data for a single POST.

Definitions

  • Hash: A SHA-256 hash of a chunk of data.
  • Pubkey: The public key of a Ed25519 key-pair.
  • Signature: An Ed25519 signature of a chunk of data.
  • Transaction: A Solana instruction signed by a client key-pair.

JSON RPC API Reference

confirmTransaction

Returns a transaction receipt

Parameters:
  • string - Signature of Transaction to confirm, as base-58 encoded string
Results:
  • boolean - Transaction status, true if Transaction is confirmed
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0", "id":1, "method":"confirmTransaction", "params":["5VERv8NMvzbJMEkV8xnrLkEaWRtSz9CosKDYjCJjBRnbJLgp8uirBgmQpjKhoR4tjF3ZpRzrFmBV6UjKdiSZkQUW"]}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":true,"id":1}

getAccountInfo

Returns all information associated with the account of provided Pubkey

Parameters:
  • string - Pubkey of account to query, as base-58 encoded string
Results:

The result field will be a JSON object with the following sub fields:

  • lamports, number of lamports assigned to this account, as a signed 64-bit integer
  • owner, array of 32 bytes representing the program this account has been assigned to
  • data, array of bytes representing any data associated with the account
  • executable, boolean indicating if the account contains a program (and is strictly read-only)
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0", "id":1, "method":"getAccountInfo", "params":["2gVkYWexTHR5Hb2aLeQN3tnngvWzisFKXDUPrgMHpdST"]}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":{"executable":false,"owner":[1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"lamports":1,"data":[3,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,20,0,0,0,0,0,0,0,50,48,53,48,45,48,49,45,48,49,84,48,48,58,48,48,58,48,48,90,252,10,7,28,246,140,88,177,98,82,10,227,89,81,18,30,194,101,199,16,11,73,133,20,246,62,114,39,20,113,189,32,50,0,0,0,0,0,0,0,247,15,36,102,167,83,225,42,133,127,82,34,36,224,207,130,109,230,224,188,163,33,213,13,5,117,211,251,65,159,197,51,0,0,0,0,0,0]},"id":1}

getBalance

Returns the balance of the account of provided Pubkey

Parameters:
  • string - Pubkey of account to query, as base-58 encoded string
Results:
  • integer - quantity, as a signed 64-bit integer
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0", "id":1, "method":"getBalance", "params":["83astBRguLMdt2h5U1Tpdq5tjFoJ6noeGwaY3mDLVcri"]}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":0,"id":1}

getClusterNodes

Returns information about all the nodes participating in the cluster

Parameters:

None

Results:

The result field will be an array of JSON objects, each with the following sub fields:

  • pubkey - Node public key, as base-58 encoded string
  • gossip - Gossip network address for the node
  • tpu - TPU network address for the node
  • rpc - JSON RPC network address for the node, or null if the JSON RPC service is not enabled
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0", "id":1, "method":"getClusterNodes"}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":[{"gossip":"10.239.6.48:8001","pubkey":"9QzsJf7LPLj8GkXbYT3LFDKqsj2hHG7TA3xinJHu8epQ","rpc":"10.239.6.48:8899","tpu":"10.239.6.48:8856"}],"id":1}

getEpochInfo

Returns information about the current epoch

Parameters:

None

Results:

The result field will be an object with the following fields:

  • epoch, the current epoch
  • slotIndex, the current slot relative to the start of the current epoch
  • slotsInEpoch, the number of slots in this epoch
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"getEpochInfo"}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":{"epoch":3,"slotIndex":126,"slotsInEpoch":256},"id":1}

getLeaderSchedule

Returns the leader schedule for the current epoch

Parameters:

None

Results:

The result field will be an array of leader public keys (as base-58 encoded strings) for each slot in the current epoch

Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"getLeaderSchedule"}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":[...],"id":1}

getProgramAccounts

Returns all accounts owned by the provided program Pubkey

Parameters:
  • string - Pubkey of program, as base-58 encoded string
Results:

The result field will be an array of arrays. Each sub array will contain:

  • string - a the account Pubkey as base-58 encoded string and a JSON object, with the following sub fields:

  • lamports, number of lamports assigned to this account, as a signed 64-bit integer

  • owner, array of 32 bytes representing the program this account has been assigned to

  • data, array of bytes representing any data associated with the account

  • executable, boolean indicating if the account contains a program (and is strictly read-only)

Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0", "id":1, "method":"getProgramAccounts", "params":["8nQwAgzN2yyUzrukXsCa3JELBYqDQrqJ3UyHiWazWxHR"]}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":[["BqGKYtAKu69ZdWEBtZHh4xgJY1BYa2YBiBReQE3pe383", {"executable":false,"owner":[50,28,250,90,221,24,94,136,147,165,253,136,1,62,196,215,225,34,222,212,99,84,202,223,245,13,149,99,149,231,91,96],"lamports":1,"data":[]], ["4Nd1mBQtrMJVYVfKf2PJy9NZUZdTAsp7D4xWLs4gDB4T", {"executable":false,"owner":[50,28,250,90,221,24,94,136,147,165,253,136,1,62,196,215,225,34,222,212,99,84,202,223,245,13,149,99,149,231,91,96],"lamports":10,"data":[]]]},"id":1}

getRecentBlockhash

Returns a recent block hash from the ledger, and a fee schedule that can be used to compute the cost of submitting a transaction using it.

Parameters:

None

Results:

An array consisting of

  • string - a Hash as base-58 encoded string
  • FeeCalculator object - the fee schedule for this block hash
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"getRecentBlockhash"}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":["GH7ome3EiwEr7tu9JuTh2dpYWBJK3z69Xm1ZE3MEE6JC",{"lamportsPerSignature": 0}],"id":1}

getSignatureStatus

Returns the status of a given signature. This method is similar to confirmTransaction but provides more resolution for error events.

Parameters:
  • string - Signature of Transaction to confirm, as base-58 encoded string
Results:
  • null - Unknown transaction
  • object - Transaction status:
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0", "id":1, "method":"getSignatureStatus", "params":["5VERv8NMvzbJMEkV8xnrLkEaWRtSz9CosKDYjCJjBRnbJLgp8uirBgmQpjKhoR4tjF3ZpRzrFmBV6UjKdiSZkQUW"]}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":"SignatureNotFound","id":1}

getSlotLeader

Returns the current slot leader

Parameters:

None

Results:
  • string - Node Id as base-58 encoded string
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"getSlotLeader"}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":"ENvAW7JScgYq6o4zKZwewtkzzJgDzuJAFxYasvmEQdpS","id":1}

getSlotsPerSegment

Returns the current storage segment size in terms of slots

Parameters:

None

Results:
  • u64 - Number of slots in a storage segment
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"getSlotsPerSegment"}' http://localhost:8899
// Result
{"jsonrpc":"2.0","result":"1024","id":1}

getStorageTurn

Returns the current storage turn's blockhash and slot

Parameters:

None

Results:

An array consisting of

  • string - a Hash as base-58 encoded string indicating the blockhash of the turn slot
  • u64 - the current storage turn slot
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"getStorageTurn"}' http://localhost:8899
 // Result
{"jsonrpc":"2.0","result":["GH7ome3EiwEr7tu9JuTh2dpYWBJK3z69Xm1ZE3MEE6JC", "2048"],"id":1}

getStorageTurnRate

Returns the current storage turn rate in terms of slots per turn

Parameters:

None

Results:
  • u64 - Number of slots in storage turn
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"getStorageTurnRate"}' http://localhost:8899
 // Result
{"jsonrpc":"2.0","result":"1024","id":1}


getNumBlocksSinceSignatureConfirmation

Returns the current number of blocks since signature has been confirmed.

Parameters:
  • string - Signature of Transaction to confirm, as base-58 encoded string
Results:
  • integer - count, as unsigned 64-bit integer
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0", "id":1, "method":"getNumBlocksSinceSignatureConfirmation", "params":["5VERv8NMvzbJMEkV8xnrLkEaWRtSz9CosKDYjCJjBRnbJLgp8uirBgmQpjKhoR4tjF3ZpRzrFmBV6UjKdiSZkQUW"]}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":8,"id":1}

getTransactionCount

Returns the current Transaction count from the ledger

Parameters:

None

Results:
  • integer - count, as unsigned 64-bit integer
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"getTransactionCount"}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":268,"id":1}

getTotalSupply

Returns the current total supply in Lamports

Parameters:

None

Results:
  • integer - Total supply, as unsigned 64-bit integer
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"getTotalSupply"}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":10126,"id":1}

getEpochVoteAccounts

Returns the account info and associated stake for all the voting accounts in the current epoch.

Parameters:

None

Results:

The result field will be an array of JSON objects, each with the following sub fields:

  • votePubkey - Vote account public key, as base-58 encoded string
  • nodePubkey - Node public key, as base-58 encoded string
  • stake - the stake, in lamports, delegated to this vote account
  • commission, a 32-bit integer used as a fraction (commission/MAX_U32) for rewards payout
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"getEpochVoteAccounts"}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":[{"commission":0,"nodePubkey":"Et2RaZJdJRTzTkodUwiHr4H6sLkVmijBFv8tkd7oSSFY","stake":42,"votePubkey":"B4CdWq3NBSoH2wYsVE1CaZSWPo2ZtopE4SJipQhZ3srF"}],"id":1}

requestAirdrop

Requests an airdrop of lamports to a Pubkey

Parameters:
  • string - Pubkey of account to receive lamports, as base-58 encoded string
  • integer - lamports, as a signed 64-bit integer
Results:
  • string - Transaction Signature of airdrop, as base-58 encoded string
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"requestAirdrop", "params":["83astBRguLMdt2h5U1Tpdq5tjFoJ6noeGwaY3mDLVcri", 50]}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":"5VERv8NMvzbJMEkV8xnrLkEaWRtSz9CosKDYjCJjBRnbJLgp8uirBgmQpjKhoR4tjF3ZpRzrFmBV6UjKdiSZkQUW","id":1}

sendTransaction

Creates new transaction

Parameters:
  • array - array of octets containing a fully-signed Transaction
Results:
  • string - Transaction Signature, as base-58 encoded string
Example:
// Request
curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1, "method":"sendTransaction", "params":[[61, 98, 55, 49, 15, 187, 41, 215, 176, 49, 234, 229, 228, 77, 129, 221, 239, 88, 145, 227, 81, 158, 223, 123, 14, 229, 235, 247, 191, 115, 199, 71, 121, 17, 32, 67, 63, 209, 239, 160, 161, 2, 94, 105, 48, 159, 235, 235, 93, 98, 172, 97, 63, 197, 160, 164, 192, 20, 92, 111, 57, 145, 251, 6, 40, 240, 124, 194, 149, 155, 16, 138, 31, 113, 119, 101, 212, 128, 103, 78, 191, 80, 182, 234, 216, 21, 121, 243, 35, 100, 122, 68, 47, 57, 13, 39, 0, 0, 0, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 0, 0, 0, 0, 0, 0, 0, 40, 240, 124, 194, 149, 155, 16, 138, 31, 113, 119, 101, 212, 128, 103, 78, 191, 80, 182, 234, 216, 21, 121, 243, 35, 100, 122, 68, 47, 57, 11, 12, 106, 49, 74, 226, 201, 16, 161, 192, 28, 84, 124, 97, 190, 201, 171, 186, 6, 18, 70, 142, 89, 185, 176, 154, 115, 61, 26, 163, 77, 1, 88, 98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]}' http://localhost:8899

// Result
{"jsonrpc":"2.0","result":"2EBVM6cB8vAAD93Ktr6Vd8p67XPbQzCJX47MpReuiCXJAtcjaxpvWpcg9Ege1Nr5Tk3a2GFrByT7WPBjdsTycY9b","id":1}

Subscription Websocket

After connect to the RPC PubSub websocket at ws://<ADDRESS>/:

  • Submit subscription requests to the websocket using the methods below
  • Multiple subscriptions may be active at once
  • All subscriptions take an optional confirmations parameter, which defines how many confirmed blocks the node should wait before sending a notification. The greater the number, the more likely the notification is to represent consensus across the cluster, and the less likely it is to be affected by forking or rollbacks. If unspecified, the default value is 0; the node will send a notification as soon as it witnesses the event. The maximum confirmations wait length is the cluster's MAX_LOCKOUT_HISTORY, which represents the economic finality of the chain.

accountSubscribe

Subscribe to an account to receive notifications when the lamports or data for a given account public key changes

Parameters:
  • string - account Pubkey, as base-58 encoded string
  • integer - optional, number of confirmed blocks to wait before notification. Default: 0, Max: MAX_LOCKOUT_HISTORY (greater integers rounded down)
Results:
  • integer - Subscription id (needed to unsubscribe)
Example:
// Request
{"jsonrpc":"2.0", "id":1, "method":"accountSubscribe", "params":["CM78CPUeXjn8o3yroDHxUtKsZZgoy4GPkPPXfouKNH12"]}

{"jsonrpc":"2.0", "id":1, "method":"accountSubscribe", "params":["CM78CPUeXjn8o3yroDHxUtKsZZgoy4GPkPPXfouKNH12", 15]}

// Result
{"jsonrpc": "2.0","result": 0,"id": 1}
Notification Format:
{"jsonrpc": "2.0","method": "accountNotification", "params": {"result": {"executable":false,"owner":[1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"lamports":1,"data":[3,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,20,0,0,0,0,0,0,0,50,48,53,48,45,48,49,45,48,49,84,48,48,58,48,48,58,48,48,90,252,10,7,28,246,140,88,177,98,82,10,227,89,81,18,30,194,101,199,16,11,73,133,20,246,62,114,39,20,113,189,32,50,0,0,0,0,0,0,0,247,15,36,102,167,83,225,42,133,127,82,34,36,224,207,130,109,230,224,188,163,33,213,13,5,117,211,251,65,159,197,51,0,0,0,0,0,0]},"subscription":0}}

accountUnsubscribe

Unsubscribe from account change notifications

Parameters:
  • integer - id of account Subscription to cancel
Results:
  • bool - unsubscribe success message
Example:
// Request
{"jsonrpc":"2.0", "id":1, "method":"accountUnsubscribe", "params":[0]}

// Result
{"jsonrpc": "2.0","result": true,"id": 1}

programSubscribe

Subscribe to a program to receive notifications when the lamports or data for a given account owned by the program changes

Parameters:
  • string - program_id Pubkey, as base-58 encoded string
  • integer - optional, number of confirmed blocks to wait before notification. Default: 0, Max: MAX_LOCKOUT_HISTORY (greater integers rounded down)
Results:
  • integer - Subscription id (needed to unsubscribe)
Example:
// Request
{"jsonrpc":"2.0", "id":1, "method":"programSubscribe", "params":["9gZbPtbtHrs6hEWgd6MbVY9VPFtS5Z8xKtnYwA2NynHV"]}

{"jsonrpc":"2.0", "id":1, "method":"programSubscribe", "params":["9gZbPtbtHrs6hEWgd6MbVY9VPFtS5Z8xKtnYwA2NynHV", 15]}

// Result
{"jsonrpc": "2.0","result": 0,"id": 1}
Notification Format:
  • string - account Pubkey, as base-58 encoded string
  • object - account info JSON object (see getAccountInfo for field details)
{"jsonrpc":"2.0","method":"programNotification","params":{{"result":["8Rshv2oMkPu5E4opXTRyuyBeZBqQ4S477VG26wUTFxUM",{"executable":false,"lamports":1,"owner":[129,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"data":[1,1,1,0,0,0,0,0,0,0,20,0,0,0,0,0,0,0,50,48,49,56,45,49,50,45,50,52,84,50,51,58,53,57,58,48,48,90,235,233,39,152,15,44,117,176,41,89,100,86,45,61,2,44,251,46,212,37,35,118,163,189,247,84,27,235,178,62,55,89,0,0,0,0,50,0,0,0,0,0,0,0,235,233,39,152,15,44,117,176,41,89,100,86,45,61,2,44,251,46,212,37,35,118,163,189,247,84,27,235,178,62,45,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}],"subscription":0}}

programUnsubscribe

Unsubscribe from program-owned account change notifications

Parameters:
  • integer - id of account Subscription to cancel
Results:
  • bool - unsubscribe success message
Example:
// Request
{"jsonrpc":"2.0", "id":1, "method":"programUnsubscribe", "params":[0]}

// Result
{"jsonrpc": "2.0","result": true,"id": 1}

signatureSubscribe

Subscribe to a transaction signature to receive notification when the transaction is confirmed On signatureNotification, the subscription is automatically cancelled

Parameters:
  • string - Transaction Signature, as base-58 encoded string
  • integer - optional, number of confirmed blocks to wait before notification. Default: 0, Max: MAX_LOCKOUT_HISTORY (greater integers rounded down)
Results:
  • integer - subscription id (needed to unsubscribe)
Example:
// Request
{"jsonrpc":"2.0", "id":1, "method":"signatureSubscribe", "params":["2EBVM6cB8vAAD93Ktr6Vd8p67XPbQzCJX47MpReuiCXJAtcjaxpvWpcg9Ege1Nr5Tk3a2GFrByT7WPBjdsTycY9b"]}

{"jsonrpc":"2.0", "id":1, "method":"signatureSubscribe", "params":["2EBVM6cB8vAAD93Ktr6Vd8p67XPbQzCJX47MpReuiCXJAtcjaxpvWpcg9Ege1Nr5Tk3a2GFrByT7WPBjdsTycY9b", 15]}

// Result
{"jsonrpc": "2.0","result": 0,"id": 1}
Notification Format:
{"jsonrpc": "2.0","method": "signatureNotification", "params": {"result": "Confirmed","subscription":0}}

signatureUnsubscribe

Unsubscribe from signature confirmation notification

Parameters:
  • integer - subscription id to cancel
Results:
  • bool - unsubscribe success message
Example:
// Request
{"jsonrpc":"2.0", "id":1, "method":"signatureUnsubscribe", "params":[0]}

// Result
{"jsonrpc": "2.0","result": true,"id": 1}

JavaScript API

See solana-web3.

solana-wallet CLI

The solana crate is distributed with a command-line interface tool

Examples

Get Pubkey

// Command
$ solana-wallet address

// Return
<PUBKEY>

Airdrop Lamports

// Command
$ solana-wallet airdrop 123

// Return
"Your balance is: 123"

Get Balance

// Command
$ solana-wallet balance

// Return
"Your balance is: 123"

Confirm Transaction

// Command
$ solana-wallet confirm <TX_SIGNATURE>

// Return
"Confirmed" / "Not found" / "Transaction failed with error <ERR>"

Deploy program

// Command
$ solana-wallet deploy <PATH>

// Return
<PROGRAM_ID>

Unconditional Immediate Transfer

// Command
$ solana-wallet pay <PUBKEY> 123

// Return
<TX_SIGNATURE>

Post-Dated Transfer

// Command
$ solana-wallet pay <PUBKEY> 123 \
    --after 2018-12-24T23:59:00 --require-timestamp-from <PUBKEY>

// Return
{signature: <TX_SIGNATURE>, processId: <PROCESS_ID>}

require-timestamp-from is optional. If not provided, the transaction will expect a timestamp signed by this wallet's secret key

Authorized Transfer

A third party must send a signature to unlock the lamports.

// Command
$ solana-wallet pay <PUBKEY> 123 \
    --require-signature-from <PUBKEY>

// Return
{signature: <TX_SIGNATURE>, processId: <PROCESS_ID>}

Post-Dated and Authorized Transfer

// Command
$ solana-wallet pay <PUBKEY> 123 \
    --after 2018-12-24T23:59 --require-timestamp-from <PUBKEY> \
    --require-signature-from <PUBKEY>

// Return
{signature: <TX_SIGNATURE>, processId: <PROCESS_ID>}

Multiple Witnesses

// Command
$ solana-wallet pay <PUBKEY> 123 \
    --require-signature-from <PUBKEY> \
    --require-signature-from <PUBKEY>

// Return
{signature: <TX_SIGNATURE>, processId: <PROCESS_ID>}

Cancelable Transfer

// Command
$ solana-wallet pay <PUBKEY> 123 \
    --require-signature-from <PUBKEY> \
    --cancelable

// Return
{signature: <TX_SIGNATURE>, processId: <PROCESS_ID>}

Cancel Transfer

// Command
$ solana-wallet cancel <PROCESS_ID>

// Return
<TX_SIGNATURE>

Send Signature

// Command
$ solana-wallet send-signature <PUBKEY> <PROCESS_ID>

// Return
<TX_SIGNATURE>

Indicate Elapsed Time

Use the current system time:

// Command
$ solana-wallet send-timestamp <PUBKEY> <PROCESS_ID>

// Return
<TX_SIGNATURE>

Or specify some other arbitrary timestamp:

// Command
$ solana-wallet send-timestamp <PUBKEY> <PROCESS_ID> --date 2018-12-24T23:59:00

// Return
<TX_SIGNATURE>

Usage

solana-wallet 0.12.0

USAGE:
    solana-wallet [FLAGS] [OPTIONS] [SUBCOMMAND]

FLAGS:
    -h, --help       Prints help information
        --rpc-tls    Enable TLS for the RPC endpoint
    -V, --version    Prints version information

OPTIONS:
        --drone-host <IP ADDRESS>    Drone host to use [default: same as --host]
        --drone-port <PORT>          Drone port to use [default: 9900]
    -n, --host <IP ADDRESS>          Host to use for both RPC and drone [default: 127.0.0.1]
    -k, --keypair <PATH>             /path/to/id.json
        --rpc-host <IP ADDRESS>      RPC host to use [default: same as --host]
        --rpc-port <PORT>            RPC port to use [default: 8899]

SUBCOMMANDS:
    address                  Get your public key
    airdrop                  Request a batch of lamports
    balance                  Get your balance
    cancel                   Cancel a transfer
    confirm                  Confirm transaction by signature
    deploy                   Deploy a program
    get-transaction-count    Get current transaction count
    help                     Prints this message or the help of the given subcommand(s)
    pay                      Send a payment
    send-signature           Send a signature to authorize a transfer
    send-timestamp           Send a timestamp to unlock a transfer
solana-wallet-address
Get your public key

USAGE:
    solana-wallet address

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
solana-wallet-airdrop
Request a batch of lamports

USAGE:
    solana-wallet airdrop <NUM>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <NUM>    The number of lamports to request
solana-wallet-balance
Get your balance

USAGE:
    solana-wallet balance

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
solana-wallet-cancel
Cancel a transfer

USAGE:
    solana-wallet cancel <PROCESS_ID>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <PROCESS_ID>    The process id of the transfer to cancel
solana-wallet-confirm
Confirm transaction by signature

USAGE:
    solana-wallet confirm <SIGNATURE>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <SIGNATURE>    The transaction signature to confirm
solana-wallet-deploy
Deploy a program

USAGE:
    solana-wallet deploy <PATH>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <PATH>    /path/to/program.o
solana-wallet-fees
Display current cluster fees

USAGE:
    solana-wallet fees

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
solana-wallet-get-transaction-count
Get current transaction count

USAGE:
    solana-wallet get-transaction-count

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
solana-wallet-pay
Send a payment

USAGE:
    solana-wallet pay [FLAGS] [OPTIONS] <PUBKEY> <NUM>

FLAGS:
        --cancelable
    -h, --help          Prints help information
    -V, --version       Prints version information

OPTIONS:
        --after <DATETIME>                      A timestamp after which transaction will execute
        --require-timestamp-from <PUBKEY>       Require timestamp from this third party
        --require-signature-from <PUBKEY>...    Any third party signatures required to unlock the lamports

ARGS:
    <PUBKEY>    The pubkey of recipient
    <NUM>       The number of lamports to send
solana-wallet-send-signature
Send a signature to authorize a transfer

USAGE:
    solana-wallet send-signature <PUBKEY> <PROCESS_ID>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <PUBKEY>        The pubkey of recipient
    <PROCESS_ID>    The process id of the transfer to authorize
solana-wallet-send-timestamp
Send a timestamp to unlock a transfer

USAGE:
    solana-wallet send-timestamp [OPTIONS] <PUBKEY> <PROCESS_ID>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --date <DATETIME>    Optional arbitrary timestamp to apply

ARGS:
    <PUBKEY>        The pubkey of recipient
    <PROCESS_ID>    The process id of the transfer to unlock

Proposed Architectural Changes

The following architectural proposals have been accepted by the Solana team, but are not yet fully implemented. The proposals may be implemented as described, implemented differently as issues in the designs become evident, or not implemented at all. If implemented, the descriptions will be moved from this section to earlier chapters in a future version of this book.

Ledger Replication

Replication behavior yet to be implemented.

Storage epoch

The storage epoch should be the number of slots which results in around 100GB-1TB of ledger to be generated for replicators to store. Replicators will start storing ledger when a given fork has a high probability of not being rolled back.

Validator behavior

  1. Every NUM_KEY_ROTATION_TICKS it also validates samples received from replicators. It signs the PoH hash at that point and uses the following algorithm with the signature as the input:
    • The low 5 bits of the first byte of the signature creates an index into another starting byte of the signature.
    • The validator then looks at the set of storage proofs where the byte of the proof's sha state vector starting from the low byte matches exactly with the chosen byte(s) of the signature.
    • If the set of proofs is larger than the validator can handle, then it increases to matching 2 bytes in the signature.
    • Validator continues to increase the number of matching bytes until a workable set is found.
    • It then creates a mask of valid proofs and fake proofs and sends it to the leader. This is a storage proof confirmation transaction.
  2. After a lockout period of NUM_SECONDS_STORAGE_LOCKOUT seconds, the validator then submits a storage proof claim transaction which then causes the distribution of the storage reward if no challenges were seen for the proof to the validators and replicators party to the proofs.

Replicator behavior

  1. The replicator then generates another set of offsets which it submits a fake proof with an incorrect sha state. It can be proven to be fake by providing the seed for the hash result.
    • A fake proof should consist of a replicator hash of a signature of a PoH value. That way when the replicator reveals the fake proof, it can be verified on chain.
  2. The replicator monitors the ledger, if it sees a fake proof integrated, it creates a challenge transaction and submits it to the current leader. The transacation proves the validator incorrectly validated a fake storage proof. The replicator is rewarded and the validator's staking balance is slashed or frozen.

Storage proof contract logic

Each replicator and validator will have their own storage account. The validator's account would be separate from their gossip id similiar to their vote account. These should be implemented as two programs one which handles the validator as the keysigner and one for the replicator. In that way when the programs reference other accounts, they can check the program id to ensure it is a validator or replicator account they are referencing.

SubmitMiningProof

SubmitMiningProof {
    slot: u64,
    sha_state: Hash,
    signature: Signature,
};
keys = [replicator_keypair]

Replicators create these after mining their stored ledger data for a certain hash value. The slot is the end slot of the segment of ledger they are storing, the sha_state the result of the replicator using the hash function to sample their encrypted ledger segment. The signature is the signature that was created when they signed a PoH value for the current storage epoch. The list of proofs from the current storage epoch should be saved in the account state, and then transfered to a list of proofs for the previous epoch when the epoch passes. In a given storage epoch a given replicator should only submit proofs for one segment.

The program should have a list of slots which are valid storage mining slots. This list should be maintained by keeping track of slots which are rooted slots in which a significant portion of the network has voted on with a high lockout value, maybe 32-votes old. Every SLOTS_PER_SEGMENT number of slots would be added to this set. The program should check that the slot is in this set. The set can be maintained by receiving a AdvertiseStorageRecentBlockHash and checking with its bank/Tower BFT state.

The program should do a signature verify check on the signature, public key from the transaction submitter and the message of the previous storage epoch PoH value.

ProofValidation

ProofValidation {
   proof_mask: Vec<ProofStatus>,
}
keys = [validator_keypair, replicator_keypair(s) (unsigned)]

A validator will submit this transaction to indicate that a set of proofs for a given segment are valid/not-valid or skipped where the validator did not look at it. The keypairs for the replicators that it looked at should be referenced in the keys so the program logic can go to those accounts and see that the proofs are generated in the previous epoch. The sampling of the storage proofs should be verified ensuring that the correct proofs are skipped by the validator according to the logic outlined in the validator behavior of sampling.

The included replicator keys will indicate the the storage samples which are being referenced; the length of the proof_mask should be verified against the set of storage proofs in the referenced replicator account(s), and should match with the number of proofs submitted in the previous storage epoch in the state of said replicator account.

ClaimStorageReward

ClaimStorageReward {
}
keys = [validator_keypair or replicator_keypair, validator/replicator_keypairs (unsigned)]

Replicators and validators will use this transaction to get paid tokens from a program state where SubmitStorageProof, ProofValidation and ChallengeProofValidations are in a state where proofs have been submitted and validated and there are no ChallengeProofValidations referencing those proofs. For a validator, it should reference the replicator keypairs to which it has validated proofs in the relevant epoch. And for a replicator it should reference validator keypairs for which it has validated and wants to be rewarded.

ChallengeProofValidation

ChallengeProofValidation {
    proof_index: u64,
    hash_seed_value: Vec<u8>,
}
keys = [replicator_keypair, validator_keypair]

This transaction is for catching lazy validators who are not doing the work to validate proofs. A replicator will submit this transaction when it sees a validator has approved a fake SubmitMiningProof transaction. Since the replicator is a light client not looking at the full chain, it will have to ask a validator or some set of validators for this information maybe via RPC call to obtain all ProofValidations for a certain segment in the previous storage epoch. The program will look in the validator account state see that a ProofValidation is submitted in the previous storage epoch and hash the hash_seed_value and see that the hash matches the SubmitMiningProof transaction and that the validator marked it as valid. If so, then it will save the challenge to the list of challenges that it has in its state.

AdvertiseStorageRecentBlockhash

AdvertiseStorageRecentBlockhash {
    hash: Hash,
    slot: u64,
}

Validators and replicators will submit this to indicate that a new storage epoch has passed and that the storage proofs which are current proofs should now be for the previous epoch. Other transactions should check to see that the epoch that they are referencing is accurate according to current chain state.

Secure Vote Signing

This design describes additional vote signing behavior that will make the process more secure.

Currently, Solana implements a vote-signing service that evaluates each vote to ensure it does not violate a slashing condition. The service could potentially have different variations, depending on the hardware platform capabilities. In particular, it could be used in conjunction with a secure enclave (such as SGX). The enclave could generate an asymmetric key, exposing an API for user (untrusted) code to sign the vote transactions, while keeping the vote-signing private key in its protected memory.

The following sections outline how this architecture would work:

Message Flow

  1. The node initializes the enclave at startup
    • The enclave generates an asymmetric key and returns the public key to the node
    • The keypair is ephemeral. A new keypair is generated on node bootup. A new keypair might also be generated at runtime based on some TBD criteria.
    • The enclave returns its attestation report to the node
  2. The node performs attestation of the enclave (e.g using Intel's IAS APIs)
    • The node ensures that the Secure Enclave is running on a TPM and is signed by a trusted party
  3. The stakeholder of the node grants ephemeral key permission to use its stake. This process is TBD.
  4. The node's untrusted, non-enclave software calls trusted enclave software using its interface to sign transactions and other data.
    • In case of vote signing, the node needs to verify the PoH. The PoH verification is an integral part of signing. The enclave would be presented with some verifiable data to check before signing the vote.
    • The process of generating the verifiable data in untrusted space is TBD

PoH Verification

  1. When the node votes on an en entry X, there's a lockout period N, for which it cannot vote on a fork that does not contain X in its history.
  2. Every time the node votes on the derivative of X, say X+y, the lockout period for X increases by a factor F (i.e. the duration node cannot vote on a fork that does not contain X increases).
    • The lockout period for X+y is still N until the node votes again.
  3. The lockout period increment is capped (e.g. factor F applies maximum 32 times).
  4. The signing enclave must not sign a vote that violates this policy. This means
    • Enclave is initialized with N, F and Factor cap
    • Enclave stores Factor cap number of entry IDs on which the node had previously voted
    • The sign request contains the entry ID for the new vote
    • Enclave verifies that new vote's entry ID is on the correct fork (following the rules #1 and #2 above)

Ancestor Verification

This is alternate, albeit, less certain approach to verifying voting fork.

  1. The validator maintains an active set of nodes in the cluster
  2. It observes the votes from the active set in the last voting period
  3. It stores the ancestor/last_tick at which each node voted
  4. It sends new vote request to vote-signing service
    • It includes previous votes from nodes in the active set, and their corresponding ancestors
  5. The signer checks if the previous votes contains a vote from the validator, and the vote ancestor matches with majority of the nodes
    • It signs the new vote if the check is successful
    • It asserts (raises an alarm of some sort) if the check is unsuccessful

The premise is that the validator can be spoofed at most once to vote on incorrect data. If someone hijacks the validator and submits a vote request for bogus data, that vote will not be included in the PoH (as it'll be rejected by the cluster). The next time the validator sends a request to sign the vote, the signing service will detect that validator's last vote is missing (as part of #5 above).

Fork determination

Due to the fact that the enclave cannot process PoH, it has no direct knowledge of fork history of a submitted validator vote. Each enclave should be initiated with the current active set of public keys. A validator should submit its current vote along with the votes of the active set (including itself) that it observed in the slot of its previous vote. In this way, the enclave can surmise the votes accompanying the validator's previous vote and thus the fork being voted on. This is not possible for the validator's initial submitted vote, as it will not have a 'previous' slot to reference. To account for this, a short voting freeze should apply until the second vote is submitted containing the votes within the active set, along with it's own vote, at the height of the initial vote.

Enclave configuration

A staking client should be configurable to prevent voting on inactive forks. This mechanism should use the client's known active set N_active along with a threshold vote N_vote and a threshold depth N_depth to determine whether or not to continue voting on a submitted fork. This configuration should take the form of a rule such that the client will only vote on a fork if it observes more than N_vote at N_depth. Practically, this represents the client from confirming that it has observed some probability of economic finality of the submitted fork at a depth where an additional vote would create a lockout for an undesirable amount of time if that fork turns out not to be live.

Challenges

  1. Generation of verifiable data in untrusted space for PoH verification in the enclave.
  2. Need infrastructure for granting stake to an ephemeral key.

Staking Rewards

A Proof of Stake (PoS), (i.e. using in-protocol asset, SOL, to provide secure consensus) design is outlined here. Solana implements a proof of stake reward/security scheme for validator nodes in the cluster. The purpose is threefold:

  • Align validator incentives with that of the greater cluster through skin-in-the-game deposits at risk
  • Avoid 'nothing at stake' fork voting issues by implementing slashing rules aimed at promoting fork convergence
  • Provide an avenue for validator rewards provided as a function of validator participation in the cluster.

While many of the details of the specific implementation are currently under consideration and are expected to come into focus through specific modeling studies and parameter exploration on the Solana testnet, we outline here our current thinking on the main components of the PoS system. Much of this thinking is based on the current status of Casper FFG, with optimizations and specific attributes to be modified as is allowed by Solana's Proof of History (PoH) blockchain data structure.

General Overview

Solana's ledger validation design is based on a rotating, stake-weighted selected leader broadcasting transactions in a PoH data structure to validating nodes. These nodes, upon receiving the leader's broadcast, have the opportunity to vote on the current state and PoH height by signing a transaction into the PoH stream.

To become a Solana validator, a fullnode must deposit/lock-up some amount of SOL in a contract. This SOL will not be accessible for a specific time period. The precise duration of the staking lockup period has not been determined. However we can consider three phases of this time for which specific parameters will be necessary:

  • Warm-up period: which SOL is deposited and inaccessible to the node, however PoH transaction validation has not begun. Most likely on the order of days to weeks
  • Validation period: a minimum duration for which the deposited SOL will be inaccessible, at risk of slashing (see slashing rules below) and earning rewards for the validator participation. Likely duration of months to a year.
  • Cool-down period: a duration of time following the submission of a 'withdrawal' transaction. During this period validation responsibilities have been removed and the funds continue to be inaccessible. Accumulated rewards should be delivered at the end of this period, along with the return of the initial deposit.

Solana's trustless sense of time and ordering provided by its PoH data structure, along with its turbine data broadcast and transmission design, should provide sub-second transaction confirmation times that scale with the log of the number of nodes in the cluster. This means we shouldn't have to restrict the number of validating nodes with a prohibitive 'minimum deposits' and expect nodes to be able to become validators with nominal amounts of SOL staked. At the same time, Solana's focus on high-throughput should create incentive for validation clients to provide high-performant and reliable hardware. Combined with potential a minimum network speed threshold to join as a validation-client, we expect a healthy validation delegation market to emerge. To this end, Solana's testnet will lead into a "Tour de SOL" validation-client competition, focusing on throughput and uptime to rank and reward testnet validators.

Slashing rules

Unlike Proof of Work (PoW) where off-chain capital expenses are already deployed at the time of block construction/voting, PoS systems require capital-at-risk to prevent a logical/optimal strategy of multiple chain voting. We intend to implement slashing rules which, if broken, result some amount of the offending validator's deposited stake to be removed from circulation. Given the ordering properties of the PoH data structure, we believe we can simplify our slashing rules to the level of a voting lockout time assigned per vote.

I.e. Each vote has an associated lockout time (PoH duration) that represents a duration by any additional vote from that validator must be in a PoH that contains the original vote, or a portion of that validator's stake is slashable. This duration time is a function of the initial vote PoH count and all additional vote PoH counts. It will likely take the form:

Lockouti(PoHi, PoHj) = PoHj + K * exp((PoHj - PoHi) / K)

Where PoHi is the height of the vote that the lockout is to be applied to and PoHj is the height of the current vote on the same fork. If the validator submits a vote on a different PoH fork on any PoHk where k > j > i and PoHk < Lockout(PoHi, PoHj), then a portion of that validator's stake is at risk of being slashed.

In addition to the functional form lockout described above, early implementation may be a numerical approximation based on a First In, First Out (FIFO) data structure and the following logic:

  • FIFO queue holding 32 votes per active validator
  • new votes are pushed on top of queue (push_front)
  • expired votes are popped off top (pop_front)
  • as votes are pushed into the queue, the lockout of each queued vote doubles
  • votes are removed from back of queue if queue.len() > 32
  • the earliest and latest height that has been removed from the back of the queue should be stored

It is likely that a reward will be offered as a % of the slashed amount to any node that submits proof of this slashing condition being violated to the PoH.

Partial Slashing

In the schema described so far, when a validator votes on a given PoH stream, they are committing themselves to that fork for a time determined by the vote lockout. An open question is whether validators will be hesitant to begin voting on an available fork if the penalties are perceived too harsh for an honest mistake or flipped bit.

One way to address this concern would be a partial slashing design that results in a slashable amount as a function of either:

  1. the fraction of validators, out of the total validator pool, that were also slashed during the same time period (ala Casper)
  2. the amount of time since the vote was cast (e.g. a linearly increasing % of total deposited as slashable amount over time), or both.

This is an area currently under exploration

Penalties

As discussed in the Economic Design section, annual validator interest rates are to be specified as a function of total percentage of circulating supply that has been staked. The cluster rewards validators who are online and actively participating in the validation process throughout the entirety of their validation period. For validators that go offline/fail to validate transactions during this period, their annual reward is effectively reduced.

Similarly, we may consider an algorithmic reduction in a validator's active amount staked amount in the case that they are offline. I.e. if a validator is inactive for some amount of time, either due to a partition or otherwise, the amount of their stake that is considered ‘active’ (eligible to earn rewards) may be reduced. This design would be structured to help long-lived partitions to eventually reach finality on their respective chains as the % of non-voting total stake is reduced over time until a super-majority can be achieved by the active validators in each partition. Similarly, upon re-engaging, the ‘active’ amount staked will come back online at some defined rate. Different rates of stake reduction may be considered depending on the size of the partition/active set.

Economic Design Overview

Solana’s crypto-economic system is designed to promote a healthy, long term self-sustaining economy with participant incentives aligned to the security and decentralization of the network. The main participants in this economy are validation-clients and replication-clients. Their contributions to the network, state validation and data storage respectively, and their requisite remittance mechanisms are discussed below.

The main channels of participant remittances are referred to as protocol-based rewards and transaction fees. Protocol-based rewards are protocol-derived issuances from a network-controlled reserve of tokens (sometimes referred to as the ‘mining pool’). These rewards will constitute the total reward delivered to replication clients and a portion of the total rewards for validation clients, the remaining sourced from transaction fees. In the early days of the network, it is likely that protocol-based rewards, deployed based on predefined issuance schedule, will drive the majority of participant incentives to join the network.

These protocol-based rewards, to be distributed to participating validation and replication clients, are to be specified as annual interest rates calculated per, real-time, Solana epoch [DEFINITION]. As discussed further below, the issuance rates are determined as a function of total network validator staked percentage and total replication provided by replicators in each previous epoch. The choice for validator and replicator client rewards to be based on participation rates, rather than a global fixed inflation or interest rate, emphasizes a protocol priority of overall economic security, rather than monetary supply predictability. Due to Solana’s hard total supply cap of 1B tokens and the bounds of client participant rates in the protocol, we believe that global interest, and supply issuance, scenarios should be able to be modeled with reasonable uncertainties.

Transaction fees are market-based participant-to-participant transfers, attached to network interactions as a necessary motivation and compensation for the inclusion and execution of a proposed transaction (be it a state execution or proof-of-replication verification). A mechanism for continuous and long-term funding of the mining pool through a pre-dedicated portion of transaction fees is also discussed below.

A high-level schematic of Solana’s crypto-economic design is shown below in Figure 1. The specifics of validation-client economics are described in sections: Validation-client Economics, State-validation Protocol-based Rewards, State-validation Transaction Fees and Replication-validation Transaction Fees. Also, the chapter titled Validation Stake Delegation closes with a discussion of validator delegation opportunties and marketplace. The Replication-client Economics chapter will review the Solana network design for global ledger storage/redundancy and replicator-client economics (Storage-replication rewards) along with a replicator-to-validator delegation mechanism designed to aide participant on-boarding into the Solana economy discussed in Replication-client Reward Auto-delegation. The Economic Sustainability section dives deeper into Solana’s design for long-term economic sustainability and outlines the constraints and conditions for a self-sustaining economy. An outline of features for an MVP economic design is discussed in the Economic Design MVP section. Finally, in chapter Attack Vectors, various attack vectors will be described and potential vulnerabilities explored and parameterized.

== Solana Economic Design Diagram ==

Figure 1: Schematic overview of Solana economic incentive design.

Validation-client Economics

Validator-clients are eligible to receive protocol-based (i.e. via mining pool) rewards issued via stake-based annual interest rates by providing compute (CPU+GPU) resources to validate and vote on a given PoH state. These protocol-based rewards are determined through an algorithmic schedule as a function of total amount of Solana tokens staked in the system and duration since network launch (genesis block). Additionally, these clients may earn revenue through two types of transaction fees: state-validation transaction fees and pooled Proof-of-Replication (PoRep) transaction fees. The distribution of these two types of transaction fees to the participating validation set are designed independently as economic goals and attack vectors are unique between the state- generation/validation mechanism and the ledger replication/validation mechanism. For clarity, we separately describe the design and motivation of the three types of potential revenue streams for validation-clients below: state-validation protocol-based rewards, state-validation transaction fees and PoRep-validation transaction fees.

State-validation protocol-based rewards

Validator-clients have two functional roles in the Solana network

  • Validate (vote) the current global state of that PoH along with any Proofs-of-Replication (see Replication Client Economics) that they are eligible to validate

  • Be elected as ‘leader’ on a stake-weighted round-robin schedule during which time they are responsible for collecting outstanding transactions and Proofs-of-Replication and incorporating them into the PoH, thus updating the global state of the network and providing chain continuity.

Validator-client rewards for these services are to be distributed at the end of each Solana epoch. Compensation for validator-clients is provided via a protocol-based annual interest rate dispersed in proportion to the stake-weight of each validator (see below) along with leader-claimed transaction fees available during each leader rotation. I.e. during the time a given validator-client is elected as leader, it has the opportunity to keep a portion of each non-PoRep transaction fee, less a protocol-specified amount that is returned to the mining pool (see Validation-client State Transaction Fees). PoRep transaction fees are not collected directly by the leader client but pooled and returned to the validator set in proportion to the number of successfully validated PoReps. (see Replication-client Transaction Fees)

The protocol-based annual interest-rate (%) per epoch to be distributed to validation-clients is to be a function of:

  • the current fraction of staked SOLs out of the current total circulating supply,

  • the global time since the genesis block instantiation

  • the up-time/participation [% of available slots/blocks that validator had opportunity to vote on?] of a given validator over the previous epoch.

The first two factors are protocol parameters only (i.e. independent of validator behavior in a given epoch) and describe a global validation reward schedule designed to both incentivize early participation and optimal security in the network. This schedule sets a maximum annual validator-client interest rate per epoch.

At any given point in time, this interest rate is pegged to a defined value given a specific % staked SOL out of the circulating supply (e.g. 10% interest rate when 66% of circulating SOL is staked). The interest rate adjusts as the square-root [TBD] of the % staked, leading to higher validation-client interest rates as the % staked drops below the targeted goal, thus incentivizing more participation leading to more security in the network. An example of such a schedule, for a specified point in time (e.g. network launch) is shown in Table 1.

Percentage circulating supply staked [%]Annual validator-client interest rate [%]
513.87
1513.31
2512.73
3512.12
4511.48
5510.80
6610.00
759.29
858.44

Table 1: Example interest rate schedule based on % SOL staked out of circulating supply. In this case, interest rates are fixed at 10% for 66% of staked circulating supply

Over time, the interest rate, at any network staked percentage, will drop as described by an algorithmic schedule. Validation-client interest rates are designed to be higher in the early days of the network to incentivize participation and jumpstart the network economy. This mining-pool provided interest rate will reduce over time until a network-chosen baseline value is reached. This is a fixed, long-term, interest rate to be provided to validator-clients. This value does not represent the total interest available to validator-clients as transaction fees for both state-validation and ledger storage replication (PoReps) are not accounted for here. A validation-client interest rate schedule as a function of % network staked and time is shown in** Figure 2**.

drawing

Figure 2: In this example schedule, the annual interest rate [%] reduces at around 16.7% per year, until it reaches the long-term, fixed, 4% rate.

This epoch-specific protocol-defined interest rate sets an upper limit of protocol-generated annual interest rate (not absolute total interest rate) possible to be delivered to any validator-client per epoch. The distributed interest rate per epoch is then discounted from this value based on the participation of the validator-client during the previous epoch. Each epoch is comprised of XXX slots. The protocol-defined interest rate is then discounted by the log [TBD] of the % of slots a given validator submitted a vote on a PoH branch during that epoch, see Figure XX

State-validation Transaction Fees

Each message sent through the network, to be processed by the current leader validation-client and confirmed as a global state transaction, must contain a transaction fee. Transaction fees offer many benefits in the Solana economic design, for example they:

  • provide unit compensation to the validator network for the CPU/GPU resources necessary to process the state transaction,

  • reduce network spam by introducing real cost to transactions,

  • open avenues for a transaction market to incentivize validation-client to collect and process submitted transactions in their function as leader,

  • and provide potential long-term economic stability of the network through a protocol-captured minimum fee amount per transaction, as described below.

Many current blockchain economies (e.g. Bitcoin, Ethereum), rely on protocol-based rewards to support the economy in the short term, with the assumption that the revenue generated through transaction fees will support the economy in the long term, when the protocol derived rewards expire. In an attempt to create a sustainable economy through protocol-based rewards and transaction fees, a fixed portion of each transaction fee is sent to the mining pool, with the resulting fee going to the current leader processing the transaction. These pooled fees, then re-enter the system through rewards distributed to validation-clients, through the process described above, and replication-clients, as discussed below.

The intent of this design is to retain leader incentive to include as many transactions as possible within the leader-slot time, while providing a redistribution avenue that protects against "tax evasion" attacks (i.e. side-channel fee payments)1. Constraints on the fixed portion of transaction fees going to the mining pool, to establish long-term economic sustainability, are established and discussed in detail in the Economic Sustainability section.

This minimum, protocol-earmarked, portion of each transaction fee can be dynamically adjusted depending on historical gas usage. In this way, the protocol can use the minimum fee to target a desired hardware utilisation. By monitoring a protocol specified gas usage with respect to a desired, target usage amount (e.g. 50% of a block's capacity), the minimum fee can be raised/lowered which should, in turn, lower/raise the actual gas usage per block until it reaches the target amount. This adjustment process can be thought of as similar to the difficulty adjustment algorithm in the Bitcoin protocol, however in this case it is adjusting the minimum transaction fee to guide the transaction processing hardware usage to a desired level.

Additionally, the minimum protocol captured fee can be a consideration in fork selection. In the case of a PoH fork with a malicious, censoring leader, we would expect the total procotol captured fee to be less than a comparable honest fork, due to the fees lost from censoring. If the censoring leader is to compensate for these lost protocol fees, they would have to replace the fees on their fork themselves, thus potentially reducing the incentive to censor in the first place.

Replication-validation Transaction Fees

As previously mentioned, validator-clients will also be responsible for validating PoReps submitted into the PoH stream by replicator-clients. In this case, validators are providing compute (CPU/GPU) and light storage resources to confirm that these replication proofs could only be generated by a client that is storing the referenced PoH leger block.2

While replication-clients are incentivized and rewarded through protocol-based rewards schedule (see Replication-client Economics), validator-clients will be incentivized to include and validate PoReps in PoH through the distribution of the transaction fees associated with the submitted PoRep. As will be described in detail in the Section 3.1, replication-client rewards are protocol-based and designed to reward based on a global data redundancy factor. I.e. the protocol will incentivize replication-client participation through rewards based on a target ledger redundancy (e.g. 10x data redundancy). It was chosen not to include a distribution of these rewards to PoRep validators, and to rely only on the collection of PoRep attached transaction fees, due to the fact that the confluence of two participation incentive modes (state-validation inflation rate via global staked % and replication-validation rewards based on global redundancy factor) on the incentives of a single network participant (a validator-client) potentially opened up a significant incentive-driven attack surface area.

The validation of PoReps by validation-clients is computationally more expensive than state-validation (detail in the Economic Sustainability chapter), thus the transaction fees are expected to be proportionally higher. However, because replication-client rewards are distributed in proportion to and only after submitted PoReps are validated, they are uniquely motivated for the inclusion and validation of their proofs. This pressure is expected to generate an adequate market economy between replication-clients and validation-clients. Additionally, transaction fees submitted with PoReps have no minimum amount pre-allocated to the mining pool, as do state-validation transaction fees.

There are various attack vectors available for colluding validation and replication clients, as described in detail below in Economic Sustainability. To protect against various collusion attack vectors, for a given epoch, PoRep transaction fees are pooled, and redistributed across participating validation-clients in proportion to the number of validated PoReps in the epoch less the number of invalidated PoReps [DIAGRAM]. This design rewards validators proportional to the number of PoReps they process and validate, while providing negative pressure for validation-clients to submit lazy or malicious invalid votes on submitted PoReps (note that it is computationally prohibitive to determine whether a validator-client has marked a valid PoRep as invalid).

Validation Stake Delegation

Running a Solana validation-client required relatively modest upfront hardware capital investment. Table 2 provides an example hardware configuration to support ~1M tx/s with estimated ‘off-the-shelf’ costs:

ComponentExampleEstimated Cost
GPU2x 2080 Ti$2500
or4x 1080 Ti$2800
OS/Ledger StorageSamsung 860 Evo 2TB$370
Accounts storage2x Samsung 970 Pro M.2 512GB$340
RAM32 Gb$300
MotherboardAMD x399$400
CPUAMD Threadripper 2920x$650
Case$100
Power supplyEVGA 1600W$300
Network> 500 mbps
Network (1)Google webpass business bay area 1gbps unlimited$5500/mo
Network (2)Hurricane Electric bay area colo 1gbps$500/mo
Table 2 example high-end hardware setup for running a Solana client.

Despite the low-barrier to entry as a validation-client, from a capital investment perspective, as in any developing economy, there will be much opportunity and need for trusted validation services as evidenced by node reliability, UX/UI, APIs and other software accessibility tools. Additionally, although Solana’s validator node startup costs are nominal when compared to similar networks, they may still be somewhat restrictive for some potential participants. In the spirit of developing a true decentralized, permissionless network, these interested parties still have two options to become involved in the Solana network/economy:

  1. Delegation of previously acquired tokens with a reliable validation node to earn a portion of interest generated

  2. Provide local storage space as a replication-client and receive rewards by submitting Proof-of-Replication (see Replication-client Economics).

    a. This participant has the additional option to directly delegate their earned storage rewards (Replication-client Reward Auto-delegation)

Delegation of tokens to validation-clients, via option 1, provides a way for passive Solana token holders to become part of the active Solana economy and earn interest rates proportional to the interest rate generated by the delegated validation-client. Additionally, this feature creates a healthy validation-client market, with potential validation-client nodes competing to build reliable, transparent and profitable delegation services.

Replication-client economics

Replication-clients should be rewarded for providing the network with storage space. Incentivization of the set of replicators provides data security through redundancy of the historical ledger. Replication nodes are rewarded in proportion to the amount of ledger data storage provided. These rewards are captured by generating and entering Proofs of Replication (PoReps) into the PoH stream which can be validated by Validation nodes as described above in the Replication-validation Transaction Fees chapter.

Storage-replication Rewards

Replicator-clients download, encrypt and submit PoReps for ledger block sections.3 PoReps submitted to the PoH stream, and subsequently validated, function as evidence that the submitting replicator client is indeed storing the assigned ledger block sections on local hard drive space as a service to the network. Therefore, replicator clients should earn protocol rewards proportional to the amount of storage, and the number of successfully validated PoReps, that they are verifiably providing to the network.

Additionally, replicator clients have the opportunity to capture a portion of slashed bounties [TBD] of dishonest validator clients. This can be accomplished by a replicator client submitting a verifiably false PoRep for which a dishonest validator client receives and signs as a valid PoRep. This reward incentive is to prevent lazy validators and minimize validator-replicator collusion attacks, more on this below.

Replication-client Reward Auto-delegation

The ability for Solana network participant’s to earn rewards by providing storage service is a unique on-boarding path that requires little hardware overhead and minimal upfront capital. It offers an avenue for individuals with extra-storage space on their home laptops or PCs to contribute to the security of the network and become integrated into the Solana economy.

To enhance this on-boarding ramp and facilitate further participation and investment in the Solana economy, replication-clients have the opportunity to auto-delegate their rewards to validation-clients of their choice. Much like the automatic reinvestment of stock dividends, in this scenario, a replicator-client can earn Solana tokens by providing some storage capacity to the network (i.e. via submitting valid PoReps), have the protocol-based rewards automatically assigned as delegation to a staked validator node and therefore earning interest in the validation-client reward pool.

Economic Sustainability

Long term economic sustainability is one of the guiding principles of Solana’s economic design. While it is impossible to predict how decentralized economies will develop over time, especially economies with flexible decentralized governances, we can arrange economic components such that, under certain conditions, a sustainable economy may take shape in the long term. In the case of Solana’s network, these components take the form of the remittances and deposits into and out of the reserve ‘mining pool’.

The dominant remittances from the Solana mining pool are validator and replicator rewards. The deposit mechanism is a flat, protocol-specified and adjusted, % of each transaction fee.

The Replicator rewards are to be delivered to replicators from the mining pool after successful PoRep validation. The per-PoRep reward amount is determined as a function of the total network storage redundancy at the time of the PoRep validation and the network goal redundancy. This function is likely to take the form of a discount from a base reward to be delivered when the network has achieved and maintained its goal redundancy. An example of such a reward function is shown in Figure 3

==PoRep Reward Curve ==

Figure 3: Example PoRep reward design as a function of global network storage redundancy.

In the example shown in Figure 1, multiple per PoRep base rewards are explored (as a % of Tx Fee) to be delivered when the global ledger replication redundancy meets 10X. When the global ledger replication redundancy is less than 10X, the base reward is discounted as a function of the square of the ratio of the actual ledger replication redundancy to the goal redundancy (i.e. 10X).

The other protocol-based remittance goes to validation-clients as a reward distributed in proportion to stake-weight for voting to validate the ledger state. The functional issuance of this reward is described in State-validation Protocol-based Rewards and is designed to reduce over time until validators are incentivized solely through collection of transaction fees. Therefore, in the long-run, protocol-based rewards to replication-nodes will be the only remittances from the mining pool, and will have to be countered by the portion of each non-PoRep transaction fee that is directed back into the mining pool. I.e. for a long-term self-sustaining economy, replicator-client rewards must be subsidized through a minimum fee on each non-PoRep transaction pre-allocated to the mining pool. Through this constraint, we can write the following inequality:

== WIP here ==

Attack Vectors

Colluding validation and replication clients

A colluding validation-client, may take the strategy to mark PoReps from non-colluding replicator nodes as invalid as an attempt to maximize the rewards for the colluding replicator nodes. In this case, it isn’t feasible for the offended-against replicator nodes to petition the network for resolution as this would result in a network-wide vote on each offending PoRep and create too much overhead for the network to progress adequately. Also, this mitigation attempt would still be vulnerable to a >= 51% staked colluder.

Alternatively, transaction fees from submitted PoReps are pooled and distributed across validation-clients in proportion to the number of valid PoReps discounted by the number of invalid PoReps as voted by each validator-client. Thus invalid votes are directly dis-incentivized through this reward channel. Invalid votes that are revealed by replicator nodes as fishing PoReps, will not be discounted from the payout PoRep count.

Another collusion attack involves a validator-client who may take the strategy to ignore invalid PoReps from colluding replicator and vote them as valid. In this case, colluding replicator-clients would not have to store the data while still receiving rewards for validated PoReps. Additionally, colluding validator nodes would also receive rewards for validating these PoReps. To mitigate this attack, validators must randomly sample PoReps corresponding to the ledger block they are validating and because of this, there will be multiple validators that will receive the colluding replicator’s invalid submissions. These non-colluding validators will be incentivized to mark these PoReps as invalid as they have no way to determine whether the proposed invalid PoRep is actually a fishing PoRep, for which a confirmation vote would result in the validator’s stake being slashed.

In this case, the proportion of time a colluding pair will be successful has an upper limit determined by the % of stake of the network claimed by the colluding validator. This also sets bounds to the value of such an attack. For example, if a colluding validator controls 10% of the total validator stake, transaction fees will be lost (likely sent to mining pool) by the colluding replicator 90% of the time and so the attack vector is only profitable if the per-PoRep reward at least 90% higher than the average PoRep transaction fee. While, probabilistically, some colluding replicator-client PoReps will find their way to colluding validation-clients, the network can also monitor rates of paired (validator + replicator) discrepancies in voting patterns and censor identified colluders in these cases.

Proposed MVP of Economic Design

The preceeding sections, outlined in the Economic Design Overview, describe a long-term vision of a sustainable Solana economy. Of course, we don't expect the final implementation to perfectly match what has been described above. We intend to fully engage with network stakeholders throughout the implementation phases (i.e. pre-testnet, testnet, mainnet) to ensure the system supports, and is representative of, the various network participants' interests. The first step toward this goal, however, is outlining a some desired MVP economic features to be available for early pre-testnet and testnet participants. Below is a rough sketch outlining basic economic functionality from which a more complete and functional system can be developed.

MVP Economic Features

  • Faucet to deliver testnet SOLs to validators for staking and dapp development.
  • Mechanism by which validators are rewarded in proportion to their stake. Interest rate mechansism (i.e. to be determined by total % staked) to come later.
  • Ability to delegate tokens to validator nodes.
  • Replicators to receive fixed, arbitrary reward for submitting validated PoReps. Reward size mechanism (i.e. PoRep reward as a function of total ledger redundancy) to come later.
  • Pooling of replicator PoRep transaction fees and weighted distribution to validators based on PoRep verification (see Replication-validation Transaction Fees. It will be useful to test this protection against attacks on testnet.
  • Nice-to-have: auto-delegation of replicator rewards to validator.

References

  1. https://blog.ethereum.org/2016/07/27/inflation-transaction-fees-cryptocurrency-monetary-policy/

  2. https://medium.com/solana-labs/how-to-create-decentralized-storage-for-a-multi-petabyte-digital-ledger-2499a3a8c281

  3. https://medium.com/solana-labs/how-to-create-decentralized-storage-for-a-multi-petabyte-digital-ledger-2499a3a8c281

Cluster Test Framework

This document proposes the Cluster Test Framework (CTF). CTF is a test harness that allows tests to execute against a local, in-process cluster or a deployed cluster.

Motivation

The goal of CTF is to provide a framework for writing tests independent of where and how the cluster is deployed. Regressions can be captured in these tests and the tests can be run against deployed clusters to verify the deployment. The focus of these tests should be on cluster stability, consensus, fault tolerance, API stability.

Tests should verify a single bug or scenario, and should be written with the least amount of internal plumbing exposed to the test.

Design Overview

Tests are provided an entry point, which is a contact_info::ContactInfo structure, and a keypair that has already been funded.

Each node in the cluster is configured with a fullnode::ValidatorConfig at boot time. At boot time this configuration specifies any extra cluster configuration required for the test. The cluster should boot with the configuration when it is run in-process or in a data center.

Once booted, the test will discover the cluster through a gossip entry point and configure any runtime behaviors via fullnode RPC.

Test Interface

Each CTF test starts with an opaque entry point and a funded keypair. The test should not depend on how the cluster is deployed, and should be able to exercise all the cluster functionality through the publicly available interfaces.

use crate::contact_info::ContactInfo;
use solana_sdk::signature::{Keypair, KeypairUtil};
pub fn test_this_behavior(
    entry_point_info: &ContactInfo,
    funding_keypair: &Keypair,
    num_nodes: usize,
)

Cluster Discovery

At test start, the cluster has already been established and is fully connected. The test can discover most of the available nodes over a few second.

use crate::gossip_service::discover_nodes;

// Discover the cluster over a few seconds.
let cluster_nodes = discover_nodes(&entry_point_info, num_nodes);

Cluster Configuration

To enable specific scenarios, the cluster needs to be booted with special configurations. These configurations can be captured in fullnode::ValidatorConfig.

For example:

let mut validator_config = ValidatorConfig::default();
validator_config.rpc_config.enable_fullnode_exit = true;
let local = LocalCluster::new_with_config(
                num_nodes,
                10_000,
                100,
                &validator_config
                );

How to design a new test

For example, there is a bug that shows that the cluster fails when it is flooded with invalid advertised gossip nodes. Our gossip library and protocol may change, but the cluster still needs to stay resilient to floods of invalid advertised gossip nodes.

Configure the RPC service:

let mut validator_config = ValidatorConfig::default();
validator_config.rpc_config.enable_rpc_gossip_push = true;
validator_config.rpc_config.enable_rpc_gossip_refresh_active_set = true;

Wire the RPCs and write a new test:

pub fn test_large_invalid_gossip_nodes(
    entry_point_info: &ContactInfo,
    funding_keypair: &Keypair,
    num_nodes: usize,
) {
    let cluster = discover_nodes(&entry_point_info, num_nodes);

    // Poison the cluster.
    let client = create_client(entry_point_info.client_facing_addr(), FULLNODE_PORT_RANGE);
    for _ in 0..(num_nodes * 100) {
        client.gossip_push(
            cluster_info::invalid_contact_info()
        );
    }
    sleep(Durration::from_millis(1000));

    // Force refresh of the active set.
    for node in &cluster {
        let client = create_client(node.client_facing_addr(), FULLNODE_PORT_RANGE);
        client.gossip_refresh_active_set();
    }

    // Verify that spends still work.
    verify_spends(&cluster);
}

Anatomy of a Validator

History

When we first started Solana, the goal was to de-risk our TPS claims. We knew that between optimistic concurrency control and sufficiently long leader slots, that PoS consensus was not the biggest risk to TPS. It was GPU-based signature verification, software pipelining and concurrent banking. Thus, the TPU was born. After topping 100k TPS, we split the team into one group working toward 710k TPS and another to flesh out the validator pipeline. Hence, the TVU was born. The current architecture is a consequence of incremental development with that ordering and project priorities. It is not a reflection of what we ever believed was the most technically elegant cross-section of those technologies. In the context of leader rotation, the strong distinction between leading and validating is blurred.

Difference between validating and leading

The fundamental difference between the pipelines is when the PoH is present. In a leader, we process transactions, removing bad ones, and then tag the result with a PoH hash. In the validator, we verify that hash, peel it off, and process the transactions in exactly the same way. The only difference is that if a validator sees a bad transaction, it can't simply remove it like the leader does, because that would cause the PoH hash to change. Instead, it rejects the whole block. The other difference between the pipelines is what happens after banking. The leader broadcasts entries to downstream validators whereas the validator will have already done that in RetransmitStage, which is a confirmation time optimization. The validation pipeline, on the other hand, has one last step. Any time it finishes processing a block, it needs to weigh any forks it's observing, possibly cast a vote, and if so, reset its PoH hash to the block hash it just voted on.

Proposed Design

We unwrap the many abstraction layers and build a single pipeline that can toggle leader mode on whenever the validator's ID shows up in the leader schedule.

Validator block diagram

Notable changes

  • No threads are shut down to switch out of leader mode. Instead, FetchStage should forward transactions to the next leader.
  • Hoist FetchStage and BroadcastStage out of TPU
  • Blocktree renamed to Blockstore
  • BankForks renamed to Banktree
  • TPU moves to new socket-free crate called solana-tpu.
  • TPU's BankingStage absorbs ReplayStage
  • TVU goes away
  • New RepairStage absorbs Blob Fetch Stage and repair requests
  • JSON RPC Service is optional - used for debugging. It should instead be part of a separate solana-blockstreamer executable.
  • New MulticastStage absorbs retransmit part of RetransmitStage
  • MulticastStage downstream of Blockstore

Simple Payment and State Verification

It is often useful to allow low resourced clients to participate in a Solana cluster. Be this participation economic or contract execution, verification that a client's activity has been accepted by the network is typically expensive. This proposal lays out a mechanism for such clients to confirm that their actions have been committed to the ledger state with minimal resource expenditure and third-party trust.

A Naive Approach

Validators store the signatures of recently confirmed transactions for a short period of time to ensure that they are not processed more than once. Validators provide a JSON RPC endpoint, which clients can use to query the cluster if a transaction has been recently processed. Validators also provide a PubSub notification, whereby a client registers to be notified when a given signature is observed by the validator. While these two mechanisms allow a client to verify a payment, they are not a proof and rely on completely trusting a fullnode.

We will describe a way to minimize this trust using Merkle Proofs to anchor the fullnode's response in the ledger, allowing the client to confirm on their own that a sufficient number of their preferred validators have confirmed a transaction. Requiring multiple validator attestations further reduces trust in the fullnode, as it increases both the technical and economic difficulty of compromising several other network participants.

Light Clients

A 'light client' is a cluster participant that does not itself run a fullnode. This light client would provide a level of security greater than trusting a remote fullnode, without requiring the light client to spend a lot of resources verifying the ledger.

Rather than providing transaction signatures directly to a light client, the fullnode instead generates a Merkle Proof from the transaction of interest to the root of a Merkle Tree of all transactions in the including block. This Merkle Root is stored in a ledger entry which is voted on by validators, providing it consensus legitimacy. The additional level of security for a light client depends on an initial canonical set of validators the light client considers to be the stakeholders of the cluster. As that set is changed, the client can update its internal set of known validators with receipts. This may become challenging with a large number of delegated stakes.

Fullnodes themselves may want to use light client APIs for performance reasons. For example, during the initial launch of a fullnode, the fullnode may use a cluster provided checkpoint of the state and verify it with a receipt.

Receipts

A receipt is a minimal proof that; a transaction has been included in a block, that the block has been voted on by the client's preferred set of validators and that the votes have reached the desired confirmation depth.

The receipts for both state and payments start with a Merkle Path from the value into a Bank-Merkle that has been voted on and included in the ledger. A chain of PoH Entries containing subsequent validator votes, deriving from the Bank-Merkle, is the confirmation proof.

Clients can examine this ledger data and compute the finality using Solana's fork selection rules.

Payment Merkle Path

A payment receipt is a data structure that contains a Merkle Path from a transaction to the required set of validator votes.

An Entry-Merkle is a Merkle Root including all transactions in the entry, sorted by signature.

Block Merkle Diagram

A Block-Merkle is a Merkle root of all the Entry-Merkles sequenced in the block. Transaction status is necessary for the receipt because the state receipt is constructed for the block. Two transactions over the same state can appear in the block, and therefore, there is no way to infer from just the state whether a transaction that is committed to the ledger has succeeded or failed in modifying the intended state. It may not be necessary to encode the full status code, but a single status bit to indicate the transaction's success.

State Merkle Path

A state receipt provides a confirmation that a specific state is committed at the end of the block. Inter-block state transitions do not generate a receipt.

For example:

  • A sends 5 Lamports to B
  • B spends 5 Lamports
  • C sends 5 Lamports to A

At the end of the block, A and B are in the exact same starting state, and any state receipt would point to the same value for A or B.

The Bank-Merkle is computed from the Merkle Tree of the new state changes, along with the Previous Bank-Merkle, and the Block-Merkle.

Bank Merkle Diagram

A state receipt contains only the state changes occurring in the block. A direct Merkle Path to the current Bank-Merkle guarantees the state value at that bank hash, but it cannot be used to generate a “current” receipt to the latest state if the state modification occurred in some previous block. There is no guarantee that the path provided by the validator is the latest one available out of all the previous Bank-Merkles.

Clients that want to query the chain for a receipt of the "latest" state would need to create a transaction that would update the Merkle Path for that account, such as a credit of 0 Lamports.

Validator Votes

Leaders should coalesce the validator votes by stake weight into a single entry. This will reduce the number of entries necessary to create a receipt.

Chain of Entries

A receipt has a PoH link from the payment or state Merkle Path root to a list of consecutive validation votes.

It contains the following:

  • State -> Bank-Merkle or
  • Transaction -> Entry-Merkle -> Block-Merkle -> Bank-Merkle

And a vector of PoH entries:

  • Validator vote entries
  • Ticks
  • Light entries
/// This Entry definition skips over the transactions and only contains the
/// hash of the transactions used to modify PoH.
LightEntry {
    /// The number of hashes since the previous Entry ID.
    pub num_hashes: u64,
    /// The SHA-256 hash `num_hashes` after the previous Entry ID.
    hash: Hash,
    /// The Merkle Root of the transactions encoded into the Entry.
    entry_hash: Hash,
}

The light entries are reconstructed from Entries and simply show the entry Merkle Root that was mixed in to the PoH hash, instead of the full transaction set.

Clients do not need the starting vote state. The fork selection algorithm is defined such that only votes that appear after the transaction provide finality for the transaction, and finality is independent of the starting state.

Verification

A light client that is aware of the supermajority set validators can verify a receipt by following the Merkle Path to the PoH chain. The Bank-Merkle is the Merkle Root and will appear in votes included in an Entry. The light client can simulate fork selection for the consecutive votes and verify that the receipt is confirmed at the desired lockout threshold.

Synthetic State

Synthetic state should be computed into the Bank-Merkle along with the bank generated state.

For example:

  • Epoch validator accounts and their stakes and weights.
  • Computed fee rates

These values should have an entry in the Bank-Merkle. They should live under known accounts, and therefore have an exact address in the Merkle Path.

Cross-Program Invocation

Problem

In today's implementation a client can create a transaction that modifies two accounts, each owned by a separate on-chain program:

let message = Message::new(vec![
    token_instruction::pay(&alice_pubkey),
    acme_instruction::launch_missiles(&bob_pubkey),
]);
client.send_message(&[&alice_keypair, &bob_keypair], &message);

The current implementation does not, however, allow the acme program to conveniently invoke token instructions on the client's behalf:

let message = Message::new(vec![
    acme_instruction::pay_and_launch_missiles(&alice_pubkey, &bob_pubkey),
]);
client.send_message(&[&alice_keypair, &bob_keypair], &message);

Currently, there is no way to create instruction pay_and_launch_missiles that executes token_instruction::pay from the acme program. The workaround is to extend the acme program with the implementation of the token program, and create token accounts with ACME_PROGRAM_ID, which the acme program is permitted to modify. With that workaround, acme can modify token-like accounts created by the acme program, but not token accounts created by the token program.

Proposed Solution

The goal of this design is to modify Solana's runtime such that an on-chain program can invoke an instruction from another program.

Given two on-chain programs token and acme, each implementing instructions pay() and launch_missiles() respectively, we would ideally like to implement the acme module with a call to a function defined in the token module:

use token;

fn launch_missiles(keyed_accounts: &[KeyedAccount]) -> Result<()> {
    ...
}

fn pay_and_launch_missiles(keyed_accounts: &[KeyedAccount]) -> Result<()> {
    token::pay(&keyed_accounts[1..])?;

    launch_missiles(keyed_accounts)?;
}

The above code would require that the token crate be dynamically linked, so that a custom linker could intercept calls and validate accesses to keyed_accounts. That is, even though the client intends to modify both token and acme accounts, only token program is permitted to modify the token account, and only the acme program is permitted to modify the acme account.

Backing off from that ideal cross-program call, a slightly more verbose solution is to expose token's existing process_instruction() entrypoint to the acme program:

use token_instruction;

fn launch_missiles(keyed_accounts: &[KeyedAccount]) -> Result<()> {
    ...
}

fn pay_and_launch_missiles(keyed_accounts: &[KeyedAccount]) -> Result<()> {
    let alice_pubkey = keyed_accounts[1].key;
    let instruction = token_instruction::pay(&alice_pubkey);
    process_instruction(&instruction)?;

    launch_missiles(keyed_accounts)?;
}

where process_instruction() is built into Solana's runtime and responsible for routing the given instruction to the token program via the instruction's program_id field. Before invoking pay(), the runtime must also ensure that acme didn't modify any accounts owned by token. It does this by calling runtime::verify_instruction() and then afterward updating all the pre_* variables to tentatively commit acme's account modifications. After pay() completes, the runtime must again ensure that token didn't modify any accounts owned by acme. It should call verify_instruction() again, but this time with the token program ID. Lastly, after pay_and_launch_missiles() completes, the runtime must call verify_instruction() one more time, where it normally would, but using all updated pre_* variables. If executing pay_and_launch_missiles() up to pay() made no invalid account changes, pay() made no invalid changes, and executing from pay() until pay_and_launch_missiles() returns made no invalid changes, then the runtime can transitively assume pay_and_launch_missiles() as whole made no invalid account changes, and therefore commit all account modifications.

Setting KeyedAccount.is_signer

When process_instruction() is invoked, the runtime must create a new KeyedAccounts parameter using the signatures from the original transaction data. Since the token program is immutable and existed on-chain prior to the acme program, the runtime can safely treat the transaction signature as a signature of a transaction with a token instruction. When the runtime sees the given instruction references alice_pubkey, it looks up the key in the transaction to see if that key corresponds to a transaction signature. In this case it does and so sets KeyedAccount.is_signer, thereby authorizing the token program to modify Alice's account.

Implemented Design Proposals

The following design proposals are fully implemented.

Blocktree

After a block reaches finality, all blocks from that one on down to the genesis block form a linear chain with the familiar name blockchain. Until that point, however, the validator must maintain all potentially valid chains, called forks. The process by which forks naturally form as a result of leader rotation is described in fork generation. The blocktree data structure described here is how a validator copes with those forks until blocks are finalized.

The blocktree allows a validator to record every blob it observes on the network, in any order, as long as the blob is signed by the expected leader for a given slot.

Blobs are moved to a fork-able key space the tuple of leader slot + blob index (within the slot). This permits the skip-list structure of the Solana protocol to be stored in its entirety, without a-priori choosing which fork to follow, which Entries to persist or when to persist them.

Repair requests for recent blobs are served out of RAM or recent files and out of deeper storage for less recent blobs, as implemented by the store backing Blocktree.

Functionalities of Blocktree

  1. Persistence: the Blocktree lives in the front of the nodes verification pipeline, right behind network receive and signature verification. If the blob received is consistent with the leader schedule (i.e. was signed by the leader for the indicated slot), it is immediately stored.
  2. Repair: repair is the same as window repair above, but able to serve any blob that's been received. Blocktree stores blobs with signatures, preserving the chain of origination.
  3. Forks: Blocktree supports random access of blobs, so can support a validator's need to rollback and replay from a Bank checkpoint.
  4. Restart: with proper pruning/culling, the Blocktree can be replayed by ordered enumeration of entries from slot 0. The logic of the replay stage (i.e. dealing with forks) will have to be used for the most recent entries in the Blocktree.

Blocktree Design

  1. Entries in the Blocktree are stored as key-value pairs, where the key is the concatenated slot index and blob index for an entry, and the value is the entry data. Note blob indexes are zero-based for each slot (i.e. they're slot-relative).

  2. The Blocktree maintains metadata for each slot, in the SlotMeta struct containing:

    • slot_index - The index of this slot
    • num_blocks - The number of blocks in the slot (used for chaining to a previous slot)
    • consumed - The highest blob index n, such that for all m < n, there exists a blob in this slot with blob index equal to n (i.e. the highest consecutive blob index).
    • received - The highest received blob index for the slot
    • next_slots - A list of future slots this slot could chain to. Used when rebuilding the ledger to find possible fork points.
    • last_index - The index of the blob that is flagged as the last blob for this slot. This flag on a blob will be set by the leader for a slot when they are transmitting the last blob for a slot.
    • is_rooted - True iff every block from 0...slot forms a full sequence without any holes. We can derive is_rooted for each slot with the following rules. Let slot(n) be the slot with index n, and slot(n).is_full() is true if the slot with index n has all the ticks expected for that slot. Let is_rooted(n) be the statement that "the slot(n).is_rooted is true". Then:

    is_rooted(0) is_rooted(n+1) iff (is_rooted(n) and slot(n).is_full()

  3. Chaining - When a blob for a new slot x arrives, we check the number of blocks (num_blocks) for that new slot (this information is encoded in the blob). We then know that this new slot chains to slot x - num_blocks.

  4. Subscriptions - The Blocktree records a set of slots that have been "subscribed" to. This means entries that chain to these slots will be sent on the Blocktree channel for consumption by the ReplayStage. See the Blocktree APIs for details.

  5. Update notifications - The Blocktree notifies listeners when slot(n).is_rooted is flipped from false to true for any n.

Blocktree APIs

The Blocktree offers a subscription based API that ReplayStage uses to ask for entries it's interested in. The entries will be sent on a channel exposed by the Blocktree. These subscription API's are as follows:

  1. fn get_slots_since(slot_indexes: &[u64]) -> Vec<SlotMeta>: Returns new slots connecting to any element of the list slot_indexes.

  2. fn get_slot_entries(slot_index: u64, entry_start_index: usize, max_entries: Option<u64>) -> Vec<Entry>: Returns the entry vector for the slot starting with entry_start_index, capping the result at max if max_entries == Some(max), otherwise, no upper limit on the length of the return vector is imposed.

Note: Cumulatively, this means that the replay stage will now have to know when a slot is finished, and subscribe to the next slot it's interested in to get the next set of entries. Previously, the burden of chaining slots fell on the Blocktree.

Interfacing with Bank

The bank exposes to replay stage:

  1. prev_hash: which PoH chain it's working on as indicated by the hash of the last entry it processed

  2. tick_height: the ticks in the PoH chain currently being verified by this bank

  3. votes: a stack of records that contain:

    1. prev_hashes: what anything after this vote must chain to in PoH
    2. tick_height: the tick height at which this vote was cast
    3. lockout period: how long a chain must be observed to be in the ledger to be able to be chained below this vote

Replay stage uses Blocktree APIs to find the longest chain of entries it can hang off a previous vote. If that chain of entries does not hang off the latest vote, the replay stage rolls back the bank to that vote and replays the chain from there.

Pruning Blocktree

Once Blocktree entries are old enough, representing all the possible forks becomes less useful, perhaps even problematic for replay upon restart. Once a validator's votes have reached max lockout, however, any Blocktree contents that are not on the PoH chain for that vote for can be pruned, expunged.

Replicator nodes will be responsible for storing really old ledger contents, and validators need only persist their bank periodically.

Cluster Software Installation and Updates

Currently users are required to build the solana cluster software themselves from the git repository and manually update it, which is error prone and inconvenient.

This document proposes an easy to use software install and updater that can be used to deploy pre-built binaries for supported platforms. Users may elect to use binaries supplied by Solana or any other party they trust. Deployment of updates is managed using an on-chain update manifest program.

Motivating Examples

Fetch and run a pre-built installer using a bootstrap curl/shell script

The easiest install method for supported platforms:

$ curl -sSf https://raw.githubusercontent.com/solana-labs/solana/v0.16.0/install/solana-install-init.sh | sh

This script will check github for the latest tagged release and download and run the solana-install-init binary from there.

If additional arguments need to be specified during the installation, the following shell syntax is used:

$ init_args=.... # arguments for `solana-install-init ...`
$ curl -sSf https://raw.githubusercontent.com/solana-labs/solana/v0.16.0/install/solana-install-init.sh | sh -s - ${init_args}

Fetch and run a pre-built installer from a Github release

With a well-known release URL, a pre-built binary can be obtained for supported platforms:

$ curl -o solana-install-init https://github.com/solana-labs/solana/releases/download/v0.16.0/solana-install-init-x86_64-apple-darwin
$ chmod +x ./solana-install-init
$ ./solana-install-init --help

Build and run the installer from source

If a pre-built binary is not available for a given platform, building the installer from source is always an option:

$ git clone https://github.com/solana-labs/solana.git
$ cd solana/install
$ cargo run -- --help

Deploy a new update to a cluster

Given a solana release tarball (as created by ci/publish-tarball.sh) that has already been uploaded to a publicly accessible URL, the following commands will deploy the update:

$ solana-keygen new -o update-manifest.json  # <-- only generated once, the public key is shared with users
$ solana-install deploy http://example.com/path/to/solana-release.tar.bz2 update-manifest.json

Run a validator node that auto updates itself

$ solana-install init --pubkey 92DMonmBYXwEMHJ99c9ceRSpAmk9v6i3RdvDdXaVcrfj  # <-- pubkey is obtained from whoever is deploying the updates
$ export PATH=~/.local/share/solana-install/bin:$PATH
$ solana-keygen ...  # <-- runs the latest solana-keygen
$ solana-install run solana-validator ...  # <-- runs a validator, restarting it as necesary when an update is applied

On-chain Update Manifest

An update manifest is used to advertise the deployment of new release tarballs on a solana cluster. The update manifest is stored using the config program, and each update manifest account describes a logical update channel for a given target triple (eg, x86_64-apple-darwin). The account public key is well-known between the entity deploying new updates and users consuming those updates.

The update tarball itself is hosted elsewhere, off-chain and can be fetched from the specified download_url.

use solana_sdk::signature::Signature;

/// Information required to download and apply a given update
pub struct UpdateManifest {
    pub timestamp_secs: u64, // When the release was deployed in seconds since UNIX EPOCH
    pub download_url: String, // Download URL to the release tar.bz2
    pub download_sha256: String, // SHA256 digest of the release tar.bz2 file
}

/// Userdata of an Update Manifest program Account.
#[derive(Serialize, Deserialize, Default, Debug, PartialEq)]
pub struct SignedUpdateManifest {
    pub manifest: UpdateManifest,
    pub manifest_signature: Signature,
}

Note that the manifest field itself contains a corresponding signature (manifest_signature) to guard against man-in-the-middle attacks between the solana-install tool and the solana cluster RPC API.

To guard against rollback attacks, solana-install will refuse to install an update with an older timestamp_secs than what is currently installed.

Release Archive Contents

A release archive is expected to be a tar file compressed with bzip2 with the following internal structure:

  • /version.yml - a simple YAML file containing the field "target" - the target tuple. Any additional fields are ignored.
  • /bin/ -- directory containing available programs in the release. solana-install will symlink this directory to ~/.local/share/solana-install/bin for use by the PATH environment variable.
  • ... -- any additional files and directories are permitted

solana-install Tool

The solana-install tool is used by the user to install and update their cluster software.

It manages the following files and directories in the user's home directory:

  • ~/.config/solana/install/config.yml - user configuration and information about currently installed software version
  • ~/.local/share/solana/install/bin - a symlink to the current release. eg, ~/.local/share/solana-update/<update-pubkey>-<manifest_signature>/bin
  • ~/.local/share/solana/install/releases/<download_sha256>/ - contents of a release

Command-line Interface

solana-install 0.16.0
The solana cluster software installer

USAGE:
    solana-install [OPTIONS] <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -c, --config <PATH>    Configuration file to use [default: /Users/mvines/Library/Preferences/solana/install.yml]

SUBCOMMANDS:
    deploy    deploys a new update
    help      Prints this message or the help of the given subcommand(s)
    info      displays information about the current installation
    init      initializes a new installation
    run       Runs a program while periodically checking and applying software updates
    update    checks for an update, and if available downloads and applies it
solana-install-init
initializes a new installation

USAGE:
    solana-install init [OPTIONS]

FLAGS:
    -h, --help    Prints help information

OPTIONS:
    -d, --data_dir <PATH>    Directory to store install data [default: /Users/mvines/Library/Application Support/solana]
    -u, --url <URL>          JSON RPC URL for the solana cluster [default: http://testnet.solana.com:8899]
    -p, --pubkey <PUBKEY>    Public key of the update manifest [default: 9XX329sPuskWhH4DQh6k16c87dHKhXLBZTL3Gxmve8Gp]
solana-install-info
displays information about the current installation

USAGE:
    solana-install info [FLAGS]

FLAGS:
    -h, --help     Prints help information
    -l, --local    only display local information, don't check the cluster for new updates
solana-install-deploy
deploys a new update

USAGE:
    solana-install deploy <download_url> <update_manifest_keypair>

FLAGS:
    -h, --help    Prints help information

ARGS:
    <download_url>               URL to the solana release archive
    <update_manifest_keypair>    Keypair file for the update manifest (/path/to/keypair.json)
solana-install-update
checks for an update, and if available downloads and applies it

USAGE:
    solana-install update

FLAGS:
    -h, --help    Prints help information
solana-install-run
Runs a program while periodically checking and applying software updates

USAGE:
    solana-install run <program_name> [program_arguments]...

FLAGS:
    -h, --help    Prints help information

ARGS:
    <program_name>            program to run
    <program_arguments>...    arguments to supply to the program

The program will be restarted upon a successful software update

Deterministic Transaction Fees

Transactions currently include a fee field that indicates the maximum fee field a slot leader is permitted to charge to process a transaction. The cluster, on the other hand, agrees on a minimum fee. If the network is congested, the slot leader may prioritize the transactions offering higher fees. That means the client won't know how much was collected until the transaction is confirmed by the cluster and the remaining balance is checked. It smells of exactly what we dislike about Ethereum's "gas", non-determinism.

Congestion-driven fees

Each validator uses signatures per slot (SPS) to estimate network congestion and SPS target to estimate the desired processing capacity of the cluster. The validator learns the SPS target from the genesis block, whereas it calculates SPS from recently processed transactions. The genesis block also defines a target lamports_per_signature, which is the fee to charge per signature when the cluster is operating at SPS target.

Calculating fees

The client uses the JSON RPC API to query the cluster for the current fee parameters. Those parameters are tagged with a blockhash and remain valid until that blockhash is old enough to be rejected by the slot leader.

Before sending a transaction to the cluster, a client may submit the transaction and fee account data to an SDK module called the fee calculator. So long as the client's SDK version matches the slot leader's version, the client is assured that its account will be changed exactly the same number of lamports as returned by the fee calculator.

Fee Parameters

In the first implementation of this design, the only fee parameter is lamports_per_signature. The more signatures the cluster needs to verify, the higher the fee. The exact number of lamports is determined by the ratio of SPS to the SPS target. At the end of each slot, the cluster lowers lamports_per_signature when SPS is below the target and raises it when above the target. The minimum value for lamports_per_signature is 50% of the target lamports_per_signature and the maximum value is 10x the target `lamports_per_signature'

Future parameters might include:

  • lamports_per_pubkey - cost to load an account
  • lamports_per_slot_distance - higher cost to load very old accounts
  • lamports_per_byte - cost per size of account loaded
  • lamports_per_bpf_instruction - cost to run a program

Attacks

Hijacking the SPS Target

A group of validators can centralize the cluster if they can convince it to raise the SPS Target above a point where the rest of the validators can keep up. Raising the target will cause fees to drop, presumably creating more demand and therefore higher TPS. If the validator doesn't have hardware that can process that many transactions that fast, its confirmation votes will eventually get so long that the cluster will be forced to boot it.

Tower BFT

This design describes Solana's Tower BFT algorithm. It addresses the following problems:

  • Some forks may not end up accepted by the super-majority of the cluster, and voters need to recover from voting on such forks.

  • Many forks may be votable by different voters, and each voter may see a different set of votable forks. The selected forks should eventually converge for the cluster.

  • Reward based votes have an associated risk. Voters should have the ability to configure how much risk they take on.

  • The cost of rollback needs to be computable. It is important to clients that rely on some measurable form of Consistency. The costs to break consistency need to be computable, and increase super-linearly for older votes.

  • ASIC speeds are different between nodes, and attackers could employ Proof of History ASICS that are much faster than the rest of the cluster. Consensus needs to be resistant to attacks that exploit the variability in Proof of History ASIC speed.

For brevity this design assumes that a single voter with a stake is deployed as an individual validator in the cluster.

Time

The Solana cluster generates a source of time via a Verifiable Delay Function we are calling Proof of History.

Proof of History is used to create a deterministic round robin schedule for all the active leaders. At any given time only 1 leader, which can be computed from the ledger itself, can propose a fork. For more details, see fork generation and leader rotation.

Lockouts

The purpose of the lockout is to force a validator to commit opportunity cost to a specific fork. Lockouts are measured in slots, and therefor represent a real-time forced delay that a validator needs to wait before breaking the commitment to a fork.

Validators that violate the lockouts and vote for a diverging fork within the lockout should be punished. The proposed punishment is to slash the validator stake if a concurrent vote within a lockout for a non-descendant fork can be proven to the cluster.

Algorithm

The basic idea to this approach is to stack consensus votes and double lockouts. Each vote in the stack is a confirmation of a fork. Each confirmed fork is an ancestor of the fork above it. Each vote has a lockout in units of slots before the validator can submit a vote that does not contain the confirmed fork as an ancestor.

When a vote is added to the stack, the lockouts of all the previous votes in the stack are doubled (more on this in Rollback). With each new vote, a validator commits the previous votes to an ever-increasing lockout. At 32 votes we can consider the vote to be at max lockout any votes with a lockout equal to or above 1<<32 are dequeued (FIFO). Dequeuing a vote is the trigger for a reward. If a vote expires before it is dequeued, it and all the votes above it are popped (LIFO) from the vote stack. The validator needs to start rebuilding the stack from that point.

Rollback

Before a vote is pushed to the stack, all the votes leading up to vote with a lower lock time than the new vote are popped. After rollback lockouts are not doubled until the validator catches up to the rollback height of votes.

For example, a vote stack with the following state:

votevote timelockoutlock expiration time
4426
3347
22810
111617

Vote 5 is at time 9, and the resulting state is

votevote timelockoutlock expiration time
59211
22810
111617

Vote 6 is at time 10

votevote timelockoutlock expiration time
610212
59413
22810
111617

At time 10 the new votes caught up to the previous votes. But vote 2 expires at 10, so the when vote 7 at time 11 is applied the votes including and above vote 2 will be popped.

votevote timelockoutlock expiration time
711213
111617

The lockout for vote 1 will not increase from 16 until the stack contains 5 votes.

Slashing and Rewards

Validators should be rewarded for selecting the fork that the rest of the cluster selected as often as possible. This is well-aligned with generating a reward when the vote stack is full and the oldest vote needs to be dequeued. Thus a reward should be generated for each successful dequeue.

Cost of Rollback

Cost of rollback of fork A is defined as the cost in terms of lockout time to the validator to confirm any other fork that does not include fork A as an ancestor.

The Economic Finality of fork A can be calculated as the loss of all the rewards from rollback of fork A and its descendants, plus the opportunity cost of reward due to the exponentially growing lockout of the votes that have confirmed fork A.

Thresholds

Each validator can independently set a threshold of cluster commitment to a fork before that validator commits to a fork. For example, at vote stack index 7, the lockout is 256 time units. A validator may withhold votes and let votes 0-7 expire unless the vote at index 7 has at greater than 50% commitment in the cluster. This allows each validator to independently control how much risk to commit to a fork. Committing to forks at a higher frequency would allow the validator to earn more rewards.

Algorithm parameters

The following parameters need to be tuned:

  • Number of votes in the stack before dequeue occurs (32).

  • Rate of growth for lockouts in the stack (2x).

  • Starting default lockout (2).

  • Threshold depth for minimum cluster commitment before committing to the fork (8).

  • Minimum cluster commitment size at threshold depth (50%+).

Free Choice

A "Free Choice" is an unenforcible validator action. There is no way for the protocol to encode and enforce these actions since each validator can modify the code and adjust the algorithm. A validator that maximizes self-reward over all possible futures should behave in such a way that the system is stable, and the local greedy choice should result in a greedy choice over all possible futures. A set of validator that are engaging in choices to disrupt the protocol should be bound by their stake weight to the denial of service. Two options exits for validator:

  • a validator can outrun previous validator in virtual generation and submit a concurrent fork

  • a validator can withhold a vote to observe multiple forks before voting

In both cases, the validator in the cluster have several forks to pick from concurrently, even though each fork represents a different height. In both cases it is impossible for the protocol to detect if the validator behavior is intentional or not.

Greedy Choice for Concurrent Forks

When evaluating multiple forks, each validator should use the following rules:

  1. Forks must satisfy the Threshold rule.

  2. Pick the fork that maximizes the total cluster lockout time for all the ancestor forks.

  3. Pick the fork that has the greatest amount of cluster transaction fees.

  4. Pick the latest fork in terms of PoH.

Cluster transaction fees are fees that are deposited to the mining pool as described in the Staking Rewards section.

PoH ASIC Resistance

Votes and lockouts grow exponentially while ASIC speed up is linear. There are two possible attack vectors involving a faster ASIC.

ASIC censorship

An attacker generates a concurrent fork that outruns previous leaders in an effort to censor them. A fork proposed by this attacker will be available concurrently with the next available leader. For nodes to pick this fork it must satisfy the Greedy Choice rule.

  1. Fork must have equal number of votes for the ancestor fork.

  2. Fork cannot be so far a head as to cause expired votes.

  3. Fork must have a greater amount of cluster transaction fees.

This attack is then limited to censoring the previous leaders fees, and individual transactions. But it cannot halt the cluster, or reduce the validator set compared to the concurrent fork. Fee censorship is limited to access fees going to the leaders but not the validators.

ASIC Rollback

An attacker generates a concurrent fork from an older block to try to rollback the cluster. In this attack the concurrent fork is competing with forks that have already been voted on. This attack is limited by the exponential growth of the lockouts.

  • 1 vote has a lockout of 2 slots. Concurrent fork must be at least 2 slots ahead, and be produced in 1 slot. Therefore requires an ASIC 2x faster.

  • 2 votes have a lockout of 4 slots. Concurrent fork must be at least 4 slots ahead and produced in 2 slots. Therefore requires an ASIC 2x faster.

  • 3 votes have a lockout of 8 slots. Concurrent fork must be at least 8 slots ahead and produced in 3 slots. Therefore requires an ASIC 2.6x faster.

  • 10 votes have a lockout of 1024 slots. 1024/10, or 102.4x faster ASIC.

  • 20 votes have a lockout of 2^20 slots. 2^20/20, or 52,428.8x faster ASIC.

Leader to Leader Transition

This design describes how leaders transition production of the PoH ledger between each other as each leader generates its own slot.

Challenges

Current leader and the next leader are both racing to generate the final tick for the current slot. The next leader may arrive at that slot while still processing the current leader's entries.

The ideal scenario would be that the next leader generated its own slot right after it was able to vote for the current leader. It is very likely that the next leader will arrive at their PoH slot height before the current leader finishes broadcasting the entire block.

The next leader has to make the decision of attaching its own block to the last completed block, or wait to finalize the pending block. It is possible that the next leader will produce a block that proposes that the current leader failed, even though the rest of the network observes that block succeeding.

The current leader has incentives to start its slot as early as possible to capture economic rewards. Those incentives need to be balanced by the leader's need to attach its block to a block that has the most commitment from the rest of the network.

Leader timeout

While a leader is actively receiving entries for the previous slot, the leader can delay broadcasting the start of its block in real time. The delay is locally configurable by each leader, and can be dynamically based on the previous leader's behavior. If the previous leader's block is confirmed by the leader's TVU before the timeout, the PoH is reset to the start of the slot and this leader produces its block immediately.

The downsides:

  • Leader delays its own slot, potentially allowing the next leader more time to catch up.

The upsides compared to guards:

  • All the space in a block is used for entries.

  • The timeout is not fixed.

  • The timeout is local to the leader, and therefore can be clever. The leader's heuristic can take into account turbine performance.

  • This design doesn't require a ledger hard fork to update.

  • The previous leader can redundantly transmit the last entry in the block to the next leader, and the next leader can speculatively decide to trust it to generate its block without verification of the previous block.

  • The leader can speculatively generate the last tick from the last received entry.

  • The leader can speculatively process transactions and guess which ones are not going to be encoded by the previous leader. This is also a censorship attack vector. The current leader may withhold transactions that it receives from the clients so it can encode them into its own slot. Once processed, entries can be replayed into PoH quickly.

Alternative design options

Guard tick at the end of the slot

A leader does not produce entries in its block after the penultimate tick, which is the last tick before the first tick of the next slot. The network votes on the last tick, so the time difference between the penultimate tick and the last tick is the forced delay for the entire network, as well as the next leader before a new slot can be generated. The network can produce the last tick from the penultimate tick.

If the next leader receives the penultimate tick before it produces its own first tick, it will reset its PoH and produce the first tick from the previous leader's penultimate tick. The rest of the network will also reset its PoH to produce the last tick as the id to vote on.

The downsides:

  • Every vote, and therefore confirmation, is delayed by a fixed timeout. 1 tick, or around 100ms.

  • Average case confirmation time for a transaction would be at least 50ms worse.

  • It is part of the ledger definition, so to change this behavior would require a hard fork.

  • Not all the available space is used for entries.

The upsides compared to leader timeout:

  • The next leader has received all the previous entries, so it can start processing transactions without recording them into PoH.

  • The previous leader can redundantly transmit the last entry containing the penultimate tick to the next leader. The next leader can speculatively generate the last tick as soon as it receives the penultimate tick, even before verifying it.

Leader-to-Validator Transition

A fullnode typically operates as a validator. If, however, a staker delegates its stake to a fullnode, it will occasionally be selected as a slot leader. As a slot leader, the fullnode is responsible for producing blocks during an assigned slot. A slot has a duration of some number of preconfigured ticks. The duration of those ticks are estimated with a PoH Recorder described later in this document.

BankFork

BankFork tracks changes to the bank state over a specific slot. Once the final tick has been registered the state is frozen. Any attempts to write to are rejected.

Validator

A validator operates on many different concurrent forks of the bank state until it generates a PoH hash with a height within its leader slot.

Slot Leader

A slot leader builds blocks on top of only one fork, the one it last voted on.

PoH Recorder

Slot leaders and validators use a PoH Recorder for both estimating slot height and for recording transactions.

PoH Recorder when Validating

The PoH Recorder acts as a simple VDF when validating. It tells the validator when it needs to switch to the slot leader role. Every time the validator votes on a fork, it should use the fork's latest block id to re-seed the VDF. Re-seeding solves two problems. First, it synchronizes its VDF to the leader's, allowing it to more accurately determine when its leader slot begins. Second, if the previous leader goes down, all wallclock time is accounted for in the next leader's PoH stream. For example, if one block is missing when the leader starts, the block it produces should have a PoH duration of two blocks. The longer duration ensures the following leader isn't attempting to snip all the transactions from the previous leader's slot.

PoH Recorder when Leading

A slot leader use the PoH Recorder to record transactions, locking their positions in time. The PoH hash must be derived from a previous leader's last block. If it isn't, its block will fail PoH verification and be rejected by the cluster.

The PoH Recorder also serves to inform the slot leader when its slot is over. The leader needs to take care not to modify its bank if recording the transaction would generate a PoH height outside its designated slot. The leader, therefore, should not commit account changes until after it generates the entry's PoH hash. When the PoH height falls outside its slot any transactions in its pipeline may be dropped or forwarded to the next leader. Forwarding is preferred, as it would minimize network congestion, allowing the cluster to advertise higher TPS capacity.

Validator Loop

The PoH Recorder manages the transition between modes. Once a ledger is replayed, the validator can run until the recorder indicates it should be the slot leader. As a slot leader, the node can then execute and record transactions.

The loop is synchronized to PoH and does a synchronous start and stop of the slot leader functionality. After stopping, the validator's TVU should find itself in the same state as if a different leader had sent it the same block. The following is pseudocode for the loop:

  1. Query the LeaderScheduler for the next assigned slot.
  2. Run the TVU over all the forks.
    1. TVU will send votes to what it believes is the "best" fork.
    2. After each vote, restart the PoH Recorder to run until the next assigned slot.
  3. When time to be a slot leader, start the TPU. Point it to the last fork the TVU voted on.
  4. Produce entries until the end of the slot.
    1. For the duration of the slot, the TVU must not vote on other forks.
    2. After the slot ends, the TPU freezes its BankFork. After freezing, the TVU may resume voting.
  5. Goto 1.

Stake Delegation and Reward

This design proposal focuses on the software architecture for the on-chain voting and staking programs. Incentives for staking is covered in staking rewards.

The current architecture requires a vote for each delegated stake from the validator, and therefore does not scale to allow replicator clients to automatically delegate their rewards.

The design proposes a new set of programs for voting and stake delegation, The proposed programs allow many stake accounts to passively earn rewards with a single validator vote without permission or active involvement from the validator.

Current Design Problems

In the current design each staker creates their own VoteState, and assigns a delegate in the VoteState that can submit votes. Since the validator has to actively vote for each stake delegated to it, validators can censor stakes by not voting for them.

The number of votes is equal to the number of stakers, and not the number of validators. Replicator clients are expected to delegate their replication rewards as they are earned, and therefore the number of stakes is expected to be large compared to the number of validators in a long running cluster.

Proposed changes to the current design.

The general idea is that instead of the staker, the validator will own the VoteState program. In this proposal the VoteState program is there to track validator votes, count validator generated credits and to provide any additional validator specific state. The VoteState program is not aware of any stakes delegated to it, and has no staking weight.

The rewards generated are proportional to the amount of lamports staked. In this proposal stake state is stored as part of the StakeState program. This program is owned by the staker only. Lamports stored in this program are the stake. Unlike the current design, this program contains a new field to indicate which VoteState program the stake is delegated to.

VoteState

VoteState is the current state of all the votes the delegate has submitted to the bank. VoteState contains the following state information:

  • votes - The submitted votes data structure.

  • credits - The total number of rewards this vote program has generated over its lifetime.

  • root_slot - The last slot to reach the full lockout commitment necessary for rewards.

  • commission - The commission taken by this VoteState for any rewards claimed by staker's StakeState accounts. This is the percentage ceiling of the reward.

  • Account::lamports - The accumulated lamports from the commission. These do not count as stakes.

  • authorized_vote_signer - Only this identity is authorized to submit votes, and this field can only modified by this entity

VoteInstruction::Initialize

  • account[0] - RW - The VoteState VoteState::authorized_vote_signer is initialized to account[0] other VoteState members defaulted

VoteInstruction::AuthorizeVoteSigner(Pubkey)

  • account[0] - RW - The VoteState VoteState::authorized_vote_signer is set to to Pubkey, instruction must by signed by Pubkey

StakeState

A StakeState takes one of two forms, StakeState::Stake and StakeState::MiningPool.

StakeState::Stake

Stake is the current delegation preference of the staker. Stake contains the following state information:

  • voter_pubkey - The pubkey of the VoteState instance the lamports are delegated to.

  • credits_observed - The total credits claimed over the lifetime of the program.

  • stake - The actual activated stake.

  • Account::lamports - Lamports available for staking, including any earned as rewards.

StakeState::MiningPool

There are two approaches to the mining pool. The bank could allow the StakeState program to bypass the token balance check, or a program representing the mining pool could run on the network. To avoid a single network wide lock, the pool can be split into several mining pools. This design focuses on using a StakeState::MiningPool as the cluster wide mining pools.

  • 256 StakeState::MiningPool are initialized, each with 1/256 number of mining pool tokens stored as Account::lamports.

The stakes and the MiningPool are accounts that are owned by the same Stake program.

StakeInstruction::DelegateStake(stake)

  • account[0] - RW - The StakeState::Stake instance. StakeState::Stake::credits_observed is initialized to VoteState::credits. StakeState::Stake::voter_pubkey is initialized to account[1] StakeState::Stake::stake is initialized to stake, as long as it's less than account[0].lamports

  • account[1] - R - The VoteState instance.

StakeInstruction::RedeemVoteCredits

The VoteState program and the StakeState programs maintain a lifetime counter of total rewards generated and claimed. Therefore an explicit Clear instruction is not necessary. When claiming rewards, the total lamports deposited into the StakeState and as validator commission is proportional to VoteState::credits - StakeState::credits_observed.

  • account[0] - RW - The StakeState::MiningPool instance that will fulfill the reward.
  • account[1] - RW - The StakeState::Stake instance that is redeeming votes credits.
  • account[2] - R - The VoteState instance, must be the same as StakeState::voter_pubkey

Reward is payed out for the difference between VoteState::credits to StakeState::Delgate.credits_observed, and credits_observed is updated to VoteState::credits. The commission is deposited into the VoteState token balance, and the reward is deposited to the StakeState::Stake token balance. The reward and the commission is weighted by the StakeState::lamports divided by total lamports staked.

The Staker or the owner of the Stake program sends a transaction with this instruction to claim the reward.

Any random MiningPool can be used to redeem the credits.

let credits_to_claim = vote_state.credits - stake_state.credits_observed;
stake_state.credits_observed = vote_state.credits;

credits_to_claim is used to compute the reward and commission, and StakeState::Stake::credits_observed is updated to the latest VoteState::credits value.

Collecting network fees into the MiningPool

At the end of the block, before the bank is frozen, but after it processed all the transactions for the block, a virtual instruction is executed to collect the transaction fees.

  • A portion of the fees are deposited into the leader's account.
  • A portion of the fees are deposited into the smallest StakeState::MiningPool account.

Benefits

  • Single vote for all the stakers.

  • Clearing of the credit variable is not necessary for claiming rewards.

  • Each delegated stake can claim its rewards independently.

  • Commission for the work is deposited when a reward is claimed by the delegated stake.

This proposal would benefit from the read-only accounts proposal to allow for many rewards to be claimed concurrently.

Passive Delegation

Any number of instances of StakeState::Stake programs can delegate to a single VoteState program without an interactive action from the identity controlling the VoteState program or submitting votes to the program.

The total stake allocated to a VoteState program can be calculated by the sum of all the StakeState programs that have the VoteState pubkey as the StakeState::Stake::voter_pubkey.

Example Callflow

Passive Staking Callflow

Future work

Validators may want to split the stake delegated to them amongst many validator nodes since stake is used as weight in the network control and data planes. One way to implement this would be for the StakeState to delegate to a pool of validators instead of a single one.

Instead of a single vote_pubkey and credits_observed entry in the StakeState program, the program can be initialized with a vector of tuples.

Voter {
    voter_pubkey: Pubkey,
    credits_observed: u64,
    weight: u8,
}
  • voters: Vec - Array of VoteState accounts that are voting rewards with this stake.

A StakeState program would claim a fraction of the reward from each voter in the voters array, and each voter would be delegated a fraction of the stake.

Persistent Account Storage

The set of Accounts represent the current computed state of all the transactions that have been processed by a fullnode. Each fullnode needs to maintain this entire set. Each block that is proposed by the network represents a change to this set, and since each block is a potential rollback point the changes need to be reversible.

Persistent storage like NVMEs are 20 to 40 times cheaper than DDR. The problem with persistent storage is that write and read performance is much slower than DDR and care must be taken in how data is read or written to. Both reads and writes can be split between multiple storage drives and accessed in parallel. This design proposes a data structure that allows for concurrent reads and concurrent writes of storage. Writes are optimized by using an AppendVec data structure, which allows a single writer to append while allowing access to many concurrent readers. The accounts index maintains a pointer to a spot where the account was appended to every fork, thus removing the need for explicit checkpointing of state.

AppendVec

AppendVec is a data structure that allows for random reads concurrent with a single append-only writer. Growing or resizing the capacity of the AppendVec requires exclusive access. This is implemented with an atomic offset, which is updated at the end of a completed append.

The underlying memory for an AppendVec is a memory-mapped file. Memory-mapped files allow for fast random access and paging is handled by the OS.

Account Index

The account index is designed to support a single index for all the currently forked Accounts.

type AppendVecId = usize;

type Fork = u64;

struct AccountMap(Hashmap<Fork, (AppendVecId, u64)>);

type AccountIndex = HashMap<Pubkey, AccountMap>;

The index is a map of account Pubkeys to a map of Forks and the location of the Account data in an AppendVec. To get the version of an account for a specific Fork:

/// Load the account for the pubkey.
/// This function will load the account from the specified fork, falling back to the fork's parents
/// * fork - a virtual Accounts instance, keyed by Fork.  Accounts keep track of their parents with Forks,
///       the persistent store
/// * pubkey - The Account's public key.
pub fn load_slow(&self, id: Fork, pubkey: &Pubkey) -> Option<&Account>

The read is satisfied by pointing to a memory-mapped location in the AppendVecId at the stored offset. A reference can be returned without a copy.

Root Forks

Tower BFT eventually selects a fork as a root fork and the fork is squashed. A squashed/root fork cannot be rolled back.

When a fork is squashed, all accounts in its parents not already present in the fork are pulled up into the fork by updating the indexes. Accounts with zero balance in the squashed fork are removed from fork by updating the indexes.

An account can be garbage-collected when squashing makes it unreachable.

Three possible options exist:

  • Maintain a HashSet of root forks. One is expected to be created every second. The entire tree can be garbage-collected later. Alternatively, if every fork keeps a reference count of accounts, garbage collection could occur any time an index location is updated.

  • Remove any pruned forks from the index. Any remaining forks lower in number than the root are can be considered root.

  • Scan the index, migrate any old roots into the new one. Any remaining forks lower than the new root can be deleted later.

Append-only Writes

All the updates to Accounts occur as append-only updates. For every account update, a new version is stored in the AppendVec.

It is possible to optimize updates within a single fork by returning a mutable reference to an already stored account in a fork. The Bank already tracks concurrent access of accounts and guarantees that a write to a specific account fork will not be concurrent with a read to an account at that fork. To support this operation, AppendVec should implement this function:

fn get_mut(&self, index: u64) -> &mut T;

This API allows for concurrent mutable access to a memory region at index. It relies on the Bank to guarantee exclusive access to that index.

Garbage collection

As accounts get updated, they move to the end of the AppendVec. Once capacity has run out, a new AppendVec can be created and updates can be stored there. Eventually references to an older AppendVec will disappear because all the accounts have been updated, and the old AppendVec can be deleted.

To speed up this process, it's possible to move Accounts that have not been recently updated to the front of a new AppendVec. This form of garbage collection can be done without requiring exclusive locks to any of the data structures except for the index update.

The initial implementation for garbage collection is that once all the accounts in an AppendVec become stale versions, it gets reused. The accounts are not updated or moved around once appended.

Index Recovery

Each bank thread has exclusive access to the accounts during append, since the accounts locks cannot be released until the data is committed. But there is no explicit order of writes between the separate AppendVec files. To create an ordering, the index maintains an atomic write version counter. Each append to the AppendVec records the index write version number for that append in the entry for the Account in the AppendVec.

To recover the index, all the AppendVec files can be read in any order, and the latest write version for every fork should be stored in the index.

Snapshots

To snapshot, the underlying memory-mapped files in the AppendVec need to be flushed to disk. The index can be written out to disk as well.

Performance

  • Append-only writes are fast. SSDs and NVMEs, as well as all the OS level kernel data structures, allow for appends to run as fast as PCI or NVMe bandwidth will allow (2,700 MB/s).

  • Each replay and banking thread writes concurrently to its own AppendVec.

  • Each AppendVec could potentially be hosted on a separate NVMe.

  • Each replay and banking thread has concurrent read access to all the AppendVecs without blocking writes.

  • Index requires an exclusive write lock for writes. Single-thread performance for HashMap updates is on the order of 10m per second.

  • Banking and Replay stages should use 32 threads per NVMe. NVMes have optimal performance with 32 concurrent readers or writers.

Reliable Vote Transmission

Validator votes are messages that have a critical function for consensus and continuous operation of the network. Therefore it is critical that they are reliably delivered and encoded into the ledger.

Challenges

  1. Leader rotation is triggered by PoH, which is clock with high drift. So many nodes are likely to have an incorrect view if the next leader is active in realtime or not.

  2. The next leader may be easily be flooded. Thus a DDOS would not only prevent delivery of regular transactions, but also consensus messages.

  3. UDP is unreliable, and our asynchronous protocol requires any message that is transmitted to be retransmitted until it is observed in the ledger. Retransmittion could potentially cause an unintentional thundering herd against the leader with a large number of validators. Worst case flood would be (num_nodes * num_retransmits).

  4. Tracking if the vote has been transmitted or not via the ledger does not guarantee it will appear in a confirmed block. The current observed block may be unrolled. Validators would need to maintain state for each vote and fork.

Design

  1. Send votes as a push message through gossip. This ensures delivery of the vote to all the next leaders, not just the next future one.

  2. Leaders will read the Crds table for new votes and encode any new received votes into the blocks they propose. This allows for validator votes to be included in rollback forks by all the future leaders.

  3. Validators that receive votes in the ledger will add them to their local crds table, not as a push request, but simply add them to the table. This shortcuts the push message protocol, so the validation messages do not need to be retransmitted twice around the network.

  4. CrdsValue for vote should look like this Votes(Vec<Transaction>)

Each vote transaction should maintain a wallclock in its data. The merge strategy for Votes will keep the last N set of votes as configured by the local client. For push/pull the vector is traversed recursively and each Transaction is treated as an individual CrdsValue with its own local wallclock and signature.

Gossip is designed for efficient propagation of state. Messages that are sent through gossip-push are batched and propagated with a minimum spanning tree to the rest of the network. Any partial failures in the tree are actively repaired with the gossip-pull protocol while minimizing the amount of data transfered between any nodes.

How this design solves the Challenges

  1. Because there is no easy way for validators to be in sync with leaders on the leader's "active" state, gossip allows for eventual delivery regardless of that state.

  2. Gossip will deliver the messages to all the subsequent leaders, so if the current leader is flooded the next leader would have already received these votes and is able to encode them.

  3. Gossip minimizes the number of requests through the network by maintaining an efficient spanning tree, and using bloom filters to repair state. So retransmit back-off is not necessary and messages are batched.

  4. Leaders that read the crds table for votes will encode all the new valid votes that appear in the table. Even if this leader's block is unrolled, the next leader will try to add the same votes without any additional work done by the validator. Thus ensuring not only eventual delivery, but eventual encoding into the ledger.

Performance

  1. Worst case propagation time to the next leader is Log(N) hops with a base depending on the fanout. With our current default fanout of 6, it is about 6 hops to 20k nodes.

  2. The leader should receive 20k validation votes aggregated by gossip-push into 64kb blobs. Which would reduce the number of packets for 20k network to 80 blobs.

  3. Each validators votes is replicated across the entire network. To maintain a queue of 5 previous votes the Crds table would grow by 25 megabytes. (20,000 nodes * 256 bytes * 5).

Two step implementation rollout

Initially the network can perform reliably with just 1 vote transmitted and maintained through the network with the current Vote implementation. For small networks a fanout of 6 is sufficient. With small network the memory and push overhead is minor.

Sub 1k validator network

  1. Crds just maintains the validators latest vote.

  2. Votes are pushed and retransmitted regardless if they are appearing in the ledger.

  3. Fanout of 6.

  • Worst case 256kb memory overhead per node.
  • Worst case 4 hops to propagate to every node.
  • Leader should receive the entire validator vote set in 4 push message blobs.

Sub 20k network

Everything above plus the following:

  1. CRDS table maintains a vector of 5 latest validator votes.

  2. Votes encode a wallclock. CrdsValue::Votes is a type that recurses into the transaction vector for all the gossip protocols.

  3. Increase fanout to 20.

  • Worst case 25mb memory overhead per node.
  • Sub 4 hops worst case to deliver to the entire network.
  • 80 blobs received by the leader for all the validator messages.

Repair Service

The RepairService is in charge of retrieving missing blobs that failed to be delivered by primary communication protocols like Avalanche. It is in charge of managing the protocols described below in the Repair Protocols section below.

Challenges:

  1. Validators can fail to receive particular blobs due to network failures

  2. Consider a scenario where blocktree contains the set of slots {1, 3, 5}. Then Blocktree receives blobs for some slot 7, where for each of the blobs b, b.parent == 6, so then the parent-child relation 6 -> 7 is stored in blocktree. However, there is no way to chain these slots to any of the existing banks in Blocktree, and thus the Blob Repair protocol will not repair these slots. If these slots happen to be part of the main chain, this will halt replay progress on this node.

  3. Validators that find themselves behind the cluster by an entire epoch struggle/fail to catch up because they do not have a leader schedule for future epochs. If nodes were to blindly accept repair blobs in these future epochs, this exposes nodes to spam.

Repair Protocols

The repair protocol makes best attempts to progress the forking structure of Blocktree.

The different protocol strategies to address the above challenges:

  1. Blob Repair (Addresses Challenge #1): This is the most basic repair protocol, with the purpose of detecting and filling "holes" in the ledger. Blocktree tracks the latest root slot. RepairService will then periodically iterate every fork in blocktree starting from the root slot, sending repair requests to validators for any missing blobs. It will send at most some N repair reqeusts per iteration.

    Note: Validators will only accept blobs within the current verifiable epoch (epoch the validator has a leader schedule for).

  2. Preemptive Slot Repair (Addresses Challenge #2): The goal of this protocol is to discover the chaining relationship of "orphan" slots that do not currently chain to any known fork.

    • Blocktree will track the set of "orphan" slots in a separate column family.

    • RepairService will periodically make RequestOrphan requests for each of the orphans in blocktree.

    RequestOrphan(orphan) request - orphan is the orphan slot that the requestor wants to know the parents of RequestOrphan(orphan) response - The highest blobs for each of the first N parents of the requested orphan

    On receiving the responses p, where p is some blob in a parent slot, validators will: * Insert an empty SlotMeta in blocktree for p.slot if it doesn't already exist. * If p.slot does exist, update the parent of p based on parents

    Note: that once these empty slots are added to blocktree, the Blob Repair protocol should attempt to fill those slots.

    Note: Validators will only accept responses containing blobs within the current verifiable epoch (epoch the validator has a leader schedule for).

  3. Repairmen (Addresses Challenge #3): This part of the repair protocol is the primary mechanism by which new nodes joining the cluster catch up after loading a snapshot. This protocol works in a "forward" fashion, so validators can verify every blob that they receive against a known leader schedule.

    Each validator advertises in gossip: * Current root * The set of all completed slots in the confirmed epochs (an epoch that was calculated based on a bank <= current root) past the current root

    Observers of this gossip message with higher epochs (repairmen) send blobs to catch the lagging node up with the rest of the cluster. The repairmen are responsible for sending the slots within the epochs that are confrimed by the advertised root in gossip. The repairmen divide the responsibility of sending each of the missing slots in these epochs based on a random seed (simple blob.index iteration by N, seeded with the repairman's node_pubkey). Ideally, each repairman in an N node cluster (N nodes whose epochs are higher than that of the repairee) sends 1/N of the missing blobs. Both data and coding blobs for missing slots are sent. Repairmen do not send blobs again to the same validator until they see the message in gossip updated, at which point they perform another iteration of this protocol.

    Gossip messages are updated every time a validator receives a complete slot within the epoch. Completed slots are detected by blocktree and sent over a channel to RepairService. It is important to note that we know that by the time a slot X is complete, the epoch schedule must exist for the epoch that contains slot X because WindowService will reject blobs for unconfirmed epochs. When a newly completed slot is detected, we also update the current root if it has changed since the last update. The root is made available to RepairService through Blocktree, which holds the latest root.

Testing Programs

Applications send transactions to a Solana cluster and query validators to confirm the transactions were processed and to check each transaction's result. When the cluster doesn't behave as anticipated, it could be for a number of reasons:

  • The program is buggy
  • The BPF loader rejected an unsafe program instruction
  • The transaction was too big
  • The transaction was invalid
  • The Runtime tried to execute the transaction when another one was accessing the same account
  • The network dropped the transaction
  • The cluster rolled back the ledger
  • A validator responded to queries maliciously

The AsyncClient and SyncClient Traits

To troubleshoot, the application should retarget a lower-level component, where fewer errors are possible. Retargeting can be done with different implementations of the AsyncClient and SyncClient traits.

Components implement the following primary methods:

trait AsyncClient {
    fn async_send_transaction(&self, transaction: Transaction) -> io::Result<Signature>;
}

trait SyncClient {
    fn get_signature_status(&self, signature: &Signature) -> Result<Option<transaction::Result<()>>>;
}

Users send transactions and asynchrounously and synchrounously await results.

ThinClient for Clusters

The highest level implementation, ThinClient, targets a Solana cluster, which may be a deployed testnet or a local cluster running on a development machine.

TpuClient for the TPU

The next level is the TPU implementation, which is not yet implemented. At the TPU level, the application sends transactions over Rust channels, where there can be no surprises from network queues or dropped packets. The TPU implements all "normal" transaction errors. It does signature verification, may report account-in-use errors, and otherwise results in the ledger, complete with proof of history hashes.

Low-level testing

BankClient for the Bank

Below the TPU level is the Bank. The Bank doesn't do signature verification or generate a ledger. The Bank is a convenient layer at which to test new on-chain programs. It allows developers to toggle between native program implementations and BPF-compiled variants. No need for the Transact trait here. The Bank's API is synchronous.

Unit-testing with the Runtime

Below the Bank is the Runtime. The Runtime is the ideal test environment for unit-testing. By statically linking the Runtime into a native program implementation, the developer gains the shortest possible edit-compile-run loop. Without any dynamic linking, stack traces include debug symbols and program errors are straightforward to troubleshoot.

Credit-Only Accounts

This design covers the handling of credit-only and credit-debit accounts in the runtime. Accounts already distinguish themselves as credit-only or credit-debit based on the program ID specified by the transaction's instruction. Programs must treat accounts that are not owned by them as credit-only.

To identify credit-only accounts by program id would require the account to be fetched and loaded from disk. This operation is expensive, and while it is occurring, the runtime would have to reject any transactions referencing the same account.

The proposal introduces a num_readonly_accounts field to the transaction structure, and removes the program_ids dedicated vector for program accounts.

This design doesn't change the runtime transaction processing rules. Programs still can't write or spend accounts that they do not own, but it allows the runtime to optimistically take the correct lock for each account specified in the transaction before loading the accounts from storage.

Accounts selected as credit-debit by the transaction can still be treated as credit-only by the instructions.

Runtime handling

credit-only accounts have the following properties:

  • Can be deposited into: Deposits can be implemented as a simple atomic_add.
  • read-only access to account data.

Instructions that debit or modify the credit-only account data will fail.

Account Lock Optimizations

The Accounts module keeps track of current locked accounts in the runtime, which separates credit-only accounts from the credit-debit accounts. The credit-only accounts can be cached in memory and shared between all the threads executing transactions.

The current runtime can't predict whether an account is credit-only or credit-debit when the transaction account keys are locked at the start of the transaction processing pipeline. Accounts referenced by the transaction have not been loaded from the disk yet.

An ideal design would cache the credit-only accounts while they are referenced by any transaction moving through the runtime, and release the cache when the last transaction exits the runtime.

Credit-only accounts and read-only account data

Credit-only account data can be treated as read-only. Credit-debit account data is treated as read-write.

Transaction changes

To enable the possibility of caching accounts only while they are in the runtime, the Transaction structure should be changed in the following way:

  • program_ids: Vec<Pubkey> - This vector is removed. Program keys can be placed at the end of the account_keys vector within the num_readonly_accounts number set to the number of programs.

  • num_readonly_accounts: u8 - The number of keys from the end of the transaction's account_keys array that is credit-only.

The following possible accounts are present in an transaction:

  • paying account
  • RW accounts
  • R accounts
  • Program IDs

The paying account must be credit-debit, and program IDs must be credit-only. The first account in the account_keys array is always the account that pays for the transaction fee, therefore it cannot be credit-only. For these reasons the credit-only accounts are all grouped together at the end of the account_keys vector. Counting credit-only accounts from the end allow for the default 0 value to still be functionally correct, since a transaction will succeed with all credit-debit accounts.

Since accounts can only appear once in the transaction's account_keys array, an account can only be credit-only or credit-debit in a single transaction, not both. The runtime treats a transaction as one atomic unit of execution. If any instruction needs credit-debit access to an account, a copy needs to be made. The write lock is held for the entire time the transaction is being processed by the runtime.

Starvation

Read locks for credit-only accounts can keep the runtime from executing transactions requesting a write lock to a credit-debit account.

When a request for a write lock is made while a read lock is open, the transaction requesting the write lock should be cached. Upon closing the read lock, the pending transactions can be pushed through the runtime.

While a pending write transaction exists, any additional read lock requests for that account should fail. It follows that any other write lock requests will also fail. Currently, clients must retransmit when a transaction fails because of a pending transaction. This approach would mimic that behavior as closely as possible while preventing write starvation.

Program execution with credit-only accounts

Before handing off the accounts to program execution, the runtime can mark each account in each instruction as a credit-only account. The credit-only accounts can be passed as references without an extra copy. The transaction will abort on a write to credit-only.

An alternative is to detect writes to credit-only accounts and fail the transactions before commit.

Alternative design

This design attempts to cache a credit-only account after loading without the use of a transaction-specified credit-only accounts list. Instead, the credit-only accounts are held in a reference-counted table inside the runtime as the transactions are processed.

  1. Transaction accounts are locked. a. If the account is present in the ‘credit-only' table, the TX does not fail. The pending state for this TX is marked NeedReadLock.
  2. Transaction accounts are loaded. a. Transaction accounts that are credit-only increase their reference count in the credit-only table. b. Transaction accounts that need a write lock and are present in the credit-only table fail.
  3. Transaction accounts are unlocked. a. Decrement the credit-only lock table reference count; remove if its 0 b. Remove from the lock set if the account is not in the credit-only table.

The downside with this approach is that if the lock set mutex is released between lock and load to allow better pipelining of transactions, a request for a credit-only account may fail. Therefore, this approach is not suitable for treating programs as credit-only accounts.

Holding the accounts lock mutex while fetching the account from disk would potentially have a significant performance hit on the runtime. Fetching from disk is expected to be slow, but can be parallelized between multiple disks.

Embedding the Move Language

Problem

Solana enables developers to write on-chain programs in general purpose programming languages such as C or Rust, but those programs contain Solana-specific mechanisms. For example, there isn't another chain that asks developers to create a Rust module with a process_instruction(KeyedAccounts) function. Whenever practical, Solana should offer dApp developers more portable options.

Until just recently, no popular blockchain offered a language that could expose the value of Solana's massively parallel runtime. Solidity contracts, for example, do not separate references to shared data from contract code, and therefore need to be executed serially to ensure deterministic behavior. In practice we see that the most aggressively optimized EVM-based blockchains all seem to peak out around 1,200 TPS - a small fraction of what Solana can do. The Libra project, on the other hand, designed an on-chain programming language called Move that is more suitable for parallel execution. Like Solana's runtime, Move programs depend on accounts for all shared state.

The biggest design difference between Solana's runtime and Libra's Move VM is how they manage safe invocations between modules. Solana took an operating systems approach and Libra took the domain-specific language approach. In the runtime, a module must trap back into the runtime to ensure the caller's module did not write to data owned by the callee. Likewise, when the callee completes, it must again trap back to the runtime to ensure the callee did not write to data owned by the caller. Move, on the other hand, includes an advanced type system that allows these checks to be run by its bytecode verifier. Because Move bytecode can be verified, the cost of verification is paid just once, at the time the module is loaded on-chain. In the runtime, the cost is paid each time a transaction crosses between modules. The difference is similar in spirit to the difference between a dynamically-typed language like Python versus a statically-typed language like Java. Solana's runtime allows dApps to be written in general purpose programming languages, but that comes with the cost of runtime checks when jumping between programs.

This proposal attempts to define a way to embed the Move VM such that:

  • cross-module invocations within Move do not require the runtime's cross-program runtime checks
  • Move programs can leverage functionality in other Solana programs and vice versa
  • Solana's runtime parallelism is exposed to batches of Move and non-Move transactions

Proposed Solution

Move VM as a Solana loader

The Move VM shall be embedded as a Solana loader under the identifier MOVE_PROGRAM_ID, so that Move modules can be marked as executable with the VM as its owner. This will allow modules to load module dependencies, as well as allow for parallel execution of Move scripts.

All data accounts owned by Move modules must set their owners to the loader, MOVE_PROGRAM_ID. Since Move modules encapsulate their account data in the same way Solana programs encapsulate theirs, the Move module owner should be embedded in the account data. The runtime will grant write access to the Move VM, and Move grants access to the module accounts.

Interacting with Solana programs

To invoke instructions in non-Move programs, Solana would need to extend the Move VM with a process_instruction() system call. It would work the same as process_instruction() Rust BPF programs.