ReGenesis Explained

Originally published: https: //medium.com/@mandrigin/regenesis-explained

The amount of money locked in DeFi apps surpassed $1.500.000.000; DApps are beign announced almost every day. At that rate, Ethereum can become a victim of its own success. One of the struggles on the infrastructure layer is boundless state growth.

A couple of days ago Alexey published an an article at EthResearch forums with an interesting idea on how to help with exactly that. He called it ReGenesis.

The idea is relatively simple and might look very radical at the first sight: let’s nuke all the state we have locally at deterministic schedule (say, every 1.000.000 blocks), keep the root hash of the latest block and then begin rebuilding the state from scratch until the next 1.000.000 blocks passes and next state extinction event happens.

ReGenesis: Reseting state every N blocks. State grows, witnesses shrink. Until the next ReGenesis.

To understand, why this idea might work and is worth researching, one needs some context and explanation.

This is what this post is exactly about.

Current State of Affairs

Centralisation of state storage.
It is very easy to send a transaction in Ethereum, to do that you don’t really need much state or anything. Transaction senders aren’t motivated to keep any state whatsoever.
If you are running an infrastructure node (a miner, or an exchange), at the other hand, you need to make sure that you have all the state possible to be able to validate the transaction and to propagate it further. Even more so if you are a miner.
That way, we have incentivies to centralize storage of the current state in infrastructure nodes who usually are being runned by bigger entities: Infura, big miners, exchanges, etc.
Stage Growth Continues Unbounded.That is it, “current state” bucket in Turbo-Geth is about 40 GB at the time of writing and keeps growing.
The problem here is not the state size itself, but it uncapped growth. Sure, we might try to hope that fast SSD drives capacity is growing faster than the blockchain state.
We were lucky here so far, but being lucky isn’t a viable long-term strategy.
Stateless Ethereum Gas Repricing is hard.Stateless Ethereum initiative has a very tricky job to price block witnesses. Those witnesses are produced by miners and it is difficult to predict when sending a transaction how big the block witness would be.
That way, it requires to somehow infere the witness price from smart contract behaviour without breaking existing smart contracts.
There were a couple of proposals there, notably Oil, but it is a hard problem to solve.

Running transactions post-ReGenesis

Okay, we all got the idea, you nuke all the current state except the root hash. Disk size goes 40GB → 32 bytes.

Amazing, disk space saved.

But how do I send transactions?

Stateless Ethereum comes to the rescue. To be able to run your transaction, you need to provide a witness. The witness consists of the state of used accounts, code and storage as well as Merkle proofs to be able to validate the root hash (more on witnesses and Stateless Ethereum here).

There is one difference between ReGenesis and Stateless Ethereum though. To explain that I would like to introduce 2 pieces of terminology: explicitstate and implicit state.

Say, you are running a geth node now. Whenever you peer sends you a new block N, it assumes that you already have all the state needed to validate all transactions in this block, that you have synced to the block N-1.

Today’s Ethereum: We imply that both nodes have all the state needed.

That is 100% implicit state. Your peer implied that you already have the state and didn’t add to the block anything.

In Stateless Ethereum, when you run a node, and your peers send you a new block N, they assume that you have no implicit state whatsoever. They just include everything you can possibly need to run this block in the block itself. It is a Block Witness.

Stateless Ethereum: We provide all state as explicitly.

Your peer is sending you explicit state.

ReGenesis does something in between.

Let’s imagine ReGenesis has happened at the block 10.000.000. Right now the chain head is at the block 10.001.000. We can assume that any ReGenesis node has implicit state for everything between blocks 10m and 10,001m. Every account, every storage entry and every contract that was used in these blocks already exist on every node and don’t require a witness. That can reduce witness size substantially, as we saw in the semi-stateless sync experiment.

ReGenesis: Block 1 provides some information that does not exist on the Node1.

ReGenesis: the state from Block 1 is merged to Node 1’s implicit state.

If you, as a transaction senders need to send a new transaction to the network that is currently at the block 10.001.000, you need to follow these steps:

create a transaction that you want;
take a look at the implicit state that was generated since the last ReGenesis;
create explicit state for the entries that are NOT inlcluded in implicit state, package it as a transacton witness (we will discuss what it is a bit later on);
send the transaction with the witness to the network;

Wait. That will require Transaction Senders to have some state even after ReGenesis!

Yes.

If you want to send a transaction after ReGenesis, you might need some pre-ReGenesis state information to be able to generate a witness.

Though, for most DApps, that means that they need to store state of a minor subset of contracts and accounts they actually use (plus a Merkle Proof of course).

That makes this optimization a bit less dramatic to them. On the bright side, potentially shifts incentives to decentralise the data storage.

Transaction Witnesses

As you have probably noticed, I’ve mentioned transaction witnesses before.

Transactional Witnesses vs. Block Witnesses

Instead of having a witness for the whole block (a block witness), we have a witness per transaction.

Transaction witnesses contain explicit state for all accounts, storage entries and code used in this transaction. They also contain a Merkle Proof to ensure we can verify the state.

Block Witnesses are generated by a miner. We need a tricky gas pricing to compensate them.

Transaction witnesses have one important benefit. They are generated on the sender and are sent with the transaction. Hence, we know their size immediately and it is obvious how to price them. We don’t need EVM repricing.

Transaction Witnesses are generated by senders create witnesses, much easier to compensate.

One potential downside of using transaction witnesses is the data duplication. If say, all transactions in the block will be between same accounts, transaction witnesses will contain duplicate data.

Another downside is a bit more complicated algorithm of applying those witnesses.

Block witnesses are generated on a miner. The miner knows the exact ordering of transactions to include, so the witness always contains up-to-date data.

Transaction witnesses that are sent from a sender. There should be an intelligent “merging” of the witness with implicit state generated by previous transactions in the block.

But the whole idea of ReGenesis would need it, so it isn’t that big of a deal.

Why can’t we use transactional witnesses right now?

A short answer a mix of Dynamic State Access and potential front-running transactions by malicious agents.

A longer answer is as follows.

In Stateless Ethereum you need to provide the full state for this transaction, we assume that the recipient has no state whatsoever. If your transaction uses Dynamic State Access, that means that which parts of the storage your code reads depends on values in other parts of the code.

That opens the possibility for the following theoretical DoS attack.

Alice has her smart contract reading either state entry A or B based on the storage value K. Bob can change this value K. Bob can frontrun his transactions before Alice.

That way if Alice provides a proof that contains A, Bob can change K before Alice’s transaction, so her transaction fails. If she provides a proof for B, he can again stop the transaction.

In this simple case, Alice can of course just provide both A and B, but if the address is determined by a uint64 or something similar Alice might have to include the full state in a proof to avoid this attack, which is impossible.

Of course, this attack is theoretical. But there might be more attacks possible in a similar manner. Moreover with the current amount of money in DApps, we need to be very careful not to break anything.

How does ReGenesis help with mitigating that?

Deterministic nature of ReGenesis helps to determine how much state the node does have. For the state, that we are sure exists on the node, we need no proof.

To mitigate the attack we need to be sure that the proof that we provide gets included in the state no matter if the transaction fails due to insufficient state or succeeds.

So, in the scenario described before.

When Alice sends a transaction with the proof with the path A, but Bob makes sure that the value at the address K is such that the Alice’s contract has to go to the path B.

Alice’s transaction fails, but it adds everything that the path A needs to the node’s implicit state.

Now Alice can re-send the transaction, this time providing the proof for the path B. Bob cannot stop this transaction by this switch anymore, because if he changes the variable at K such that is goes to the path A again, it is too late, it is in the implicit state, so the transaction can go there without any proof. For B, there is a proof.

Conclusion

Okay, so let’s recap the idea and its key points:

every N blocks, we just remove all state, keeping only the root hash;
this event should be relatively rare, maybe each 1.000.000 blocks, maybe each 10.000.000 blocks or something;
to send a transaction, the sender will need to provide explicit state (as a Transaction Witness);
the sender pays gas for the transaction witness inclusion based on the witness size;
to achieve that, the sender will have to keep some pre-ReGenesis state for contracts/accounts it is interested in;
if the transaction fails due to insufficient explicit state, we add the provides state as implicit and we won’t require it on the next send;
there is a smart merging mechanism to merge explicit state from the transaction proof with implicit state of a node after running previous transactions in the block.

As we can see, after zooming out and seeing a bigger picture, ReGenesis approach can potentially:

shift the incentives balance between Transaction Senders and Infrastructure nodes to achieve higher decentralization of state storage;
limit state growth to growth between ReGenesis events (with a few caveats);
allow to use Transaction Witnesses and simplify gas pricing for each witness;

Of course, a lot of these things should be tested, challenged and proved first. I think it is an interesting and very promising area of research that brings a surprising amount of potential benefits.