Data Storage Layer & EVM

Artemis Week-2

Jul 19, 2022

Laying the foundations. Artemis Week-2

Second week of lectures at Artemis, where despite we kept covering the fundamentals behind the technology, we also started to get into more practical stuff.

This week we covered different topics around the Ethereum ecosystem: proof-of-stake, nodes, clients, and the EVM. Again, all of them are interesting topics and with plenty of side-quests that could be explored. Nevertheless, I will focus on the Data Storage Layer and the Ethereum Virtual Machine.

Personally, I plan to further familiarize myself with the opcodes in the upcoming weeks, since they can help provide a better understanding of the overall system and potentially help with gas optimization and even security.

Data Storage Layer

To fully comprehend Ethereum it is key to first learn how its data storage layer works. To do so, the following topics should be understood.

Merkle Patricia Trie

A trie is an easy-to-implement and small-memory data structure that is extremely fast at finding common prefixes. As Ethereum uses a Merkle Tree to efficiently store hashes in blocks, the development of a new data structure called “Modified Merkel Patricia Trie” became the core of data storage. Because of that, the world state trie, the Account State, the Receipt trie, and the Transaction trie are implementations of a Merkle Patricia Trie.

World State

The world state trie keeps a mapping between addresses (EOAs and contract accounts) and their account states. It can be seen as a global state that is constantly updated by transaction executions.

It is important to note that despite the world state trie can be used to retrieve all the accounts’ state information from a root hash (storage root), the actual information is stored in the Account State trie.

Finally, it is also important to take into account that all this information is not stored on the blockchain. Instead, a single root hash (state root) representing the latest state, is stored in each block. The state root is enough for light nodes and full nodes to request world state data to full nodes and archival nodes respectively.

Note: If you are not familiar with it yet, you can check the characteristics of the different node types here.

Transactions

Transactions are cryptographically signed instructions initiated by accounts. Transactions are the reason behind changes in the world state, and can result in message calls or the creation of new contracts.

Similar to the world state trie, each block has its own separate transaction trie. This trie stores both the transaction-related information and the transaction order decided by the miner who assembled the block.

Blocks

A block is composed of two different parts: the header and the body.

Block header: Actual “block-chain” part of Ethereum. Contains the hash of its predecessor, ensuring an untampered chain.
Block body: Contains a list of transactions that have been included in the block and a list of uncle/ommer block headers.

All the exposed information can be summarized in the following diagram:

Main Diagram — Relationship between blocks and the data storage layer in the Ethereum network.

EVM Basics

The EVM is a virtual, stack-based computation engine that powers the Ethereum Blockchain by enabling it to run and deploy smart contracts.

One can think of EVM as the execution environment for the Ethereum blockchain. As such, the EVM consists of opcodes (operation codes), which can be conceptualized as a set of instructions with associated gas costs.

Since the EVM is deterministic, Ethereum can be described as a network with a state transition function where, given a state and a set of transactions, the network will transition towards a new valid state.

High-level Language → Byte Code → Opcode

Smart contracts are compiled into bytecode, which is then executed by the EVM as a set of opcodes that perform specific tasks. In combination with enough resources, these opcodes allow the EVM to compute almost anything, grating it Turing-completeness.

EVM Architecture

The EVM execution is controlled by a 1024-item-deep stack, where each item is a 256-bit word. During execution, the EVM maintains a volatile memory that works as a word-addressed byte array, and which does not persist between transactions. Contracts, however, can use storage tries (as addressable key-value pairs), associated with the network accounts to access persistent data.

EVM State transition

As explained before, to move into a new state, the EVM needs to have an initial state to transition from, and a set of transactions in charge of the state changes.

If these requirements are fulfilled, every new transaction will either perform a message call or deploy a new contract. Regardless of the transaction nature, the stack will be populated with the corresponding opcodes and data, to effectively compute some operations and transition into a new state.

EVM execution model showcasing how the different components interact.

Recommended Resources

If you feel like a quant today, you should definitely check the Ethereum Yellow Paper.

If you want to have a more in-depth view of how the Ethereum Virtual Machine works, check this article.

You can find all the different opcodes, their explanation, and their minimum gas consumption here.

If you want to fully understand opcodes and their correlation with stack, memory, and storage, I highly encourage you to play around with this awesome tool.

Finally, I would like to give credit to Gonçalo for explaining all these complex topics in such a comprehensive manner.

Aspiring web3 dev

Discussion about this post