Solving Source Code Verification In Ethereum
Introducing the Verifier Alliance
One of the items that plagues users of blockchains is the abundance of hexadecimal. Apart from developers, no one in their right mind should ever want to have to work with long hexadecimal strings that are considered standard in blockchain applications.
I've lost track of the number of times over the years where the proof of some activity taking place on a blockchain is demonstrated to an audience.
One can argue that dealing with hex is ok for developers, but for your typical non-technical user that we want to bring into this wonderful ecosystem we're all building, it's a huge no-no.
Certain initiatives have helped greatly in trying to address the hexadecimal problem for users of blockchain networks. The Ethereum Name Service provides human-readable mappings for addresses on Ethereum, such as chainlens.eth
which maps to the Ethereum address 0xF11fe81d92e2289507363dCD8a83093cD8dbcCF9
.
But there has historically been another area underserved — the decoding of smart contracts deployed onto the blockchain.
This has resulted in users when examining contracts using blockchain explorers being exposed to meaningless hexadecimal such as shown below.
There are established solutions to this problem which I discuss below. However, the space is about to get a significant boost thanks to the recent launch of the Verifier Alliance, a new initiative to help address this problem.
Source Code Verification 101
When a developer writes a smart contract using Solidity or Vyper, it needs to be compiled before it can be deployed to the Ethereum network for execution.
This compilation is performed by the Solidity or Vyper compilers which take the contract source code and compile it into bytecode, Within this bytecode is a series of instructions that the Ethereum Virtual Machine executes when a method is called on that smart contract. Take for instance the following Greeter smart contract:
// SPDX-License-Identifier: Apache-2.0
pragma solidity ^0.7.0;
// Modified Greeter contract. Based on example at https://www.ethereum.org/greeter.
contract Mortal {
/* Define variable owner of the type address*/
address owner;
/* this function is executed at initialization and sets the owner of the contract */
constructor () {owner = msg.sender;}
modifier onlyOwner {
require(
msg.sender == owner,
"Only owner can call this function."
);
_;
}
/* Function to recover the funds on the contract */
function kill() onlyOwner public {selfdestruct(msg.sender);}
}
contract HelloWorld is Mortal {
/* define variable greeting of the type string */
string greet;
/* this runs when the contract is executed */
constructor (string memory _greet) {
greet = _greet;
}
function newGreeting(string memory _greet) onlyOwner public {
emit Modified(greet, _greet, greet, _greet);
greet = _greet;
}
/* main function */
function greeting() public view returns (string memory) {
return greet;
}
event Modified(
string indexed oldGreetingIdx, string indexed newGreetingIdx,
string oldGreeting, string newGreeting);
}
From this Solidity source file, the Solidity compiler generates the following bytecode:
608060405234801561001057600080fd5b5060405161074538038061074583398181016040526020
81101561003357600080fd5b81019080805160405193929190846401000000008211156100535760
0080fd5b90830190602082018581111561006857600080fd5b825164010000000081118282018810
171561008257600080fd5b82525081516020918201929091019080838360005b838110156100af57
...
8255916020019190600101906104ee565b50610515929150610519565b5090565b5b808211156105
15576000815560010161051a56fe4f6e6c79206f776e65722063616e2063616c6c20746869732066
756e6374696f6e2ea264697066735822122035d0100e0feb96ff214b61ec1e850015aec0360a78c7
5ed21a9792401e390a3464736f6c63430007060033
To run the smart contract code on the Ethereum network, it needs to be deployed to the network. This deployment happens by creating a new transaction on the network which creates an instance of the smart contract on the network.
This transaction only contains the bytecode of the smart contract, it doesn't include any of the underlying source code.
Application Binary Interface (ABI) Files
To interact with a smart contract application on the network, developers need to know what code is deployed on the network.
This is achieved using the contract Application Binary Interface (ABI) file which contains method names and the parameters they accept for smart contracts.
For instance in our Greeter contract above, the following section of the ABI file shows the constructor method and variables it supports.
[
{
"inputs": [
{
"internalType": "string",
"name": "_greet",
"type": "string"
}
],
"stateMutability": "nonpayable",
"type": "constructor"
},
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"internalType": "string",
"name": "oldGreetingIdx",
"type": "string"
},
{
"indexed": true,
"internalType": "string",
"name": "newGreetingIdx",
"type": "string"
},
{
"indexed": false,
"internalType": "string",
"name": "oldGreeting",
"type": "string"
},
{
"indexed": false,
"internalType": "string",
"name": "newGreeting",
"type": "string"
}
],
"name": "Modified",
"type": "event"
},
{
"inputs": [],
"name": "greeting",
"outputs": [
{
"internalType": "string",
"name": "",
"type": "string"
}
],
"stateMutability": "view",
"type": "function"
},
{
"inputs": [],
"name": "kill",
"outputs": [],
"stateMutability": "nonpayable",
"type": "function"
},
{
"inputs": [
{
"internalType": "string",
"name": "_greet",
"type": "string"
}
],
"name": "newGreeting",
"outputs": [],
"stateMutability": "nonpayable",
"type": "function"
}
]
Using this information developers can create transactions to deploy and integrate with existing smart contracts.
ABI files can be created at compile time by the different smart contract language compilers.
This approach creates a trust conundrum for developers when working with smart contracts on Ethereum. How can you be sure that the smart contract you are interacting with is the one that you think it is and that the ABI file you have for it is correct?
As it's not possible to store either the contract source code or ABI file on the Ethereum network, this creates a trust challenge.
Metadata Files
The Solidity compiler developers were cognizant of this issue and created a metadata file to help address the issue.
The metadata file contains crucial information about the smart contract — the information about the environment required to reproduce the same compilation and ABI information.
I.e. if you are in possession of the original source files and metadata file you have the information you need to reproduce the bytecode for a deployed smart contract. Additionally, the metadata file has ABI information, you are in possession of all the information you need to interact with the contract.
The metadata file can be generated at compile time by passing the associated --metadata command line argument to the compiler. Our Greeter example from above produces a file like the below:
{
"compiler": {
"version": "0.7.6+commit.7338295f"
},
"language": "Solidity",
"output": {
"abi": [
{
"inputs": [
{
"internalType": "string",
"name": "_greet",
"type": "string"
}
],
"stateMutability": "nonpayable",
"type": "constructor"
},
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"internalType": "string",
"name": "oldGreetingIdx",
"type": "string"
},
{
"indexed": true,
"internalType": "string",
"name": "newGreetingIdx",
"type": "string"
},
{
"indexed": false,
"internalType": "string",
"name": "oldGreeting",
"type": "string"
},
{
"indexed": false,
"internalType": "string",
"name": "newGreeting",
"type": "string"
}
],
"name": "Modified",
"type": "event"
},
{
"inputs": [],
"name": "greeting",
"outputs": [
{
"internalType": "string",
"name": "",
"type": "string"
}
],
"stateMutability": "view",
"type": "function"
},
{
"inputs": [],
"name": "kill",
"outputs": [],
"stateMutability": "nonpayable",
"type": "function"
},
{
"inputs": [
{
"internalType": "string",
"name": "_greet",
"type": "string"
}
],
"name": "newGreeting",
"outputs": [],
"stateMutability": "nonpayable",
"type": "function"
}
],
"devdoc": {
"kind": "dev",
"methods": {},
"version": 1
},
"userdoc": {
"kind": "user",
"methods": {},
"version": 1
}
},
"settings": {
"compilationTarget": {
"src/main/solidity/HelloWorld.sol": "HelloWorld"
},
"evmVersion": "istanbul",
"libraries": {},
"metadata": {
"bytecodeHash": "ipfs"
},
"optimizer": {
"enabled": true,
"runs": 200
},
"remappings": []
},
"sources": {
"src/main/solidity/HelloWorld.sol": {
"keccak256": "0x786fd9cb6e787fce5c67fdbc3055d5ec70f6df4276053be6b7272316d6c18b4a",
"license": "Apache-2.0",
"urls": [
"bzz-raw://b1a5ddc13ff494eaa74500cc5c04600d4e68354b0ca7c3f7a1b9770694793c12",
"dweb:/ipfs/QmfMBjzeSaJZLjzD7oTpMVoKmJ7RxXL4GvKzPHDULvQH3f"
]
}
},
"version": 1
}
In order to link the metadata file to the binary produced by the compiler, the hash of the contract metadata file is encoded at the end of the binary.
This enables you to say with certainty if a contract metadata file matches the contract code that is deployed on the network.
The hash of the metadata file by default is based on the IPFS hash of the file, the intent being that the metadata file is stored in IPFS. In our above example, the IPFS hash is the below hexadecimal string (obtained using the Sourcify Playground):
{
"ipfs": "0x122035d0100e0feb96ff214b61ec1e850015aec0360a78c75ed21a9792401e390a34",
"solc": "0x000706"
}
If you search for the above IPFS hash you'll see it appears toward the end of the compiled bytecode below:
608060405234801561001057600080fd5b5060405161074538038061074583398181016040526020
81101561003357600080fd5b81019080805160405193929190846401000000008211156100535760
0080fd5b90830190602082018581111561006857600080fd5b825164010000000081118282018810
171561008257600080fd5b82525081516020918201929091019080838360005b838110156100af57
...
8255916020019190600101906104ee565b50610515929150610519565b5090565b5b808211156105
15576000815560010161051a56fe4f6e6c79206f776e65722063616e2063616c6c20746869732066
756e6374696f6e2ea264697066735822
> 122035d0100e0feb96ff214b61ec1e850015aec0360a78c75ed21a9792401e390a3464736f6c634 <--ipfs hash
30007060033
Full Versus Partial Contract Verification
This type of match is considered a full contract verification, where the embedded IPFS hash in the bytecode matches the IPFS hash of the produced metadata file.
There is another type of match, a partial verification, which is where the entirety of the contract bytecode matches the bytecode from the metadata file, except for the metadata hash at the end differing.
This partial match implies that the behaviour of the contract bytecode would be identical, however, the content of the source files used to create were not identical due to different comments, variables names, space characters or paths to source files.
Challenges With This Approach
Although this elegant solution exists to capture the compilation environment and source code used for a smart contract that has been deployed to Ethereum. Practical challenges remain.
- Developers need to ensure that contract metadata is generated and stored always before the deploy a contract
- Contract metadata doesn't always end up on IPFS as its slower then a traditional file store or object store and requires additional hoops for developers
- Users need to upload metadata files to blockchain explorers to verify them manually
- Blockchain explorers don't share verification data with one another
Sourcify was created to help address this challenge.
Sourcify For Contract Verification
Sourcify is a web service that allows users to verify the source code for smart contracts that have been deployed to an Ethereum network. It supports the Ethereum Mainnet and a number of Layer 2 networks (the full list is available here).
With Sourcfiy you provide a deployed smart contract address on a supported network along with the contract source code and metadata hash. Sourcify then takes this information and responds with a verification status — one of a full match, partial match or no match.
The provided files and status are stored by Sourcify and pinned to IPFS so that verification statuses against contract addresses can easily be retrieved again in the future.
Sourcify is a very useful service for anyone working with on-chain data as it enables them to take smart contracts bytecode and transactions and decode them with confidence for users.
One place where this is widely embraced with blockchain explorers. Where you have verified or partially verified smart contracts you're able to fully decode information about transactions taking place and events being emitted by them.
This allows a users view of a contract and activity taking place to go from this:
To this:
Sourcify is used by many of the leading blockchain explorers including Chainlens, and integrated with developers tools such as Hardhat helping ensure a higher proportion of contracts on Ethereum blockchains are verified.
However, Sourcify itself has some limitations such as:
- Verified contracts are associated with a single blockchain deployment, if you deploy the same contract on two different blockchains, the addresses will be different, requiring verification for each on Sourcify
- It only supports smart contracts written in Solidity
- It is not possible to query any information about a contract other than if it's been verified. For instance, you cannot query any of the data such as the compilation environment or source code hashes which are contained in the metadata file.
In order to provide a collaborative solution to some of these challenges, the Verifier Alliance was recently formed.
The Verifier Alliance
The Verifier Alliance is an ecosystem collective aiming for easy, unified, and open access to the source-code of EVM smart contracts.
Its members include Sourcify and a number of blockchain explorer teams, data providers and ecosystems.
The Verifier Alliance intends to build on the work being done by Sourcify and blockchain explorers and provide a common approach to contract verification that can be contributed to and embraced by any teams producing or consuming smart contracts.
This is being initially approached by creating a shared database schema that can be used for storing information about smart contracts deployed to networks.
This schema provides:
- A registry of all deployments of verified smart contracts published across different blockchain networks
- Information from the associated metadata file such as the compiler used, source code hashes and ABI
- Mappings from code hashes to bytecode facilitating easy search for deployments of specific contracts across different networks
- Support for Solidity and other languages including Vyper and Yul
In creating this richer dataset and having a shared repository in place, the intent is that consumers of blockchain data are much less likely to run into contracts without verified source code in place.
The number of teams getting behind this initiative is very promising and if there can be a single source of truth for verification data that can operate based on contract bytecode instead of requiring full verification via Sourcify for every instance this would be a useful step forward.
However, this idea of a widely shared database (in its initial planned form) is not a web3 native approach to solve this problem. No doubt the Verifier Alliance is aware of this and will take steps to provide a decentralised solution once available, but in the meantime, it's got off to a promising start with a lot of industry support.
As far as the Chainlens team is concerned we will ensure that our verification service continues to work with Sourcify and will integrate with the Verifier Alliance repository shortly.
Users are unlikely to notice a significant difference given that Chainlens already utilises Sourcify's verification service. But behind the scenes, the Verifier alliance should result in an ever-increasing number of verified contracts helping users gain a better understanding of activity taking place on-chain.