Bits, Bytes, shifting and masking in Assembly (Yul)
Performing permutations on bytes strings in assembly
Bit
A bit is the smallest unit of data in a computer and can be represented in only two patterns 1 0r 0, which can represent on/off , yes/no , true/false.
But a bit is too small to represent any meaningful data.
Byte
A byte is a collection of 8 bits.
A byte can represent 256 possible patterns, for example let’s look at how many patterns 2 bits and 3 bits can represent.
The base 10 values of the Binary also included
2 bits can represent up to 4 different patterns and 3 bits can represent up to 8 patterns, so the patterns the number of bits can represent is \(2^n\), n being the number of bits. 2 Bits = \(2^2\) , 3 Bits = \(2^3\) and 8 Bits = \(2^8\) .
An ethereum address is 20 bytes long, which means it can store 2^(20*8) possible patterns.
An ethereum contract stores data in the smart contract as 32 bytes hexadecimal data. What does that mean? It means that the the data is a collection of 32 bytes, which means 32 * 8 = 256 bits. The hexadecimal simply means the binary bytes representation are converted from base 2 to base 16.
Storage in EVM
Ex.
convert 13901371 in base 10 to a 32 bytes hexadecimal representation.
Convert 13901371 to base 2 = 110101000001111000111011
The base 2 representation above is 24 bits long, meaning it can fit into 3 bytes just fine.
Convert the base 2 to base 16 = d41e3b
Let’s see how this is represented in the EVM.
contract test{
uint val = 13901371;
function getVal() public view returns(bytes32 _val)
{
uint _slot;
assembly{
_slot := val.slot
_val :=sload(_slot)
}
}
}
0x0000000000000000000000000000000000000000000000000000000000d41e3b
The EVM pads numbers to the right to complete 32 bytes.
Ex 2.
convert uint256
max value from base 10 to base 16.
Base 10 = 115792089237316195423570985008687907853269984665640564039457584007913129639935
Binary = 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
Base 16 = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
The length of the binary is 256 long, which means 32 bytes.
contract test{
uint public val = type(uint256).max;
function getVal() public view returns(bytes32 _val)
{
assembly{
_val := sload(val.slot)
}
}
}
0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
It’s important to note that everything stored in a bytes32 is represented in binary, as this will come in handy in the next section.
Packed Data
As much as a single value can be stored in one slot, multiple values can be packed into a single storage slot,
contract test{
uint16 public home;
uint24 public apartments;
uint104 public beach;
uint104 public house;
uint8 public skycrapper;
}
The above is just one slot. 16 + 24 + 104 + 104 + 8 = 256 bits, which makes up 32 bytes.
Representation in a single bytes32
contract test{
uint16 public home = 11;
uint24 public apartments = 291;
uint104 public beach = 171;
uint104 public house = 890;
uint8 public skyscrapper = 39;
function getSlots() public pure returns(uint _home, uint _apartments, uint _beach, uint _house, uint _skyscrapper)
{
assembly{
_home := home.slot
_apartments := apartments.slot
_beach := beach.slot
_house := house.slot
_skyscrapper := skyscrapper.slot
}
}
function getValues() public view returns(bytes32 values){
assembly{
values := sload(home.slot)
}
}
}
All the return values from getSlots
return 0
, which means the values are all packed in one slot, so in getValues()
calling the slot of any of the variables will return same.
Let’s break down the values from base 10 numbers to their respective base16 and their representation in a single bytes32 string, which is in a single slot.
home:
base10 = 11 base16 = b
apartments:
base10 = 291 base16 = 123
beach:
base10 = 171 base16 = ab
house:
base10 = 890 base16 = 37a
skyScrapper:
base10 = 39 base16 = 27
Note: values are packed from bottom to up, meaning the last value, in this context apartments
is the first byte represented in the bytes32 string.
The function getValues()
returns :
0x270000000000000000000000037a000000000000000000000000ab000123000b
From the returned bytes32 we can clearly see the values from each of the base16 numbers represented in the string, starting from skyscrapper.
Remember earlier mentioned the EVM takes a bottom up approach in packing the values. Since the values all packed are integers, the padding starts from the right, padding here means the leading zeros.
Reading from packed Data
To recover values from a bytes32 string in yul, mask and shifting operations are used. Both method are used at different parts of the string, you use shift for bytes after the wanted number, what you will call a postfix, for example suppose a byte string 0xxxx444xxxx,
the bytes after the wanted values of 44 are shifted.
Solve:
Shift
0x00004440000 >> 4
= 0x00000000444
Mask
0x00000000fff & 0x00000000444
= 0x444
uint16 public home = 11;
uint24 public apartments = 291;
uint104 public beach = 171;
uint104 public house = 890;
uint8 public skyscrapper = 39;
function getValues() public view returns(bytes32 _value, uint _slot, uint _offset)
{
assembly{
_slot := apartments.slot
_offset := apartments.offset
_value := sload(_slot)
}
}
bytes32: _value 0x270000000000000000000000037a000000000000000000000000ab000123000b
uint256: _slot 0
uint256: _offset 2
Suppose now we want to fetch the value of apartments
from the byte32 string, we would have to perform a few permutations using shifts and masks.
Right Shift to unset values after wanted Variable.
The bitwise operator we will be using here is right shift.
Ex.
1111111 >> 2 = 0011111
To get apartments, we would have to right shift the bytes32 string to clear everything after apartments, right shift simply means removing the values on the right and padding the left to compensate for the loss in bytes on the right. And padding here is always with zeros. So we need to right shift the bits that represent home
so that apartment would be the rightmost value. To right shift we perform the operation with the assembly function shr(), and shifts in assembly works with bits, to get the value to be shifted we have to call the assembly property offset, offset fetches the offset of the variable apartment from the right, in bytes. Which in this case is 2, so 2 bytes * 8 bits = 16 bits
.
pragma solidity ^0.8.19;
contract test{
uint16 public home = 11;
uint24 public apartments = 291;
uint104 public beach = 171;
uint104 public house = 890;
uint8 public skyscrapper = 39;
function getApartment() public view returns(bytes32 _shift)
{
assembly{
let _value := sload(apartments.slot)
_shift := shr(mul(apartments.offset,8), _value)
}
}
}
0x0000270000000000000000000000037a000000000000000000000000ab000123
Now that apartment is clearly the right most value we need to mask the byte32 string to unset the values before apartments.
Mask To get wanted Variable.
The bitwise operator we would be using here is & (and).
Ex.
100111 & 111111 = 100111
Mask
0x00000000fff & 0xb0000000444
= 0x444
Remember zeros before an integer is just padding.
Solve.
0x0000270000000000000000000000037a000000000000000000000000ab000123 & 0x0000000000000000000000000000000000000000000000000000000000ffffff
F represents the value we want to keep, and 0 discard, purely binary operations as described initially.
We use the assembly function and to mask the shifted value, now we have the apartment value as required.
pragma solidity ^0.8.19;
contract test{
uint16 public home = 11;
uint24 public apartments = 291;
uint104 public beach = 171;
uint104 public house = 890;
uint8 public skyscrapper = 39;
function getApartment() public view returns(uint _apartments)
{
assembly{
let _value := sload(apartments.slot)
let _shift := shr(mul(apartments.offset,8), _value)
_apartments := and(_shift, 0x0000000000000000000000000000000000000000000000000000000000ffffff)
}
}
}
291
Writing to packed Data
To write values to a packed slot is pretty similar using the same strategies as reading.
To change apartment
from 291
to 25
, we need to mask out the bytes that apartment is assigned to.
The entire packed bytes32 storage slot returns the below
0x270000000000000000000000037a000000000000000000000000ab000123000b
and apartments as we know from the previous examples occupies the space below demarcated with brackets for clarity.
0x270000000000000000000000037a000000000000000000000000ab[000123]000b
Also from previous examples we know we can mask a bytes string and here we want to do just that, we want to reset the apartment bytes space to zero.
Note:
0
means discard this byte andf
means keep this byte, so we use0
where we want to reset apartments
0x270000000000000000000000037a000000000000000000000000ab000123000b& 0xffffffffffffffffffffffffffffffffffffffffffffffffffffff000000ffff
function setSingleValue() public view returns(bytes32 reformed) {
assembly{
let slot := apartments.slot
let value := sload(slot)
reformed := and(0xffffffffffffffffffffffffffffffffffffffffffffffffffffff000000ffff, value)
}
}
0x270000000000000000000000037a000000000000000000000000ab000000000b
So the space assigned to apartments has been reset, now it’s time to input the new value.
function setSingleValue(uint24 newVal) public view returns(bytes32 reformedVal) {
assembly{
let slot := apartments.slot
let value := sload(slot)
let reformed := and(0xffffffffffffffffffffffffffffffffffffffffffffffffffffff000000ffff, value)
reformedVal := shl(mul(apartments.offset, 8), newVal)
}
}
newVal
would be 25, note that the assembly code converts newVal to
0x0000000000000000000000000000000000000000000000000000000000000019
automatically, 19 is the hexadecimal representation of 25.
back to the setSingleValue
function, reformedVal
results to
0x0000000000000000000000000000000000000000000000000000000000190000
this is after shifting the bits to the left by 8 * 2
. shl()
performs the same operation as shr()
, only this time it shifts the opposite direction. Left.
Combine and store
The bitwise operator we would be using here is | (or).
Ex.
100111 | 111111 = 111111
So now we have to combine the two variables reformed
and reformedVal
to get the new slot bytes32 value.
function getApartment() public view returns(uint _apartments)
{
assembly{
let _value := sload(apartments.slot)
let _shift := shr(mul(apartments.offset,8), _value)
_apartments := and(_shift, 0x0000000000000000000000000000000000000000000000000000000000ffffff)
}
}
function setSingleValue(uint24 newVal) public {
assembly{
let slot := apartments.slot
let value := sload(slot)
let reformed := and(0xffffffffffffffffffffffffffffffffffffffffffffffffffffff000000ffff, value)
let reformedVal := shl(mul(apartments.offset, 8), newVal)
let result := or(reformedVal, reformed)
sstore(slot, result)
}
}
The result of the combined hex should give us
0x270000000000000000000000037a000000000000000000000000ab000019000b
sstore()
stores the new value in it the slot input. it collects two parameters, the slot and the data to store.
We see that when we call getApartment
, it now returns the new value of apartments
which is 25 (or what ever is passed to the setter)!
Resources
Bytes, numbers, and characters