Blockchain Study Notes for developers
Sharing our notes from a Blockchain study sessions with our new developers. We run these induction sessions when we have new coder joining our team, covering key tech topics like AI, Blockchain, Cloud, big Data, etc. They spent a week researching into the topic with some starting questions, then meet up to discuss the topic for an hour or so, and then a week or so to write up the topic. We will update and edit these notes every now and then, and we welcome feedback and comments. Thanks.
About Us: OnDemandWorld team, we are working on a new blockchain and AI app solutions. Niftnack NFT and TechHub Jobs. MVP already on App Store and Google Play. We are building the first scalable version now. Looking for new junior devs, blockchain devs, full-stack devs, etc. Contact us for more information. Remote working mostly in 2022 unless you are in Shanghai already.
(First week, aim at around 13 hours or so in total, to research into the topic. Try to answer the following questions. Jot down notes for the meeting. 5–10 Pages.)
Tech Topic 1: Blockchain
- What is blockchain? Why is it special?
- What is Bitcoin? And why is it important?
- What’s Ethereum? What’s ERC20? How about defi and NFT?
- What’s common hashing and cypher? How does it work? Why is hashing often used in programming? How about collisions?
- What’s PoS? Why is it needed? What are the alternatives?
- Blockchain is unhackable? Any security concern?
- What’s public key cryptography?
- How does blockchain use these?
- dApp — what is dApp and do we need it? How about defi? Uniswap.
(And here are our notes and summary to the questions above, covered in about an hour or so of discussion.)
Blockchain, by definition in the Satoshi (the person who invented blockchain) paper, is a solution to the double-spending problem using a peer-to-peer network. You could think of a blockchain as a single linked list of blocks, and each child block contains the hash information of its parent block. You “can’t” change the previous committed block as they are distributed to over 15k nodes. (The Satoshi paper: https://bitcoin.org/bitcoin.pdf).
From a technological standpoint, blockchain is actually not that special, since cryptography, hashing, etc. are not new. Those theories have been invented for years. However, the Satoshi paper is not only about technology, but also about economics and how this peer-to-peer network could be put in practical usage. This makes blockchain a very special technology trend because it is probably the first time ever that people use a decentralized system to create such a large economical impact (about 1–2 trillion dollars). Although blockchain is most well-known for its application in cryptocurrency, it could be used in many fields, not just limited to cryptocurrency.
To better understand blockchain, we need to first examine some technical foundations that it is build upon. As you might already know, blockchain is basically a linked list of blocks. And what’s special about it is how the data inside each block is store, and how the each block’s data interact with each other. To understand that, we have to have some general idea of what hashing and encryption are.
Hashing is method that uses a special function called the Hash function to map a value with a particular key for faster access of elements. Also, it is an irreversible conversion of an input source. For example, hashing is use when creating a hash table: you have a list of integers [11, 12, 13, 14, 15], the hash function is H(x) = [x % 10]. Here [x % 10] represents which position x will be placed in the hash table (You can think of the hash table as an array for simplicity). After applying the hash function to all integers in the input list, we are able to find a unique position for every single element in the input list. An simple example of how hash is used in hash table data structure is here: https://www.geeksforgeeks.org/hashing-data-structure/
The output of a hash function is called a hash. In the previous example, the hash is the position in the hash table. One of the functionalities of hash functions could produce a hash/output of a certain length, for example, 256 bits. Therefore, when you hash a video file, the hash is 256 bits, when you hash a .txt file, the hash is 256 bits. Additionally, since we are using a hash method names SHA-256 nowadays, it is safe to say that, the input and the hash is a 1–1 mapping, meaning an input corresponds to only one hash and vice versa. This is why we could find a unique position in the hash table for every element in the input array. Also, if you compare the hash of password “abcdef” and “abcde”, you will see that the hash is different, since the mapping is 1–1. (For practical reasons, most can safely assume that SHA-256 will produce a unique hash, there is no known collision of SHA-256 at the moment, but let’s pretend we have a crystal ball. If there are over 2²⁵⁶ unique inputs, of course there can be collisions, but that’s a very very large number of inputs.)
Why is hashing important to blockchain? Hashing maps arbitrarily huge data into a fixed length data. It is easier and faster to do computations and operations in fixed sized data. Further more, blockchain uses hashing to ensure immutability. Why? In blockchain, every block has the hash of its parent block, as you can see in the graph below. So if there is any mutation in the parent block’s data, the hash of the parent block would be changed and different from the hash stored in the current block. Then you can quickly identify if something is invalid by doing a comparison on the hashes. Therefore, hashing ensure’s the security and immutability of the blockchain. In this article, the “light bulbs” example illustrates how hash and bits works: https://medium.com/@ConsenSys/blockchain-underpinnings-hashing-7f4746cbd66b
To further explain how is hashing used in blockchain, blockchain uses a hash tree call Merkle tree in each block, which stores the hash outputs of raw data. Each leaf node consists of a hash of its original data, and every parent node is a hash of the combination of its child node hashes. The root node is the combination of all hashed in the left and right sub-trees. Because of this, if any data, say, a specific transaction in the block is changed, the Merkle root would be completely different. Therefore, one could quickly identify if any data in a block has been tamper. A more detailed introduction is here: https://en.bitcoinwiki.org/wiki/Merkle_tree
Public Key Encryption
After hashing, the raw data is not there anymore, since hashing process is irreversible. However, for encryption, we can use a public key to encrypt a message, and then use a private key to decrypt the message.
In public key encryption, also known as asymmetric encryption, public key is public to everyone and private key is only known to the user. For example, A wants to send a message to B. A can encrypt a message using the intended B’s public key, and B can use its private key to decrypt the message and then able to read it. Note that the private key is only known to B, so B is the only who can decrypt the message. (Might also be interesting to learn a little about Symmetric-key algorithm, where you have one key to encrypt and decrypt a file. It has long been common to use both symmetric-key encryption with public key encryption as it is more efficient to just use public key encryption for the symmetric keys. The idea of stream and block cipher has been around for a long time. Just that in blockchain, block refers to a block of transactions, could have been a streamchain, but doesn’t sound as cool somehow.)
Here is a simple example of the public key encryption process: https://en.wikibooks.org/wiki/Cryptography/A_Basic_Public_Key_Example. It shows you how the public key and the private key is generated and how the encryption and decryption works. One of the main theoretical foundations of it is Fermat’s Little Theorem, which was proved in the mid-17th century. Fermat’s Little Theorem states: If p is a prime and a is any integer not divisible by p, then a p − 1 − 1 is divisible by p. The detailed proof of the RSA encryption could be found here: https://www.cse.cuhk.edu.hk/~taoyf/course/bmeg3120/notes/rsa-proof.pdf
You could use a public key to encrypt a message and only the person who knows the private key can decrypt and read the message and the reverse is true. This means that you could use a private key to encrypt a message and then use the public key to decrypt it. Both ways “encrypt to different hashes but each key can decrypt the other’s encryption.” (A more detailed explanation is on Stack overflow: https://stackoverflow.com/questions/18257185/how-does-a-public-key-verify-a-signature)
This could be explained by Elliptic curve cryptography and it makes the following conditions possible:
- A public key can be mathematically generated from a private key
- A private key cannot be mathematically generated from a public key
- A private key can be verified by a public key
You could either spend hours to try understand how it works or you could, as I would recommend, “accept the properties above — just like you accept Newton’s 3 laws of motion without needing to derive them yourself.” (Stack overflow)
Since statement 3 is true, by using a private key, you can digitally “sign” the data so anyone with the public key could verify the data is created by the owner of the private key and they haven’t been modified.
How does blockchain use public key encryption? When you create a wallet, blockchain generates a pair of public and private keys. The address of the wallet/person is generated using the public key. The public key is visible globally and anyone could see it. Only the owner of the wallet would know the private key and it is used to access the data in that address and authorize any of the actions for the address, which are generally transactions. (One way to create a new wallet is just to enter some random text yourself as the private key, the address it can unlock is wallet address.)
Is Blockchain Unhackable?
Given that blockchain utilized technologies like hashing and public key encryption, it seems like blockchain is very secure and there is no way some one could possibility hack into it. However, it appears that there are still ways that blockchain is hackable.
The most well-known method is called “51% attack”. It means that if hackers gain a majority over the hash power available on a given blockchain system, they could rewrite transactions and create double spendings so that the tokens could be used after they are “spent”. However, this is hard to achieve because it is difficult to gain the majority hash power over all the computers that are using blockchain.
It is also important to note that 51% is not the same “hard” as collusion of 256-bit hash, because it has been done before. For example, in May 2018, more than $18 million were stolen through double spending in a 51% attack to Bitcoin Gold. The Ethereum Classic, a 51% attack was created to perform double spending, and the trading platform Coinbase halted all transactions. In May 2019, Bitcoin Cash experience 51% attack, and the hackers used the tasing power in both BTC.com and BTC.top to create the attack. Since every network is different, the computing power hackers need to create 51% attack is different. More detailed data on the theoretical cost could be found on this website: https://www.crypto51.app/
Another way to hack blockchain is to utilize bugs, in the blockchain protocol code or in other parts like smart contracts. In 2010, hackers found a bug in the blockchain protocol code that if the outcome is too large, and it causes an overflow in summing up. Hackers ended up using this bug to generated 184.467 billion coins. In 2016, hackers found a bug in smart contract that it allows a user to request money from the DAO account and not record it on the ledger. $60 million was stolen from the Ethereum network. After this happened, Ethereum hard forked to reset the system. So, the Ethereum you see now is actually the result of that hard fork. However, some people still use the old chain, and that’s called Ethereum Classic.
We have discussed the technical foundations of blockchain as well as some of its security concerns. Let’s take a look at blockchain’s practical application on cryptocurrency including Bitcoin and Ethereum.
Generations of Blockchains
Gen1: Bitcoin — Cryptographic Decentralised Storage of Digital Asset
The first generation of blockchain technology is the famous bitcoin. So, how does bitcoin work exactly? How does it use the blockchain technologies discussed above? In this part, we will illustrate how bitcoin works with an example.
Imagine there are four people: Alice, Bob, Catherine, and David, who exchange money frequently. And whenever they make a transfer at the bank, the bank charges them some service fee.
The idea comes naturally: is there a way to step over the bank? Why don’t they create their own version of banks that interact with only four of them? If they can interact with each other directly, no service fees will be needed during any transactions.
This is the key idea of bitcoin and blockchains technology: building a system without centralized management (decentralization).
To have a better understanding of how this system works, let’s think about the record of the transactions between the four people.
1. Alice pays Bob 10 dollars.
2. Bob pays Catherine 50 dollars.
3. Catherine pays David 100 dollars.
Imagine a document that has these three lines stored in it. And several problems need to be addressed.
1. Whose in charge of keeping this document? Namely, whose in charge of this system? Is it one of the four people? But as long as there is a manager, it is not a decentralized system. It’s only changing the manager of the original system.
2. If the system is runned autonomously — the document is put somewhere on a sacred desk and whenever someone makes a translation, they just adds a line to the document without the supervision of anyone, how can people using this system actually trust the algorithm and trust that the system is working correctly? How can we prohibit, say, Alice, adds a line that says “Bob pays Alice 10000 dollars?”
To address these two problems, we apply blockchain technologies discussed in previous parts of this article.
1. Public key cryptography
To make sure that it is impossible for Alice to add lines that say “Bob pays Alice 10000 dollars”, we give each user of the system two keys, one public and one private. The public key is visible to all users of the system, and people use one’s public key to make a transaction to them. The private key is kept a secret from the public, and is also called a secret key.
Whenever a line that says “Bod pays Alice 10000 dollars,” Bod needs to sign this transaction. And his signature is produced by a function call:
Signature = sign(message, Bob’s secret key).
And then the system will verify the validity of the transaction by calling another function that returns either true or false:
Verify(message, Signature, Bob’s public key).
Through this process, it is impossible for Alice to produce Bob’s signature because she doesn’t have access to Bob’s secret key. Additionally, all numbers are designed to be large so that it is impossible for someone to guess another user’s secret key.
The documents that keep track of all the transactions are the blocks in a blockchain. To ensure that no documents are added in and that no content of the document is changed without the agreement of all, we give each document a unique identifier, which is produced a special function that returns a same id for the same content, and, similarly, different ids for different contents. Each document, or block stores the id of itself and the id of the previous document, or block. The alternation of one document will result in the change of its own id, and since latter blocks store the id of previous blocks, a chain of ids need to be changed, or else all previous blocks will be lost due to the fact that latter blocks no longer point to them using previous block hash.
For example, if Bob changes the line that says “Alice pays Bob 10 dollar” to “Alice pays Bob 100 dollar,” even though the content of the document is changed only slightly, it will result in a completely different hash, and will need to update all blocks that come after this, which would take a extremely long time and would be impossible to do with the power of nowadays computers.
3. Proof of work
If changing the contents of a document is impossible only for the reason that there are too many blocks that follow the one that’s being altered, would changing the content of a more recent document be possible? To address this problem, bitcoin introduces the idea of proof of work. In order to prove the validity of a document, there is a special number that needs to be added at the end of the document for this block to go through the function SHA-256 and have a special id that starts with 30 zeros.
Producing a hash that starts with 30 zeros is of extremely small probability, 1/(2³⁰), and there’s no good way of finding the proof of work other than merely guessing and checking. In order to incentivise users to find the special number and keep the blockchains running, the bitcoin system awards people who find the proof of work a small amount of bitcoins. The process of finding the special number by constant guessing and checking, and then being awarded some money for his/her hard work is similar to mining, and the people who help to prove the validity of every document are called miners.
Finding the proof of work, also the process of mining, ensures that proper functioning of the blockchains through the characteristic that blockchains will keep the blocks along the longest chains. For example, Catherine wants to change the document to include more transactions that award her with money, so she creates her own document with the proof of work that she finds and adds on to the existing chain. Her goal is to keep adding on to the branch of blockchain that she created and hopes one day, people will trust the information on this chain. But while she’s adding on to the chain with her own secret document that other users don’t know about, Alice, Bob, and David are also guessing the proof of work with their computers and keep adding on to the correct chain. Now, the blockchain will find that two blocks are referencing the same block as their parent, and whenever this scenario occurs, the blockchain will keep the two records separately and keep adding on to the separate branches and see which one will have the longer chain, and that is the chain that it will record. Through this feature, in order for Catherine to maintain her own branch, she would need to compete with all other users combined to find proof of work fast to keep her branch winning in length.
In this example where there are only four people, it might be possible for Catherine to keep up a few blocks if she gets lucky, but in the real world, the bitcoin blockchain has vastly more users, and thus, not possible for individuals to keep his own branch running.
Gen2: Ethereum — Smartcontracts, beyond Digital Asset storage, enabling transaction/application logic.
Mission of Ethereum: build a “world computer”. They charge you on the number of operations you use.
Ethereum is a platform built on this inspiration. It is a DIY platform for users to create their own decentralized applications (Dapps). It is worth noting here that Ethereum is not a currency, but a platform. Ether is the currency used to incentivise the network.
Ethereum is written in solidity, and this second generation of blockchains is different from the first one because it is able to write smart contracts.
Smart contracts are able to simulate real life situations based on a set of well crafted logical clauses, which aliens with how people interact and work together: if something happens, do something. For example, a smart contract for renting house might be written as something like this:
1. If the renter pays rent, then, he can open the door of the house.
2. If the tenancy is up, then, the renter can no longer open the door of the house.
3. If the air conditioner is no longer functioning, then, the landlord should pay for the repair cast.
In theory, we would be able to simulate every real life situation using Ethereum, if one can write a thorough enough smart contract without bugs. We are getting to something big here: if this becomes a reality, we will be able to get rid of all middlemen and centralized control. However, a smart contract that can handle the complexity of real life situations would be hard to formulate. Take the simple smart contract we wrote above as an example: What if the tenancy is up but the renter needs a few days to move everything out of the house? What if the renter is the one who broke the air conditioner, should the landlord still pay? What if the renter refused to admit that he broke the air conditioner? How can the system tell whether the user is lying?
The what-ifs listed above are still simple situations, but from these, we can have a glimpse at the complexity that would be hard to entangle without the interference of some third party control. And to make matters worse, smart contracts written on Ethereum are unchangeable once they are applied, which means that it is possible for some people to find the loopholes in the smart contract and take advantage of it, and all other users, including the initial composer of the smart contract, can do nothing except watch it happen since no one has the ability to change even a word on the smart contract.
An existing case of someone finding a loophole in some smart contract is known as DAO event, and the occurrence of this event causes the ethereum platform to banish its original idea that code is law.
Gen3: Into the Unknown
The idea of blockchain is truly fascinating, but this doesn’t cover up the fact that there are a lot of existing problems in blockchain that need to be addressed before blockchain technologies can move on to the next level.
One problem that’s gradually drawing attention is the ineffectiveness of the proof of work. Proof of work is to ensure the security of the blockchain, but it requires miners to be constantly guessing and checking numbers, and the numbers do nothing except making the hash code of blocks special. In other words, people are wasting electricity on finding numbers that are themselves meaningless. The amount of electricity spent on mining is astonishing. According to digiconomist, bitcoin miners use about 54TWh of electricity, which is sufficient to power millions of households and even sufficient to power a small country. Electricity is not the only drawback of this approach. Because of the bonus system that rewards the miners who validate a block, miners that are richer and have better equipment are more likely to get the reward, and further, be the validator of more blocks, which could potentially lead to the rich controlling the blockchain system.
One of the solutions to this problem was brought up by a bitcointalk forum user named QuantumMechinic, in 2011. He called the solution proof of stake, which chooses the validator of the next block among all users, giving users that have higher stake in the network a higher chance of being elected. And the validator will lose part of his stake if he approves a fraudulent transaction. This approach incentives the rich to not spend money on building better computers that guess numbers fast but to put more money into the network and keep it running. However, proof of work also does not fully resolve the problem. It resolves the problem of wasting electricity, but in the system, rich people will still be potentially controlling the system, and lead to centralization.
So, what will be the next step of blockchain technology? What will the third generation of blockchain look like? What ideas will it be built on, and what missions will it be able to accomplish? No one knows at this stage. What we do know is that we are witnessing the formation of something big. A glimpse into a future.
To play around with Ethereum blockchain on your own computer, download this: https://trufflesuite.com/ganache/
Listed in recommended viewing order
- Blockchains introduction https://www.youtube.com/watch?v=SSo_EIwHSd4&t=115s
- How Bitcoins work https://www.youtube.com/watch?v=bBC-nXj3Ng4
- Ethereum Introduction https://www.youtube.com/watch?v=jxLkbJozKbY
- Smart Contract https://www.youtube.com/watch?v=ZE2HxTmxfrI
- Ethereum wallet https://www.youtube.com/watch?v=qLZ1IoezucE
- ERC20 https://www.youtube.com/watch?v=cqZhNzZoMh8
- Proof of stake (VS proof of work) https://www.youtube.com/watch?v=M3EFi_POhps&t=234s
- Full MIT Course on Blockchain, by the now SEC Chair Gary Gensler. https://www.youtube.com/playlist?list=PLUl4u3cNGP63UUkfL0onkxF6MYgVa04Fn
I hope you find this article useful to get your head around the Blockchain tech buzz. I won’t be writing a conclusion in this article but will leave you with a few questions to think about. (Might share that in another article eventually.)
What did blockchain replace? How did we establish trust before the digital age? How did we do such proof of ownership and timestamp with analogue technologies? And how did we price it?
A little about us: OnDemandWorld Team, currently building a blockchain & AI based recruitment platform with tokenomics called TechHub Jobs, early prototype already on App Store and Google Play. We will be releasing these on Github soon. Stay tuned.