Slicing the Onion: An introduction to the Onion Routing

When you first searched for information about terms like “Tor network”, “deep web” or “dark web” you probably ended up learning two main key characteristics about this world: 1) The Tor network is the most well-known mechanism for gaining anonymity on the Internet and 2) there are plenty of articles and sites explaining the huge number of hidden and non-indexed websites and services in this, at first, unknown environment called the “deep web”.

In this article, we will make a superficial technical dive into the protocol that defines what we know today as the Tor network (a.k.a.“The Onion Router”), which is key to understand the security features offered to you as a user, as well as the vulnerabilities to which you may be potentially exposed. Please make sure to have a basic understanding of cryptography and networking before continuing with this article.

There are several terms I would like to define and clarify to avoid misunderstandings with other articles that you may read on the Internet:

The deep web and the Tor network are not the same thing. The deep web is a term used to refer to the collection of websites not indexed by common search engines (like Google or Bing), while the Tor network is an actual infrastructure used to route traffic to either the deep web, dark web, or any conventional website.
it is important to note that the “deep web” and “dark web” have only one difference. From a technical perspective, the “dark web” refers to a subset of websites and services from the “deep web” that cannot be accessed unless you know their specific location and you use specialised software like the Tor Browser, also known as hidden services. This will be explained later in this article.
The deep web is not made up of several layers (some of them usually called “Charter Web” or “Mariana’s Web”), as many videos or articles suggest. There may be resources easier to find and access than others; some of them require an invitation, specific software, additional encryption mechanisms…or a combination of all of these. But the basics for accessing all those resources are the same: taking advantage of the Onion Routing.

What is the Tor network?

Before heading into a deeper explanation of the Onion Router, we need to briefly explain the Tor network. If we were to define it with one sentence, we could say that it is a decentralised network that provides anonymity through a list of volunteer nodes.

To explain this further, it is important to understand that any computer can be part of the Tor network. You, as a regular user, can give a small portion of your bandwidth to become a member of the network, meaning that data will be transmitted through your computer as if it was an additional node. This leads to the next property; since we now know the network is made up of volunteer nodes all around the world, it makes the architecture decentralised, where there is no particular central node used to rule or control the rest of the elements inside the network.

Last but not least, before heading into the main topic from this article, we need to talk about some additional key elements:

As with every network, the data jumps from one point to another. Each of these points are called “nodes”. There are three types of nodes: entry nodes, middle nodes and exit nodes.
As with every decentralised architecture, there must be a “source of truth” that contains vital information from the network. In this case, these are called “Directory Authorities”, and they will be responsible for distributing the public keys from each node. As you may have already determined, these keys are really important for the Onion Routing.

Tor Network’s secret: The Onion Router

Now that we have everything in place, let’s dive into the Onion Routing, the protocol used to transmit data over the Tor Network. As we will dedicate a specific section for hidden services, let’s explain this protocol as if we were to access a conventional website, like Hakin9’s magazine, using the Tor Browser.

What is the difference between using the Tor Browser and a regular browser when making a request to https://hakin9.org? Well, we are basically adding one extra step for the data transmission:

Data will be transmitted through the Tor network.
Data will go out of the Tor network and will be transmitted regularly through the public network.
Data will reach the destination and take the same path back to the origin.

We will focus now on step 1, which is what we came here for. Whenever data is transmitted through the Tor network, a circuit is created. A circuit is just a collection of nodes that will be used to create a transmission channel. The list of available nodes is downloaded from a Directory Authority. There must be a least three nodes inside a circuit: the entry node, the middle node and the exit node. More nodes can be used as middle nodes, but that would depend on the Tor Browser’s configuration and the state of the network. The number of nodes to be added into a circuit directly impacts both the security and performance from the transmission channel. If you have ever used the Tor Browser to access a conventional website, you probably noticed that it was much slower than a regular browser, and that is mainly because of this circuit.

After the nodes have been selected, this is when the real Onion Routing protocol starts. The main objective from the Onion Routing is to maintain the message as secure and anonymised as possible. In order to do that, the protocol follows the steps below:

The public keys from each of these nodes are retrieved. These are normally RSA keys or Ed25519 keys, with this last being an elliptical curve digital signature algorithm. These public keys are used to authenticate the nodes of the network, as well as signing messages sent from these nodes. Asymmetric cryptography is the base for the network’s trust and health.
One common mistake when talking about the circuit establishment is thinking that the previously fetched public keys are used to create a session key (or shared secret) between the origin and each node. This is not correct. A shared secret is created for each node using the Diffie-Hellman Key Exchange algorithm.
The message is now encrypted with each shared key, using the AES (Advanced Encryption Standard) algorithm. The order for applying the encryption is crucial here. The shared key from the last node is used first, the result is then ciphered with the shared key from the penultimate node, and the ciphering process continues until it finishes using the first node’s shared key.

Let’s make a stop here. At this moment, we have our original data wrapped in several layers of encryption. The fact that all shared keys were used to encrypt the data, means that each node will be responsible for removing their corresponding encryption layer when the message goes through it. The existence of these layers is what made the creators of the algorithm compare the data being transmitted with an onion, that has several layers that need to be peeled. This is where the name of this algorithm comes from: “The Onion Router”. Let’s continue:

The data gets transmitted into the first node. The first node, or entry node, takes the shared key and decrypts the first layer of the packet. After decrypting it, the node will not have visibility over the original data, as there are still more encryption layers, but it will have information from the next node where to send the data. After reviewing this information, the data is sent into the next node, which will be the first middle node of the circuit. The process continues until there are no more middle nodes.
Finally, the exit node is reached. In here, the last layer of encryption is removed with its shared key, revealing the original packet crafted in the origin. After reviewing the packet’s information, the exit node routes the data through the conventional network into the destination.

It’s been a long journey, but we still have one more thing to do: receive the response. After the server sends back the response, the path used to initially send the request is re-used in an inverse way to route it back to the origin.

After the response finishes going through the conventional network, it reaches the exit node. Since all nodes keep track of the previous node that has been used to send the original message, the exit node in this case would know the penultimate’s node address, but nothing else. Before sending the data, the exit node will use his shared key to encrypt the message. This process continues until reaching the origin, with each node adding a layer of encryption with their shared key.
When the response finally arrives to its destination (the host that created the request), all the shared keys will have to be used to decrypt its content, once again taking advantage of the onion concept. In this case, the first layer will be removed using the entry node’s shared key, then the first middle node’s shared key…. until the last layer created by the exit node is decrypted, finally revealing the original response from the server.

As you can see, although there are lot of steps involved here all the cryptographic implications during message transmission through the Tor network is more clear now. Please be aware that the Tor Network itself has a lot of cryptographic operations going on between nodes, bridges and services each second to maintain the network’s health, stability and trust, but these are out of scope of this article.

Accessing hidden services

Hidden services, also known as onion services, are those that can only be accessed via the Tor network. Any traffic attempting to reach these hidden services will never reach the conventional network, all its traffic will be routed through Tor, making full use of its security and anonymity features.

They are identified by a unique address ending in “.onion”, and they can be accessed by pasting the address in the Tor Browser’s URL search bar. An example of a hidden service may be “y6xkkogwj99u34ca.onion” (please note this is a fake address). The address itself is not random, it is a hash from the hidden service’s public key.

Let’s talk about what we came here for. From a technical perspective, which are the differences between the process previously explained while accessing a conventional website and a hidden service? Instead of explaining everything again, we will focus on the main changes:

By entering an onion service's address, the Tor Browser sends information about this request to the network and retrieves the service descriptor. This contains information about the public key, introduction points and further information.

When a hidden service is published inside the Tor Network, there is something called the “introduction points”, which are entry points shared to and used by any user wanting to connect to the service.

The circuit concept previously explained works the same way, initially. A series of nodes are selected and session keys are created, but this time we do not have an exit node, as we are not going out of the Tor network, but we have what’s called a rendezvous point.

In order to avoid disclosing the user’s IP address and the hidden service’s IP address, the rendezvous points are used as a middleware node for establishing connections between the user and the hidden services, avoiding establishing direct communication in the first place. Note that any node in the network can be a rendezvous point.

The user initially sends what’s called an “introduction request”. This request contains the public key from the hidden service, the chosen rendezvous point, among other useful data for the connection, all encrypted. This request, and all the requests being sent by the user before the final circuit is established, as opposed to the previous scenarios we have seen, is encrypted with the same layer-based concept but this time using the public keys from each of the nodes from the circuit. This means that each node will remove one encryption layer with their private key.

“Why is this introduction request important?” you may ask. This request is used by the introduction point to let the hidden service know that a user wants to connect to it. When the introduction request is received, the hidden service is notified and information about the chosen rendezvous point is sent.

Now that the hidden service has information about the rendezvous point, it creates a circuit with it. Additionally, the introduction point also creates a circuit to the rendezvous point.
Now that everything is set, the user sends what’s called a “rendezvous request”. This is the final request used before the last connection is finally made. When the rendezvous point receives this request, it is relayed to the introduction point, which upon receiving it, relays it again to the hidden service.
Thanks to the information coming in the rendezvous request, the hidden service and the user can now finally create a working circuit (which works like the first scenario described in the article) that can be used to exchange information through the rendezvous point.

This process might feel a little bit confusing at the beginning, but it makes total sense if you keep in mind that this connection is being established this way so that the user and the hidden service don’t know their real addresses. It is like using a third person (rendezvous point) to help anonymous people meet each other inside a room where they can’t see their faces.

Security Implications

Lastly, let’s talk about security. I am not going to dive into vulnerabilities related to web applications, but you should know that using Tor does not prevent any kind of vulnerabilities already existing in the conventional network. This includes the lack of HTTPS, browser-based attacks, using vulnerable software…most people even recommend disabling JavaScript for security reasons. That is why most hidden services you will find in the Tor network are not fancy-looking, as they avoid using JavaScript so users don’t have any trouble using it, in case they disable it. Of course, we won’t be mentioning the fact of trusting any of the websites published inside the Tor network and entering personal information in any of them.

Having said that, there are two main points that we need to talk about:

Anonymity: Now that we know how the Onion Router works, we can assume that this could be used to remain anonymous in the Internet. Although this is most likely true, there are techniques to discover the real identity from a person using the Tor Network. When you establish a connection through Tor, your ISP (Internet Service Provider) does not know what you are connecting to, but they know you have established a connection via Tor. In case of an investigation by either an individual, a group or even a law enforcement department, anyone that has information about the time a connection was established via Tor and further data about the connections coming out of the network, may correlate it to discover the real identity from the user.

Apart from that, malicious users can infect Tor nodes. The cryptographic operations involved during data transmission via Tor prevents any attacker from controlling relay nodes to see the data traveling under the encryption layers, but the more relay nodes they control, the more data they would have to correlate and find the original sender.

Exit nodes: By reading carefully the process of creating a Tor circuit, something would probably come into your mind: “Doesn’t the exit node remove the last bit of encryption, hence revealing the original request coming from the user, and also the response coming from the server?” Yes, that is absolutely correct. This would be the main point of failure from the Tor Network. Any attacker that compromises an exit node may be able to see the data being sent through it, and they may even be able to modify it.

For this reason, HTTPS is absolutely recommended when connecting to either hidden services or conventional networks, because TLS encryption works on top of the Onion Routing, so infected exit nodes will not be able to see the application layer data.

In order to add further protection to your anonymity and data being transmitted, people sometimes opt to use a VPN in addition to Tor. This, of course, adds some security, but it also slows down data transmission even more. Tor over VPN is not the same as VPN over Tor, but their usage, advantages and disadvantages could be part of a whole different article.

Conclusion

The Onion Routing is complex and secure in many ways. Although there may be points of failure, there is no protocol that could be perfectly hardened. What is important is that you, as a user, have the required knowledge to use this technology properly and know the risks related to this technology. I personally encourage you to further investigate the Tor network and learn about the incredible world of the deep web.