This is thought to be common knowledge by most, but there are quite a few tech people out there that still do not know what happens when you trace to an IP address. And troubleshooting traceroutes can even be more complicated. We will cover the basics here, and if you are looking to delve into this topic further I have provided a link to the full presentation.
Traceroutes and pings are how we troubleshoot the Internet. It’s how we see where our traffic is going, how close our game servers are, and how we quickly identify issues. Ping and traceroute programs use a protocol called ICMP (Internet Control Message Protocol). Traceroute uses this protocol because of its diagnostic and error reporting features. These features, included with a TTL (Time To Live) and response time, give the user an abundance of knowledge about their packets.
TTL is used to identify router hops as your packets are being sent around the Internet. Traceroute uses an identification process to show you the hops by setting a specific TTL. It sends the first packet out with a TTL of 1 and each route along the way decrements the TTL by one. Once the TTL reaches 0, the router responds back with a TTL exceeded error message. The source host receives this error messages then knows that this device that reported the error message is the first router “hop” on the path. It repeats this process by increasing the TTL by 1 for each hop to display the path. Typically after 30 hops traceroute would see the destination as unreachable and quit.
Some facts about traceroutes:
You only see the direction your packets are being sent. This does not mean it is being sent back the same way. Traffic could be sent back through another network, and you would never know unless you did a traceroute in the other direction.
What you see identified in the trace is the port your packets entered the router on. You are never able to see how traffic exits the specific router; you can only see how it enters the router and then enters the next one in the path. Even a traceroute in the other direction may not show you this if the packets take a different path.
There are ways to use the name of the device to identify the port type and location of each hop. This information can be crucial in identifying routing loops and suboptimal routing. More information on this can be found on the detailed presentation.
Just because there is a spike in latency does not mean there is an issue. Remember how I said the router must respond to a TTL of 0 with the exceeded error. Well this requires precious CPU cycles to accomplish this task. Other normal packets going through the router are switched by hardware totally bypassing the CPU. So if you see a spike in latency at one hop that is typically due the router doing something else. Pretty much everything takes more priority than ICMP packets so this is why you are likely to see spike in latency from hop to hop when running a traceroute.
ICMP packets are often blocked and rate-limited. This could be for security issues or other DDoS protection measures. So don’t be worried if you can ping the destination but the traceroute never completes. The best way to look for legitimate traceroute issues to see a consistent, above average latency increase across the path. This would also be in conjunction with pings sent to the destination that show packet loss or above average latency.
At the end of the day, if you ever need to submit a ticket, we always recommend traceroutes in both directions and 100 pings. This way we will have the information we need to effectively troubleshoot the issue for you. If you have pings that show your baseline latency, this is always too. We hope that this cleared up some questions you had in the past about traceroutes and pings. If you want to know more about this topic, there is a great Nanog presentation about this at the link below: