The Internet. What it is, where it comes from, and where its going

A few weeks ago Nimit did a presentation on ‘The Internet’. All of it. I’ve been meaning to blog about it since then as a way to internalize everything we went over but have been putting it off because I hate writing. But yesterday, David’s friend and fellow programmer, Andy Brett, came to FullStack and spoke about many topics including coding interviews. He mentioned one of his favorite questions to ask an interview-ie was how the Internet works. After today you and I will both be able to fully answer that question. And here we go!

The Internet is a big network of servers. Servers are computers that many people can access at a time and responsd to Internet requests.

URL: A url has five parts.

Ex: http://sports.yahoo.com/nhl

The first part of this URL is the transfer protocol. More on this later. the ‘sports’ part is a sub-domain. These go before the domain name and are free. I believe they have become more popular recently. ‘yahoo’ is the domain name. Great name! Domain names cost money. You can rent or buy them from sites like godaddy.com. The ‘com’ is the top level domain. There are a ton of top level domains but com is by far the most popular. The coolest top level domain out there right now is io which stands for indian ocean. It makes me think of the iphone for some reason. probably from ios. ‘/nhl’ is the sub-directory or file. This part of the url, like the sub-domain, is free. This is where all the routing parts of a url will go.

A request: What happens when we type google.com into the browser url? The browser first resolves the host-name by making a DNS request over TCP/IP. We’ll be coming back to TCP/IP. DNS stands for Domain Name System. This set of servers is responsible for translating web addresses into ip addresses. There are a few steps in this process. When you make a request, your browser first checks your local cache for the IP address. The IP address of websites you visit often are stored on your computer so you don’t even have to make a DNS lookup request. Small things like this can make web browsing faster. If you do not have the site cached on your box, your browser will send a request to the local DNS server. Local here means exactly what you think it means, the server closest to your current location. So your DNS server will change as you change locations. This DNS server, like your browser, has a cache of web addresses and ip addresses. This cache is much larger then your local box cache. If the sever has the ip address it sends the address back to your browser. If the DNS server does not have your request, it becomes a DNS client and requests the IP address from other DNS servers. this is a recursive process and will continue up the DNS hierarchy until the request reaches one of the 13 ‘root’ servers. These servers maintain the entire database of web addresses and their ip addresses. 10 of these are located in the US, one in Japan, one in London, and one in Sweden. Kind of a big deal. Anyways, back to our request. If out request is not found on any lower level DNS server, the request goes to the root server. The root server will send back the ip address through the chain of DNS servers, if it has one, to your browser. This all happens at crazy fast speeds and that was only the first step of the process. We haven’t even requested any data from the website we are actually interested in. But now that we have an ip address, we can make that request. The most common request is a GET request. This means we are only asking for data from the server, not sending any to the server. This GET request travels to your ISP (internet service provider, e.g. Time Warner, Verizon) your ISP will basically send this message along until it reaches the correct server. Your request can travel all around the country looking for the correct server. Generally servers will know a little more about where the IP address is than the previous server and will route the request as efficiently as it knows how to. There servers are different than the web server we are trying to reach. They are only trying to route internet traffic. These servers are owned by ISPs. ISPs allow traffic to pass between their servers without much restriction. This has allowed the internt to become much faster. Eventually (and when I say eventually, I mean within a second and probably closer to 200 milliseconds) our GET request will reach the correct server. Here, the server will take the HTTP verb, ours is GET, and the url, process the request and return data to your browser. This is just an overview of a request but there is definitely alot more depth than what I described here.

Fun things to do to learn more about the internet: open a command prompt, terminal, ect, and type in nslookup google.com. This returns your IP address and google’s IP addresses. Your box has an internal IP address assigned by the router you are currently using. This router also has an IP address. This IP address is an external IP address and is used so web servers know where to send information to. You can find out your router’s ip address by going to google and typing, whats my ip address. Thanks google! Other fun things, in the terminal type traceroute goggle.com. This shows you the IP address of where your request goes and the time it takes to get to the next server. Very interesting. Type ping yahoo.com in the terminal. This will continually ping yahoo.com to see how long it takes to receive a response from their servers. Press ctrl C to end the request cycle. Once ended, ping will also tell you an average amount of time to took to ping the server. One more fun one. Type whois kyledorman.com. Boooom! Its me. Who is will give you a bunch of information about the person, or organization who is currently using the web address. You can see that they messed mine up and put my state as NOrth Carolina not New York. Oh well. Some companies will pay to have this information hidden but I left it up there. Its kind fo fun. If you want to see some weird stuff type out whatis google.com. People have tacked their websites to the end of google somehow so they also show up on the whois list.

IP/TCP: IP stands for Internet Protocol, TCP for Transmission Control Protocol. IP is the principal communications protocol for the internet for relaying data across network boundaries. It is tasked with delivering packets of data from the source host to the destination host based on IP addresses in the packet headers. IP uses n end to end principle in its design. This basically means the internet is assumed to be unreliable at any one point so most error-checking takes place the end points of the process. For our purposes this means our computer and the web server. There is a ton more I could learn about this subject but for now I will let it be. One last note, MTU stands for maximum transmission unit and is the max size one packet of data can be. When google first started out, in 1998, its search page fit entirely in one packet, i.e. it was one MTU in size, meaning it would load very quickly. They don’t do this anymore. Too much information to send out But it would be fun if they reverted to this for a dey or two some time.

Webserver: Above I explained a web request and talked briefly about web servers but there is much more to a web server than simply serving a web request (Server: a computer that ‘serves’ web requests, ohhhhhh). A web server using runs a UNIX OS or Windows Server. Web servers respond to HTTP requests. There are ~20 HTTP verbs but the most common 4 are GET, POST, PUT, and DELETE. The HTTP verb and the route you are requesting will tell the web server what exactly you want the web server to do. As I said above, the web server serves lots of GET requests for viewing a page but will also serve many PUT and POST requests for adding content to pages. PUT requests are great because they will only be sent once no matter how many times you click enter or click a button. But they are not well supported by servers, so rails emulated PUT requests using POST. I don’t have too much more on that for now. Often times DELETE is also emulated using a POST request. To some extent I think Rails is trying to use more HTTP verbs not less.

Lastly, response codes. When a web server sends a response to our request, it includes a response code that tells us the result of the request. Codes in the 200s indicate success. Codes in the 300s indicate redirection. Codes in the 400s indicate and error of some kind. I’ve been seeing alot of those lately. Whomp.

Ok so thats the internet in a nut shell. I wanted to spend more time talking about the history of the internet and where I think its headed but this turned out to be quite meaty on its own. Let me know if you have any questions.

Fast and Firm, Kyle (KED)