This book is licensed under a Creative Commons by-nc-sa 3.0 license. See the license for more details, but that basically means you can share this book as long as you credit the author (but see below), don't make money from it, and do make it available to everyone else under the same terms.
This content was accessible as of December 29, 2012, and it was downloaded then by Andy Schmitz in an effort to preserve the availability of this book.
Normally, the author and publisher would be credited here. However, the publisher has asked for the customary Creative Commons attribution to the original publisher, authors, title, and book URI to be removed. Additionally, per the publisher's request, their name has been removed in some passages. More information is available on this project's attribution page.
For more information on the source of this book, or why it is available for free, please see the project's home page. You can browse or download additional books there. To download a .zip file containing this book to use offline, simply click here.
OK, we know how to read a Web address, we know that every device connected to the Net needs an IP address, and we know that the DNS can look at a Web address and find the IP address of the machine that you want to communicate with. But how does a Web page, an e-mail, or an iTunes download actually get from a remote computer to your desktop?
For our next part of the Internet journey, we’ll learn about two additional protocols: TCP and IP. These protocols are often written as TCP/IP and pronounced by reading all five letters in a row, “T-C-P-I-P” (sometimes they’re also referred to as the Internet protocol suite). TCP and IP are built into any device that a user would use to connect to the Internet—from handhelds to desktops to supercomputers—and together TCP/IP make Internet working happen.
Figure 12.4 TCP/IP in Action
In this example, a server on the left sends a Web page to the user on the right. The application (the Web server) passes the contents of the page to TCP (which is built into the server’s operating system). TCP slices the Web page into packets. Then IP takes over, forwarding packets from router to router across the Internet until it arrives at the user’s PC. Packets sometimes take different routes, and occasionally arrive out of order. TCP running on the receiving system on the right checks that all packets have arrived, requests that damaged or lost packets be resent, puts them in the right order, and sends a perfect, exact copy of the Web page to your browser.
TCP and IP operate below http and the other application transfer protocols mentioned earlier. TCP (transmission control protocol)Works at both ends of most Internet communication to ensure a perfect copy of a message is sent. works its magic at the start and endpoint of the trip—on both your computer and on the destination computer you’re communicating with. Let’s say a Web server wants to send you a large Web page. The Web server application hands the Web page it wants to send to its own version of TCP. TCP then slices up the Web page into smaller chunks of data called packets (or datagrams)A unit of data forwarded by a network. All Internet transmissions—URLs, Web pages, e-mails—are divided into one or more packets.. The packets are like little envelopes containing part of the entire transmission—they’re labeled with a destination address (where it’s going) and a source address (where it came from). Now we’ll leave TCP for a second, because TCP on the Web server then hands those packets off to the second half of our dynamic duo, IP.
It’s the job of IP (Internet protocol)Routing protocol that is in charge of forwarding packets on the Internet. to route the packets to their final destination, and those packets might have to travel over several networks to get to where they’re going. The relay work is done via special computers called routersA computing device that connects networks and exchanges data between them., and these routers speak to each other and to other computers using IP (since routers are connected to the Internet, they have IP addresses, too. Some are even named). Every computer on the Internet is connected to a router, and all routers are connected to at least one (and usually more than one) other router, linking up the networks that make up the Internet.
Routers don’t have perfect, end-to-end information on all points in the Internet, but they do talk to each other all the time, so a router has a pretty good idea of where to send a packet to get it closer to where it needs to end up. This chatter between the routers also keeps the Internet decentralized and fault-tolerant. Even if one path out of a router goes down (a networking cable gets cut, a router breaks, the power to a router goes out), as long as there’s another connection out of that router, then your packet will get forwarded. Networks fail, so good, fault-tolerant network design involves having alternate paths into and out of a network.
Once packets are received by the destination computer (your computer in our example), that machine’s version of TCP kicks in. TCP checks that it has all the packets, makes sure that no packets were damaged or corrupted, requests replacement packets (if needed), and then puts the packets in the correct order, passing a perfect copy of your transmission to the program you’re communicating with (an e-mail server, Web server, etc.).
This progression—application at the source to TCP at the source (slice up the data being sent), to IP (for forwarding among routers), to TCP at the destination (put the transmission back together and make sure it’s perfect), to application at the destination—takes place in both directions, starting at the server for messages coming to you, and starting on your computer when you’re sending messages to another computer.
TCP is a perfectionist and that’s what you want for Web transmissions, e-mail, and application downloads. But sometimes we’re willing to sacrifice perfection for speed. You’d make this sacrifice for streaming media applications like Windows Media Player, Real Player, Internet voice chat, and video conferencing. Having to wait to make sure each packet is perfectly sent would otherwise lead to awkward pauses that interrupt real-time listening. It’d be better to just grab the packets as they come and play them, even if they have minor errors. Packets are small enough that if one packet doesn’t arrive, you can ignore it and move on to the next without too much quality disruption. A protocol called UDP (user datagram protocol)Protocol that operates instead of TCP in applications where delivery speed is important and quality can be sacrificed. does exactly this, working as a TCP stand-in when you’ve got the need for speed, and are willing to sacrifice quality. If you’ve ever watched a Web video or had a Web-based phone call and the quality got sketchy, it’s probably because there were packet problems, but UDP kept on chugging, making the “get it fast” instead of “get it perfect” trade-off.
The increasing speed and reliability of the Internet means that applications such as Internet phone calls (referred to as VoIP, or voice over Internet protocolTransmission technologies that enable voice communications (phone calls) to take place over the Internet as well as private packet-switched networks.) are becoming more reliable. That doesn’t just mean that Skype becomes a more viable alternative for consumer landline and mobile phone calls; it’s also good news for many businesses, governments, and nonprofits.
Many large organizations maintain two networks—one for data and another for POTS (plain old telephone service). Maintaining two networks is expensive, and while conventional phone calls are usually of a higher quality than their Internet counterparts, POTS equipment is also inefficient. Old phone systems use a technology called circuit switching. A “circuit” is a dedicated connection between two entities. When you have a POTS phone call, a circuit is open, dedicating a specific amount of capacity between you and the party on the other end. You’re using that “circuit” regardless of whether you’re talking. Pause between words or put someone on hold, and the circuit is still in use. Anyone who has ever tried to make a phone call at a busy time (say, early morning on Mother’s Day or at midnight on New Year’s Eve) and received an “all circuits are busy” recording has experienced congestion on an inefficient circuit-switched phone network.
But unlike circuit-switched counterparts, Internet networks are packet-switched networks, which can be more efficient. Since we can slice conversations up into packets, we can squeeze them into smaller spaces. If there are pauses in a conversation or someone’s on hold, applications don’t hold up the network. And that creates an opportunity to use the network’s available capacity for other users. The trade-off is one that swaps circuit switching’s quality of service (QoS) with packet switching’s efficiency and cost savings. Try to have a VoIP call when there’s too much traffic on a portion of the network and your call quality will drop. But packet switching quality is getting much better. Networking standards are now offering special features, such as “packet prioritization,” that can allow voice packets to gain delivery priority over packets for applications like e-mail, where a slight delay is OK.
When voice is digitized, “telephone service” simply becomes another application that sits on top of the Internet, like the Web, e-mail, or FTP. VoIP calls between remote offices can save long distance charges. And when the phone system becomes a computer application, you can do a lot more. Well-implemented VoIP systems allow users’ browsers access to their voice mail inbox, one-click video conferencing and call forwarding, point-and-click conference call setup, and other features, but you’ll still have a phone number, just like with POTS.
Routers are connected together, either via cables or wirelessly. A cable connecting a computer in a home or office is probably copper (likely what’s usually called an Ethernet cable), with transmissions sent through the copper via electricity. Long-haul cables, those that carry lots of data over long distances, are usually fiber-optic lines—glass lined cables that transmit light (light is faster and travels farther distances than electricity, but fiber-optic networking equipment is more expensive than the copper-electricity kind). Wireless transmission can happen via Wi-Fi (for shorter distances), or cell phone tower or satellite over longer distances. But the beauty of the Internet protocol suite (TCP/IP) is that it doesn’t matter what the actual transmission media are. As long as your routing equipment can connect any two networks, and as long as that equipment “speaks” IP, then you can be part of the Internet.
In reality, your messages likely transfer via lots of different transmission media to get to their final destination. If you use a laptop connected via Wi-Fi, then that wireless connection finds a base station, usually within about three hundred feet. That base station is probably connected to a local area network (LAN) via a copper cable. And your firm or college may connect to fast, long-haul portions of the Internet via fiber-optic cables provided by that firm’s Internet service provider (ISP).
Most big organizations have multiple ISPs for redundancy, providing multiple paths in and out of a network. This is so that if a network connection provided by one firm goes down, say an errant backhoe cuts a cable, other connections can route around the problem (see Figure 12.1).
In the United States (and in most deregulated telecommunications markets), Internet service providers come in all sizes, from smaller regional players to sprawling international firms. When different ISPs connect their networking equipment together to share traffic, it’s called peeringWhen separate ISPs link their networks to swap traffic on the Internet.. Peering usually takes place at neutral sites called Internet exchange points (IXPs), although some firms also have private peering points. Carriers usually don’t charge one another for peering. Instead, “the money is made” in the ISP business by charging the end-points in a network—the customer organizations and end users that an ISP connects to the Internet. Competition among carriers helps keep prices down, quality high, and innovation moving forward.
When many folks think of Wall Street trading, they think of the open outcry pit at the New York Stock Exchange (NYSE). But human traders are just too slow for many of the most active trading firms. Over half of all U.S. stock trades and a quarter of worldwide currency trades now happen via programs that make trading decisions without any human intervention.H. Timmons, “A London Hedge Fund that Opts for Engineers, Not M.B.A.’s,” New York Times, August 18, 2006. There are many names for this automated, data-driven frontier of finance—algorithmic trading, black-box trading, or high-frequency trading. And while firms specializing in automated, high-frequency trading represent only about 2 percent of the trading firms operating in the United States, they account for about three quarters of all U.S. equity trading volume.R. Iati, “The Real Story of Trading Software Espionage,” Advanced Trading, July 10, 2009.
Programmers lie at the heart of modern finance. “A geek who writes code—those guys are now the valuable guys” says the former head of markets systems at Fidelity Investments, and that rare breed of top programmer can make “tens of millions of dollars” developing these systems.A. Berenson, “Arrest Over Software Illuminates Wall St. Secret,” New York Times, August 23, 2009. Such systems leverage data mining and other model-building techniques to crunch massive volumes of data and discover exploitable market patterns. Models are then run against real-time data and executed the instant a trading opportunity is detected. (For more details on how data is gathered and models are built, see Chapter 11 "The Data Asset: Databases, Business Intelligence, and Competitive Advantage".)
Winning with these systems means being quick—very quick. Suffer delay (what techies call latency) and you may have missed your opportunity to pounce on a signal or market imperfection. To cut latency, many trading firms are moving their servers out of their own data centers and into colocation facilitySometimes called a “colo,” or carrier hotel; provides a place where the gear from multiple firms can come together and where the peering of Internet traffic can take place. Equipment connecting in colos could be high-speed lines from ISPs, telecom lines from large private data centers, or even servers hosted in a colo to be closer to high-speed Internet connections.. These facilities act as storage places where a firm’s servers get superfast connections as close to the action as possible. And by renting space in a “colo,” a firm gets someone else to manage the electrical and cooling issues, often providing more robust power backup and lower energy costs than a firm might get on its own.
Equinix, a major publicly traded IXP and colocation firm with facilities worldwide, has added a growing number of high-frequency trading firms to a roster of customers that includes e-commerce, Internet, software, and telecom companies. In northern New Jersey alone (the location of many of the servers where “Wall Street” trading takes place), Equinix hosts some eighteen exchanges and trading platforms as well as the NYSE Secure Financial Transaction Infrastructure (SFTI) access node.
Less than a decade ago, eighty milliseconds was acceptably low latency, but now trading firms are pushing below one millisecond into microseconds.I. Schmerken, “High-Frequency Trading Shops Play the Colocation Game,” Advanced Trading, October 5, 2009. So it’s pretty clear that understanding how the Internet works, and how to best exploit it, is of fundamental and strategic importance to those in finance. But also recognize that this kind of automated trading comes with risks. Systems that run on their own can move many billions in the blink of an eye, and the actions of one system may cascade, triggering actions by others.
The spring 2010 “Flash Crash” resulted in a nearly 1,000-point freefall in the Dow Jones Industrial Index, it’s biggest intraday drop ever. Those black boxes can be mysterious—months after the May 6th event, experts were still parsing through trading records, trying to unearth how the flash crash happened.E. Daimler and G. Davis, “‘Flash Crash’ Proves Diversity Needed in Market Mechanisms,” Pittsburgh Post-Gazette, May 29, 2010; H. Moore, “‘Flash Crash’ Anniversary Leaves Unanswered Questions,” Marketplace Radio, May 5, 2011. Regulators and lawmakers recognize they now need to understand technology, telecommunications, and its broader impact on society so that they can create platforms that fuel growth without putting the economy at risk.
Want to see how packets bounce from router to router as they travel around the Internet? Check out a tool called traceroute. Traceroute repeatedly sends a cluster of three packets starting at the first router connected to a computer, then the next, and so on, building out the path that packets take to their destination.
Traceroute is built into all major desktop operating systems (Windows, Macs, Linux), and several Web sites will run traceroute between locations (traceroute.org and visualroute.visualware.com are great places to explore).
The message below shows a traceroute performed between Irish firm VistaTEC and Boston College. At first, it looks like a bunch of gibberish, but if we look closely, we can decipher what’s going on.
The table above shows ten hops, starting at a domain in vistatec.ie and ending in 126.96.36.199 (the table doesn’t say this, but all IP addresses starting with 136.167 are Boston College addresses). The three groups of numbers at the end of three lines shows the time (in milliseconds) of three packets sent out to test that hop of our journey. These numbers might be interesting for network administrators trying to diagnose speed issues, but we’ll ignore them and focus on how packets get from point to point.
At the start of each line is the name of the computer or router that is relaying packets for that leg of the journey. Sometimes routers are named, and sometimes they’re just IP addresses. When routers are named, we can tell what network a packet is on by looking at the domain name. By looking at the router names to the left of each line in the traceroute above, we see that the first two hops are within the vistatec.ie network. Hop 3 shows the first router outside the vistatec.ie network. It’s at a domain named tinet.net, so this must be the name of VistaTEC’s Internet service provider since it’s the first connection outside the vistatec.ie network.
Sometimes routers names suggest their locations (oftentimes they use the same three character abbreviations you’d see in airports). Look closely at the hosts in hops 3 through 7. The subdomains dub20, lon11, lon01, jfk02, and bos01 suggest the packets are going from Dublin, then east to London, then west to New York City (John F. Kennedy International Airport), then north to Boston. That’s a long way to travel in a fraction of a second!
Hop 4 is at tinet.net, but hop 5 is at cogentco.com (look them up online and you’ll find out that cogentco.com, like tinet.net, is also an ISP). That suggests that between those hops peering is taking place and traffic is handed off from carrier to carrier.
Hop 8 is still cogentco.com, but it’s not clear who the unnamed router in hop 9, 188.8.131.52, belongs to. We can use the Internet to sleuth that out, too. Search the Internet for the phrase “IP address lookup” and you’ll find a bunch of tools to track down the organization that “owns” an IP address. Using the tool at whatismyip.com, I found that this number is registered to PSI Net, which is now part of cogentco.com.
Routing paths, ISPs, and peering all revealed via traceroute. You’ve just performed a sort of network “CAT scan” and looked into the veins and arteries that make up a portion of the Internet. Pretty cool!
If you try out traceroute on your own, be aware that not all routers and networks are traceroute friendly. It’s possible that as your trace hits some hops along the way (particularly at the start or end of your journey), three “*” characters will show up at the end of each line instead of the numbers indicating packet speed. This indicates that traceroute has timed out on that hop. Some networks block traceroute because hackers have used the tool to probe a network to figure out how to attack an organization. Most of the time, though, the hops between the source and destination of the traceroute (the steps involving all the ISPs and their routers) are visible.
Traceroute can be a neat way to explore how the Internet works and reinforce the topics we’ve just learned. Search for traceroute tools online or browse the Internet for details on how to use the traceroute command built into your computer.
If you’re a student at a large research university, there’s a good chance that your school is part of Internet2. Internet2 is a research network created by a consortium of research, academic, industry, and government firms. These organizations have collectively set up a high-performance network running at speeds of up to one hundred gigabits per second to support and experiment with demanding applications. Examples include high-quality video conferencing; high-reliability, high-bandwidth imaging for the medical field; and applications that share huge data sets among researchers.
If your university is an Internet2 member and you’re communicating with another computer that’s part of the Internet2 consortium, then your organization’s routers are smart enough to route traffic through the superfast Internet2 backbone. If that’s the case, you’re likely already using Internet2 without even knowing it!