The Computer and the Internet I. Introduction The Internet as we know it in 1998, although vast, is still a new and developing communications technology. It is based on a number of ingenious engineering accomplishments. This paper will look at some of the most important. Any quantitative description of the Internet includes the number of networks interconnected (hence the name Internet from internetworking), the number of computers among which electronic data can be exchanged and ultimately the number of people who can communicate with this vast computer and network resource and also with each other. The elements that comprise the Internet are computers and networks of computers. These being physical entities, in order to perform reliably, require careful design based on solid engineering principles. The Internet itself is more than the sum of its elements. It too requires careful and evolving design based on principles similar to those for computers and networks and some unique to the Internet. II. The Computer The engineering breakthrough that set the basis for much of modern life but especially for the Internet was the creation of a high speed, digital, program controlled, calculating and logic machine. For at least the last 350 years, efforts had been made to create mechanical devices for performing tedious arithematic calculations. The names that are associated with the best known of these efforts include Blaise Pascal (1623-1662), Gottfried Wilhelm von Leibniz (1646-1716), Per Georg Scheutz (1785-1873) and his son Evard (1821-1881), and Charles Babbage (1791-1871). These efforts depended upon and added to the technological developments of their times and so were limited to mechanical (as opposed to electic or optical) devices. The rapid development of telecommunications in the 1920's demonstrated the operating reliability of electromagnetic relays. By the means of such relays a growing level of automation was being introduced into telephone networks. Independently, a number of people in various countries began to conceive of the use of such relays in calculators and eventually as the basis of multipurpose computing machines. The operation of the telephone relay depends upon the ability of an electric current to temporarily magnetize a metal core. Essentially, a relay is a switch whose state is open or closed depending the absence or presence of a current in a controlling coil: metal strip e a______________*================== gap h hinge^ __ ^----------------b ___________________|__|_ circuit for _|__|_ coil current _|__|_ ___________________| | ^--- coil c |__| ^----- core h Fig. 1 The Principle of the Electromagnetic Relay Current will be able to flow between point a and point b (the closed state) if there is a current in coil c enough to magnitize core d sufficiently to attract the metal strip e closing gap h. Otherwise current will not be able to flow from a to b (the open state). With gap sizes in fractions of a millimeter, the state of such a relay can be changed on the order of 1000 times per second. Thinking about the all or none, on or off, two state possibilites for simple relays suggested the use of binary digits (0,1) to represent all data and instructions rather than for example decimal digits (0,1,2,3,4,5,6,7,8,9). Requiring a device to only represent and respond to two digits keeps the design of the device simple. Charles Babbage in his efforts to build his Differnce and Analytic Engines needed wheels and gears with 10 possible distinct states or positions. Had he chosen binary in stead of decimal representation his gears and wheels would have been much simplified. There is one complication introduced by an all binary computer. It needs to have decimal-to-binary and binary-to-decimal converters to make life easier for its human users. Binary representation of numbers also allows for binary representaion of the operations and simplifies what elements are needed. In 1936 in his masters degree thesis at MIT, Claude Shannon showed the link between Boolean logic (named after George Boole 1815-1864) and electrical circuits. For example the following three circuits are equivalent to the AND, OR and NOT operators which are shown with their corresponding (truth) tables: AND OR NOT __\__ | A | a___\___\___b a---| |---b a___ __b A B |__\__| ^\___| B A (each switch can be replaced by a relay, triode or transistor) Fig. 2 The Three Basic Logic Operations and Their Circuit Equivalents Out of these three logic circuits can be built the basic arithematic operations that comprise the functional elements of the computer. For example the following is a circuit for adding two binany digits. It is called a "half-adder" because it does not allow for a carry from a previous addition. If you allow inputs at A and B to take on all combinations of 0s and 1s the accompanying table results which is the table for A+B in binary notation: Fig. 3 A Half-Adder Constructed from Logic Elements There would need to be one such set of logic elements for each digit of accuracy being calculated. So for 8 decimal places of accuracy which corresponds to 27 binary places an addition of two 27 binary digit numbers would require at least 27 full adders. In a similar fashion, all basic arithmetic operations can be built out of simple circuits of switches, tubes or transisitors. Konrad Zuse (1918-199?), working in Germany, was the first engineer to systematically incorporate relays into a calculating device. He built a series of machines (Z1, Z2, Z3 and Z4) which used relays progressively more exclusively. His Z3, operational in 1941 and Z4, operational in 1945 are given credit as being the first successful electric, arithmetical machines. Others working on similar machines at about the same time include George Stibitz at AT&T Bell Laboratories and Howard Aiken at Harvard with support from IBM. A significant innovation began with the work of John Atanasoff and continued by work on the ENIAC at the University of Pennsylvania supervised by J. Presper Eckert and John Mauchly. That was the use of vacuum tubes (what the British call "thermionic valves") in place of electromagnetic relays. The operation of the simple triode vacuum tube is fully analogous to the operation of the electromagnetic relay, but the vacuum tube contains no moving mechanical parts: _____ | | ___|___ | anode d | ___ _ _ _ | grid g | e | | _______ | cathode c | | /\ | heater h | | || | | | || | | | || | | | || | g a h b Fig. 4 The Principle of the Triode Vacuum Tube Current will flow between point a and point b (the closed state) because the heater h drives negatively charged electrons off of the cathode c that can get to the anode d. The electrons are attracted to the anode d because the anode is at a positive potential with respect to the cathode. However, if the grid g is given a negative potential relative to the cathode the electrons will be repelled and not reach the anode (the open state). The grid potential bias can be switched as fast as a million times per second. Therefore the state of the tube can be changed in a time of the order of microseconds. The first computing device based on vacuum tubes was the Electronic Numerical Intergrator and Computer. The ENIAC, incorporating more than 18,000 vacuum tube switches was operational in 1946. What the relay and vacuum tube machines demonstrated was that high speed automatic calculating machines were feasible with the technology available in the mid 1940s. That technology was, in not too long a time, superseded by technology based on a device invented in 1947 by John Bardeen, Walter Brattain and William Schockly. The three working at Bell Labs produced the first transistor in December 1947. The transistor had the same functionality as the triode vacuum tube but in a solid state device smaller, faster, more reliable and more economical than the vacuum tube. The operation of the ENIAC suggested a set a principles for the construction of all subsequent electronic general purpose calculating machines, what have come to be called computers. Some of the principles had been understood at least 100 years earlier by Charles Babbage and had been periodically rediscover in the intervening 100 years by various people who took up to built a multipurpose general computing machine. The principles at the foundation of today's computers were summerized by John von Neumann in the "First Draft of a Report on the EDVAC". The EDVAC was planned as the successor to the ENIAC with the improvement of having its instructions contained in its electronic storage or memory element. What von Neumann presented was the summmary of the work and thinking he had been part of with Eckert, Mauchley and others around the ENIAC. The principles include: 1) to be an all purpose computer a device had to operate at very high speed and thus the elements of the device had to be electronic and not mechanical. 2) to be an all purpose computer a device had to operate on numerical not analog representation of the quantities and problems involved. 3) to achieve high speed of operation and relative simplicity of design the elements had to represent and operate on binary digits. 4) the device had to include a mechanism for the storage of data and instructions. 5) to achieve high speeds the instructions had to be stored in the memory element. 6) to achieve relative simplicity of construction the operations should be sequential rather than simultaneous where ever possible using a fixed clock or pulse to time cycles. The requirement of high speed was to fulfill the primary driving force behind the work to produce computers: to do in a short period of time what it would take an unaided person a long time to do if doable at all. By 1949 IBM was marketing computers that did multiplication in 1/50 of a second that would take an average adult 20 minutes to do. That is the machine could do 60,000 multiplications in the time it would take one person to do one. Since the computer's principle was high speed operation, feeding it instructions could not be done manually or even mechanically (by cards or tape). Since it had to have its instructions at the same speed as it did its operations, the instructions had to be stored internally along with the data. Since both the data and the instructions were in binary digital form they could be stored on the same memory device. Also an internal clock or voltage pulse generator is needed to cycle the computer through its operations The pulse marks the beginning and end of each cycle so that the operations are synchronized. For example during one cycle one instruction is read or one partial addition is performed. von Neumann also envisioned that instructions must be given in exhaustive detail carried out completely without any need for further intelligent human intervention. The computer does nothing it has not been instructed or wired to do. It can do no calculation a person could not theoretically do given enough time. The essense of the computer is its great speed of operation. But that great speed can give the appearance of doing things people could not do. The discipline to put together the proper instuctions for a computer is called software engineering, the sting of coded instructions being software. Flawless functioning is aimed for but since there will always be inevitable errors from poorly designed software or from mechanical malfunction, human oversight of computers and intervention will be necessary. Architecturally, a computer consists of a set of arithmatic circuits, an all purpose logical control unit, storage devices called stores by Babbage and memory in today's usage, and input devices such as keyboards and mice, output units such as oscilloscope or TV like monitors and printers, and mixed input/output devices like magnetic tape reader/writers. The computers that make possible today's Internet are descendents of the computer von Neumann summerized. But there has been a spectacular miniturization based on the ability to embed millions of electronic components almost at the molecular level onto waffers of silicon called microchips. Operating pulse frequencies of the order of 100 million per second are becoming available on personal computers. Also memory technology continually advances making possible the use of increasingly more complex instruction sequences called operating systems that generate the functionality of the computer for the user. III Time-Sharing Manufacter of computers began in the late 1940s. The first computers to become available were very large devices costings hundreds of thousands or millions of dollars. Only government bodies, large universities and the largest corporations could afford them. The standard input devices were punch card readers. The programmers, people who wrote the software, put their coded instructions and the needed data onto cards by punching holes to represent the instructions and data. Punch card readers sensed the holes and converted the information into electronic form usable by the computer circuits. Typically, each programmer submitted a deck of punched cards containing his or her program and the necessary data and the decks were read sequentially by the computer. Sometime later the programmer could pick up his or her deck together with any printed output. This mode of scheduling computer use was known as batch processing. These early computers acted in isolation from each other and were not operated directly by the programmers. The next breakthrough that made the Internet possible was to look at the computer in a different way. John McCarthy at MIT and others saw the need for programmers to interact with the computers directly so they could try out and correct their programs more quickly. But to allow one programmer to monopolize such a large and expensive computer was inefficient. McCarty's solution was to consider using the big computers efficiently in interactive mode by attaching more than one terminal to the computer. He conceived of restructuring computer design or at least the design of the operating system (the set of instructions which controls how a computer performs its operations). Because of the great speed of computer operation, McCarthy believed it was possible to serve each terminal in turn with a short burst of of operating time so that the composite result would be the illusion for each user that he or she had sole use of the computer. This mode of computer use came to be called time- sharing. McCarthy suggested in a 1959 Memo that MIT begin to experiment with time-sharing. He predicted that all computers would one day operate that way. Engineering professor Fernando Corbato began working to test out the feasibility of the time- sharing mode of computing. By 1962, he and his staff succeeded in creating CTSS (Compatible Time-Sharing System) which became the prototype for other time sharing experiments. Under the encouragement of JCR Licklider who had just become director of the Information Technologies Office of the Advanced Research Projects Agency (see below), time-sharing and interactive computing experiments where undertaken at a number of places. Engineers at MIT experimented with using leased telephone lines to allow professors to use their terminals from home. The users of the CTSS computer began to share files and programs and even messages among each other. The experiment had another effect. It drew together the users into a community of mutual help, collaboration and resource sharing. IV Packet Switching The success of CTSS was matched by other time-sharing experiments making computer programming and error correction called debugging much more efficient. The interactive mode of computer use made computing accessable to a much larger body of users than previously. The vision of an Intergalactic Computer Network began to be spread by Licklider summerizing the growing desire to connect up not only a few people around one computer but all the people and resources on many computers. Such connecting of computers to each other is called computer networking. If the computers are close by each other the network would be a Local Area Network (LAN). If the computers are geograhically separated their interconnection is called a Wide Area Network (WAN). The first cross country computer network experiment was tried using the telephone network. The telecommunications technology used for telephone transmission requires setting up a path between the calling and answering parties. That path is set up by activating switches in the telephone network either manually or automatically and keeping those switches and the wires that connect the calling and answering parties active as a path for the duration of the call. The setting up of a dedicated path of wires and switches is called circuit switching. An experiment was conducted in 1967 connecting up two time-sharing computers over a circuit switched telephone network path. The experiment seemed a success except that the dial communications based on the telephone network were too slow and unreliable. For the short bursts of computing time allowed each computer it took too long to set up and connect a circuit switched path. Time-sharing itself suggested the solution. The solution found was to break up each data burst that constitutes the communication between time-shared computers into small data strings, include with each string some address and sequence information and send the resulting data packet onto the network toward the receiving computer. The data packets that make up one message would be interspersed with other such data packets as they travel on the network. The original message gets reassembled at the receiving computer end. This technology is know as packet switching. The great speed of the computers doing as many as 100 million operations per second coupled with the speed of transport of electric signals on wires at 1/10 or more the speed of light (3,000km per second) makes the communication appear continuous even when the computers are far apart geographically. The greater efficinecy of packet switching over circuit switching comes from interspersing packets so that the shared connections are not monopolized by big messages or one or another computer. A packet switched network, like a time shared computer is democratic technolohgy in that all sized messages get broken up into packets which get interspersed until reconstructed at the other end. The first large scale packet switching experiment was the ARPANET setup to connect university and other Defense Department contractors. Beginning in 1962, the US government funded scientific and engineering research specifically in information processing and computer networking technology. The funding was distributed by the Advanced Research Projects Agency (ARPA), an interbranch agency of the US Department of Defence. The ARPANET is so named because ARPA was the source of the funding and encouragemnet of the experiments that resulted in the successful network. A confusion has arisen over whether the ARPANET and other research funded by ARPA had military purposes. ARPA was created in the wake of the launch in 1957 of the SPUTNIK earth orbiting satellite by the then Soviet Union. ARPA's mandate was, under civilian leadership, to find and encourage advanced research projects that had the promise or potential of making substantial scientific or engineering breakthroughs which in the long run would strengthen the US. The experimental work that lead to the ARPANET was initiated as purely an experiment with packet switching on a relatively large scale. Contrary to military principles, all work on the ARPANET was made public and all experimenters were encouraged and in some cases required to write uncensored and unclassified descriptions of their work and to collaborate with as many other researchers as possible. The open principle on which the ARPANET was based continues as the principle on which its child the Internet thrives. The ARPANET was contructed out of a subnetwork of mini- computers, each connected via leased telephone lines to at least two other such computers called Interface Message Processors (IMPs). Each IMP was connected to a nearby local time-sharing computer called a host to which were connected terminals (once called guests). Each host was assigned a number with 8 digital bits allowed for that purpose. Therefore the hosts could be numbered from 00000000 to 111111111 which in decimal notation is from 0 to 255. As an experimental network it was not expected to expand beyond 256 hosts. The agreement on how to number ARPANET hosts and how to include that information as an address in the packets intended for that host is an example of a communications protocol. A communications protocol is the set of specifications and conventions that must be observed to make communications possible. Such specifications usually include how to initiate and how to terminate a communications session, how to add an address to the communicated message, etc. The details of the ARPANET protocols were worked out by graduate students and the staff of BBN the contractor who produced the first IMPs. The graduate students formed a group called the Network Working Group (NWG) which was responsible for all the host-host protocols. They started a process of documenting and making public the protocols and other network specifications. The information was put into a standard form named a Request For Comment (RFC) and made available first by surface mail to members of the NWG, later made available on the ARPANET itself and today available over the Internet or by surface mail. The success of the ARPANET was accompanied by other networking experiments. In particular, ARPA funded experiments with radio wave carried data packets and with a satellite node packet switching network. At a little later time period, in 1979, graduate students Tom Truscott, Steve Bellovin and others in North Carolina launched a users news network eventually called Usenet that first tavelled via telephone calls between computers. Mark Horton and others at the University of California at Berkeley devised a way for a computer that was both an ARPANET host and being used for Usenet to send messages from the ARPANET host to Usenet. This demonstated the possibility of intercon- necting networks with differnet characteristics. By this time ARPA was well on its way to funding research to arrive at protocols and specifications that would make possible the interconnection of many different networks. Out of that research and experimentation came the Internet. V The Internet The Internet is the successful interconnecting of many different networks to give the illusion of being one big computer network. What the networks have in common is that they all use packet switching technology. On the other hand, each of the connected networks may have its own addressing mechanism, packet size, speed etc. Any of the computers on the connected networks no matter what its operating system or other characterists can communicate via the Internet if it has software implemented on it that conforms to the set of protocols which resulted from the ARPA funded research in the late 1970's. That set of protocols is built around the Internet Protocol (IP) and the Transmission Control Protocol (TCP). Informally, the set of protocols is called TCP/IP (pronounced by saying the names of the letters T-C- P-I-P). The Internet Protocol is the common agreement to have software on every computer on the Internet add a bit of additional information to each of its packets. Without such software a computer can not be connected to the Internet even if Internet traffic passes over the network the computer is attached to. A packet that has the additional information required by IP is called an IP datagram. To each IP datagram the computer adds its own network addressing information. The whole package is called a network frame. It is a network frame containing an the IP datagram rather than ordinary packets that a computer must send onto its local packet switching network in order to communicate with a computer on another network via the Internet. Fig. 5 A Network Frame If the communication is between computers on the same network the network information is enough to deliver the frame to its intended destination computer. If the communication is intended for a computer on a differenr network, the network information sends the frame to the closest computer that serves to connect the local network with a different network. Such a special purpose computer is called a router (sometimes a gateway). It is such routers that make internetworking possible. The Internet is not a single giant network of computers. It is hundreds of thousands of networks interconnected by routers. A router is a high speed, electronic, digital computer very much like all the other computers in use today. What makes a router special is that it has all the hardware and connections necessary to be able to connect to and communicate on two or more different networks. It also has the software to create and interpret network frames for each network it is attached to. In addition it must have cababilities require by IP. It must have software that can remove network information from the network frames that come to it and read the IP information in the datagrams. Based on the IP information it can add new network information to create a an appropriate network frame and send it out on that different network. But how does it know where to send that the IP datagram? The entire process of Internet communication requires that each computer participating in the Internet has a unique digital address. The unique addresses of the source and destination are part of the IP information added to packets to make IP datagrams. The unique number assigned to a computer is its Internet or IP address. The IP address is a binary string of 32 digits. Therefore the Internet can provide communication among 2 to the 32nd power or about 4 billion 300 million computers (2 for every three people in the world). Internet addresses are written as for example like 128.59.40.130. Each such address has two parts, a network ID and a host ID. In this example 128.59 (network ID) identifies that this computer is part of a Columbia University network and 40.129 (the host ID) identifies which particukar computer (on the cunix cluster) it is. A router's IP software examines the IP information to determine the destination network from the network ID of the destination address. Then the software consults a routing table to pick the next router to send the IP datagram to so that it takes the "shortest" path. A path is short only if it is active and it is not congested. Ingenious software programs called routing deamons send and receive short messages among adjacent routers characterizing the condition on each path. These messages are analyzed and the routing table is continually up dated. In this way IP datagrams pass from router to router over different networks until they reach a router connected to their detination network. That router puts network information into the network frame that delivers the datagram to its destination computer. The IP datagram is unchanged by this whole process. Each router has put next router information along with the IP datagram into the next network frame. When the IP datagram finaly reaches its destination it has no information how it got there and differnt packets from the original source may have taken different paths to get to the same destination. IP as described above requires nothing of the interconneted networks except that they are packet switching networks with IP compliant routers. If the transmitting network uses a very small frame size, the IP software can even fragment an IP datagram into a few smaller ones to fit the network's frame size. It is this minimum requirement by the Internet Protocol that makes it possible for a great variety of networks to participate in the Internet. But this minimum requirement also results in little or no error detection. IP arranges for a best-effort process but has no guarantee of reliability. The remainder of the TCP/IP set of protocols adds a sufficient level of reliability the make the Internet useful. There are problems that IP does not solve. For example, interspersed network frames from many computers can sometimes arrive faster than a router can route them. A small backlog of data can be stored on most routers but if too many frames keep arriving some must be discarded. This possibility was anticipated. On most computers on the Internet except routers software behaving according to the Transmission Control Protocol (TCP) is installed. When IP datagrams arrive at the destination computer, the TCP compliant software scans the IP information put into the IP datagram at the source. From this information the software can put packets together again if they are all there. If there are duplications the software will discard any but the first packet to arrive. What if some IP datagrams have been lost? As a destination computer receives data, the TCP software sends a short message back over the Internet to the original source computer specifying what data has arrived. Such a messsage is called an acknowledgement. Everytime TCP and IP software send out data, TCP software starts a timer (sets a number and decreases it periodically using the computer's internal clock) and waits for an acknowledgement. If an acknowledgement arrives first, the timer is cancelled. If the timer expires before an acknowledgement is received back the TCP software retransmits the data. In this way missing data can usually be replaced at the destination computer in a resaonable time. To achieve efficient data transfer the timeout interval can not be preset. It needs to be longer for more distant destinations and for times of greater network congestion and shorter for closer destinations and times of normal network traffic. TCP automatically adjusts the timeout interval based on current delays and on the distance it calculates based on the network address of destination. This ability to dynamically adjust the timeout interval contributes greatly to the success of the Internet. Having been designed together and engineered to perform two separate but related and needed tasks, TCP and IP complement each other. IP makes possible the travel of packets over different networks but it and thus the routers are not concerned with data loss or data reassembly. The Internet is possible because so little is required of the intervening networks. TCP makes the Internet reliable by detecting and correcting duplications, out of order arrival and data loss using an acknowledgment and time out mechanism with dynamically adjusted timeout intervals. VI Conclusion The Internet is a wonderful engineering achievement. Since January 1, 1983, the cutoff data of the old ARPANET protocols, TCP/IP technology has successfully dealt with trmendous increases in usage and in the speed of connecting computers. This is a testament to the success of the TCP/IP protocol design and implementation process. Douglas Comer highlighted the features of this process as follows: * TCP/IP protocol software and the Internet were designed by talented dedicated people. * The Internet was a dream that inspired and challenged the research team. * Researchers were allowed to experiment, even when there was no short-term economic payoff. Indeed, Internet research often used new, innovative technologies that were expensive compared to existing technologies. * Instead of dreaming about a system that solved all problems, researchers built the Internet to operate efficiently * Researchers insisted that each part of the Internet work well in practice before they adopted it as standard. * Internet technology solves an important, practical problem; the problem occurs whenever an organization has multiple networks. (from The Internet Book) The high speed, electronic, digital, stored program controlled computer and the TCP/IP Internet are major historic breakthroughs in engineering technology. Every such breakthrough in the past like the printing press, the steam engine, the telephone, the airplane have had profound effects on human society. The computer and the Internet have already begun to have such effects and this promises to be just the beginning. In the long run, despite the growing pains and dislocations every great technological breakthrough serves to make possible a more fulfilling and comfortable life for more people. The computer and the Internet have the potential to speed up this process although it may take a hard fight for most people to experience any of the improvement. We live however in a time of great invention and great potential. Bibliography Augarten, Stan. Bit by Bit: An Illustrated Histroy of Computers. New York. Ticknor & Fields. 1984. Berkeley, Edmund C. Giant Brains Or Machines that Think. New York. Science Editions. 1961. Comer, Douglas E. Internetworking with TCP/IP Vol I: Principles, Protocols, and Architecture 2nd Edition. Englewood Cliffs, NJ. Prentice Hall. 1991. Comer, Douglas E. The Internet Book. Englewood Cliffs, NJ. Prentice Hall. 1995. Computer Basics: Understanding Computers. Alexandria VA. Time- Life Books, 1989. Hauben, Michael and Ronda Hauben. Netizens: On the History and Impact of Usenet and the Internet. Los Alamitos, CA. IEEE Computer Society Press. 1997 Lynch, Danial C. and Marshall T. Rose. Editors. Internet Systems Handbook. Reading, MA. Addison-Wesley. 1993. Randell, Brian. Editor. The Origins of Digital Computers, Selected Papers. Berlin. Springer-Verlag. 1973. Stevens, W. Richard. TCP/IP Illustrated, Vol 1 Protocols. Reading, MA. Addison-Wesley. 1994. Strandh, Sigvard. The History of the Machine. New York. Dorset Press. 1989 (Copyright 1979, AB NORDBOK, Gothenburg, Sweden). Zuse, Konrad. The Computer - My Life. Berlin. Springer-Verlag. 1993. (English translation of Der Computer - Mein Lebenswerk).