Grok the Web

A Programmer's Guide to the New Software Development Paradigm

by Andrew Schulman

Chapter 1

The Crude Beginnings of a New Operating System

Last revised: April 15, 1997

Introduction
Why Would You Want to Do That?
sidebar: Rich Text and Email Bombs
sidebar: Shopping Carts and Cookies
URLs: Handles to Data
sidebar: ISBN
sidebar: A FORM and Its ACTION are Distinct
HTML
Distributed Computation: CGI
Creating Synapses: HTTP
A New Type of Application
A Lesson for the Software Industry

Now that web addresses such as http://www.ups.com appear on the sides of buses and trucks, and http://www.msnbc.com and www.cnn.com (and, bizarrely, www.heavensgate.com) show up on the nightly TV news, these odd-looking web incantations are almost becoming familiar and taken for granted, like phone numbers. There are probably millions of people who know the term "URL", and perhaps a million or so for whom even HTML tags such as <IMG SRC> and <A HREF> hold no terrors. I recently overheard a bus driver give out his URL to one of his passengers; he was very casual in his use of the seemingly-arcane Uniform Resource Locator. He then proceeded to discuss the use of interlaced GIF files in an HTML <IMG> tag.

Rather than laugh at the seeming incongruity of a bus driver involving himself in the details of web-page construction, I hope that you'll instead agree this isn't incongruous at all: it is great! Web programming is accessible to millions of people in a way that, say, Windows programming never was, and never could be. HTML coding is so simple, that you might well dispute whether it is programming at all.

But in this chapter, I hope to have you stare at some fairly ordinary-looking HTML code, long enough to see, not only that it is programming, but also that it represents something very important: the beginnings of a new platform for software development.

To most readers of this book, the following source code for a web page likely seems no big deal:

<HTML>
<A HREF="http://www.amazon.com/exec/obidos/ISBN=1568843054">
<IMG SRC="http://www.idgbooks.com/images/smallcovers/1-56884-305-4.gif"
ALT="Click here to order"></A>
</HTML>

This web page (Example 1-1) is written, of course, in HTML, the Hypertext Markup Language. The <IMG> tag causes a web browser to display an inline image; in this case, it happens to be a GIF (Graphics Interchange Format) file of the cover of my previous book, located at the publisher's web site, www.idgbooks.com. (I'm going to be mentioning my previous book a lot in this chapter, but merely as an example that I know intimately; many other examples would work just as well, and I don't intend this as any sort of advertisement for Unauthorized Windows 95 -- especially when, as you'll see, I now find the intricacies of Windows programming a lot less interesting and important than I once did.)

If someone clicks on the image, the <A HREF> tag takes them to an order form for the book at an online bookstore, www.amazon.com.

As I said, this HTML code is No Big Deal. And when displayed by a web browser, it certainly doesn't look like much either. It's just a picture of a book cover; if the mouse hovers over the picture, the note "Click here to order" (from the <IMG ALT> statement) briefly appears, at least in some browsers (see Figure 1-1). If the reader has turned off graphics in their browser, that note is all that appears.

Figure 1-1: from
http://www.sonic.net/~undoc/book/ex11.html
Figure 1-1

But look again. This HTML page is located on one machine (www.sonic.net); the image is located on a second machine (www.idgbooks.com); clicking on the image takes you to a third machine (www.amazon.com). Using any number of tools (described later in this chapter), one can readily find out that one of these machines is running the NCSA/1.5.2 web server, another is running Apache/1.1.1, and the third is running Netscape-Commerce/1.12. They have little relation to each other. Yet they have been lashed together to form some sort of odd hybrid.

That all these comprise a single hyper-document may not be apparent from this example. After all, a reader must explicitly click on the image to go to the order form. So perhaps the following variation (Example 1-2) will make clearer what is going on here:

<HTML>
<FRAMESET FRAMEBORDER=no BORDER=0 COLS=135,*>
<FRAME SRC="http://www.idgbooks.com/images/smallcovers/1-56884-305-4.gif">
<FRAME SRC="http://www.amazon.com/exec/obidos/ISBN=1568843054">
</FRAMESET>
</HTML>

Using the <FRAMESET> and <FRAME> tags introduced by Netscape, and now supported by other web browsers such as the Microsoft Internet Explorer (MSIE), this small HTML page on one machine visually joins together the components from the other two machines. As can be seen in Figure 1-2, the "FRAMEBORDER=no BORDER=0" attributes make the join seamless. The "COLS=135,*" allocates space for the image in the first (borderless) frame; the amazon.com order form nicely flows into whatever space we've left over for it in the second frame. {{Aaargh! Doesn't work quite as well now that amazon.com redesigned its site: now they use frames too, so have smaller real estate in which to fit book description. Actually, can also access in old way, but need different URL? Put "change-style" in URL before ISBN. Yes, seems to work, but is toggle?? Check with GETURL: very instructive, they use TABLE, not FRAME. And /change-style/ISBN turns off graphics, change-style/ISBN/t/ turns it back on. So this is fine!}}

Figure 1-2: from
http://www.sonic.net/~undoc/book/ex12.html
Figure 1-2

Thus, two components from two different vendors, with no coordination between them, have been happily married together -- or, at least, hustled together in a shotgun wedding. How can this work without explicit coordination? In a way, that's the subject of this book, but briefly the answer is: industry standards. So long as these vendors run software that adheres to standards such as the Hypertext Transfer Protocol (HTTP), things work out pretty well, most of the time.

To put it one way, what these few lines of HTML represent is a "compound document," with its components located on different machines, running different brands of web server, and probably running different operating systems as well. Furthermore, as we'll see later, one of the components is not even a file, but the dynamically-generated output of a program. Thus, we not only have a compound document in Example 1-2, but also distributed computation: both of these were supposed to be "hard" problems, yet here they appear in the form of hypertext/hypermedia links, which even many non-programmers feel comfortable constructing -- which my bus driver enjoys constructing.

To put it another way, we have here the crude beginnings of a platform for a new type of software. Ok, make that very crude, because as we'll see there are plenty of problems. And those industry standards are odd things: every vendor seems to have its own version. But we really are witnessing the beginnings of a new type of software: less generic than what we've become used to, specific, tailored, focused, {{yuk! vague!}} with more emphasis on content; software that is truly "document-centric," to use a phrase that Microsoft popularized but could never seem to follow through on. And the ratio of effort to effect seems amazingly low, at least to someone coming from a C and Windows programming background, where, notoriously, it took one hundred lines of code just to display the string "hello world!"

Why Would You Want to Do That?

But I'm getting ahead of myself here. Many computer books, including the ones I've written, tend to explain how to do something, without ever explaining why you would want to do it in the first place. In my books on Windows programming, I would explain how to call undocumented Windows functions such as PrestoChangoSelector() or TabTheTextOutForWimps() (both genuine names, by the way). A natural question would be "Why would you want to do that?" Indeed, it was sometimes difficult to explain why someone would want to call a function named TabTheTextOutForWimps(). A lot of times, I knew about solutions, without knowing what problems they were good for.

Now I'm writing about very-widely documented HTML tags such as <IMG> and <FRAME>, but all the same "Why would you want to do that?" remains a good question. So what that two separate documents can be lashed together with frames into something that looks like a single document? Perhaps this is just another solution in search of a problem, another hammer to which everything looks like a nail?

Well, this time around I actually had the problem before I knew the solution. Let me step back and explain.

One of the frustrations of being an author is that there are always people who apparently would like to buy your book, but who can't seem to find it in any bookstores. When you're an author on email, you hear about it directly, like: "I've want to get your book, but none of the bookstores near me carry it. Could you tell me where to find it, or better yet, send me a copy? I'll send you a check by return mail." I receive a fair number of emails like this: as if the publisher's warehouse were located in my garage! As if an author has anything to do with selling his or her own book. That's the publisher's job, right? Well, as we'll see, the web has a way of turning this sort of assumption on its head. In fact, these customers were absolutely right to ask me. Now I need to support them.

I got tired of sending out individual emails telling people about different places they could try to find my previous book, so I decided to put up a web page with this information. At first, the page just had some names of likely bookshops, with their web addresses (Example 1-3):

{{B&N new site accessible via ISBN? Time, 14 April 1997; no, only AOL: http://biz.yahoo.com/prnews/97/03/18/aol_bks_x_1.html}}

<HTML>
A few places online where you can purchase <I>Unauthorized Windows 95</I> and related books:
<UL><LI><A HREF="http://www.amazon.com/exec/obidos/ats-query-page">amazon.com</A>
<LI><A HREF="http://www.books.com/scripts/search1.exe">Book Stacks</A> 
<LI><A HREF="http://www.compubooks.com/bin/shop/compubooks/e/shop.html#search">CompuBooks</A> 
<LI><A HREF="http://www.staceys.com">Stacey's</A> (San Francisco CA)
<LI><A HREF="http://www.softproeast.com/">Softpro</A> (Burlington MA; Denver CO)
<LI><A HREF="http://www.clbooks.com/cgi-bin/searchform">Computer Literacy</A> (San Jose CA)
<LI><A HREF="http://www.quantumbooks.com/find.html">Quantum Books</A> (Cambridge MA)
<LI><A HREF="http://www.powells.portland.or.us/cgi-bin/mk-search.pl">Powell's Technical Books</A> (Portland OR)
<LI><A HREF="http://www.compbook.co.uk/cgi-win/search.exe/single">Computer Bookshop</A> (London)
<LI><A HREF="http://secure.bookshop.co.uk/search.htm">Internet Book Shop</A> (UK)
<LI><A HREF="http://www.lmet.fr/Rechercht.html">Le monde en 'tique</A> (Paris)
<LI><A HREF="http://www.buchkatalog.de/kod-bin/isuche.exe?lang=deutsch&dbname=Buchkatalog&PARAM=LNKUSERID&Aktion=Suche">KNO-K&V Buch Katalog</A> (Germany)
<LI><A HREF="http://www.hotline.com.au/hotkey.html">Hotline Books</A> (Australia)
<LI><A HREF="http://www.yahoo.com/Business_and_Economy/Companies/Books/Computers/">
Other online stores for computer books</A> (from Yahoo!)
</UL>
</HTML>

I'd email them the web address of this page. If more people used HTML-enabled mail readers like the Netscape Messenger Mailbox, or if more vendors would support HTML-based email (Microsoft is in Outlook Express 4.0), I could email them the actual web page, as shown in Figure 1-3: note how my little email message is joined to the contents of the web page I'm sending, and how all the links in the email message are clickable.

Figure 1-3: ex13.html in mail message
Figure 1-3

Rich Text and Email Bombs
HTML-based email is one example of how HTML is becoming a "lowest common denominator" file format; perhaps it will eventually replace plain-vanilla ASCII. The possibilities for "richer" text bring with them additional worries, however.
For example, the ability to construct "unsafe URLs" (see chapter 2), coupled with the implicit URL loading that's triggered by the <IMG> tag, means we'll not only be seeing rich-text email, but also email "bombs." As a particularly nasty example, Windows users might consider the following line of HTML:
<IMG SRC="file://AUX">
A web browser or HTML-enabled email reader, on encountering the <IMG SRC> tag, will try to load the image whose URL is given in the SRC= attribute. A file:// URL points to a file on the recipient's local file system (again, see chapter 2). Unfortunately, Microsoft's MS-DOS Programmer's Reference notes that "If the auxiliary device is not present or not ready to receive or send data, a program that reads or writes data to the device may hold indefinitely." Since someone receiving this one-line HTML page is unlikely to have an AUX device ready to deliver data, much less an image, "hold indefinitely" is precisely what the browser or emailer does. In Windows 95, this can halt everything, leaving the unfortunate recipient with no choice but to turn their machine off and on again. Given that the Windows 95 file system is aggressively cached, this can cause serious data loss. Richard Smith of Phar Lap Software has been beating the drums about this one, and hopefully PC browser vendors will do something to plug this nasty security hole. (Yes, an HTML page's ability to crash the user's browser, and possibly the operating system, is a security hole.)
Add the facts that HTML can now include JavaScript code, and that the <APPLET>, <EMBED> and <OBJECT> tags can automatically load some piece of ("safe," yeah, right) binary code, and you might think twice about the mere act of opening an email from an unknown or untrusted source in an HTML-enabled mail reader, because merely opening such a document means executing the unknown code.
{{Not to be confused with "Good Times" email virus hoax/urban legend. See http://www.physics.uiuc.edu/~weitzen/humor/hoax.html, an FAQ which includes: "Is an email virus possible? The short answer is no, not the way Good Times was described. The longer answer is that this is a difficult question that's open to nitpicking." Goes on to acknowledge that "There are some email programs that can be set to automatically download a file attachment, decode it, and execute the file attachment. If you use such a program, you would be well advised to disable the option to automatically execute file attachments." That's essentially what we're dealing with here.}}
HTML may well become the ASCII of the 21st century (the first few years of it, anyway), and that will mean far more "powerful" documents, but blurring the old false division between documents and programs does come at a price: increased risk. Only those who believe in Santa Claus and free lunches will be surprised at this. Perhaps HTML-based email readers (and all HTML browsers, for that matter) will require an option to disable all implicit loading (IMG, FRAME, APPLET, etc.) and instead present the user with a selection menu.
{{http://www.infoworld.com/cgi-bin/displayTC.pl?/reviews/970324antivirus.htm: Norton AntiVirus for Internet Email Gateways; description at http://www.symantec.com/nav/wpnavieg.pdf}}

From the links on this page, I figured people could easily find the book. About the only assistance I provided was to try to give them a link for each bookstore's search form, rather than for the bookstore's home page (for example, http://www.compbook.co.uk/cgi-win/search.exe/single rather than just http://www.compbook.co.uk). Not exactly door-to-door service; it was more like dropping them off at the corner, and waving vaguely in the direction of my book.

Still, it seemed rather obvious how to fill out a search form at an online bookstore. Figure 1-4, for example, shows part of the search form at Le monde en 'tique, a wonderful computer bookstore in Paris:

Figure 1-4: from http://www.lmet.fr/Rechercht.html
Figure 1-4

Filling out a form like this generally brings you to a list of books by an author you've specified, or of books whose title includes words you've specified. From there, you click on the book you want, and you're taken to a page with an order form, and sometimes with reviews or descriptive material. For example, Figure 1-5 shows part of a page for my book at the Computer Literacy chain of bookstores in Silicon Valley; back in Figure 1-2, the right frame shows a similar page from amazon.com's large online bookstore.

Figure 1-5: from http://www.clbooks.com/cgi-bin/displayinfo
Figure 1-5

Having just said that such pages typically contain an "order form," you can see from both Figure 1-2 and Figure 1-5 that, in fact, there's really just a button that says "Add this book to your shopping basket" or "Put in shopping cart," or perhaps something cute like "Put this in my book bag."

Shopping Carts and Cookies
Why isn't there an order form located on the page for each book? Because all these online stores want to make it relatively easy for you to buy more than one book at a time. This requires the ability to browse around, select a book or video cassette or CD or whatever to buy, browse around some more, pick more items to buy, and when you're done shopping, to proceed to the check-out counter.
This presents an interesting technical problem, because HTTP, the Hypertext Transfer Protocol upon which much of the web is currently based, is a "stateless" protocol. {{Explain "state"!}} A client, such as a web browser, connects to a server, asks for a page, gets the page, and then in HTTP/1.0 is even disconnected by the server. Even if the server does keep the connection open for additional requests from the client (as in HTTP/1.1 and the "Connection: Keep-Alive" option of HTTP/1.0), the server is still not supposed to "remember" anything from one request to another. Meanwhile, a virtual shopping cart clearly requires memory, or "state."
By now, the solution to the shopping-cart problem is common, and shows the flexibility of Internet standards. An application can either carry around "state" in hidden fields of an HTML form (chapter 4 will discuss "<INPUT TYPE=hidden>" in detail), or it can use the "persistent cookies" solution introduced by Netscape (and now supported by other vendors, including Microsoft). "Cookies" are really just an extension to HTTP headers. So shopping carts did not require a major overhaul of web standards; shopping carts could be built right on top of stateless HTTP by making clever use of HTML, or by adding some new HTTP headers for cookies.
{{Privacy concerns over cookies: http://www.news.com/News/Item/0,4,8770,00.html.}}
{{Hidden form fields regarded as kludge: Orfali/Harkey, "Client/Server Programming with JAVA and CORBA," ch. comparing to CGI, p. 226: "the kludge is to use hidden fields within a form to maintain state on the client side.... in essence, the CGI program stores the state of the transaction in the forms it sends back to the client instead of storing it in its own memory. What do you think of this workaround? We did warn you that it was going to be a real work of art." But what actually so bad? Must be the sheer simplicity they dislike?}}
{{Transition here confusing!}}
Certainly, there are well-known problems with the HTTP protocol. It could be argued that HTTP has become a victim of its own success, was too simple and didn't "scale well," is not a "good network citizen," and is now in part responsible for the "World Wide Wait." According to HREF="http://www.w3.org/pub/WWW/Protocols/HTTP/1.0/HTTPPerformance.html, "The effects of HTTP/1.0's use of TCP on the Internet have resulted in major problems caused by congestion and unnecessary overhead." Yet the solutions are by no means obvious. Another paper, http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Pipeline.html, reports on tests done with persistent connections, pipelined requests, and data compression, using as sample data a combination of the Microsoft and Netscape home pages ("Microscape"!). The results are somewhat surprising; for example: "An HTTP/1.1 implementation that does not implement pipelining will perform worse (have higher elapsed time) than an HTTP/1.0 implementation using multiple connections." In other cases, improvements were merely modest. While completely new binary protocols such as HTTP-NG have been proposed, any successor to HTTP is likely to be a text-based superset of HTTP.
Incidentally, if you want to learn more about web technologies such as cookies or HTTP/1.1, a good place to start is the web itself. In particular, the Yahoo! site has imposed some organization on the chaos of the web, by introducing logically-structured URLs. For example, you can find some key links to information about cookies at:
http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/HTTP/ Protocol_Specification/Persistent_Cookies/
This page includes a link to a tutorial "How to make cookies and shopping cart" (http://www.ids.net/~oops/tech/make-cookie.html), by a company that has made a business out of supplying shopping-cart software to other vendors on the web (http://www.rent-a-cart.com).

Figure 1-6 shows part of a shopping basket at amazon.com. (The number "2503-2540908-490803" in the URL is not a credit card number, incidentally, but a transaction ID assigned automatically by the store.) Somewhat awkwardly, you can use your browser's "Back" button to step back and buy more books. Eventually, you say you're done, and you press a button that says something like "Buy items now." In the case of amazon.com and other stores, this takes you to a secure server (note the "https://" in the URL in Figure 1-7), which provides a form into which you can enter your credit-card number (though, as seen in Figure 1-7, it's also easy enough to instead phone in your credit-card number). When a URL contains https:// rather than http://, the Secure Sockets Layer (SSL) encrypts all inbound and outbound packets; we'll look at SSL in more detail later.

Figure 1-6: from http://www.amazon.com/exec/obidos/shopping-basket
Figure 1-6

Figure 1-7: from https://www.amazon.com/exec/obidos/order2
Figure 1-7

If you've made it this far (it's definitely a more complex process than one would like), you enter the shipping address, press a last confirmation button, and you're done. You soon receive an email confirmation, your credit card is debited, you receive another email when the items are shipped (this email will often include a package-tracking number; see below), and with any luck, a couple of days later you receive the items themselves. Some times I've had two-day turnaround (order over the web on Monday night, Wednesday the package arrives), other times I've had to wait six weeks.

I've tried to give a fairly complete picture here of the user's experience of online commerce. It's definitely more complicated than it should be: all those fields to enter and buttons to press. Using the browser's Back button as a major navigational device feels unnatural. Still, online shopping often beats a trip to the local bookstore or CD shop (for example, you can search for books whose title you only vaguely remember, without trying to get an overworked clerk to help you).

More important, from my perspective as an author trying to support my books online, I could tap into this whole process with just a few lines of HTML. As Figure 1-8 shows, it can even be made to appear seamlessly as part of my own page (the entire right side is part of the form for sending credit-card information to amazon.com's secure server; incidentally, if you're wondering why the "https://" URL does not show up in the browser's location bar, see the sidebar on "Naive URL Re-Use" in chapter 2).

Figure 1-8: from http://www.sonic.net/~undoc/book/ex12.html
Figure 1-8

So why would I, someone who has messed around for the past ten years in the hidden, undocumented, internals of MS-DOS and Windows, want to get involved with such a high-level, almost non-programming activity like HTML coding? It was things like those credit-card images in Figure 1-7 that did it. Once I saw how easy it was to tap into the hard work that others had done, and adapt it for my own applications, I was hooked. It became clear that web sites have the potential to be reusable software components (see chapter 5, "The Tools Approach to the Web").

True, the results are somewhat primitive, and the web has many problems. But these only make things more interesting: problems are, of course, merely opportunities in disguise -- so long as the fundamentals work well. The display in Figure 1-8, enabled by the few lines of code in Example 1-2, shows, I think, how well the fundamentals of the web do indeed work.

If you've developed software before, you should be absolutely thrilled, amazed, and excited that the ridiculously-simple code in Example 1-2 produces the display in Figure 1-8. True, the code for the credit-card form on the right side of Figure 1-8 nowhere appears in Example 1-2; all I've done is link to someone else's code, which I've neglected to show. Am I therefore minimizing the actual amount of work that's required to use HTML to build an application such as that shown in Figure 1-8? Not at all. Aside from the fact that forms building in HTML is trivial, the real point is precisely that I didn't have to concern myself with the work involved in producing the form: it behaves as a truly reusable software component, even re-formatting to fit the frame into which I've coerced it.

URLs: Handles to Data

As you'll recall, my attempt to nudge along sales of my book on the web had got as far as posting the list of bookstores in Example 1-3. As I said, I would take potential customers and drop them off at the corner, and from there they could hopefully find their own way to my book.

At an online store called Book Stacks Unlimited, though, I noticed this at the bottom of the page for one of my books:

Link URL = http://www.books.com/scripts/view.exe?isbn~1568843054

The phrase "Link URL" was highlighted; clicking on it produced this explanation:

If you have a reference about a book on your Web site, you can now link it to our bookstore. Why would you want to do this? Well if you're a publisher or an author, it's the perfect way to sell your titles online without having to deal with any of the ordering or shipping details.
But what if you're neither the publisher or author, and you just happen to have put something up on your site about a good book? Partner with Book Stacks and we will share 8% of all of the resulting sales with you.

I'll get into that 8% business later; for now, I just want to focus on the fact that, using the ISBN number (see below), one could link into the store's database, directly to the page for almost any book in print. How odd, that each book in print had its own "home page," so to speak, on the web. One could have a link, not just to a bookstore, but directly to an order form for a specific book. I didn't have to drop potential customers off at the corner; I could give them door-to-door service.

Perhaps this all seems obvious, but to me it was a revelation. I've seen others have this same sort of blinding revelation, the day it dawns on them how powerful URLs really are. Note that it takes only about 50 bytes of text to pinpoint my book. This same pointer works from any web-capable machine in the world. The HTML in Example 1-1, which additionally pinpoints a graphic of the book, located on a completely different machine, takes a total of perhaps 200 bytes. Once you have laboriously found your way to some resource on the web that you want to reuse, it is frequently trivial to then save away a handle to that specific resource: not only save it away as a "bookmark" in your web browser (yawn), but far more significantly, incorporate it as a link, image, form, or frame into your own projects.

Now, I know there are notorious problems with URLs, and we'll examine these in chapter 2, but first take a moment to consider how well they often do work. The following line is quite amazing, if you stop to think about it:

Buy it!

Nor was this ability to link directly into very specific parts of their database unique to the Book Stacks site. Several other online bookstores provided the same capability. ISBN linking didn't work at every store listed in Example 1-3, but it worked at enough of them that I could now drive potential buyers directly to the book at several different stores:

http://www.amazon.com/exec/obidos/ISBN=1568843054

http://www.viamall.com/softpro/1-56884-305-4.html

http://www.hotline.com.au/cgi-bin/title_search?keyword=1568843054 (Australia)

You'll immediately notice that, while all four of the URLs refer to the same ISBN number 1-56884-305-4, the syntaxes are otherwise quite different. Evidently, there are few rules for the formation of URLs, once you get past the protocol://hostname part. Each site has set up its system differently. This is to be expected, but it means that rules learned for one system don't help much trying to link into a different system. As we'll discuss in more detail in chapter 2, "educated guesses" can be a poor way of generating URLs.

In fact, one might suspect that the efficiency of the URLs for my book, and what little regularity they have, owes more to the ISBN numbering scheme (see sidebar) than it does to the URL scheme.

ISBN
The International Standard Book Number (ISBN) consists of ten digits and is made up of four parts: Group identifier, Publisher identifier, Title identifier, and Checksum digit. (The Group identifier, assigned by the ISBN Agency in Berlin, stands for language and country groups; for instance, English-speaking countries use 0 or 1.) {{Is ORA prefix 1565 or 15659?}} The ISBN system was established in 1968; according to http://www.reedref.com/Standards/isbn.html, "virtually every item sold in a bookstore requires an ISBN as increasing numbers of publishing systems base their entire inventory on the ISBN."
If it seems remarkable that each book in print has an address on the web, it's additionally noteworthy that www.amazon.com recently appears to have added about a million out of print books to its catalog. This includes even some books published before the introduction of ISBN numbers in 1968: for example, http://www.amazon.com/exec/obidos/ISBN=0849534313 has a publication date of June 1923!? I asked info@amazon.com about this, and was told: "We used a variety of data sources and selected only books for which ISBNs were available as a method of discriminating easier to fill out of print titles. If we had chosen books w/o ISBNs, the task would be substantially harder to acquire and receive these titles."
{{Placed order for some OOP books on March 17, as of XXX hadn't received any notice. Placed orders for these same books with ABE on April 20, next day received email about three of them: two (Edelman, Disney) being sent to me, another not available (Boktin; but other copies at ABE). B&N also says carry OOP books? Time, 17 April 1997}}
{{Books out of print at http://www.reedref.com/bop/. The Lane Egypt ISBN comes from Bowker, see http://www.reedref.com/bop/BOPformsrch.html}}
{{OOP books at http://www.abebooks.com/cgi/abe.exe/routera^_pr=inventoryKeys^phase=1; listed in http://www.bookwire.com/index/antiquarian-booksellers.html. Looks excellent! Also http://www.antiquarian.com/bookworm/: Antiquarian BookWorm. Also InterLoc (http://daniel.interloc.com): last week went "public access," previously had been limited to dealers and libraries. Huge database! Placing search form in one frame, results in another (including email form) makes interactive.}}

But it really isn't just items with ISBN numbers that work this way. As another example, I mentioned earlier that after one of these online stores ships an item to you, they'll typically send you an email containing a package-tracking number. If they ship via United Parcel Service (UPS), for example, you can go to the form at http://www.ups.com/tracking/tracking.html, enter in the package-tracking number, and find out where your package is ("what do they mean, they already delivered it?! oh, there it is, under the newspapers on the back porch"). The HTML source for the form looks like this:

<!-- from http://www.ups.com/tracking/tracking.html -->
<FORM METHOD="POST" ACTION="http://wwwapps.ups.com/tracking/tracking.cgi">
<P><I>Tracking number:</I> <INPUT TYPE="TEXT" NAME="tracknum" SIZE="40"></P>
<P><INPUT TYPE="submit" VALUE="Track this package"> <INPUT TYPE="reset" VALUE="Clear form and start over"></P>
</FORM>

In other words, this form takes the text field named "tracknum", and POSTs it to http://wwwapps.ups.com/tracking/tracking.cgi. That's it.

The question is, can we dispense with this form and instead form a URL to our package? For reasons we'll get into in chapter 2, taking data meant to be POSTed, and instead tacking it onto the end of the URL (equivalent to METHOD=GET) doesn't always work. In this case, however, it does work; you can directly place the tracking number in a URL:

Where's my package?!

<A HREF="http://wwwapps.ups.com/tracking/tracking.cgi?tracknum=1Z742E220310270799"> Where's my package?!</A>

Figure 1-9

{{Fedex similar, e.g., http://www.fedex.com/cgi-bin/track_it?trk_num=4364020496&dest_cntry=U.S.A.&ship_date=040797}}

A FORM and Its ACTION are Distinct
You may be wondering why the form is located at http://www.ups.com, yet the URL I manufactured goes to http://wwwapps.ups.com. These really are different machines: http://www.netcraft.com/cgi-bin/Survey/whats?host=www.ups.com&port=80 reveals that www.ups.com is running Netscape-Enterprise/2.01, while http://www.netcraft.com/cgi-bin/Survey/whats?host=wwwapps.ups.com&port=80 reveals that wwwapps.ups.com is running Netscape-Communications/1.1.
Note that the location of a form -- the "front-end" or user interface to a program -- and the location of the program that acts as the "back end" to this form, need have nothing in common. Because the ACTION= attribute of a <FORM> tag often contains a relative URL, it is frequently forgotten that ACTION= can be a full URL. This is an important point for chapter 4, "Snarfing Forms," which will show how a form at one site can act as the user interface for software running at a completely different site.
For now, it's worth mentioning one example: if METHOD=post, a form's ACTION can even have a "mailto:" URL; the ACTION will just be an email message sent to the specified address. For example:
<FORM ACTION="mailto:andrew@ora.com" ENCTYPE="application/x-www-form-urlencoded" METHOD=POST> var1: <INPUT TYPE=text NAME="var1">
var2: <INPUT TYPE=text NAME="var2"> <INPUT TYPE=submit> </FORM>
Typing "this is a test" into the var1 text field, and "666" into the var2 text field, will result in something like the following mail message:
From: John Doe X-Mailer: Mozilla 4.0b2 (Win95; I) MIME-Version: 1.0 To: andrew@ora.com Subject: Form posted from Mozilla Content-type: application/x-www-form-urlencoded var1=this+is+a+test&var2=666
While this is actually sometimes useful -- you can have a form POST data to yourself to see exactly how the posted data would look to a program -- the point here is merely that, quite obviously in this example, the front-end form and its back-end ACTION are distinct.

Now, I don't know about you, but it strikes me as very strange that every package has its own globally-accessible URL, its own "home page" on the web, as it were (see Figure 1-9). {{See more like this in chapter 2: "An Address for Everything" on URLs}}

Want to make sure reader understands that I'm not showing them how to "surf" the web. That not purpose of URLs. URL is like address of a function that returns data. "Returns"? Or just displays? Reader may dispute this, along with whole assumption of this chapter that sending back a doc to display in a browser is somehow equivalent to returning a value to a caller. Seem to be claiming that e.g. www.altavista.digital.com is really a world-wide altavista() function, callable via some sort of RPC. Yes, I am. Note that HTTP is not displaying doc. Really does return data to a client. What caller does with it, is it's business. Browsers happen to be clients we're most familiar with, but client not necessarily browser. Browsers are clients that display, using HTML tags as instructions on how to display. Another piece of software, however, might fetch doc (in same way as browser) and then do something different -- such as extract piece of info. See Clinton Wong book.

Named pipes?

Example make this clearer: DOS batch file that, given ISBN, returns book's price. Uses geturl utility from chapter 5, together with grep. To anticipate, geturl works like this:

usage: geturl [options] <http://whatever or -stdin>
  options:
  -noloc : don't do HTTP relocations (default on)
  -base <addr> : use addr as base for all relative URLs
  -head : do HTTP HEAD (default GET)
  -post <data> : do HTTP POST of data
  -input <file> : get all HTTP headers from file
  -stdin : get URLs from stdin
  -split : break HTML output into lines on tags

C:\>type isbn2price.bat
@echo off
geturl -split http://www.amazon.com/exec/obidos/ISBN=%1 | grep List:

C:\>isbn2price 1568843054
List: $39.99

Since word "List:" might appear elsewhere in text (annotations, see below), for robustness should make this "egrep "^List: \$[0-9.]*$" ?? See ORA book on regular expressions. If don't have grep, then can use DOS find, but grep better.

geturl -split http://www.amazon.com/exec/obidos/ISBN=%1 | \windows\command\find "List:"

Note that geturl.exe automatically does Location: redirections

No error handling here (not handling bad ISBN, out of print, etc.). Can use egrep to search for multiple strings at once. Using -split to separate all HTML tags from text, but still long lines (e.g., for out of print book below): {{Use ^ and $ to make egrep search more robust}}

C:\>type isbn2price.bat
@echo off
geturl -split http://www.amazon.com/exec/obidos/ISBN=%1 | \
   egrep "(List:)|(Sorry)|(Availability)|(wrong)"
echo.

C:\>isbn2pri 1
Something seems to have gone wrong...

C:\>isbn2pri 0198226357
Availability: This item is out of print, but if you place an order we
may be able to find you a used copy within 2-6 months. We can't
guarantee a specific condition, binding, or edition. If we find a
copy, we will notify you via e-mail and request your approval of the
price. We'll also notify you if we can't find a copy.  PLEASE NOTE:
Each out of print item is shipped and billed separately.

C:\>isbn2pri 0060922753
List: $12.50
Availability: This item usually shipped within 2-3 days.

Maybe for this chapter, stop here, and refer to C code in chapter 5. To really make clear that data on web is reusable component, need to show C function that takes isbn, returns price. Build this on top of new http function in http.c. But need to then put redirection inside http().

More sophisticated version: isbn2price() function in AWK. Could as easily done in perl or C. {{Make search patterns more robust with ^ and $}}

In AWK, can implement grep in few lines {{show grep.awk?}}, so obviously can turn any grep pattern into AWK program.

C:\>isbn2pri 1568843054
ISBN #1568843054: List price: $39.99 (US)

C:\>isbn2pri 0140177388
ISBN #0140177388: List price: $5.95 (US)

C:\>isbn2pri 038508031X
ISBN #038508031X: List price: $11.00 (US)

C:\>isbn2pri 0849534313
ISBN #0849534313: Out of print

C:\>isbn2pri 1
ISBN #1: Incorrect ISBN format?

# isbn2price.awk

BEGIN {
    CANT_FIND = -1;
    INTERNAL_ERROR = -2;
    OUT_OF_PRINT = -3;
    SOMETHING_WRONG = -4;
    }

function isbn2price(isbn)
{
    cmd = "geturl http://www.amazon.com/exec/obidos/ISBN=" isbn;
    while (cmd | getline)
    {
        if ($0 ~ /^List: \$/)
        {
            nf = split($2, arr, "$");
            return arr[2];
        }
        else if ($0 ~ /^Sorry,/)
            return CANT_FIND;
        else if ($0 ~ /^Availability: This item is out of print, but/)
            return OUT_OF_PRINT;
        else if ($0 ~ /Something seems to have gone wrong.../)
            return SOMETHING_WRONG;
    }

    # still here
    print "List error: amazon.com format may have changed!" > stderr;
    return INTERNAL_ERROR;
}

function print_price(isbn)
{
    printf("ISBN #%s: ", isbn);
    price = isbn2price(isbn);
    if (price == INTERNAL_ERROR) print "Internal error";
    else if (price == CANT_FIND) print "Couldn't locate book";
    else if (price == OUT_OF_PRINT) print "Out of print"; 
    else if (price == SOMETHING_WRONG) print "Incorrect ISBN format?";
    else printf("List price: $%.02f (US)\n", price);
}

BEGIN {
    if (ARGC < 2)
        exit;
    for (i=1; i<ARGC; i++)
        print_price(ARGV[i]);
    ARGC = 1;
    }

Notice that pattern/action language good. Expect certain strings, what to do when get. cf. language "Expect" for building apps like this??)

Name/title to ISBN:

C:\>type amazsearch.bat
@echo off
geturl -post author=%1&author-mode=full&title=%2&title-mode=word \
   http://www.amazon.com/exec/obidos/ats-query/ | egrep "ISBN(=|:)"

C:\>amazsearch Schulman Unauthorized
<a href="/exec/obidos/ISBN=1568841698/0712-5505930-011155">Unauthorized Windows
95 : A Developer's Guide to Exploring the Foundations of Windows 'Chicago'</a>;
<a href="/exec/obidos/ISBN=1568843054/0712-5505930-011155">Unauthorized Windows
95 : Developer's Resource Kit/Book and 2 Disks</a>;
<a href="/exec/obidos/ISBN=1568847076/0712-5505930-011155">Unauthorized Windows
95 CD-ROM</a>;

C:\>amazsearch Kisseloff Box
<a href="/exec/obidos/ISBN=0140252657/2304-8300827-761158">The Box : An Oral His
tory of Television, 1920-1961</a>;
<a href="/exec/obidos/ISBN=0670864706/2304-8300827-761158">The Box : An Oral His
tory of Television, 1929-1961</a>;

C:\>amazsearch Waldmeir Mammon
ISBN: 0933951647<br>

{{Need to output something more useful if response is actual page, not list of pages?}}

Of course, dependent upon amazon.com page now. {{Especially risky because as see later, any user can add annotations to amazon page. Annotation might include, either accidentally or maliciously, the patterns our software is depending on!}}

But such dependencies nothing new: cf. DLL versionitis; MS so anal about linkages in COM, etc., but fact is, never worked too well with DLLs (WinSock, etc.). Loosely coupled systems. Do need to do something about changes to underlying site: add error page with mailto: URL?

In case seems contrived, here's another, genuine example: Amazon Top 50 computer books (http://www.amazon.com/exec/obidos/bestsellers/computer50), SHOW screen shot, wanted to know publisher ranking, could throw together app very quickly:

Source code for top 50 list looks like this (number after ISBN is temp shopping cart # that assigned via redirection):

<B>1.</B> <a href="/exec/obidos/ISBN=1568302894/1270-0422490-440341">Creating Killer Web Sites : The Art of Third-Generation Site Design</a>;
 David Siegel; Paperback; $27.00; <i>Descriptive information available.</i><p>
<B>2.</B> <a href="/exec/obidos/ISBN=0062514792/1270-0422490-440341">What Will Be : How the New World of Information Will Change Our Lives</a>;
 Michael L. Dertouzos; Hardcover; $15.00; <i>Descriptive information available.</i><p>
<B>3.</B> <a href="/exec/obidos/ISBN=1562057154/1270-0422490-440341">Designing Web Graphics .2</a>;
 Lynda Weinman; Paperback; $33.00; <i>Descriptive information available.</i><p>
<B>4.</B> <a href="/exec/obidos/ISBN=1565921496/1270-0422490-440341">Programming Perl</a>;
 Larry Wall, et al; Paperback; $23.97; <i>Descriptive information available.</i><p>
...

First, little program to generate list of publishers, in order. First pass, get list of URLs:

geturl -split http://www.amazon.com/exec/obidos/bestsellers/computer50 | \
   grep ISBN= > amaz50.lst

amaz50.lst looks like this:

<a href="/exec/obidos/ISBN=0062514792/2076-6105754-531263">
<a href="/exec/obidos/ISBN=1568302894/2076-6105754-531263">
<a href="/exec/obidos/ISBN=1565921496/2076-6105754-531263">
<a href="/exec/obidos/ISBN=0471117099/2076-6105754-531263">
<a href="/exec/obidos/ISBN=0201633612/2076-6105754-531263">
...

Now submit this to geturl -stdin, get back 50 pages, one for each book. Relative URLs, so use GETURL -base option to fetch URL for each of these pages at www.amazon.com; GETURL can peel off A HREF:

grep -base http://www.amazon.com -stdin < amaz50.lst

In each page, info like (show extract from amaz50.big for Programming Perl).

Want "Published by" line. (Later, want price too.) "Published by" might appear in annotations, so grep for "Published by" at beginning of line, <BR> at end:

geturl -split http://www.amazon.com/exec/obidos/bestsellers/computer50 | \
   grep ISBN= > amaz50.lst
geturl -base http://www.amazon.com -stdin < amaz50.lst | \
   grep "^Published by .* <br>"

{{But standard publisher prefixes! E.g., ORA is 56592. Silly to fetch doc from amazon for each one?}}

Output looks like following, in-order list of publishers of top 50 computer books:

Published by Harpercollins<br>
Published by Hayden Books<br>
Published by O'Reilly & Associates<br>
Published by John Wiley & Sons<br>
Published by Addison-Wesley Pub Co<br>
Published by Harvard Business School Pr<br>
Published by Ap Professional<br>
Published by Addison-Wesley Pub Co<br>
Published by Microsoft Pr<br>
Published by O'Reilly & Associates<br>
Published by Prentice Hall<br>
Published by Microsoft Pr<br>
Published by O'Reilly & Associates<br>
Published by Harpercollins (Paper)<br>
...

Need more robust program, also get prices, so did in AWK. See a50.awk; still need way to sort last three tables; produces HTML output (see also chapter 7 on web browser as display engine). Get GUI via printf!!

Now have simple sort function, but still having problems. Must be some major bug in a50.awk!

function sort_table(table) {
	local x, tx, tmp, arr, DELIM, y;
	DELIM = "!@#@!";  # some random pattern we don't expect to see in table
	for (x in table) {
		tx = table[x];
		tmp[tx] = (tx in tmp) ? (tmp[tx] DELIM x) : x;
		}
	for (x in tmp) {
		split(tmp[x], arr, DELIM);
		for (y in arr)
			sorted[x][y] = arr[y];
		}
	return sorted;
	}

function do_html_table(title, array) {
	local x, y, srt;
    print "<H1>", title, "</H1>";
    print "<TABLE BORDER=1>";
	srt = sort_table(array);
	REVERSE = 8;
	SORTTYPE += REVERSE;
    for (x in srt)
		for (y in srt[x]) {
			print srt[x][y], x > stderr;
        	print "<TR><TD ALIGN=RIGHT>", srt[x][y], "<TD>", x, "</TR>";
			}
    print "</TABLE>";
    }

# a50.awk

# TODO: figure out a good way to SORT table!
function do_html_table(title, array) {
    print "<H1>", title, "</H1>";
    print "<TABLE BORDER=1>";
    for (x in array)
        print "<TR><TD ALIGN=RIGHT>", array[x], "<TD>", x, "</TR>";
    print "</TABLE>";
    }

BEGIN {
    amaz50big = "amaz50.big";
    if (filesize(amaz50big) == -1) {
        cmd = "geturl -split ";
        cmd += "http://www.amazon.com/exec/obidos/bestsellers/computer50";
        cmd += " | grep ISBN= | geturl -base http://www.amazon.com -stdin";
        cmd += " > " amaz50big;
        system(cmd);
        }

    while (getline < amaz50big) {
        if ($0 ~ /^Published by .*<br>/i) {
            sub("Published by ", "", $0);
            sub("<br>", "", $0);
            publisher[++i] = $0;
            count[$0]++;
            }
        else if ($0 ~ /^<NOBR>List: \$[0-9.]*<\/NOBR>/i) {
            sub(/<\/?NOBR>/i, "", $2);
            sub(/\$/, "", $2);
            price[++j] = $2;
            }
        else if ($0 ~ /^<TITLE>.*<\/TITLE>$/i) {
            gsub(/<\/?TITLE>/i, "", $0);
            title[++k] = $0;
            }
        }
    close(amaz50big);

    if ((i != j) || (i != k)) {
        print "Something wrong! Publisher/price mismatch!";
        exit;
        }
    top = i;

    print "<H1>Top 50 computer books</H1>";
    print "<TABLE BORDER=1>";
    for (x in publisher)
        print "<TR><TD ALIGN=RIGHT>", x, "<TD>", publisher[x], "<TD>", 
              price[x], "<TD>", title[x], "</TR>";
    print "</TABLE>";

    # figure out "weight" for each publisher (ranking, price)
    for (x in publisher) {
        p = publisher[x];
        rank = (top - x) / 10;
        weight[p] += rank;
        wtpr[p] += (rank * (price[x] / 10));
        }

    do_html_table("Number of books per publisher", count);
    do_html_table("Weighted rankings", weight);
    do_html_table("Weighted rankings, with price", wtpr);
    }

Show sample output after running a50 > a50.html, compare with earlier shot of amazon page. (Note that could retain links in table from original amazon file.) Load a50.html into browser: would like to be automatic (see chap xxx on "local CGI").

Next thing is to host this app on the web (port to Unix)

{{Can also apply to Amazon Top 100 (not just computer books), or any list at amazon? http://www.amazon.com/exec/obidos/subst/amazon500-1.html is top 100, has link to amazon500-2.html, etc.}}

Treating web as file system (though often not actual file at server; see CGI below), apply tools, write programs to manipulate just as one writes programs to manipulate files on user's hard disk.

More on programming like this: Clinton Wong, Web Client Programming with Perl: Automating Tasks on the Web. Also see avsubmit example later. Will look at this more in ch. 2 on URLs, ch. 5

Realized that web site returns data, doesn't display it. WOW! That data can be manipulated in other ways, besides merely displaying. Many others have had the same realization. For example, Jon Udell, Byte magazine's executive editor for new media, wrote a great article on this topic for the November 1996 issue of Byte (of course, the article is available on the web: http://www.byte.com/art/9611/sec9/art1.htm:

On-Line Componentware: I use AltaVista to build BYTE's Metasearch application and realize that every Web site is a software component.
Software components can turn up in the unlikeliest places. In our May 1994 cover story ("Componentware," http://www.byte.com/art/9405/sec5/sec5.htm), for instance, we pointed out that object-oriented programming (OOP) technology had failed to produce a rich harvest of plug-and-play software objects. However, we showed that Visual Basic custom control (VBX) technology -- a hastily conceived mechanism for Visual Basic plug-ins -- had, to everyone's surprise, jump-started a thriving component-software industry.
Fast-forward to 1996. I want to prototype a Web-search application that embraces BYTE and five fellow McGraw-Hill publications. I have only a few hours to spend on the task. What component can I pull off the shelf and use? Java or ActiveX components? They're coming, but they're not here yet. Distributable search engines? They exist, but deployment across six Web sites will take more than the allotted few hours.
As I drove home from work, I suddenly knew where to find the right component for the job. It was sitting in plain view at http://www.altavista.digital.com/. That's right -- Digital Equipment's AltaVista, a public Web site, is also the software component that let me prototype the McGraw-Hill Metasearch application before I went to bed that night.
A powerful capability for ad hoc distributed computing arises naturally from the architecture of the Web.

Need summary here

HTML

Where was I? Oh yeah, had just marvelled at ability to link to specific book, global validity of this link, and then showed these links not just for bookmarks or surfing, but handles for programs.
Anyway, pretty amazing that one link/click to order form for book, or as fig12 shows, could even force reader of page to look, with <FRAME> and autoloading.
Got good to me: as also saw back in fig 12, add picture. Step back, show how got there.
Had to find picture. Kinko's cheap rates for scanner use, could go there, scan in cover, but figured must be somewhere on web. Search in AV? Note on search for ISBN; very specific searches. Or just figure that publisher should have that! www.idgbooks.com: Search for my book in online catalog, find it (if AV not know, explain why? search engines and CGIs?), sure enough has a picture, right clink on picture:
Save Image As... 1-56884-305-4.gif

Make link:

<A HREF="http://www.amazon.com/exec/obidos/ISBN=1568841698">
<IMG SRC="1-56884-305-4.gif"></A>

Then upload gif to my web site.
Wow! Click on picture, go to order form.
Ok, we've seen this already. Show border to make button. (Mis-use of TABLE tag to make border.) But didn't mention another revelation: compound documents!
First heard that phrase from MS, OLE, but contrast WinWord {fields}. Quite frankly, these never really worked for me when tried to share with others (Err=1227). See chapter 3.
But wait, I'm uncertain about just snarfing this picture, although it's of cover of my book, and I'm clearly helping publisher sell book. Go back to that page. Actually, why did I copy it at all?
Right click again, in Netscape "Copy Image Location": http://206.80.51.142/images/smallcovers/1-56884-305-4.gif
Yuk IP address (earlier said not hard-wired, but can be if want): but turns out this is just www.idgbooks.com (Actually, not quite: db.idgbooks.com, found from http://www.bankes.com/nslookup.htm).

Hmm, could just refer to that. No need to copy (interesting legal qu.; see sidebar in ch. 3; for now, just look at technical aspects):

<A HREF="http://www.amazon.com/exec/obidos/ISBN=1568841698">
<IMG SRC="http://www.idgbooks.com/images/smallcovers/1-56884-305-4.gif"
ALT="Click here to order"></A>

Now, this a very, very interesting piece of HTML, and want you to stare at it for a while, because as said in intro, basically going to spend the rest of this chapter exploring it.
What's so amazing? Just picture. Ok click on it, go to order form, direct link to customer's wallet in way not possible with standalone app on someone's machine.
compound, multimachine document; SRC= another machine!!; object linking to another machine; clicking takes to process on third machine
Just as FORM ACTION can be full URL (saw earlier in UPS example, sidebar "A FORM and Its Action are Distinct"), IMG need not be relative either. Assumption seems to be that always relative. Or don't realize implications of fact that IMG SRC= can be full URL. No necessary relation with location of HTML page, means tech at least HTML page can incorporate images from other sites without copying.
Legalities interesting. Some sites invite (Map View), others discourage or even make impossible (Intellicast: must be looking at referrer?). See chapter 3.
downside: now depend on picture on some other machine; who else depend? (AV link:)
For more, see chapter 3 on "compound documents"

Distributed Computation: CGI

step back, see what have here: URL, compound doc, etc.
But something more. Earlier said something about how funny that each ISBN number own "home page." But of course not home page. Also said treat web as file system, but may not be actual file. Can see from URL is some sort of CGI. With all excitement about Java and ActiveX, CGI looks unimportant. But see that most "active" content on web uses it. Problems for sure, but far better to intimately understand CGI, before go on to other things. CGI, for better or worse, is key technology of web -- in way that neither Java nor ActiveX is yet, nor is likely to be for while, if ever.
Related: performance problems with CGI, so server-side scripting (what MS calls ASP)
Originally URLs called UDI, but document -> resource (J p. ix). Widespread misconception that URL -> "page."
Earlier saw URL is handle, not pointer.
Explain CGI: instead of HTML page, program whose output is HTML page.
Actually, much broader than that: any MIME type that supported by browser. "Content-type:" in HTTP header.
So those two lines: <IMG SRC> when web browser loads page, goes to one machine (www.idgbooks.com), sucks down GIF and displays it. When user clicks on this image, goes to another machine (www.amazon.com), initiates process with ISBN parameter, gets back HTML page, displays that, which has form.
again, borderless frames make clearer how pieces from multiple machines go to form single doc: if use frames, then note how text from other site such as amazon.com flows to fit frame I've set up: cooperative doc, I can provide context for other sites (even if they too use frames!)

stock ticker might make clearer that a process, not a page. Anticipating chapter 2 somewhat:

<HTML>
<FRAMESET FRAMEBORDER=no BORDER=0 COLS=50%,*>
<FRAME SRC="http://quote.yahoo.com/quotes?SYMBOLS=msft&detailed=t">
<FRAME SRC="http://qs.secapl.com/cgi-bin/qs?tick=NSCP">
</FRAMESET>
</HTML>

Notice time, etc.

Wait a minute, wait a minute! This is distributed computation! Client/server! I remember when both of these were something we would have to wait for MS to provide, years away. Various interim solutions (Ralph Davis book). Here, it happened.
http://www.netcraft.com/cgi-bin/Survey/whats?host=www.sonic.net&port=80: www.sonic.net is running NCSA/1.5.2.
http://www.netcraft.com/cgi-bin/Survey/whats?host=www.amazon.com&port=80: www.amazon.com is running Netscape-Commerce/1.12. Netscape-Commerce is also being used by Walt Disney, Ford, Playboy, Sony and VISA.
http://www.netcraft.com/cgi-bin/Survey/whats?host=www.idgbooks.com&port=80: www.idgbooks.com is running Apache/1.1.1. Apache is also being used by The Java Developers, The FBI, Cisco, Financial Times, The Movies Database and Wired.
For that matter: http://www.netcraft.com/cgi-bin/Survey/whats?host=www.netcraft.com&port=80: www.netcraft.com is running NCSA/1.5.

Actually, www.amazon.com is running SSL, so: http://www.netcraft.com/cgi-bin/Survey/sslwhats?host=www.amazon.com&port=443&.submit=Examine: Returns a lot of information about an HTTPS server, SSL ciphers, RSA data security, etc., etc.:

Organisation as given by certificate: Amazon.com, Inc., Washington, US
Common name: www.amazon.com
HTTPS Server: Netscape-Commerce/1.12

Supported SSL ciphers: 
RC4 with MD5 
RC4 with MD5 (export version restricted to 40-bit key) 
RC2 with MD5 
RC2 with MD5 (export version restricted to 40-bit key) 
DES with MD5 
Triple DES with MD5 

Certificate Administrative Details:
Valid from: Jun 3 00:00:00 1996 GMT 
Valid to: Jun 3 23:59:59 1997 GMT 
Serial number: 0x0278000672 

Certification Authority
Secure Server Certification Authority
RSA Data Security, Inc.
US

SSL controversies, etc.:

http://home.netscape.com/newsref/std/SSL_old.html
http://home.netscape.com/newsref/std/sslref.html
http://home.netscape.com/newsref/ref/128bit.html
http://pauillac.inria.fr/~doligez/ssl/ (broke SSL!)
http://home.netscape.com/newsref/std/key_challenge.html
http://home.netscape.com/newsref/std/key_security.html
http://home.netscape.com/newsref/ref/netscape-security.html
http://www.webreview.com/97/04/11/edge/index.html (Ian Goldberg cracked 40-bit)

Netscape Navigator supports a new URL access method, "https", for connecting to HTTP servers using SSL. SSL is designed to layer beneath application protocols such as HTTP, SMTP, Telnet, FTP, Gopher, and NNTP. SSL is layered above the connection protocol TCP/IP.
"https" is a unique protocol that is simply SSL underneath HTTP. You need to use "https://" for HTTP URLs with SSL, whereas you continue to use "http://" for HTTP URLs without SSL. The default "https" port number is 443, as assigned by the Internet Assigned Numbers Authority.
SSL provides a security "handshake" that is used to initiate the TCP/IP connection. This handshake results in the client and server agreeing on the level of security they will use, and fulfills any authentication requirements for the connection. Thereafter, SSL's only role is to encrypt and decrypt the bytestream of the application protocol being used (for example, HTTP, NNTP, or Telnet). This means that all the information in both the HTTP request and the HTTP response are fully encrypted, including the URL the client is requesting, any submitted form contents (including things like credit card numbers), any HTTP access authorization information (usernames and passwords), and all the data returned from the server to the client.
Browsers that do not implement support for HTTP over SSL will naturally not be able to access "https" URLs. One of the reasons we are using a different URL access method ("https" instead of just "http") is so non-SSL browsers will gracefully refuse to allow insecure submission of forms that expect to be submitted securely. That is, if a document served by a normal HTTP server contains a fill-out form that allows a user to enter his or her credit card number, and that form's submission action is an "https" URL (because the document's author expects the form to be submitted securely), a non-SSL browser will not even try to submit the form (typically giving a "cannot submit" error message instead). Were a separate URL access method not being used, the browser would try to submit the form, passing the credit card number over the net in the clear, and the submission would fail anyway.
Currently, Netscape Navigator does not include support for NNTP over SSL or application protocols other than HTTP; however, such support will be available soon. {{End possible sidebar?}}
Point is simply that three disparate machines, running very different software, located in different places, have been joined together.
Distributed computation, etc., under the guise of hypertext! See Jakob Nielsen, Multimedia and Hypertext: The Internet and Beyond on "computational hypertext". Anything in Ted Nelson,
Literary Machines
This was supposed to be hard! See quote from MSer in "Soul of the Internet", p. 328: the internet "is so highly distributed that it serves as a wonderful embarassment to the computing industry that promised distributed systems but never got them to work. There's a huge irony in how well the Internet pulled things together." Exactly!
Jon Udell, Byte: "Distributed computation can be easy.... We tend to think of distributed computation as a hard problem that even expensive middleware and unfathomable APIs can't always solve. The Web's dynamics turn that assumption on its head."
e.g., MS never really got around to distributed OLE? DCOM was afterthought? (OSF DCE RPC)
Tim B-L finessed the whole problem: end-run.
not just object linking & embedding, but "process linking"
client/server
MS says "activate the Internet": is active; was active before they came along
Tom Button, MS marketing mgr. for developer tools, March 1997: quoted (http://nytsyn.com/live/Latest_columns/079_032097_122213_9467.html) as saying new MS tools "will help programmers build sites on the Internet's World Wide Web that are more interactive than in the past. Web users will be able to send information and search more effectively, instead of just reading text."
MS campaign to convince its developers that, without MS, Internet is "just reading text."
ActiveX includes TextBox control, etc., largely duplicating functionality long part of HTML. MS hoping that "its" developers are sufficiently ignorant about web, HTML, etc. that they will just use MS tools even when unnecessary?
aside from performance, another problem with CGI: granularity!! (really do "return" answers, not display: that is browser; really returns data, but not necessarily in easily-parsible form) but any solution: keep it simple, stupid!
already have client/server, but based on delivery of doc, or simulated doc
possibly related: http://www.news.com/News/Item/0,4,9424,00.html, "Tool aims to integrate catalogs"?? "how to retrieve information from different databases that are set up in incompatible formats."??
want also delivery of data, without presentation
Possibly related to this: style sheets. Norm suspects Netscape hasn't done anything to implement Style Sheets yet "because they don't appreciate the concept of separating content from format." Andy O. hopes just taking time, doing it right: "It's hard, making every element of a display subject to variation! I wonder if either company can make style sheets robust enough to be trusted."
SGML and separation of content and format. See David Siegel guy in http://www.webreview.com/97/04/11/feature/index.html: "In a perfectly tagged world..."; XLM as subset of SGML?
HTTP is basically RPC, can build RPC on top of HTTP, if write programs to split out HTML, AV subroutine in chap XXX, maybe make stock ticker example too, but this very brittle; economics?
Data in hidden fields? (Interesting use: Plug-in vote in BrowserWatch: See http://browserwatch.iworld.com/plug-in/plugin-vote.test.html)
RMS WPC?
Excellent article by Dan Connolly, "Objects and the Web," Web Apps (March-April 1997), pp. 4, 8:
"Distributed objects are in fact the very heart of the Web, and have been since its invention. HTTP was designed as a distributed realization of the Objective C (originally Smalltalk) message-passing infrastructure [on a NeXt machine!]. The first few bytes of every HTTP message are a method naame: GET or POST. Uniform Resource Locator is just the result of squeezing the term object reference through the IETF standardization process."
"opaque objects"
Object approach "very left-brain and anal" vs. web chaotic approach, which actually how developers do apps anyway.
This all just technical aspects: distributed applications borrowing pieces from other machines. Web enabled in way that MS never could. But what the economics of this borrowing? If Brad Cox Superdistribution correct, that is the really hard issue. Legalities will flow from economics. How make money producing reusable software components? (See ch. 3.)

Creating Synpases: HTTP

Have established "synapse" between two very particular items on web. Can add more connections: Have to admit that my book somewhat out of date. I've updated a lot of material, posted to web. Obviously this should be connected to order form and copy of cover too:
[Show: do simple version without frames, frames version with sidebar.html is for chapter 2]
Book review at Ray Duncan's www.ercb.com: http://www.ercb.com/ddj/1995/ddj.9504.html
http://www.byte.com/art/9503/sec6/art3.htm
Suddenly getting critical mass.
Synapse: submit to AltaVista: show url: http://www.altavista.digital.com/cgi-bin/query?pg=tmpl&v=addurl.html
Warns about spam!

Form looks like:

<FORM method=GET 
action="http://add-url.altavista.digital.com/cgi-bin/newurl">
<INPUT name=q size=60 maxlength=1000 value="http://">

So build avsubmit.bat, on top of geturl.exe:

@echo off
rem avsubmit.bat
geturl -split http://add-url.altavista.digital.com/cgi-bin/newurl?q=http://%1 | \
   egrep "(seconds)|(valid)"
echo.

Looks like AV rejects any URL with ? (amazon.com!). Tried with sqrt.pl (worked), sqrt.pl?144 (rejected). Of course, also checks robots.txt, but amazon.com doesn't have one. But wait a minute, amazon URLs don't have ? in them! Does it reject any redirected URL?? Gets error 404, no longer valid.
Add excite to avsubmit, change then -- make genuine "batch" to submit to multiple sites (Yahoo, etc.).
Yahoo form at http://add.yahoo.com/fast/add? is complicated. (See yahooadd.html). geturl -post xxx http://add.yahoo.com/fast/add, where xxx: category="Yahoo category" (e.g., "Computers and Internet/Operating Systems/Microsoft Windows/Development"); title="Title"; URL="url"; java=no; vrml=no. Other fields optional.
List at http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/Announcement_Services/. See http://www.liquidimaging.com/liqimg/submit/usa/index.html.
excite.bat: "geturl http://www.excite.com/cgi/add_url.cgi?url=http://%1&email=andrew@ora.com | grep Thank". Unfortunately, doesn't check for valid URL.
avsubmit www.sonic.net/~undoc/book/ex11.html {{Change to sidebar.html! Which will JS-autoload ex12new.html!}}
avsubmit: rejects anything with '?' as "not valid URL"? Gets error 404 for all pages at amazon.com?
In chapter 5, will see other apps on top of AV, e.g. avcount. Here, actually *adding* to web. Will later see ability built into each amazon.com page. Though not full HTML annotation. Here some other examples of annotation on web: BBSes, etc. E.g., http://bbs.iguide.com/cgi-movies/podium?operation=responseForm&virtual=yoda/126/ (can even add HTML).
Better example: WebBoard? All possible with TEXTAREA tag (note symmetry of web protocols: write as well as read, POST/PUT, etc.). See "http://barracuda.west.ora.com:8080/read?361,4" (works as separate URL, but missing frame). Can contain live HTML (link with "~" worked fine when turned off Preview/SpellCheck). ORA book, "Building Your Own Web Conferences."
Also DejaNews ability to *add* news! See http://postnews.dejanews.com/post.xp. Also has bookmarkable form you can customize: http://postnews.dejanews.com/post_book.xp. Result: http://postnews.dejanews.com/post.xp?FRM=andrew@ora.com&ORG=Deja%20News%20USENET%20Posting%20Service&BD=Andrew%20Schulman%0d%0aConsulting%20editor,%20O'Reilly%20%26%20Associates%0d%0aAuthor,%20Unauthorized%20Windows%2095%0d%0ahttp://www.sonic.net/%7eundoc.
Check reply I did to message: "Will DOS Die?" in rec.games.programmer (Message-Id: <860354846.9620@dejanews.com>). http://xp9.dejanews.com/getdoc.xp?recnum=13691388&search=thread&threaded=1&NTL=1&server=dnserver.db97p1x&CONTEXT=860519784.20925&HIT_CONTEXT=860519728.20533&HIT_NUM=3&hitnum=71; but this is not a good URL to use. See http://xp9.dejanews.com/profile.xp?author=andrew@ora.com. This gives URL as http://xp9.dejanews.com/getdoc.xp?recnum=13691388&server=dnserver.db97p1x&CONTEXT=860519964.22233&hitnum=0. Sounds like http://search.dejanews.com/getdoc.xp?recnum=13691388 should do it. Says "server specifier" is missing from query. So: http://search.dejanews.com/getdoc.xp?recnum=13691388&server=dnserver.db97p1x. This returns correct doc, but with error msg at top "Error: Invalid Context. Apparently you have just tried to continue from an original query which you have not used in 2 hours, we only store a query context for 2 hours. If you wish to continue, you should resubmit your query to get a new context." Grrr. Maybe best can do is: http://xp9.dejanews.com/dnquery.xp?search=word&defaultop=and&query=%7ea%20(andrew@ora.com)%20%26%20%7eg%20(rec.games.programmer)&svcclass=dnserver. {{This all may better belong in chapter 2, where discuss DejaNews.}}
WebReview has WebBoard: http://www.webreview.com/universal/webb/index.html
Building a custom app from parts scattered all over net.
GET "not supposed to have side effect" (HTTP spec), but even plain "hit" can (e.g., User-Agent). See Lines, How to Program CGI with Perl 5.0, pp. 331-2.
geturl shows really is simple (HTTP, sockets, telnet, perl version)
because side effects, almost everyone does add synpases
AV is just concrete embodiment of web
web built on HREF connections, which URL, which includes HTTP
of course, not just HTTP (URL encompass earlier; problems with HTTP)
FTP, etc. simple too: TCP/IP, sockets
web is crude beginnings of new OS; everyone can add; this good and bad
good: no priesthood; "empowerment"!!
bad: not a library (Sci. Am.), but a bathroom wall; no gatekeeper; not all synapses worthwhile ("The Writing on the Bathroom Wall")
editorial control on sites

A New Type of Application

Saw earlier that with one line can link to order form for my previous book. Buying stuff over web, an application, enabled with less than 200 bytes of code. Actually, don't call it code, because non-programmers making $30k a year can do this, whereas programmer making $200 an hour couldn't have made this happen with earlier systems.
Didn't say much about this order form itself. Now look:
[Show order form: HTML forms, UI, show source?]
Click on my name, takes them to all other books I've written that they carry; also two that I didn't?!
"Add This Book to your Shopping Basket": discussed earlier (see figs)

"You can write your own review of this book and enter it directly into our system."!! Also:

I read this book, and I want to review it. -> http://www.amazon.com/exec/obidos/user-review-form/1568843054/
I am the Author and I want to comment on my book. -> http://www.amazon.com/exec/obidos/author-comment-form/1568843054/
I am the Publisher and I want to comment on this book. -> http://www.amazon.com/exec/obidos/publisher-comment-form/1568843054/

So can tell others about page with updates. Makes bookstore into interface to web. But interestingly, result is not HTML with actual links. Presumably this would be giving outsiders too much power: could create malicious links. Very interesting example of both power and danger of web.
(Show how annotations work, can make site act as alternate interface to web. "Interview" with authors. Except, can't have arbitrary individuals just adding links. So the links aren't live! How solve problem: universal annotation, let others turn your site into general interface -- add value -- but keep out malicious additions?)
Problem! debbie.ora.com: "Brent Chapman had the unhappy experience of finding a forged and offensive interview of himself on his book's web page at Amazon. I've looked around a bit to see if I can find any other such problems, but I have not. You might want to be aware of this. Brent had conversations with Amazon about this, and they were horrified and very quick to remove the interview. Whether they will do any checking of author/publisher input they get in the future remains to be seen."
Check boxes, "Search for books matching ALL the subjects you check"
this is not just document ("static", again MS "activate the Internet" funny that this true long before MS even realized web importance): program/application
is data, not doc. Not necessarily presented in best way to reuse as data by another program, but can do. e.g., with amazon subject search, given returned list of books (yes, really "returned", not just displayed), do AV search for each book.
even without that: links alone make active (cf. Cinemania as app; see chap XXX on content)
goes beyond, to real programming inside doc: forms, JavaScript (see chap XXX on "doc is program"). Simple example: flip gifs at ZDNet.
interactive doc: links to add annotations!
button to order: online commerce, https://, SSL, etc. (Netcraft)
When finish purchase, sent confirmation over email. Email is part of the web too.
amazon.com 8% associates program on top of this
implications are far-reaching
no, don't mean "social revolution" (Gilder, Wired, etc.)
do mean: crude beginnings of new software development paradigm, new platform, new OS really. Complete basis for business solutions, all with standards (HTTP, HTTPS, HTML, etc.)
this is an application:
Given that I get royalties from sales of each book, this is directly money in my pocket. (Also 8% commission.)
Put up link, get people to see it -- make money. Not just a doc: a program.
This is a program, as much as anything I had ever written. In fact, more of an app because so direct. (customer's wallet)
New kind of app, somewhat like those enabled those enabled by 1-2-3 or dBase or HyperCard were new types of apps. (Remember them?) HTML: millions can create programs, without programming, or at least without thinking of themselves as programmers, without quite knowing what doing.
plain documents evolve into apps: my RegWiz, SYCR, NTS/W articles (see chapter 6)
others: Yellow Pages
At same time, some realism here: amazon.com commercial site, yet won't be profitable until ??? (interview with Jeff Bezos; WSJ article. Controversy over amazon.com claims to be "world's biggest bookstore.")
"online book sales currently account for less than one percent of the $16 billion consumer book market"; For the quarter ending December 31, the company reported net sales of $8.5 million, double the net sales of $4.2 million for the previous quarter, which ended September 31. (http://www.news.com/News/Item/0,4,9077,00.html)
http://www.news.com/News/Item/0,4,4996,00.html: why books good job for Internet; amazon.com $17 million annual sales, etc.
Article on amazon.com price cuts, competition with B&N: http://www.news.com/News/Item/0,4,8862,00.html (also links to other articles); amazon IPO: http://www.pcweek.com/news/0324/26eamaz.html
How make any money from this other than through venture capital?
Chapter on how make money from Philip Greenspun, Database Backed Web Sites, http://www-swiss.ai.mit.edu/wtr/dead-trees/53002.htm, includes discussion of amazon.com "kickback". Excellent critique of amazon.com plan!
"You can feed this fantasy by reading articles in the business press about http://amazon.com, the perennial poster children for Internet commerce. They set up what is essentially a front-end to a wholesale book distributor's database, and now they are selling books every few seconds. It sounds like they are rolling in money. Well, it turns out that I know some people who work at amazon.com. The customers don't always fill out the forms exactly right. The books aren't always in stock like they should be. The customers send e-mail asking when their books are going to be shipped. So instead of one Unix box and a big vault for the cash, the company has 200 employees sucking all the money out of it. And remember, this is the best that anyone has really done: high expenses and high sales. More typical is an Internet store with high expenses and low or no sales." "I sent people to the amazon site 2,651 times. Only 4 of those people ignored the 25 extra links and bought books off the very first page. One of them bought a special order book for which the dogs at Angell got nothing. Bottom line: The standard Internet price for a clickthrough is 10 cents; it would have cost amazon.com $265 per week to get these users by purchasing ads on other folks' sites; amazon got them from me for $3.95."
Most of his click-throughs to amazon.com are from books with the word "nude" or "body" in the title:
```
0811807622    726     0     0.00 The Body : Photographs of the Human For
082306459X   1152     0     0.00 Graphis Nudes
0855339438    135     0     0.00 **2** The Workbook of Nudes and Glamour
0893815322    329     0     0.00 Edward Weston : Nudes
```
Which maybe just means that anything with the word "nude" will get pressed.
My own Undoc Bookstore: takes between 30 and 40 visitors to make a single sale:
```
Quarter-to-date Books Ordered:             19
Quarter-to-date Qualified Book Revenue:   575.81
Quarter-to-date Referral Fees:             46.06
```
Each week, about 70 visitors looking at order form for about 80 books, sell 2 books:
```
1568843054      7     2     5.76 Unauthorized Windows 95 : Developer's R
                      2          sold at 10% off list price of   39.99
1568843186     16     1     3.60 Windows 95 System Programming Secrets
                      1          sold at 10% off list price of   49.99
```
amazon.com weekly revenue from my site = $70; mine = $6.50. http://www.news.com/News/Item/0,4,1867,00.html: "Under this plan, a site that generates $6,000 in sales generated would earn only $302. Sites would have to sell a lot of books to generate a significant level of income." This is hardly something to write home about, much less write a book about! Now, in the case of Unauthorized Win95, I make another 10% or so off whatever revenue publisher gets from book. Given deep discounting, probably about $1.50. Gee, do better selling someone else's books! Clearly, need to increase my number of visitors. Things like avsubmit help. amazon.com sends out weekly newsletter with tips, pointers to other sites with good designs, and so on.
Still, ridiculously small amount of money we're talking about here. Pays for ISP and that's about it. Note first year of baseball teams, made about $1!
Where's the money? http://www.news.com/News/Item/0,4,9206,00.html; another good article: http://nytsyn.com/live/Latest_columns/083_032497_114203_17757.html on profitable site, Cybercalifragilistic. See http://www.yahoo.com/headlines/970411/tech/stories/internet_2.html
See Tim article, http://www.ora.com/oreilly/pubmod.html: "Publishing Models for Internet Commerce". Not broadcasting, but publishing! Run as appendix to book? But also note that publishing is often used as metaphor for web: "greatest revolution since Gutenberg," etc. See paper in Cerf on actual Gutenberg effect (and Cerf's idiotic afterword).
Money: porn! See http://www.pcmag.com/insites/willmott/dw970416.htm, on Babes4U.com. "The theory is that it was porn that drove the VCR market to maturity in the late '70s and now it will do the same for the Web. We'll see about that, but it's certainly true that adult content is one of the few things that people are willing to pay for on the Web."
Merger publishing, software development: http://www.infoworld.com/cgi-bin/displayStory.pl?970422.wspeech.htm. See also chapter 6, "The Document is the Program".
Good column on e-commerce: Tim Clark, Bits & Change, c|net, latest at http://www.news.com/Perspectives/perspectives.html (get 4/21/97 permanent URL: "No brainers for net commerce")
Shakeout: http://www.infoworld.com/cgi-bin/displayStory.pl?/features/970414demise.htm

So good way to experiment. I suppose early radio operators made even less from their hobby. From that hobby, how get to radio industry? http://www.backgroundbriefing.com/radio.html: Early Internet Business--Stories from Early Radio, By Neil Weiner.

broadcasting, "push", convergence, etc.:
http://nytsyn.com/live/Latest_columns/094_040497_160004_26653.html (ZD and TV)
MS acquisition of WebTV! (http://biz.yahoo.com//finance/97/04/06/msft_2.html; see chap 11)
http://biz.yahoo.com//prnews/97/04/06/msft_x000_2.html (MS proposal: use VBI to 
send data over broadcast networks)
http://biz.yahoo.com//prnews/97/04/06/msft_x000_3.html (MS at NAB; digital
broadcasting; didn't FCC just make announcement about digital TV?
Yes: http://www.yahoo.com/headlines/970404/tech/stories/tv_1.html; FCC
approves digital TV license giveaway; also 
http://www.news.com/News/Item/0,4,9404,00.html)
http://www.ipmulticast.com/
http://www.news.com/News/Item/0,4,9420,00.html (IP multicasting)
http://www5.zdnet.com/anchordesk/story/story_811.html (IP multicasting)
http://www.news.com/News/Item/0,4,8726,00.html
http://www.news.com/News/Item/0,4,8727,00.html (CDF; note based on XML)
vs. Netscape "live site" in Constellation
http://www.webreview.com/97/03/21/addict/index.html ("A channel ID will become as essential as an email
address. By next year, I predict, Web sites without
CDF (or Netscape's 'live site' technology) will be
history, used mostly for archival storage.")
http://www.news.com/SpecialFeatures/0,5,6598,00.html (PointCast)
http://www.pcweek.com/news/0407/11erep.html (PointCast used by 12%, but
generates 18% of traffic, citing WWW6 paper at http://www6.nttlabs.com/HyperNews/get/PAPER131.html)
http://www.pointcast.com/whatis.html
http://www5.zdnet.com/zdnn/content/pcwo/0312/pcwo0014.html
http://www5.zdnet.com/anchordesk/story/story_761.html (Milton Berle!)
http://nytsyn.com/live/Latest_columns/080_032197_180005_173.html (Excite!
reorg as "channels")
http://www.news.com/News/Item/0,4,9279,00.html?latest (BackWeb can
deliver software, such as virus updates; not just web pages; so added
versioning)
Economist on WebTV: http://www.economist.com/issue/12-04-97/wb8710.html
http://www.webreview.com/97/04/04/trends/index.html

Web advertising business model: http://www.pcweek.com/ir/0407/07jia.html
web browser == OS: new generation of browsers both make explicit. See http://www.pcmag.com/features/jit/browser/_open.htm, http://www.pcmag.com/features/jit/browser/msie4.htm (shell integration, Active Desktop), http://www.pcmag.com/features/jit/browser/nscom.htm (Constellation will bring OS + browser together, emphasis on browser means push app can take over desktop). http://www.webreview.com/97/02/07/addict/index.html: "Microsoft is putting the Internet into the desktop; we're putting the desktop into the Internet," says Netscape. Constellation a NUI -- Network User Interface -- which will replace GUIs (graphical user interfaces). Allow a developer to remove "chrome" (menus and toolbars, etc.) and take over the entire screen. "Constellation is the next evolutionary step up from traditional Web pages. You can now create content that runs full-screen, totally immersive!" Content-oriented rather than file-oriented. Netscape has one more clever strategic tool: HomePort. This is a personal workspace on the server of your choice that lets you store your active documents and special apps. [cf. IMAP, ACAP!]
WebReview on MSIE4: http://www.webreview.com/97/03/28/feature/index.html, Dale editorial: http://www.webreview.com/97/03/28/imho/index.html ("Every window is a browser")
Maybe see http://www.sunlabs.com/people/john.ousterhout/scripting.html ("Scripting: Higher Level Programming for the 21st Century", by John K. Ousterhout (inventor Tcl?)).
Jon Udell, Byte: web "is something far more profound [than ???]: a planetary operating system ... CGI is a sort of INT 21h for the '90s" (INT 21h was the programmatic interface to MS-DOS, i.e., the DOS API, though it was so simple no one thought to call it that).
interfaces: tiny AltaVista form (can do in 8 lines on my start.html), is clearly an interface (API) to incredible database of information (size?). All enabled with few lines. Similarly, amazon.com form is interface to database of at least million books (two million if include OP, but Bowker and ABE better). How to get across that the following is an interface/API?
```
<FORM method=GET 
action="http://www.altavista.digital.com/cgi-bin/query">
<INPUT TYPE=hidden NAME=pg VALUE=q>
<INPUT TYPE=hidden NAME=what VALUE=web>
<INPUT TYPE=hidden NAME=fmt VALUE=".">
<INPUT NAME=q size=30 maxlength=200 VALUE="">
<INPUT TYPE=submit VALUE="AltaVista">
</FORM>
```
or see avcount.bat
amazon: see amazsearch.bat
In case of amazon.com, say I want to use amazon for in-print books, Bowker and ABE for out-of-print. How lash them together with one interface? First approximation of course is to put three forms on same page; next is to write small CGI that takes one query, sends out to three databases, creates page with links to their replies; next is CGI that takes one query, sends out to three databases (cf. MetaCrawler, http://metacrawler.cs.washington.edu/, as example), and merges all results into one new page.

A Lesson for the Software Industry

Gates: "most exciting development since PC"
Came from outside software industry; not just MS: entire industry (quote MSer here?)
from academia, ivory tower of all places (Switzerland not US; first developed on NeXT!) (somehow my 9 yr. old son knows; quiz at science museum, qu. where web invented, "oh I know that!", Switzerland, how the heck he know these things?! I sure didn't tell him).
Internet failed to collapse: http://www.economist.com/issue/19-04-97/wb8721.html: common to "underestimate the power of the Internet's distributed technology to evolve with demand.... The evidence suggests that the Internet is not only getting faster in absolute terms, as one might expect given the usual march of technology, but growing even faster than demand. No single statistic tells the whole story. "Ping" data -- the time it takes to contact another computer across the network -- collected by John Quarterman suggest that the typical response time over the Internet has been improving by about 15% a year, despite at least an annual trebling of traffic. And "instability" -- a measure of how often the Internet's main switches have to redirect traffic around problems -- at one of the main network junctures in New York has dropped by nearly two-thirds in the past six months."
perhaps lesson here that we in software industry have been doing things wrong
except for Wintel bringing down prices; but even then what % of Wintel box, OS necessary for Web client or server? (see chap XXX)
in general, what % features of software used? (bloatware, shelfware already made many suspect something wrong)
why http+html+url right sort of solution (of course, problems), and OLE wrong sort of solution? (Tony Williams 1993 Redmond incomprehensible)
example: IMG SRC
text instead of binary
binary creates priesthood: this implications for the priests, software developers; $35k/yr. vs. $200/hr.; much expertise slowly becoming less valuable; of course, new opportunities (where? look at problems: performance, etc., but remember: solution must be simple!)
simplicity
resistance to simplicity (HTML "troff of the 90s")
Tim B-L: trade universality for 100% robustness
loose high-level connections vs. tight low-level connections
MS said document-centric; this really is
MS said "information at your fingertips"; this actually has chance (though again, not library, but bathroom wall)
WinWord had some potential for this. Actually, 1-2-3 and dBase both examples. Little languages? Both created huge developer communities, writing programs but not programmers. Vendors largely ignored. For many, these were the OS, apps delivered on dBase. Bought DOS machine solely to run 1-2-3, for several years was the OS.
So what happened? UI looked cruder & cruder, delayed getting out Windows versions, vendors never understood value of programmability (1-2-3 almost no macros!), never really saw as platform.
Could this happen with web? How similar? Ease of programmability, without thinking of self as programmer; instant results. In 1-2-3 and web, doc == program.
Importance of democratizing programming; little languages; or inherently difficult/complex? (Fred Brooks "no magic bullet"). Inevitably become complex, expensize, centralized? (e.g., radio broadcasting professionalization). Really any chance of deprofessionalizing software development? Again, look at dBase, 1-2-3, Hypercard, perl to see possibilities. Edie Friedman and ORA HTML book.
Empowerment!