Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


XML: Structuring Data for the Web: An Introduction

May 3rd 1998

Fixing the Web

If it ain't broke, don't fix it. That's what some people might say with respect to the World Wide Web. After all, millions of people surf the Web every day -- from homemakers searching for a recipe, to investors seeking the latest stock quotes, to students researching the assassination of Abraham Lincoln, to readers purchasing the latest novel from an on-line e-commerce site. The Web works well for them.

Or does it? Let's kick the tires a bit and see if the Web can take it. Here are a few of the problems that hinder the current Web and beg for solutions with the Next Generation of Internet technologies.

  • HTML standards change too slowly - During most of the Web's history, there have been essentially only two versions of the HTML specification, HTML 2.0 and HTML 3.2. (HTML 1.0 pre-dates most Web sites and HTML 4.0, although the current standard as of December 18, 1997, is only slowly appearing on popular sites.) When HTML 3.2 was finally approved in January, 1997, it was more of a rubber stamp of then-current practices than an innovation since nearly all of the elements it defined had been in use unofficially for as long as a year. It simply took too long for the World Wide Web Consortium (W3C) to agree on the specification (presumably due largely to the browser-specific extensions discussed next).

  • Browser-specific tags ("extensions") - Prior to HTML 3.2, Netscape and Microsoft began the unfortunate practice of introducing their own extensions to the language. This was an endless cause of headaches for content developers who struggled to make their pages accessible to all users while needing (wanting?) to use the latest features introduced by the browser vendors. Less ambitious authors succumbed to the "This site best viewed with {Netscape/Microsoft}" virus which has contributed to some truly horrible sites. These authors forgot that the Web isn't truly "World Wide" if authors entrench themselves in different camps and embrace extensions which aren't universally supported.

  • Can't markup data in any meaningful way - HTML was originally intended to provide a simple way to markup any type of document to reflect its structure (title, major headings, minor headings, lists, and so on) as well as some stylistic aspects (bold, italics, and so forth). Adding to this the hypertext linking capability HTML offered, as well as browser support for a long list of MIME types, it isn't hard to understand the phenomenal rate at which the Web developed, especially since Web authoring fell within the capabilities of grade school students. HTML was (and still is) great for marking up documents. However, businesses and scientists also have the need to exchange data. A new language is needed to express the hierarchical relationship of data values, such as that which is represented by database records and object hierarchies. HTML reflects structure and presentation, but conveys nothing about the meaning of the marked up document.

  • Browser paradigm is too constraining - With the advent of Java and JavaScript, the Web browser quickly became far more than merely a tool for surfing the Web; it became the launcher of applications. However, often the browser gets in the way. Customers want applications that look and feel more like their familiar desktop applications, such as spreadsheets. While MIME type content handling helps in this regard, there are times when the browser paradigm just doesn't make sense. Even if you can "lose the chrome" (i.e., browser menus and controls), sometimes there is a need to pass information between two or more cooperating applications. What we really need is web-enabled applications (programs that understand common Internet protocols such as HTTP) so we can access Web resources without using a browser at all. (This is not science fiction; companies such as webMethods, Inc. have already achieved this goal.)

  • Search engines return far too many hits - Unless you become a master of your favorite search engines by learning their similar yet annoyingly different query syntax, you'll undoubtedly receive hundreds or thousands more hits than you have time or patience to examine. If you're incredibly lucky (or skillful), the reference you're looking for may be in the first page or two of results -- but don't count on it. The problem is that search engines typically can only index frequency of words, document titles, and, in some cases, meta tags that describe the contents of a page. What is needed is a way to markup the significant portions of a document and to convey the semantics of documents so search engines can ignore all of the noise and focus instead on the signal. Sometimes searches require a finer granularity of control than most search engines permit. For example, how would you search for books written by Paul McCartney, rather than books that refer to him, the Beatles, or Wings? If the words "Paul McCartney" could be tagged as <AUTHOR> to indicate a specific meaning, such finely-tuned searches would become possible.

  • Can't specify collections of related pages - It is often the case that you encounter a Web page which is obviously part of a larger collection. If you're lucky enough to find a link to a table of contents, a home page, or some other means of listing the collection, then you're half way there. But how do you print the collection? Current answer: one HTML file at a time.) There has to be a better way to express the interrelationship of a set of pages so they can be processed as a group. We need to be able to attach metadata ("information about information" or "machine understandable information") to Web pages to express interrelationships.

  • One-way linking is somewhat limited - Although the Web's current one-way hypertext link capability has proven extremely useful, did you know far more flexible schemes have existed for many years in the publishing industry? Since 1992, Hypermedia/Time-based Structuring Language (HyTime) and the Text Encoding Initiative (TEI) have enabled publishers to express complex link relationships, such as links with multiple targets, multi-directional links, and automatically updated link databases. We need a richer linking language for the Web.

HTML 3.2 together with CGI scripts, Java applets, and JavaScript (and its derivatives), plus plug-ins such as Shockwave, RealPlayer, and Quicktime provide Web authors and commercial sites with a rich array of techniques for displaying content that is visually compelling and possibly even informative. However, these techniques do little if anything for the representation of structured data unless one introduces middleware solutions.

XML: Structuring Data for the Web: An Introduction
XML: Structuring Data for the Web: An Introduction
XML: Structuring Data for the Web: An Introduction


Up to => Home / Authoring / Languages / XML / Intro




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers