
 |
Storage Networking , Part 1
eBook: A storage network is any network that's designed to transport block-level storage protocols. But understanding the ins and outs of networked storage takes you deep into several of protocols. This guide covers SANs, Fibre Channels, Disk Arrays, Fabric, and IP Storage.
»
Storage Networking 2, Configuration and Planning
eBook: Picking up where Part 1 left off, Part 2 of our look at storage networking examines configurations for SAN-attached servers and disk arrays, and also includes a look at the future of IP storage.
»
Storage Management Costs in the Enterprise: A Comparison of Mid-Range Array Solutions Whitepaper:
Many factors contribute to the ownership cost for enterprise storage. These include (but are not limited to): physical capacity relative to physical space requirements, performance capacity for data transfer and system reaction time, software maintenance and updates, expandability and flexibility, and much more.
»
Storage Is Changing Fast Be Ready or Be Left Behind
PDF: The storage landscape is headed for dramatic change, thanks to new technologies like Fibre Channel over Ethernet (FCoE), pNFS, object-based storage and SAS that will affect everything from NAS and SANs to disk drives. Get the knowledge you need to make the most of your storage environment, now and in the future.
»
HP StorageWorks EVA4400 Demo:
Dont settle for an expensive and complex array that lacks functionality. The HP StorageWorks EVA4400 delivers virtual storage with enterprise class functionality at an affordable price.
»
|
|
|
|
|
|
The Web Librarian
Locating information on the web is becoming more and more problematic.
Search engines overwhelm users with vast quantities of information,
much or most of which is not precisely what was wanted;
and browsable catalogs ('virtual libraries') take a lot of time to use,
and now often index only a fraction of the relevant material.
*
An automated 'Web Librarian' (WLn) should help address these problems.
Traditional notions of simple hierarchical classification need to be
augmented or replaced with more powerful methods, e.g. concept analysis
and faceted classification.
Authors and publishers should be encouraged to provide resource meta -
information.
*
A WLn is a synthesis of humans and robots, databases, FAQs, and smart
software, designed to enable people to receive precise answers to
precisely formulated queries.
Current Problems
Finding Stuff
Whenever we develop a new skill or extend an old one, we have to
emphasize the relative importance of some aspects and features over
others. We can then place these into neat levels only when we discover
systematic ways to do so. Then our classifications can resemble
level-schemes and hierarchies. But the hierarchies always end up
getting tangled and disorderly because there are also exceptions and
interactions to each classification scheme.
-- Marvin Minsky, The Society of Mind
If you want to find something on the web, what do you do:
browse a catalog, or use a or a search engine (or something else) ?
I find I'm using Alta Vista increasingly - even for web development
topics (my VL).
Why? Because
1) it's fast, and 2) even at nearly 2,000 entries, my VL only catalogs
a fraction of the relevant material available on the web. The drawback
is that I may get swamped with results that have to be carefully
screened for relevance.
For example, I tried to locate any web information on Faceted
Classification (more on this later).
First, I tried the
WWW Virtual Library.
I looked in "Information Science" - only 3
links ! There is, apparently, no "Library Science" (!);
"Libraries" just points into Libraries. I expected to find
"Software Engineering" under something like "Computers" but no, it's
under "Engineering". An example of the problem of strict hierarchy.
I then found
a couple of pointers to software reusability, but nothing on faceted
classification. After 20 minutes I gave up.
Next, I entered "faceted classification" into Alta Vista
(with the quotes). Within seconds I had 155 results,
and a few minutes of checking
through them confirmed I had got some good hits.
Of course, this may have been just an unfortunate example.
But the point I want to make, is that I believe the concept of the
VL as a static browsable hierarchy needs serious rethinking.
We've all had great fun putting our hotlists on public display,
but web technology is superceding us.
I know some will say that these catalogs are hand-crafted
by domain experts and are therefore of very high quality. This has some
merit - but it's not enough. We have a Library, but no Librarian.
Users may
browse, but they can't ask for help. There isn't even a card index.
User Entry of Classification Data
Some of the problems with user-entered URLs are:
- Annotations given are too short, too long, or not very helpful.
- People enter inapropriate URLs, not related to the catalog's
subject matter (I reject 1/3rd of all entries for this alone).
- People want their listing to appear in multiple categories. This is
equivalent to wanting to associate multiple keywords with the
entries - which would be a useful extension.
- People classify their entries poorly. This depends to some extent
on how clear and intuitive the classification system is, although
some people apparently refuse to spend any time trying to
understand it. The greatest number of mis-classifications are from
commercial entities seeking to publicise their products and
services.
In addition, other means of populating a catalog, by spider and by
surfing, reading newsgroups, mailing lists, etc, may also be used.
This could be partially automated by program to extract URLs from
these sources, compare them with the catalog URLs, and those not
found can be added to a list for later investigation by a human.
Classification Systems
There are three general types of classification schemes: enumerative,
synthetic and analytico-synthetic. The enumerative scheme is based on
the concept of a universe of knowledge which is divided into
successively narrower and more specific subjects. Theoretically, all
topics are to be represented.
Library of Congress (LC) is an enumerative scheme. A synthetic
scheme is one in which new class numbers can be developed for new
topics not already listed. The Dewey Decimal Catalog (DDC),
although primarily enumerative,
approaches a synthetic scheme with each revision.
Faceted Classification
The facet classification is an analytico-synthetic scheme. It is
analytic because it subdivides broader elements into single concepts
that are clearly defined through facet analysis. It is synthetic in
that new elements can be developed. The classification was first
originated by S.R. Ranganathan in the 1930's with the Colon
Classification.
Note that the process of facet analysis can also be used to construct
thesauri.
There is renewed interest in this system,
because some believe that older
systems such as DDC and LC do not provide enough detail to accurately
describe all subjects in all media, may not meet the needs of the
individual or special library, may not provide for enough coordination
of terms, may require complex or lengthy notation, and are often
difficult to use to locate materials.
Basically, the facet development process begins by defining the
subject to be covered by examining existing classifications or
thesauri, or titles or objects in the perspective database. The
derived topics are broken down into facets each with a distinct label.
Items are organized so that they are in homogeneous, mutually
exclusive groups that differ from the main group by one
characteristic.
Within each facet, subfacets or more specific topics are listed. The
breakdown continues into subfacets within subfacets. The items in each
subfacet, in general, are ordered from more general to more specific,
complex or concrete.
I don't think a hierarchical classification scheme is good enough for a
modern web-based catalog of any substantial size. Entries rarely fit
exactly into one leaf node. Ruben Prieto-Diaz has proposed
"faceted classification" for a reusable software library - a concept
he found in library science.
In a faceted classification scheme, the facets may be considered to be
dimensions in a cartesian classification space, and the value of a
facet is the position of the artifact in that dimension.
For software, one might have facets with values such as "Operand",
"Functionality", "Platform", "Language", .... Prieto-Diaz claims that a
fixed (and small) number of facets is sufficient for classifying all
software.
Implementation of a Web Librarian
At the bare minimum, this is a classification system, database, and
means to populate the database.
But instead of blindly indexing all the words in a zillion web pages,
it should distill or encapsulate domain intelligence and structure.
A user's query should not just shoot keywords at an index,
but should be "understood" by the librarian,
sufficiently that it can direct you to the appropriate library section.
The Librarian should be thesaurus-based so that it can suggest synonyms
and
related concepts. The Librarian should be an active participant in the
user's exploration of the library.
Most web indexing systems don't have any provision for the author of a
web resource to offer any guidance on how it should be indexed.
The WLn system should not only allow, but positively encourage authors
to provide some meta information.
Although full text searches of a Web archive are an important way of
identifying relevant information, sometimes it can be very useful to
base searches on document attributes such as
author, keywords, language, etc.
The HTML specification defines a special markup element for this
purpose: the <META>.
This tag can be used to augment documents with information that is not
normally displayed by
browsers. It provides document authors with a mechanism for identifying
information that should
be included in the response headers for an HTTP request. The markup is
stored as attributes of a
tag and is not displayed if the document is loaded into a browser. It
can however be extracted by
servers and clients for use in identifying, indexing, and cataloging
documents.
The full system would be something like:
- Internet resources (web, ftp, news, ...)
- resource meta information
- classification system(s)
- databases
- search engines
- NL parser
- learning system
- expert system
- network of WLns
- human support
The fundamental architecture of this system would be based on a series
of levels where a query might be resolved, rather like memory management
levels:
- cache
- "FAQ"
- index db
- internet resources
- subdomain WLns
- human support
The basic algorithm then would be:
- Parse query
- Identify possible subject domains
- Pass on query to other WLns or human if inappropriate
- Search levels from top till query resolved
- Record answer in the levels above the one where it was resolved.
Quality Control
The issues of quality control need consideration, e.g.
criteria for acceptance into the
catalog; and whether some kind of rating system would be useful.
Related to this is the possibility of assigning multiple
keywords to each entry,
perhaps with relevance weightings so that search results could be
sorted to help the user select the entries closest to their needs.
Weeding the list must also be done to remove URLs that become
misleading, obsolete,
or are a lesser quality duplication of another URL.
Navigation
Having collected a lot of good, up-to-date URLs, it is of course
essential for users to be able to locate what they need very
quickly with high precision and recall. The two main methods are by
browsing and searching. The browse hierarchy is currently
only two levels deep (excluding the root), and could be deeper.
Alternatively, a faceted classification scheme (multiple keywords)
could replace the hierarchy with a directed acyclic graph
permitting multiple links from category parents, which would
improve the likelihood that relevant entries are found from a given
starting point.
Database Design
Information about the resources will be stored in a relational database.
The following information may be used to search the database:
| Title |
The name of the object.
This will normally be the Title as given in the HEAD of the HTML file.
|
| URL |
The content is a URL to fetch an instance of the resource.
String or number used to uniquely identify this object.
This is the key field and must be unique.
|
| Author |
The person(s) and/or organization(s) primarily responsible
for the intellectual content of the work.
|
| Abstract |
A description or annotation of the object.
|
| Publisher |
The agent or agency responsible for making the object available.
|
| Date |
The date of publication.
|
| Other Agent |
Other person(s) and/or organization(s), such as editors,
transcribers, sponsors, etc. who have made significant
contributions to the work. Author and Publisher are special cases
of OtherAgent.
|
| Keywords |
The abstract category of the object defined by a fixed set of keywords.
The keywords are partitioned into the following facets:
| Function |
The main activity in which the object applies.
|
| Context |
The setting or environment in which the object is used.
|
| Object |
The object itself.
|
| Medium |
Stuff the object is built from.
|
|
| Type |
The particular manifestation or data representation of the
object, such as PostScript file or Windows executable. For URCs,
form will typically be specified as an Internet Media Type -
formerly known as the MIME Content-type.
|
| Relation |
Relationship to other objects. This element should
identify the role of the relationship, as well as the related
objects.
|
| Status |
An indicator for the state of the object in the db, e.g. new entry;
to be deleted; etc.
|
| Language |
Natural language of the intellectual content.
|
| Email |
Electronic address of the resource maintainer.
|
Bibliography
in WebWeek.
- Christian Neuss, Robert E. Kent
-
Conceptual Analysis of Resource Meta-information.
- R. Prieto-Diaz and P. Freeman.
-
Classifying Software for Reusability. IEEE Software,
4(1):6-16, January 1987.
- Ron Daniel and Michael Mealling,
-
An SGML-based URC Service
- D. Cohen,
-
A Format for E-Mailing Bibliographic Records, RFC 1357
- Stuart Weibel, Jean Godby, Eric Miller, and Ron Daniel (eds),
-
OCLC/NCSA Metadata Workshop Report
|