Feature Article: Posted 01/09/02

Resource Discovery Using Metadata

by George Rogers

What is Metadata?

Metadata is a description about an information resource using HTML tags. The most common example of a metadata system is a library card catalog, which describes resources and helps searchers locate them. Metadata elements can be put into many document formats. They can be filled with descriptive information about the document being created and can then read by a search engine. The www.uth.tmc.edu web site currently contains HTML files, Word documents, Excel spread sheets, and Powerpoint slide shows. All of these files require HTML tags to be added for proper searching to take place. The following include examples of Metadata elements for each document type:

Example
Metadata Elements:

HTML Tags for web pages: title, subject, description, publisher, creator, date, and keywords.

HTML Tags
for MS applications:

  • Word - title, subject, author, and keywords
  • MS Excel - title, subject, author, and keywords
  • MS PowerPoint - title, subject, author, and keywords

In an environment such as the traditional library, where cataloguing and acquisition are the sole preserve of trained professionals, complex metadata schemes such as MARC (MAchine Readable Catalogue) are, perhaps, acceptable means of resource description. In the more chaotic online world, however, new resources appear all the time, often created and maintained by interested individuals rather than large centrally funded organizations. As such, it is difficult for anyone to easily locate information and data of value to them and the large search engines - with all their faults - are often the only means by which new information may be found.

In such an environment, there is an obvious requirement for metadata, but this metadata must be of a form suitable for interpretation both by the search engines and by human beings, and it must also be simple to create so that any web page author may easily describe the contents of their page and make it immediately both more accessible and more useful. As such, compromises must be made in order to provide as much useful information as possible to the searcher while leaving the technique simple enough to be used by the maximum number of people with the minimum degree of inconvenience.

Why do we need to participate in putting HTML tags into our documents?

The UT-Houston Meta Data Set is a simple set of 11 descriptive elements based on Dublin Core methodology that can be used to describe network resources such as web pages and other document types. It is particularly useful for web page authors, because:

It is very simple to learn
It can be extended for more complex applications
It can be embedded invisibly in web pages
It is recognized by the World Wide Web Consortium

If web authors do not use Metadata elements, their documents will not be found with focused searches using advanced query language. There are some required metadata elements, while others are optional and can be used to further describe web documents for more relevant search retrieval.

One of the biggest problems facing web authors, is ever growing volume of web pages that are created daily. How do web authors provide a way to find the information they need within the ever growing web site.

As our university web resources continue to grow, searching for information will become more challenging. The use of Metadata elements combined with a robust search engine will make information retrieval more accurate and efficient.


University of Texas-Health Science Center at Houston

Office of Academic Computing
George J. Rogers - Web Site Content Coordinator
Last Modified: