An Introduction to Web Farming

Home · New · Introduction · Mission · Services · Discussions · Contact · Contents
 

Previous Back to Beginning Next

Information Refining

The paradigm of the Web is radically different from the paradigm for the data warehouse. Adapting an old programming term, you might say that web content is spaghetti data. That is, it can link to anything with little discipline. Furthermore, web content is highly volatile and constantly changing. The Web's diversity challenges our imagination and appreciation for new forms of creative expression. The problem is cultivating from that diversity those few nuggets with real business value.

To do so requires a discipline to transform raw data into validated information -- much like a farmer transforms seed into a harvest. With web farming, this discipline is called Information Refining and consists of four processes: discovery, acquisition, structuring, and dissemination.

Discovery is the exploration of available Web resources to find those items that relate to specific topics. Discovery involves considerable "detective" work far beyond searching generic directory services (such as Yahoo) or indexing services (such as AltaVista). Furthermore, the discovery activity must be a continuous process because data sources are continually appearing (and disappearing) from the Web. A business analyst is the central figure in this activity and requires advanced search and indexing tools to be productive.

Acquisition is the collection and maintenance of content identified by its source. The main goal of acquisition is to maintain the historical context so you can analyze content in the context of past changes. Acquisition requires a secured server platform with large storage capacity.

Structuring is the analysis, validation, and transformation of content into a more useful format and into a more meaningful structure. The formats can be Web pages, spreadsheets, word processing documents, and database tables. As we move toward loading data into a warehouse, the structures must be compatible with the star-schema design and with key identifier values.

Dissemination is the packaging and delivery of information to the appropriate consumers, either directly or through a data warehouse. It requires a range of dissemination mechanisms from predetermined schedules to ad hoc queries. Newer technologies such as information brokering and preference matching may be desirable.

There is a bi-directional flow to the processes. The left-to-right flow refines the content of information, which becomes more structured and validated. The right-to-left flow refines the control of the processes, which become more selective and discriminating.

Previous Back to Beginning Next

 


Home · New · Introduction · Mission · Services · Discussions · Contact · Contents

Copyright ©1998-2003 Bolder Technology, Inc. dba  WebFarming.com.
All rights reserved worldwide. Revised 2003-06-02 04:56 PM
Site Design by A Net Presence, Inc.