The Internet has shaken up the world
of business intelligence, especially in the retail sector. Want to find out
what your competition is doing? Just log on to the Internet, and browse their
Never has retail competition been so
fierce, and never has business-intelligence data been so accessible. Online
catalogs are a mother lode of your competition's pricing and inventory data,
and it's in a machine-readable form. But getting at the data in online catalogs
in a consistent way isn't always easy. Online catalogs are often vast and have
relatively complex layouts.
The idea of extracting business
intelligence from your competition's Web pages can be described as "Web
farming," a term created by Dr. Richard Hackathorn. In his book, Web
Farming for the Data Warehouse (Morgan Kaufmann, 1998), Hackathorn defines Web
farming as "the systematic refining of information resources on the Web
for business intelligence."
Pricing and product information is
core intelligence data for retail-product industries, and many retail
E-businesses employ intelligence gatherers who track the online catalogs of the
competition. Automating this gathering effort is a necessity, and various
software applications are emerging that make it fairly straightforward.
This category of products and services
is still quite young. Hackathorn's
book is the definitive reference, and his Web site (www.webfarming.com)
tracks many of the products and related standards. He argues eloquently that the "benefits of
Web farming can be global in scope for the enterprise." And I agree.
InternetView: Farming The Web By Jason
Levitt, InformationWeek Columnist, November 1, 1999.
Richard Hackathorn clearly remembers
the moment the central idea for his new book Web Farming for the Data
Warehouse: Exploiting Business Intelligence and Knowledge Management (Morgan
Kaufman Publishers, Inc.: San Francisco 1999) came to him. He was sitting
through a tedious presentation at a conference for chief information officers
in 1997 when he jotted on his notepad that "the Web is the mother of all
data warehouses." Hackathorn, a former professor who worked on fundamental
concepts of enterprise systems, database management, decision support and data
warehousing, is a
well-known industry innovator and consultant. He defines Web farming as
the systematic refining of information resources on the Web for business
intelligence. To achieve that goal, relevant content found on the Web must be
refined into a form that is compatible with a data warehouse.
Although the Web is a dynamic and
expansive information space, it is a database designer’s worst nightmare,
Hackathorn notes. It is a free-form combination of text, images and virtually
any kind of information object for which somebody can develop a browser. It has
no structure at all, just a series of links and pointers, many of which no
longer work. Finding information on the Web, Hackathorn observes, is like
trying to find a needle in a haystack while people are constantly adding and
subtracting from the pile.
And while the Web has been
productively used to distribute information from data warehouses to users for
analysis, nobody has seriously addressed the possibility of using Web data as
input for a data warehouse. Hackathorn makes just that argument in this book.
While significant barriers must be overcome to make Web content suitable for a
data warehouse, the benefits, he suggests, outweigh the costs.
Hackathorn has divided his book into
four parts. Acknowledging the negative reaction many IT professionals have to
even trying to incorporate Web content into a data warehouse, his first section
is primarily motivational. He provides the business argument for Web farming --
noting that the efficient processes to turn data into information and then
knowledge are essential to the well-being of any enterprise.
The second section of the book lays
out a strategy for initiating Web farming efforts inside an enterprise.
Adopting a structure first articulated in the 1970s by Richard Nolan, a
professor at the Harvard Business School, Hackathorn lays out a four-step
process. First, the business case based on the objectives and business
environment of the enterprise must be made for Web farming. Then, the concept
has to be accepted and an infrastructure built. Next, pipelines to users have
to be established. Finally, the Web content must be structured for the warehouse.
The final two sections of the book
look at some of the tools and sources of content available to implement Web
farming projects -- there currently is no single solution for Web farming --
and the social and cultural ramifications of these efforts. In an extremely
interesting section at the end of the book, Hackathorn proposes a code of
ethics for Web farming.
Hackathorn has a vision. As he notes, Web
farming is not about technology or the Web. It is about basic business
practices in the contemporary environment. Moreover, he has the certainty that a visionary needs.
Indeed, he writes, "The
development of Web farming is certain. It will become a standard function
within data warehousing systems as companies strive in desperation for their next
competitive advantage." Doing business as usual, he writes, will no longer
be a viable strategy.
Down: Who Wrote the Books for DW? by Elliot King, an Associate Professor of Communications at
Loyola College in Maryland. Book review in Enterprise Systems Journal,
June 11, 1999
Thorough: enough theory and plenty of examples. Dr. Hackathorn's
compendium of data farming theory, techniques, and resources is about the most useful guide you can
find for understanding the mining possibilities of the sprawling Internet.
Not too technical first half is readable, and the second half is a treasure-trove of tools and
- A reader from Maryland, Amazon.com
Review, June 4, 1999
Ever hear of Web farming? Neither did
we, until tipped off to this site. Web farming takes the best of data mining, intelligent agents and push
technology, and creates a whole new discipline. Consultants and those
selling just-named technology ought to learn more about Web farming -- it just
might turn out to be the next "knowledge management."
Web farming is best defined as
business intelligence using Web-based information resources. Similar to data
mining, Web farming deals not with internally stored information, but the
collection and internalization of information from external sources.
In layman's terms, Web farming is Web
surfing, polished and with a purpose. Instead of searching the Web for
information, Web farmers look for reliable data "seeds" which when
combined and watered daily reap a harvest of usable information. The
agriculture metaphor becomes tiresome but we think you get the point.
Why Web farming? In today's speed-of-light
market twists and turns, internal data becomes less of a factor. In fact by
focusing only on internal data you may actually be more vulnerable to outside
factors. This site creates the wonderful analogy of a person "serenely
contemplating his navel" while an unseen lion moves in for the kill.
At Webfarming.com you'll find lots of
background information on this emerging practice, including links to Web
Farming articles in other magazines. A thin discussion forum offers the chance
to share ideas on Web farming. Information on upcoming presentations, seminars,
and workshops are to come in the next year. We'd like to see more content, and
the design needs work (canary-yellow links on a white background?).
The site also represents a novel idea:
write a book and promote it online with resources and discussion forums. While
we expect Random House isn't beating down the barn doors to get at
Webfarming.com's secrets, this use of the Web to promote the print medium is
The future of Web farming is up for
debate, though the idea has merit. Even if you have little interest in Web
farming, you should check
out this site for no other reason than to say you knew it before it was big
- KM World Website Reviews
(April 6, 1999)
In Web Farming for the Data Warehouse,
author Richard D. Hackathorn applies his 30-plus years of information expertise
to the novel concept of "Web farming." He lays out the methodology of
cultivating the global Web for information relevant to an enterprise's operation.
Although this title is targeted at information managers in large organizations,
the basic ideas contained within can easily be applied to businesses of all sizes to some degree.
The first part of the book, aptly
titled "Plowing the Soil," presents the importance of gathering
information to build institutional knowledge for competitive reasons. The
author explains how Web farming fits in with the more established concept of
data warehousing and emphasizes the way high-capacity information gathering can
change business processes.
In the central portion of the book,
the author explains the process of moving an organization to Web farming from
both technological and managerial perspectives. Then he gets into the details,
explaining all of the related Internet standards, information tools, and online
databases waiting to be tapped. This section is amazingly comprehensive.
The book finishes with a discussion of
privacy and the effects of new information technology on society. If you're
interested in Web farming or simply want a taste of where the Internet is
likely to take us, this
title is sure to provide a fresh perspective.
-Stephen W. Plain, Amazon.Com
The gap between Internet and data warehouse
is an important one to be bridged. This book does a very good job in spanning it.
-Bill Inmon, Founder, Pine Cone Systems,
This book is about using the web as a source
of data for the data warehouse. It
gives good advice on separating useful web information from the useless
"flaky-free" data. As Richard
points out, the internet is the "mother of all data warehouses" but
few corporations have begun to harvest the hundreds of sources of data that
will allow them to look at what is going on outside the enterprise. His vision and the simplicity of explanation
make this book a road map
for the next five years of Business Intelligence.
What makes this book doubly useful, aside
from the easy to read writing style, is that Richard has melded together the three
biggest trends in our industry into a single strategy. Combining the Internet, data warehousing,
and knowledge management into one vision, Richard gives us insight into the next wave that will crash upon
been preaching this message to our customers only to find someone has written
an entire book on it!
- Dan Graham, Strategy & Solutions
Executive, IBM Global Business Intelligence Solutions
Hackathorn delivers a creative, thorough, and
down-to-earth guide that lets you harvest Web information to improve
decision-making and to enrich data warehouses.
-Maurice Frank, Business Intelligence
Services National Practice, IBM Global Services
I have reviewed Dick Hackathorn's
book, Web Farming for the Data Warehouse,
and am very impressed. He has done an
outstanding job of surveying the many aspects of this new but rapidly emerging
field. His insight that "...the
Web is the mother of all data warehouses" was only the starting point for
the intellectual journey he describes in easy to digest prose. Known to many as the "father of middleware" because of his pioneering
work early in this decade, Dr. Hackathorn is once again in the forefront of
another technological wave.
-Dr. Donald R. Deutsch, Vice
President, Middleware Development and Interoperability Center, Sybase
book on an important subject. . . Dr. Hackathorn's new book on web Farming is an
important look at the merger of two major technologies - data warehousing and
the World Wide Web. Readers will see the enormous value that can be gained from
a systematic approach to collecting web information. Hackathorn's writing style
makes the subject understandable to both the business manager and IT
professional. The extensive list of resources is helpful to those who wish to
quickly implement Web Farming systems.
-Christopher Ryan, President and CEO,
¨ Brings the reader to a new frontier of information
processing ... a tantalizing proposal for those interested in the dynamics of
being market responsive...
¨ The synthesis of many years of study
in information warehousing surprisingly leads the reader to yet another
plateau of applicability...
researched, and thought stimulating book ... if you want to learn what
will be happening in the Information Warehouse space over the next few years,
get started by reading the book!
Now that I have had the opportunity to
review the book, I can say that Dick's work and the book based on that work
represents one of the most exciting enterprises in the new world of human
function and work. The process of identifying web content, acquiring it as
validated sources, structuring it for storage and retrieval, disseminating it
effectively based on tight customer profiles, and managing these tasks as part
of a new data center service agency is indeed best practice in value-added
knowledge management. This book
provides a cogent primer and description of the motivation, perspectives,
foundations, methodology, architecture, management, standards, tools,
resources, techniques, information landscape, challenges and exciting opportunities
associated with web farming. It
describes a journey (not a destination), its emerging technology as applied to
the web, and the many potential ways that this journey (web farming) will transform business
function and management today and beyond.
Feel free to use any part of this, including
my name, as you deem appropriate.
-James F. Williams,II, Dean of Libraries,
University of Colorado
Web Farming for the Data Warehouse is
a compendium of information that is best-described as systematically making
intelligent use of the Web. The author makes a strong case for the Web as a
valuable source of information for data warehousing and business intelligence.
He explains the methodology for farming the Web -- creating a plan, building
the infrastructure, identifying information sources, extracting data, analyzing
it, and presenting information.
The book contains a wealth of
information about content-providers, protocols, standards, tools, discovery
services, knowledge management, Web agents, and data mining software. The book
also addresses topics such as leveraging knowledge and creating information
markets. The author, Richard Hackathorn, is a recognized expert in enterprise
computing and database connectivity. This is his second book on data warehousing.
He co-authored Using the Data Warehouse (Wiley) with Bill Inmon.
Hackathorn's book is as close to
leisure reading as I expect to find in an IT book. It belongs on the must-read list for
anyone having an interest in exploring the Web's potential as an information
- Ken North, consultant and database
This is a breakthrough book about gaining competitive advantage
through effective use of information-based technology.
-Jerry Donahue, President, BTI, the NBIA
winner of International Technology Incubator-1998, and U.S. SBA National award
for Technology Development.
The book is an excellent survey on the state
of the art in web-based information gathering.
Dick covers the gamut-it's a A-to-Z resource for effective tools, data,
and techniques for anyone who bears the dubious title of "knowledge
worker". WebFarming is well researched by an industry
veteran on the cutting edge of web-based information gathering and business
-Jim Harding, Chief Technology Officer,
Dr. Richard Hackathorn presents a different
philosophy and more effective manner of collecting data using the vast World
Wide Web, in his concept of Web Farming.
His idea encourages the user to look at the big picture, using greater
vision, and long-term focus. The
practical process of data collection used in Web Farming is presented in an
easy manner in Web Farming for the Data Warehouse, as Dr. Hackathorn's writing
style is conversational and simple to follow.
The illustrator he selected compliments this manner, as the pictures are
clear, humorous and entertaining. I do
not possess a hard technical background, yet I have worked in the high-tech
industry for the past 6 years. I was
able to read this book and understand the message Dr. Hackathorn was conveying. As a manager, understanding the concept of
Web Farming and knowing the tools necessary to set up a web farm will play a
great role in my career in high technology.
I highly recommend
this book to professionals in the data warehouse and research community,
managers in the high-tech industry, or any industry for that matter, and those
visionaries looking towards the horizon.
-Angenette N. Rider, Manager, Access Graphics
In early 1998, at the Fortune IT Strategy
Forum, Peter Drucker told his audience that "the single biggest challenge
you face is to organize outside data,
because change occurs from the outside" and went on to observe that
today's management, while swamped with inside
data, doesn't have any more real information than it did 40 years ago. Richard
Hackathorn's book addresses Drucker's point in spades. Frankly, the book is ahead of its time. I
think that not only will it help readers think outside the proverbial box, but
also give them the roadmap
for implementing their own Web farming. Outstanding list of annotated resources as well.
-Karen Watterson, data and knowledge
warehouse design consultant