For information on the current workshop, go to
Building Web Farming Systems: Architecture and
Methodology
Overview and update to the book Web Farming for the Data
Warehouse: Exploiting Business Intelligence and Knowledge
Management. Covers the business concepts and justification
for Web Farming. Introduces the four-stage methodology and
explains the respective architecture required for each
stage. The role of the Information Analyst is
highlighted.
Enterprise Information Portals: Personalization and
Collaboration
Reviews the latest developments in the emerging
Enterprise Information Portal (EIP) area. In particular,
positions EIP as a required component for Web Farming
systems, and highlights the functions of personalization and
collaboration.
Searching the Web: Dealing with Global Search Engines
Examines the practices and tools for finding business
information by using global search engines. The strengths
and weaknesses of the top 6-10 services are outlined, along
with specific usage tips.
Searching the Web: Dealing with the Invisible
Dark-Matter
Only a fraction of web-based information are indexed by the
global search engines. The rest is hidden in six types of
Web 'dark-matter' and often accounts for the more valuable
business information. Tools and resources are surveyed.
Text and Hyper-Text Mining: Concepts and Tools
Text is often regarded as single blob with few definable
features. New techniques for mining the information from
text and especially hyper-text are surveyed. Covered are the
techniques for: Language Identification, Feature Extraction
(multi-term, abbreviations, acronyms), Noun Categorization
(person, place, duration), Document Categorization
(predefined topics), Document Clustering and Similarity
(discovered topics), Text Summarization, Text Search
(Boolean, fuzzy, relevance ranking). Analysis techniques of
explicit and implicit links are surveyed.
Information Visualization: Concepts and Tools
Information Visualization or InfoViz has great potential
for enhancing the value of web-based information. However,
this technology is still in its infancy. Covered are various
approaches and concepts for InfoViz, along with a survey of
current tools.
In-Depth XML: Vocabularies for Information Exchange
A detailed examination of XML-related standards, such as DTD,
namespaces, XSL/XSLT, Xlink, DOM, RDF, and Dublin Core,
along with several under review (XHTML, XML-Data). Survey of
10-15 XML vocabularies that are being formed or are
established. Approaches to using XML to create
"information communities."
Acquiring Web-Based Information: Techniques and Tools
Discovering web-based information is only half the
battle. Acquiring this information so that it has business
value is the challenge. Covered are the issues of capturing
link versus content, time-varying content (movie rather than
photo), and data extraction. Examples of custom
'web-scraping' programs are given, along with a survey of
current data acquisition tools. Advances in intelligent
structuring of text into XML documents are reviewed.
Privacy and Intellectual Property Rights: The Deeper
Issues of BI
Any Web Farming effort should include an awareness of
privacy, confidentiality, and intellectual property rights.
The legal and ethic boundaries surrounding these issues are
constantly changing. Every organization engaged in Web
Farming should determine its policies for these deeper
issues. A suggested Code of Ethics is discussed.
Analysis of Hubs and Authorities on the Web
Based on Klienberg's work (as utilized in the IBM Clever
Project), multiple virtual information communities for
several topics will be examined. See the June 1999 issue of
Scientific American for a description of this technique.
Discussion of the practical application of this technique
and a comparison with new Google website.
Web Farming with Commercial Content Providers
Survey of 15-20 popular content vendors, with a more
detailed examination of several innovative firms.
Examination of approaches to XML delivery. Discussion of
licensing terms, especially for database archiving.
Survey of Information Analysis Tools
Survey of browser enhancements (Alexa, InfoSeek Express)
and separate personal tools (BullsEye, Copernic, WebCompass).
Discussion of requirement for advanced analysis and
collaboration within a group setting.