Finding Out About

Richard K. Belew

CURRENT STATUS (27 Mar 01)

Finding Out About [R.K. Belew, Cambridge Univ. Press, 2000] has been assigned an ISBN = 0-521-63028-2

You can get it through the Cambridge University Press website

Amazon.com also knows about it.

FOA "Users"

That is, those of you who have already purchased copies of FOA: The FOA User Registration form I encourage you to register your copy. Benefits include:

Participation in the RAVE relevance assesment data collection effort;
Access to the FOA discussion board
Specification of the beneficiary of "your" reader royalties (cf. FOA's Active Colophon)
Typo, bug fix, etc. alerts

This information will be used only by me, and only for these FOA-reader-related purposes!

Other FOA resources

The following resources may be of some use:

I am collecting Errata that I and others have found in FOA. I am sure there are more, so please let me know if you find other bugs. I have currently have no plans for a second edition, so this list will have to suffice.
I have developed an HTML version of the FOA manuscript, missing figures and most equations but with all intra-FOA hyperlinks and links to other related WWW resources. This is NOT the same HTML version of FOA as on the CDROM (which has all figures and math and many typo fixes). Rather, this version is intended as a simpler, public version that we are using for research purposes.
Overview of CD-ROM contents
Table of contents (GZip'd Postscript, 420K) and (PDF, 690K)
Chapter 1 (Overview) (GZip'd Postscript, 778k) and PDF, 6.6M
Chapter 8 (Conclusions) (GZip'd Postscript, 196k) and PDF, 4.0M

Instructors

INSTRUCTORS interested in using FOA with classes should see the additional resources listed on the FOA instructors' page and mailto:rik@cs.ucsd.edu for access to additional materials.

Abstract

People have been producing books, technical papers, judicial opinions, newspaper articles, and personal correspondence since before the printing press. But just as that technology changed what was produced and who could read it, computers and networks have much more recently transformed the way we can communicate with one another.

This text is focused on the problem of "finding out about" (FOA): Identifying documents that help someone learn more about a topic of interest. "Information retrieval" (IR) is the name of a sub-discipline within computer science that has developed a number of core technologies for constructing a statistical characterization of words occurring in each document. This is used to efficiently search through very large textual corpora for documents that a user is likely to find "relevant." Construction of an IR "search engine" and evaluating its effectiveness will be our first focus.

The recent confluence of Internet, HTML, WWW, browsers, agents, etc. technologies with the exploding range of digital media beyond text makes IR techniques essentially important to every computer scientist. These same changes have stretched traditional IR methods, and the second focus for the course moves to a number of advanced techniques that extend the basic IR search engine's capabilites. Chief among these are probabilistic and statistical methods shared with other areas of computer science, particularly machine learning. A second source is artificial intelligence, reasoning about the relations among keywords (thesauri, classification taxonomies), relations within documents (e.g., chapter/section/subsection structures, footnotes, prerequisites), and relations connecting documents to one another (citation), etc. Computational linguistics is a third, useful source of techniques for parsing sentences, identifying parts of speech, phrasal units, etc. We will also analyze the WWW and other recent innovations directly, to see which aspects of the FOA problem are changed by these new technologies, and which have remained the same.

With these technical foundations in place, we consider two of the most intellectually difficult issues underlying the FOA problem. First, exactly what does it mean for language to "mean" something? Questions of semantics like this one have always been a central to a philosophy of language, but which variety of meaning concerns us most when we attempt to find documents "about" some topic? Second, what is the social context within which the FOA activity operates? How have economic, legal, political, etc. forces shaped the publication activity, and how are new technologies changing these social forces? The central premise of the FOA text is that it is possible to give students some "job skills" that are quite immediately rewarding and at the same time use this opportunity to acquaint them with some of the most exciting, intellectually stimulating and open questions of modern science and philosophy.

Last modified by: rik@cs.ucsd.edu 1 Sept 01