Richard K. Belew


CURRENT STATUS (27 Mar 01)

FOA IS OUT!!

  • Finding Out About [R.K. Belew, Cambridge Univ. Press, 2000] has been assigned an ISBN = 0-521-63028-2
  • You can get it through the Cambridge University Press website
  • Amazon.com also knows about it.

    FOA "Users"

    That is, those of you who have already purchased copies of FOA: The FOA User Registration form I encourage you to register your copy. Benefits include:

    This information will be used only by me, and only for these FOA-reader-related purposes!

    Other FOA resources

    The following resources may be of some use:

    Instructors

    INSTRUCTORS interested in using FOA with classes should see the additional resources listed on the FOA instructors' page and mailto:rik@cs.ucsd.edu for access to additional materials.

    Abstract

    People have been producing books, technical papers, judicial opinions, newspaper articles, and personal correspondence since before the printing press. But just as that technology changed what was produced and who could read it, computers and networks have much more recently transformed the way we can communicate with one another.

    This text is focused on the problem of "finding out about" (FOA): Identifying documents that help someone learn more about a topic of interest. "Information retrieval" (IR) is the name of a sub-discipline within computer science that has developed a number of core technologies for constructing a statistical characterization of words occurring in each document. This is used to efficiently search through very large textual corpora for documents that a user is likely to find "relevant." Construction of an IR "search engine" and evaluating its effectiveness will be our first focus.

    The recent confluence of Internet, HTML, WWW, browsers, agents, etc. technologies with the exploding range of digital media beyond text makes IR techniques essentially important to every computer scientist. These same changes have stretched traditional IR methods, and the second focus for the course moves to a number of advanced techniques that extend the basic IR search engine's capabilites. Chief among these are probabilistic and statistical methods shared with other areas of computer science, particularly machine learning. A second source is artificial intelligence, reasoning about the relations among keywords (thesauri, classification taxonomies), relations within documents (e.g., chapter/section/subsection structures, footnotes, prerequisites), and relations connecting documents to one another (citation), etc. Computational linguistics is a third, useful source of techniques for parsing sentences, identifying parts of speech, phrasal units, etc. We will also analyze the WWW and other recent innovations directly, to see which aspects of the FOA problem are changed by these new technologies, and which have remained the same.

    With these technical foundations in place, we consider two of the most intellectually difficult issues underlying the FOA problem. First, exactly what does it mean for language to "mean" something? Questions of semantics like this one have always been a central to a philosophy of language, but which variety of meaning concerns us most when we attempt to find documents "about" some topic? Second, what is the social context within which the FOA activity operates? How have economic, legal, political, etc. forces shaped the publication activity, and how are new technologies changing these social forces? The central premise of the FOA text is that it is possible to give students some "job skills" that are quite immediately rewarding and at the same time use this opportunity to acquaint them with some of the most exciting, intellectually stimulating and open questions of modern science and philosophy.


    Last modified by: rik@cs.ucsd.edu 1 Sept 01