To provide a framework for the management and annotation of a very large corpus for a search engine for linguists, a two-stage process was designed. The first step uses a series of XML filters to extract data from the Internet and select relevant material for use. These sentences are then loaded into a central database that holds annotations as well as information about which sentences need to be annotated; the sentences can be annotated in parallel by various geographically distributed nodes.

Aaron Elkiss 2003-05-14