ht://Dig Copyright © 1995-2002 The ht://Dig Group
	  Please see the file COPYING for
	  license information.
	
The system performs three major tasks that should be performed in the following order:
	  Digging is the first step in creating a search database. This
	  system uses the word digging while other systems call
	  it harvesting or gathering. In the ht://Dig
	  system, the program htdig performs
	  the information gathering stage. In this process, the program
	  will act as a regular web user, except that it will follow
	  all hyperlinks that it comes across. (Actually, it
	  will not follow all of them, just those that are within the
	  domain it needs to gather information on...)
	   Each document it visits is examined and all the unique
	  words in this document are extracted and stored, excepting
	  those specified as 
	  too short, too
	  long, or to be
	  excluded by the configuration.
	
The digging process will create at least two files. The first one is the list of all the words and the second one is a database of URLs and information about the URLs. Other files may be created for a list of all URLs seen, all images seen, ASCII versions of the databases, etc.
Once the digging process is complete, the data must be converted into something the search engine can actually use. The htmerge program does this.
The term "merge" is used because data from several databases is gathered together and merged into several other databases. The source databases include the databases created by the latest "dig" but also any previous merged databases. The latest dig will produce a database that provides information on new pages and information on changes to previously existing pages; the information on the new pages, and the new information on changes to old pages is merged with the unchanged information to create up-to-date databases.There are other, optional, tasks which are categorized under the merge phase:
Searching is where all the information gathered and organized during the dig and merge stages gets put to use. The htsearch program performs the actual searches. The CGI program, using the HTML "search form" on the website as input performs the search and produces the HTML output (or, the "failed search") which is seen by users.