. Search the site
FRDCSA | packages | docco-0.3

docco-0.3

Jump to: Project Description | Capabilities

Project Description

Personal document management using Formal Concept Analysis

The tool is able to index local hard drives and everything mounted into the local file system, such as Windows or Unix network drives. It scans for a number of different document formats and creates a database containing which words are contained in which documents. This allows very fast lookup of keywords and other information like authors, title or location. The keywords used are generated from the bodies of the documents, such that no manual annotation is required.

Docco support the follwing formats:

* plain text * HTML * XML * OpenOffice/ StarOffice 6.0 documents * Word (with POI plugin) * Excel (with POI plugin) * PDF (with PDFbox or Multivalent plugin) * UNIX man pages (with Multivalent plugin)

Once an index is created, the query interface allows asking for any documents containing certain keywords and shows how these combine. Once a set of interesting documents is found, they can be selected and will be displayed as tree view, from which they can be opened in the default application.

Capabilities


This page is part of the FWeb package.
It derives from the Robotics Institute projects page.
Last updated Mon Jan 15 08:47:40 CST 2007 .