. Search the site
FRDCSA | internal codebases | Crawler

Crawler

Architecture Diagram: GIF

Jump to: Project Description | Capabilities

Project Description

The seeker algorithm is relatively straightforward. Both keywords and URLs are used to seed the search. Keywords are used to search online search engines to retrieve web pages, through a module which learns effective queries. URLs are spidered. Speculative fetching is performed based on expectation that site is a project URL or a metasite, as classified by WebKB tools. In this way, a database of project URLs is found. Next, we use information extraction to populate KBs about software systems, then use these to intiate searches. Eventually we would like this to extend this to a set of tactics for retrieving all information related to packaging and systems integration.

Capabilities

  • Extract version numbers from HTML. vbolten-3.0. SNA, software. Vulnerability assessment. crawler determines whose running vulnerable software. Graph isomorphisms.
  • Come up with new name for crawler.
  • Sorcerer, -a crawler


This page is part of the FWeb package.
It derives from the Robotics Institute projects page.
Last updated Mon Jan 15 08:34:48 CST 2007 .