Project   Papers  

Overview

XML Crawler is a tool to index and search the XML web. It collects XML documents, stores them in a Native XML Database (either Tamino or XIndice). Then, the end-user can search for keywords within the XML repository, using a HTML form. Besides conventional keyword searches, XML Crawler makes it possible to search for the keywords inside specific tags, either in a top-down or in a bottom-up fashion. The search can then be interactively refined as the user navigates through the XML structure.

It comprises three tools:

1 - a command-line crawler tool

2 - a search/query engine with jsp interface

In the links bellow you can see the demo interface of the:

1 - Initial search interface

2 - Search results interface

3 - Refinement interface

In the next few days, I'll be posting the first alpha version (0.0.1) of the Crawler, initially only for the XIndice database. The search engine is almost finished, though it needs to be polished to be in production-ready state. The development of the crawler has already begun, but it is still in a very beginning phase.

I'm developing this software all by my self. If you want to contribute with some code, especially in the crawler component, please drop an e-mail.



Developed with:
XML Starter Kit SourceForge Logo Netbeans XIndice Apache
Hosted on:
SourceForge Logo

General Public License
© 2002 by Alessandro Coelho Ribeiro
alessandro.ribeiro@ajato.com.br