Introduction ....
searchdb is an ASP.NET search engine written in VB.NET version 2. It incorporates a webcrawler, indexer and site search engine. The program uses a database to store the crawled pages and extracted words and the results are displayed in a way similar to popular internet search engines.
Try it out on this site by clicking here.
The program is capable of indexing static text web pages and also dynamic pages which are normally
extracted from a database and are of the form
'default.asp?name=value'
Formats such as Adobe pdf, Microsoft word, Macromedia Flash etc are not supported.
The engine can be used on relativly large web sites if SQL Server is used for the database. If MS Access is used as the database, then it tends to run slower and so would be more suitable for smaller sites.
Features
- Crawls and indexes static and dynamic web pages.
- Able to crawl multiple sites.
- Stores the crawled URLs and associated words in database tables.
- The word indexer extracts title, meta data, alt text and visible text from the web page.
- Common words are excluded by the word indexer and search engine.
- Search results are displayed in order of word hits in a way similar to popular internet engines.
- Written in VB.Net Version 2.
- Works with either Microsoft Access or SQL Server databases.
- User Agent for the crawler identified as "http://www.webconcerns.co.uk crawler".
- Set up via password protected management displays.
The Crawler
The webcrawler starts crawling from a given page extracting a list of url links. It then spiders each link, extracting further links. Eventually all pages for the domain are listed in the database.
As each url link is found, the words on the page are extracted including meta tag keywords, meta tag descriptions and image alt text. These are stored in the database with the occurrence of each word.
All words of more than one character are indexed except those defined in the exclude word list. Also, punctuation marks are removed so you may see words such as asp.net being stored as aspnet within the database. The same parsing is done on the search side as well as on the indexing side, so searching for asp.net will return the correct results.
The current version does not obey the noindex and nofollow meta tag keywords which may appear in the head of a web page. If you wish to exclude certain areas of your site then you can do so by entering the directory names into the list of directories to be excluded. Then all files within the directories and any sub directories will not be indexed.
As each page is indexed, its file size is stored in the database. This is so that you may re-index only those pages that have changed in file size rather than re-indexing the complete site.
The Search Engine
To search the site you enter one or more words into a text box. Any words of one character are ignored, as are common noise words such as 'them', 'they', etc.
The search system is based on the word count within the pages. So if you do a search for 'cycling in Scotland' it will do a sql group by query based on 'cycling' and 'Scotland' and sorted by the word count. The word 'in' will be excluded as it is an exclude word. So a page which has the word cycling and the word Scotland several times will have a higher word count and hence higher relevance and will appear further up the top of the search results.
The speed of searching is usually less than 0.2 seconds. As the number of web pages increase, the time to search does increase but not significantly because all the processing has been done during the index, and the search method is based on an efficient sql query.
Management Displays
In order to set up and configure the system, a set of password protected web pages are provided.
These are accessed at http://www.yourservername.com/searchdb/admin/default.aspx
The default user name is 'admin' with a password of 'admin'
Requirements
You need a web server with Microsoft .NET framework version 2 installed.
Trial version
The trail version which may be downloaded from here, is provided so that you may check the operation of the program on your server. The trail version is exactly the same as the final version except that it is restructed to 200 urls and it displays 'this is an unregistered version' on various displays. Also the trial version does not include source code.
If the trail version operates with your system, there should be no problem with the registered version.
The purchased version is supplied with full source code.
Pricing
The cost of searchdb is 49 US Dollars per installation which includes the complete source code written in vb.net.
| Copyright © 2008 | Page updated March 2008 |