Installation ....
Installation of searchdb is the same as any ASP.Net Version 2 application and is normally straight forward.
Using IIS create a virtual directory where you want the search application to be installed, usually searchdb.
Copy or ftp all the files to the folder.
The web.config file should be placed at the root of the site. If you already have an existing web.config file at the root of your site which is being used for another application, then you will have to copy the settings in the 'appSettings' from the searchdb web.config file into the 'appSettings' of your existing web.config file.
The database
The application may use Microsoft Access or SQL Server database.
Microsoft Access DatabaseIf you are using a Microsoft Access database (searchdb.mdb), this may be located where ever you want - you have to define its location in the web.config file. You may be restricted to the directory for the Access Database depending on the host server you are using. Some hosts define a specific location for the database because of security and permissions.
You have to set the permissions on the Access database to write or otherwise you will get a database update error when you attempt to crawl sites.
Make sure that the FOLDER where the MS Access file is located, has write access because MS Access has to create a lock file.
If you are using a hosting organisation, they usually identify a folder outside the root of the web site which has the correct permissions.
Otherwise you have to set the permissions by right clicking on the access database, select Properties and then the Security tab, then change the permissions for the appropriate user.
I have created a button on the define site admin page which tests if the database can be written to.
If you are using SQL Server, then a sql installation script is provided (searchdb.sql). First create a database called searchdb, create a new user for this database, and run the sql script against the database. The script will create all the tables and set up default values where appropriate.
The connection string for the database must be entered into the web.config file.
web.config file
The application settings are within the web.config file.
You must have the database connection string to identify the database - other settings are optional and default setting will be used if they are not included.
<configuration>
<appsettings>
<add key="pg_CrawlerConnect" value="your provider string" />
<add key="pg_ButtonText" value="Search" />
<add key="pg_Instructions" value="Enter one or two words here" />
<add key="pg_NorecordsFoundText" value="No records found - try again" />
<add key="pg_XRecordsFoundText" value="records found" />
<add key="pg_CodePage" value="1252" />
<add key="pg_ErrorLogging" value="true" />
</appsettings>
</configuration>
Settings
The application settings in the web.config allow you define your own static text so that other languages may be accommodated.
The file search.aspx is a basic search page displaying the search engine results. Modify search.aspx to fit in with your site design layout.
User options in web.config consists of :
| pg_CrawlerConnect | This is the data provider connection string for the database. You must construct the provider string which will depend on the type of database you are using. If its an access database then it will be of the form : <add key="pg_CrawlerConnect" value="Provider=Microsoft.Jet.OLEDB.4.0;Data Source= d:\inetpub\aspnet\db\searchdb.mdb;Persist Security Info=False" /> To help identify the location of the mdb directory on your server, I have included a file called a_map.aspx which displays the file name and path of a_map.aspx. From that it should be possible to work out the file name and path for the Access database. If its a SQL Server then it will be of the form : <add key="pg_CrawlerConnect" value="Provider=SQLOLEDB.1;Password=people;Persist Security Info=True;User ID=searchuser;Initial Catalog=searchDB;Data Source=169.254.219.170" /> The Provider string is just a standard OLEDB connection string. If it is MSDE then it will be of the form : <add key="pg_CrawlerConnect" value="Provider=MSDASQL;Persist Security Info=False;User ID=user_name;Password= user_password;Initial Catalog=user_catalog;Data Source=user_databasename;Connect Timeout=15" /> The admin page includes a button which you may use to check that the database connection is correct. |
|
| pg_ButtonText | A string identifying what text the search button should display. This is defined as 'Search' by default. | |
| pg_Instructions | A string which appears above the text box and is intended to display a few words of brief instructions. | |
| pg_NoRecordsFoundText | A string which is displayed when no records are found. | |
| pg_XRecordsFoundText | A string which is displayed following the number of records. e.g. 32 records found, in this case XRecordsFoundText contains the string "records found" | |
| pg_CodePage | This is an optional entry which can be used to define the code page when the web
pages are read. If this setting is left out, then a standard code page of 1252 is used
which should be correct for Western character sets. If you wish to use a different code page, the value is integer of the form value="950". In some circumstances you may see ? characters when the pages are indexed. This is an indication that the code page is incorrect. |
|
| pg_ErrorLogging | This is an optional entry which can be used to define if error logging is to be
enabled. If this setting is left out, then no error logging occurs. Logging includes any database update errors, webcrawler and indexing errors. It also logs the search words which are entered in the search dialog box. Logging details are stored in database tables and can be viewed using the administration web pages. |
The search page
You may either use search.aspx or searchUC.aspx as the main search page and to display the results from the search engine. The file searchUC.aspx has a user control.
Some users wish to have a search box on each page so that when you click on the search button the search engine results are displayed on a separate page. The page default.htm is an example of how to do this. default.htm contains a form which you should place on each of your web pages. When you click on the search button, it will direct the output to search.aspx.
Operation
You log onto the management display system through a simple logon form which is accessed by http://www.yourserver.com/searchdb/admin/default.aspx. The user names and passwords are stored in the database.
The default username / password is admin / admin.
Once logged in you are able to set up the system :
Base URL :-
The root of the domain. This is in the form http://www.yourserver.com
-
The page from where you want crawling to start. This is in the form
http://www.yourserver.com/default.htm Usually this is the home page of the site
but may be any page such as a site map.
-
A list if exclude directories. This is a comma separated list of directories
which are to be excluded.
-
Defines if this URL is to be crawled. Tick the box to indicate that you want
this URL to be crawled
-
A list of exclude words. This is a comma separated list which you want
the indexer to ignore. Single character words are not indexed.
-
A comma separated list of valid file extensions.
-
Start of extract. In many web sites, the first few characters of text do not
give information which would be of value as an extract. This parameter
effectively shifts the start point. The default value is 1.
-
Length of extract. Up to 255 characters can be used as the extract text. The
fewer the characters, the smaller the database needed to store it. The default
value is 150.
-
A timeout period for crawling the site. If the site is large, this period may
have to be extended or otherwise the page will program will stop and an error
message will be displayed.
-
The use meta description as extract tick box defines whether the meta
description will be used as the extract text. The meta description is
of the form e.g. <meta name="description" content="Javascript code for
downloading"> and appears in the head of the web page.
If the web page does not contain a meta description tag, then the extract will be made from the body of the web page. When un ticked, the extract text always comes from the body of the web page. The problem with taking the extract from the body of the web page is that sometimes the text may not be sensible as you may have menu entries etc. Using the meta description tag should give more sensible and controllable results.
-
The meta data tick box defines whether meta data such as keywords, description
etc which can appear in the head of a web page will be indexed into the
database. By default, this is set to not index meta data. The reason to stop
meta data being indexed is that quite often the meta data may not have a great
deal of relevance to the rest of the text on the page and will cause search
results to display incorrect values.
The search form
Once the site has been crawled and then indexed, to search the site you use the search form which is accessed by http://www.yourserver.com/searchdb/search/search.aspx or http://www.yourserver.com/searchdb/search/searchUC.aspx or http://www.yourserver.com/searchdb/search/default.htm
| Copyright © 2008 | Page updated March 2008 |