'); //FTP password account
6. Update PhpDig
==============================
6.1. Database update
-------------------------
The [PHPDIG_DIR]/sql/update_db_to[version].sql contains all required
SQL instructions to update your existing install of PhpDig.
Vous pouvez également utiliser l'interface d'installation en choississant
l'option de mise à jour de la base. Cette fonction n'opère que pour la
version immédiatement précédente de la base de données.
6.2. Scripts update
-------------------------
Save your configurations files, and just replace the existing scripts by the
new ones.
7. Indexing with web interface
==============================
7.1. Index a new host
-------------------------
Open the admin interface with your browser : [PHPDIG_DIR]/admin/index.php.
Just fill in the url field, PhpDig reconizes if it is a new host or an existing one.
You can also precise a path and/or a file, wich is the starting point of the robot.
Select the maximum search depth in levels and click on the "Dig This !" button.
A new page opens showing the indexing and spidering process.
If a double is displayed, it means that PhpDig has detected that the current document, with
a new url, is a duplicate of an existing one in the database.
Each "+" sign means that a new link was detected and will be followed at the next
spidering level.
For each level, PhpDig displays the number of new links it has found.
If no new link is found, PhpDig stops its browsing and displays the list of the documents.
7.2. Update an existing host
-------------------------
From the admin page, you can reach the update interface
by choosing a site and clicking on the [update form] button.
A two parts inteface appears.
On the left side of the screen is the client-side folder structure of the site.
The blue arrow displays the "folder" content, in order to reindex the documents individually.
The document's listing of a folder is on the right side of the screen.
On both sides, the red cross deletes all the selected branch or file,
including sub-folders in case of deleting a branch, from the engine.
The green check mark reindexes the selected branch or document if they were indexed
for more than [LIMIT_DAYS] days. It also search new links for documents wich are
changed.
7.3. Index maintenance
-------------------------
3 scripts are used to delete useless data in the PhpDig database.
The links are in the admin page.
Clean index deletes index records not linked to any page.
Useful if manual deletes are done in the database.
Clean dictionary deletes keywords which are not used by the index.
Useful for reducing the size of the dictionary,
particularly when a large site contains a great deal of technical words and is deleted from the
engine.
Clean common words must be run when new common words are added in the
[PHPDIG_DIR]/includes/common_words.txt file. It deletes all reference to those common words.
8. Indexing by command line interface
==============================
Le script [PHPDIG_DIR]/admin/spider.php could be lauched by the shell
in order to not overload the webserver.
Launching the script :
#php -f [PHPDIG_DIR]/admin/spider.php [option]
List of options :
- all (default) : Update all hosts ;
- forceall : Force update all hosts ;
- http://mondomaine.tld : Add or update the url ;
- path/file : Add or update all urls listed in the given file.
Examples :
#php -f [PHPDIG_DIR]/admin/spider.php http://host.mydomain.com
#php -f [PHPDIG_DIR]/admin/spider.php [File containing an urls list]
As any shell command, the output can be redirected to a textfile.
(If you want some logs.)
#php -f [PHPDIG_DIR]/admin/spider.php all >> /var/log/phpdig.log
The [PHPDIG_DIR]/admin/spider.php can be launch by a cron task too, in order to auto update
the index. The recommended periodicity is 7 days. The updated documents you want to see
immediately in the searches can be updated manually.
Those pages can contain a "revisit-after" metatag with a short delay.
9. Templates
==============================
9.1. Description of templates
-------------------------
Templates are HTML files containing some xml-like tags wich are replaced with
the dynamic PhpDig content.
See the provided templates source code as making templates example.
The tags display the
results table : All content between the tags will be repeated as much
time there are results in the results page.
Two CSS classes are used by PhpDig :
.phpdigHighlight : class for highlighting of search terms.
a.phpdig : class for phpdig results and navigation links.
All template tags look like : .
Excepted the
tag, all are stand-alone tags.
9.2. Tags outside the results table
-------------------------
phpdig:title_message Page title
phpdig:form_head Starting the search form
phpdig:form_title Form title
phpdig:form_field Text field of the form
phpdig:form_button Submit button of the form
phpdig:form_select Select list to choose the num of results per page
phpdig:form_radio Radio button to choose the parsing of search keys
phpdig:form_foot Ending the search form
phpdig:result_message Num of results message
phpdig:ignore_message Too short words message
phpdig:ignore_commess Too common words message
phpdig:nav_bar Navigation bar to browse results
phpdig:pages_bar Navigation bar without previous/next links
phpdig:previous_link src='[img src]' "Previous" icon
phpdig:next_link src='[img src]' "Next" icon
9.3. Results table tags
-------------------------
phpdig:results Contains results list
phpdig:img_tag Relevance Baragraph
phpdig:weight Relevance of the page (in percents)
phpdig:page_link Result title and link to the document
phpdig:limit_links Links of limitation to an host / path
phpdig:text Highlighted text extract or summary
phpdig:n Result ranking, starting 1.
phpdig:complete_path Complete URL of the document
phpdig:update_date Last update of the document
phpdig:filesize Size of the document (KiloBytes)
10. Insert PhpDig in a website
==============================
10.1. The index.php script
-------------------------
The index.php script is only an example of using PhpDig with the
same name template. This script can be inserted in any part of your website,
assuming the configuration files and libraries are included.
The $relative_script_path must contain the relative path of PhpDig's root
directory from the current script.
The phpdigSearch() must be called always as this :
extract(phpdigHttpVars(
array('query_string'=>'string',
'template_demo'=>'string',
'refine'=>'integer',
'refine_url'=>'string',
'site'=>'integer',
'limite'=>'integer',
'option'=>'string',
'search'=>'string',
'lim_start'=>'integer',
'browse'=>'integer',
'path'=>'string'
)
));
phpdigSearch($id_connect, $query_string, $option, $refine,
$refine_url, $lim_start, $limite, $browse,
$site, $path, $relative_script_path, $template);
The last parameter, $template, sets the way how PhpDig works :
Use 'classic' for the static look of PhpDig. All html tags are
included.
Use the $template variable to use a template.
The variable could be set in the config.php file or anywhere else, in
order to have a different look in distincts part of your website.
Use 'array' to do what you want with the search form and results.
10.2. Using 'classic' mode
-------------------------
There is nothing to do : All display is done by the phpdigSearch()
function. But you can't modify display with this.
10.3. Using templates
-------------------------
With templates described earlier, it is very easy to insert PhpDig in an
existing website look.
A template could be an entire HTML page as sample templates provided,
but only a part of it too. Only this part is described by the template and
will appear where you call the phpdigSearch() function.
You just have to add a .phpdigHighlight CSS class to you existing CSS.
10.4. Using PHP
-------------------------
using the 'array' mode, The phpdigSearch() function returns an
array containing both results and search form elements.
Use this to get all content of this array, giving the script a full
search results URL :
print '';
print_r(
phpdigSearch($id_connect, $query_string, $option, $refine,
$refine_url, $lim_start, $limite, $browse,
$site, $path, $relative_script_path, 'array')
);
print '';
And do what you want with the results (first in big, the following three
in medium size, and only the title of others at the right side, all is possible !).
11. Inside PhpDig
==============================
11.1. Spidering and Indexing
-------------------------
PhpDig reads the fist page entered for indexing and adds found links in a
list of links to follow.
When no more valid link is found by the robot, it stops the process.
Do decide what to do with a new link, PhpDig follows this procedure :
- It requests the HTTP header for the current URI. If the returned Mime-type
could be parsed by the robot, it continues its process.
If the server returns a redirection, PhpDig search if the redirection go
to another host or not.
Then the robot compares "last-modified" header with the previous stored
date. If they are the same, the URI is not processed.
At least, the robot compare the URI with the exclude list.
- If the document type is HTML, PhpDig reads the META Robots content to know
if it is allowed to index and/or follow links from the current document.
- Then PhpDig downloads the document in a temporary file.
In first the document is indexed : The text content is stored in a file in order
to display snippets on results page, then parsed in order to get the keywords.
For an HTML document, the exclude and include comments are searched
(PHPDIG_EXCLUDE_COMMENT and PHPDIG_INCLUDE_COMMENT constants) to exclude parts
from indexing.
- At least, PhpDig reads again the temporary file (in case of HTML document)
in order to extract new links. All links are tested and parsed to decide
those to index, those giving a 404 error, linking to another host and so on.
- Indexing process is exclusive : An host is locked by the spider during
indexing, update or delete. No operation on this host is permitted (excepting
search of course) as long as the host is locked.
11.2. Clearings on search
-------------------------
The search form is so simple that it not needs lot of explanation.
But it could be useful to know that :
- An AND operator is applied between each search key ;
- Putting a '-' sign before a word excludes it from the search results.
No document containing this word would be displayed ;
- Search is case-insensitive and accent-insensitive.
12. Getting help with PhpDig
==============================
A messageboard dedicated to PhpDig can be found at :
http://www.phpdig.net/
Ask there any questions you have about this script.