Home Documentation :: FAQ Documentation :: Templates |
The most important programs that have to be installed are
All the required programs should be available in common Linux distributions. However, if you do not have them and do not like to compile them from source (available via the apropriate homepage), have a look at the RPM search engines (rpm.pbone.net or rpmseek.com), wether a prebuilt package for your architecture is available.
Further requirements can be found in the section about Perl.
After unpacking the Document Archive gzipped tarball somewhere the install.pl should be run from console (fewest problems when running as root) to copy the files and configure docarc and the MySQL database. To do so the installer needs to know some facts about your MySQL and Apache installations. A sample session may look like
Install Document Archive v0.9.4 ------------------------------- Configuration by user input ... HTTP-Server configuration Document Root of your webserver (eg. /srv/www/htdocs) []: /SERVER/httpd/htdocs subdirectory where to store images, .css data and other static content (relative to Document Root) [docarc]: cgi-bin directory of your webserver (eg. /srv/www/cgi-bin) []: /SERVER/httpd/cgi-bin subdirectory where to store the scripts (relative to cgi-bin) [docarc]: user id the apache server is run under [wwwrun]: group id the apache server is run under [www]: MySQL configuration docarc's MySQL user (will be created) [docarc]: docarc's user's MySQL password : password1 MySQL host [localhost]: MySQL superuser (the one that is allowed to create databases etc.) [root]: MySQL superuser's password : password2 if MySQL runs on another server: how the http server is called from there [localhost]: docarc's MySQL database (will be created) [docarc]: optional ht://Dig search engine integration is ht://Dig installed and you want to use it for full text search? [y]: complete absolute path to htsearch binary (eg. /srv/www/cgi-bin/htsearch) [/SERVER/httpd/cgi-bin/htsearch]: where ht://Dig's docarc related files should go (eg. /srv/www/htdig/docarc) []: /SERVER/httpd/htdig/docarc ht://Dig's common directory (where the dictionaries etc. are, eg. /srv/www/htdig/common) []: /SERVER/httpd/htdig/common docarc's ht://Dig user id (max. 8 characters) [htdig]: docarc's ht://Dig user password : password3 local hostname (eg. this.host.org) [ed004]: Configure Document Archive (create admin user) admin's docarc user id (max. 8 characters) [admin]: admin's password (max. 8 characters) : password4 admin's firstname []: Konrad admin's lastname []: Kieling admin's email address []: kkieling@users.sourceforge.net user id for public access to docarc [public]: Storing configuration into ~/.docarc ... Templates parsing and files creation ... Directory creation and copying of files ... database creation ... The Document Archive is installed. Installation steps: + config (Configuration by user input): run, succeeded + templates (Templates parsing and files creation): run, succeeded + copy (Directory creation and copying of files): run, succeeded + db (database creation): run, succeeded Ensure that Apache allows Basic Authentication for the docarc cgi-bin directory. If that's not the case you can enable it with the lines <Directory "/SERVER/httpd/cgi-bin/docarc"> AllowOverride AuthConfig </Directory> in your apache configuration files. ht://Dig configuration has been installed at "/docarc.conf" To index your Document Archive you have run htdig -c /docarc.conf htmerge -c /docarc.conf This can be done either manually or you put it in your crontab. For having ht://Dig search through Portable Document Files (PDF) or PostScript (PS) files, you need to install xpdf (see online documentation). To uninstall Document Archive (eg. if you want to get rid of it before installing new versions) you have to + drop the MySQL database "docarc" + delete the MySQL user "docarc" + remove the directories /SERVER/httpd/htdocs/docarc and /SERVER/httpd/cgi-bin/docarc and /SERVER/httpd/htdig/docarc This text will be saved as "install.txt".
All this information (except the passwords) get stored in ~/.docarc to be available when updating or reinstalling Document Archive. When encountering already existing files, install.pl will ask you for the fate of them. During installation new subdirectories in the Apache's cgi-bin and htdocs directories are created, containing scripts, templates and static content.
If anything goes wrong, install.pl will inform you about that. These errors may be caused by some misconfiguration of some other software (eg. MySQL) or some incorrect inputs. To let install.pl do only some of the installation steps (config, templates, copy and db) again (eg. if database creation was successfull, you won't have to do it again), you have to run it with the appropriate steps as command line arguments.
After installation the directory with the install.pl may be deleted.
When updating and the newer version has a changed database structure, you have two choices to save your documents:
Backup your data. This may be done with the command line interface. Assuming you have installed it and configured it properly via environment variables, the following two commands store the database in the version independent .bib representation (into backup.bib and the corresponding documents in the subdirectory backup) and save the categories (into backup.cat):
docarc -d -r backup fetch backup.bib '*' docarc cfetch > backup.cat
Category saving (second line) works only for Document Archive versions higher than or equal to 0.9.3. Of course you have to install the commandline interface of the new Document Archive version after update. The corresponding restoring commands would look like (run from the same directory as the backup):
docarc -r backup add backup.bib docarc cset backup.cat
To use this way you should delete the documents directory of your old Document Archive installation after backup and let install.pl delete the MySQL database before creating the new one.
This procedure does not save any changes you may have done to the document type/fields structure. It's also critical if you have not given the documents a BibTeX id because then you used the document number for citations. Since the document numbers don't have to be the same after the backup/restore procedure, you better use real BibTeX ids.
Due to not changing database structure every version just look into CHANGES if there were any changes and you had to backup your data.
install.pl will ask for the user and group ids, Apache is run under. It needs to know them because when the CGI script tries to access certain files and directories to store it's data and configuration, it only succeeds when having set the right owner ids.
install.pl assumes that Apache allows per directory overriding of auhtorization configuration by using .htaccess files. If you are not asked for a password when opening browsing Document Archive's page, you either have to rename the installed .htaccess files (there are three of them) to whatever the directive AccessFileName is set. Or you have to change or append <Directory> sections that allow the overriding for the docarc's cgi-bin directory to your Apache configuration files. This section may look like (Apache 2.0)
AccessFileName .htaccess <Directory "/srv/www/cgi-bin/docarc"> AllowOverride AuthConfig Limit </Directory>
where /srv/www/cgi-bin/docarc should be replaced with docarc's cgi-bin directory. For additional information on how to configure Apache please see the Apache Documentation Project on Authorization (1.3) or CGI scripts (1.3).
To let install.pl create the necessary MySQL database and the tables within there, there should exist a MySQL user that has the rights to create databases, tables and new users. This user must have the rights to access the MySQL server from the machine you are installing Document Archive on. This is the MySQL superuser, install.pl asks for. After installation this user won't be used by Document Archive, the password will not be stored.
docarc's MySQL user is the one that will be created during installation. When running Document Archive it will be used for all access to the database. After this user is created a manual restart of MySQL may be necessary.
The CGI script uses some Perl libraries that have to be installed. Some of them are better known than others and included in many Linux distributions by default. That's why there are some Perl modules in docarc's directory. If you want to install them globally on your system you may download and install them and delete the appropriate files in the docarc directory. The bundled packages are
Package | Files and Directories | Homepage |
---|---|---|
CGI.pm-3.04 | CGI.pm, CGI | http://search.cpan.org/~lds/CGI.pm/ |
CGI-Application-3.21 | CGI/Application.pm, CGI/Application | http://search.cpan.org/~markstos/CGI-Application/ |
Config-General-2.24 | Config | http://search.cpan.org/~tlinden/Config-General/ |
Since docarc's command line interface also uses the last one you should also install it onto the client machines.
MySQL access is done with the following packages which are not contained in the docarc package:
Package | Homepage |
---|---|
DBI | http://search.cpan.org/~timb/DBI/ |
DBD-mysql | http://search.cpan.org/~rudy/DBD-mysql/ |
Since Document Archive version 0.9.4 integration of the ht://Dig search engine allows easy access to fulltext search. install.pl generates a configuration file for ht://Dig and creates a htdig user account. So the index creation can simply be started by running
htdig -c configfile htmerge -c configfile
This can be done manually or you run it automatically for example as a cronjob.
During index creation ht://Dig downloads all document files within your Document Archive. Since these are probably not plain text they have to be converted using apropiate conversion utilities. The automatically generated configuration recognizes the most common file formats, namely .ps and .pdf. Conversion of these documents is done using pstotext (from here or here) and pdftotext from the xpdf package. On conversion rules for other document formats read more about that topic in ht://Dig configuration file manual, ht://Dig FAQ 4.8 and ht://Dig FAQ 4.9.
the Document Archive is © Konrad Kieling, 2004 |