Installation

Requirements

The most important programs that have to be installed are

Apache
Perl
MySQL
ht://Dig (optional)
xpdf (optional)
pstotext from here or here (optional)

All the required programs should be available in common Linux distributions. However, if you do not have them and do not like to compile them from source (available via the apropriate homepage), have a look at the RPM search engines (rpm.pbone.net or rpmseek.com), wether a prebuilt package for your architecture is available.

Further requirements can be found in the section about Perl.

The Installer

After unpacking the Document Archive gzipped tarball somewhere the install.pl should be run from console (fewest problems when running as root) to copy the files and configure docarc and the MySQL database. To do so the installer needs to know some facts about your MySQL and Apache installations. A sample session may look like


Install Document Archive v0.9.4
-------------------------------

Configuration by user input ...

HTTP-Server configuration
  Document Root of your webserver (eg. /srv/www/htdocs) []: /SERVER/httpd/htdocs
  subdirectory where to store images, .css data and other static content (relative to Document Root) [docarc]:
  cgi-bin directory of your webserver (eg. /srv/www/cgi-bin) []: /SERVER/httpd/cgi-bin
  subdirectory where to store the scripts (relative to cgi-bin) [docarc]:
  user id the apache server is run under [wwwrun]:
  group id the apache server is run under [www]:

MySQL configuration
  docarc's MySQL user (will be created) [docarc]:
  docarc's user's MySQL password : password1
  MySQL host [localhost]:
  MySQL superuser (the one that is allowed to create databases etc.) [root]:
  MySQL superuser's password : password2
  if MySQL runs on another server: how the http server is called from there [localhost]:
  docarc's MySQL database (will be created) [docarc]:

optional ht://Dig search engine integration
  is ht://Dig installed and you want to use it for full text search? [y]:
  complete absolute path to htsearch binary (eg. /srv/www/cgi-bin/htsearch) [/SERVER/httpd/cgi-bin/htsearch]:
  where ht://Dig's docarc related files should go (eg. /srv/www/htdig/docarc) []: /SERVER/httpd/htdig/docarc
  ht://Dig's common directory (where the dictionaries etc. are, eg. /srv/www/htdig/common) []: /SERVER/httpd/htdig/common
  docarc's ht://Dig user id (max. 8 characters) [htdig]:
  docarc's ht://Dig user password : password3
  local hostname (eg. this.host.org) [ed004]:

Configure Document Archive (create admin user)
  admin's docarc user id (max. 8 characters) [admin]:
  admin's password (max. 8 characters) : password4
  admin's firstname []: Konrad
  admin's lastname []: Kieling
  admin's email address []: kkieling@users.sourceforge.net
  user id for public access to docarc [public]: 

Storing configuration into ~/.docarc ...

Templates parsing and files creation ...

Directory creation and copying of files ...

database creation ...

The Document Archive is installed.

Installation steps:
  + config (Configuration by user input):  run, succeeded
  + templates (Templates parsing and files creation):  run, succeeded
  + copy (Directory creation and copying of files):  run, succeeded
  + db (database creation):  run, succeeded



Ensure that Apache allows Basic Authentication for the docarc cgi-bin
directory. If that's not the case you can enable it with the lines

  <Directory "/SERVER/httpd/cgi-bin/docarc">
          AllowOverride AuthConfig
  </Directory>

in your apache configuration files.

ht://Dig configuration has been installed at "/docarc.conf"
To index your Document Archive you have run
  htdig -c /docarc.conf
  htmerge -c /docarc.conf
This can be done either manually or you put it in your crontab. For having
ht://Dig search through Portable Document Files (PDF) or PostScript (PS) files,
you need to install xpdf (see online documentation).


To uninstall Document Archive (eg. if you want to get rid of it before
installing new versions) you have to
  + drop the MySQL database "docarc"
  + delete the MySQL user "docarc"
  + remove the directories
      /SERVER/httpd/htdocs/docarc
    and
      /SERVER/httpd/cgi-bin/docarc
    and
      /SERVER/httpd/htdig/docarc

This text will be saved as "install.txt".

All this information (except the passwords) get stored in ~/.docarc to be available when updating or reinstalling Document Archive. When encountering already existing files, install.pl will ask you for the fate of them. During installation new subdirectories in the Apache's cgi-bin and htdocs directories are created, containing scripts, templates and static content.

If anything goes wrong, install.pl will inform you about that. These errors may be caused by some misconfiguration of some other software (eg. MySQL) or some incorrect inputs. To let install.pl do only some of the installation steps (config, templates, copy and db) again (eg. if database creation was successfull, you won't have to do it again), you have to run it with the appropriate steps as command line arguments.

After installation the directory with the install.pl may be deleted.

Update

When updating and the newer version has a changed database structure, you have two choices to save your documents:

Trust install.pl which will ask you for database update and let it do the changes. It knows about the database structures of older versions and tries to update the tables. In this case no new datasets are inserted (like new document types or fields), only structural changes are done.
Backup your data. This may be done with the command line interface. Assuming you have installed it and configured it properly via environment variables, the following two commands store the database in the version independent .bib representation (into backup.bib and the corresponding documents in the subdirectory backup) and save the categories (into backup.cat):
```
docarc -d -r backup fetch backup.bib '*'
docarc cfetch > backup.cat
      
```
Category saving (second line) works only for Document Archive versions higher than or equal to 0.9.3. Of course you have to install the commandline interface of the new Document Archive version after update. The corresponding restoring commands would look like (run from the same directory as the backup):
```
docarc -r backup add backup.bib
docarc cset backup.cat
      
```
To use this way you should delete the documents directory of your old Document Archive installation after backup and let install.pl delete the MySQL database before creating the new one.

This procedure does not save any changes you may have done to the document type/fields structure. It's also critical if you have not given the documents a BibTeX id because then you used the document number for citations. Since the document numbers don't have to be the same after the backup/restore procedure, you better use real BibTeX ids.

Due to not changing database structure every version just look into CHANGES if there were any changes and you had to backup your data.

Apache

install.pl will ask for the user and group ids, Apache is run under. It needs to know them because when the CGI script tries to access certain files and directories to store it's data and configuration, it only succeeds when having set the right owner ids.

install.pl assumes that Apache allows per directory overriding of auhtorization configuration by using .htaccess files. If you are not asked for a password when opening browsing Document Archive's page, you either have to rename the installed .htaccess files (there are three of them) to whatever the directive AccessFileName is set. Or you have to change or append <Directory> sections that allow the overriding for the docarc's cgi-bin directory to your Apache configuration files. This section may look like (Apache 2.0)

AccessFileName .htaccess
<Directory "/srv/www/cgi-bin/docarc">
  AllowOverride AuthConfig Limit
</Directory>

where /srv/www/cgi-bin/docarc should be replaced with docarc's cgi-bin directory. For additional information on how to configure Apache please see the Apache Documentation Project on Authorization (1.3) or CGI scripts (1.3).

MySQL

To let install.pl create the necessary MySQL database and the tables within there, there should exist a MySQL user that has the rights to create databases, tables and new users. This user must have the rights to access the MySQL server from the machine you are installing Document Archive on. This is the MySQL superuser, install.pl asks for. After installation this user won't be used by Document Archive, the password will not be stored.

docarc's MySQL user is the one that will be created during installation. When running Document Archive it will be used for all access to the database. After this user is created a manual restart of MySQL may be necessary.

Perl

The CGI script uses some Perl libraries that have to be installed. Some of them are better known than others and included in many Linux distributions by default. That's why there are some Perl modules in docarc's directory. If you want to install them globally on your system you may download and install them and delete the appropriate files in the docarc directory. The bundled packages are

Package	Files and Directories	Homepage
CGI.pm-3.04	CGI.pm, CGI	http://search.cpan.org/~lds/CGI.pm/
CGI-Application-3.21	CGI/Application.pm, CGI/Application	http://search.cpan.org/~markstos/CGI-Application/
Config-General-2.24	Config	http://search.cpan.org/~tlinden/Config-General/

Since docarc's command line interface also uses the last one you should also install it onto the client machines.

MySQL access is done with the following packages which are not contained in the docarc package:

Package	Homepage
DBI	http://search.cpan.org/~timb/DBI/
DBD-mysql	http://search.cpan.org/~rudy/DBD-mysql/

ht://Dig

Since Document Archive version 0.9.4 integration of the ht://Dig search engine allows easy access to fulltext search. install.pl generates a configuration file for ht://Dig and creates a htdig user account. So the index creation can simply be started by running

htdig -c configfile
htmerge -c configfile

This can be done manually or you run it automatically for example as a cronjob.

During index creation ht://Dig downloads all document files within your Document Archive. Since these are probably not plain text they have to be converted using apropiate conversion utilities. The automatically generated configuration recognizes the most common file formats, namely .ps and .pdf. Conversion of these documents is done using pstotext (from here or here) and pdftotext from the xpdf package. On conversion rules for other document formats read more about that topic in ht://Dig configuration file manual, ht://Dig FAQ 4.8 and ht://Dig FAQ 4.9.