Document Archive

Document Archive (docarc) is a database written in Perl to help you (and your workgroup) keeping track of the vast amount of electronic documents and BibTeX entries you might have in use. The underlying database is MySQL (because it's addressed through Perl's DBI, it might work with others as well).

It's meant to be an alternative to .bib files and a complicated directory structure of document files. Although it's a web-based application command line frontends to the most important features have been developed. Some work on integration into the great browser suites Mozilla and Firefox has also been done.

Live Demo (last updated 2005-02-02)

For those who want to look around a live demo is available here. Username / password combinations are available for all three user groups (no documents may be added as guest):

    admin:admin,
    user:user,
    guest:guest
  

Please do NOT edit the fields / doctype structure. Password change is disabled, so you also can't create users. Due to restrictions of my provider, ht://Dig powered fulltext search is not available.

Access to the public part of the live demo is possible here. Public installations of Document Archive users are listed here.

Download

You may download the current version (0.9.4) at SourceForge.net Project Page.

Known bugs and patches for v0.9.4 can be viewed here.

Since the Mozilla plugin was renamed you have to delete the already installed plugin first (if version < 0.9.2) before updating it.

Documentation

FAQ [ps.gz]
Installation [ps.gz]
Configuration
1. Templates [ps.gz]
Internal Structure
1. Database [ps.gz]

This is a living document and I appreciate any questions (email or a forum, or any of the trackers Bugs, Support Requests, Patches or Feature Requests from the project page) regarding the installation process, configuration, or any other part that should be explained in detail. If you want to know or comment what is beeing implemented at the very time, have a look at the Task Manager.

Changes (0.9.5)

arxiv layout has changed. a set of all current input filters can be downloaded here. replace the file parse_bibtex.pm and the directory parse_bibtex in your cgi-bin/modules directory.
new input filters for Institute of Physics journals, Science Direct journals and PubMed
new document type problemset for publishing exercise sheets to your students
a new prx input filter has been written to compensate for changes in prl/pra/... layout. download prx.pl here and replace the old one in the modules/parse_bibtex directory
completely rewritten almost everything. new features are
- multiple files can be attached to a bibtex entry
- document type of existing entries can be altered
- speed improvement
- pagewise browsing through category view of search result (ie. offset and limit are working)
- more comfortable edit the doctypes/fields structure
user space metadata can be attached to any document (comments, private bibtexids or categories)
speeded up search a little by optimizing sql queries
introduced cli command upload to put a document file for a specified document into the database (eg. for use in makefiles to update your document entry)
1x1 document description fields are now represented by two radio buttons rather than a checkbox
Content-Disposition header ensures correct filenames when downloading documents
some pages now have HTTP Last-modified header so ht://Dig does not need to parse unchanged documents
fixed database and style bugs and standards conformance
after new entry submission user will be redirected to the corresponding entry view (in case he wants to add it to categories or file any meta-data)

Features

widely configurable
charset recode (bibtex, html, rdf)
tree structure for content classification
multiuser containers may be used to group project oriented documents
public runmode allows readonly access to allowed documents for anyone without authorization
fully themeable
traditional (standard) theme
- fully css-configurable per central stylesheet for nearly every class of elements
- user customizable (some options that influence the handling)
- strictly xml conform (uses doctype xhtml 1.0 transitional)
- optimized for mozilla, but also runs under recent versions of other browsers. it's not planned to support obsolete browser versions because i'd like to concentrate on standards like xml, css and dom.
simple layout that returns special pages for your internal search engine (eg. htdig)
dynamic online help
fields/doctypes structures and categories configurable via web interface
command line interface (cli) for simple access during latex document compiling
internal database search lets you specify complex search expressions
fulltext search through document files available via integrated ht://dig support
modular extensible. available extensions:
- frame and iframe based extensions of the traditional theme
- planned: scan2pdf interface (tiff/bmp/png/jpeg -> pdf)
- planned: content extraction (just have to upload a pdf file, database extracts the necessary content)
- planned: printing and format conversion capabilities (ghostscript, xpdf, lpr, psutils)
- planned: email support (send registered users bibtex entries or document files)
- planned: documentation of template interface
- planned: little integration into some famous editors (eg. vi, emacs, kile)
- planned: more browser integration (eg. internet explorer, opera, konqueror)
mozilla/firefox integration (plugin and search engine)
- direct access to documents via toolbar
- select html embedded bibtex and post it directly to docarc via context menu
- let docarc parse the currently viewed web-page (many famous e-journals are supported)
- query document archive from sidebar or url input field