Document Archive (docarc) is a database written in Perl to help you (and your workgroup) keeping track of
the vast amount of electronic documents and BibTeX entries you might have in
use. The underlying database is MySQL (because it's addressed through Perl's
DBI, it might work with others as well).
It's meant to be an alternative to .bib files and a complicated directory
structure of document files. Although it's a web-based application command
line frontends to the most important features have been developed. Some work on
integration into the great browser suites
Firefox has also been done.
Live Demo (last updated 2005-02-02)
For those who want to look around a live demo is available here.
Username / password combinations are available for all three user groups (no documents may be added as guest):
Please do NOT edit the fields / doctype structure. Password change is disabled, so you also can't create users. Due to restrictions of my provider,
ht://Dig powered fulltext search is not available.
Access to the public part of the live demo is possible here.
Public installations of Document Archive users are listed here.
You may download the current version (0.9.4) at
SourceForge.net Project Page.
Known bugs and patches for v0.9.4 can be viewed here.
Since the Mozilla plugin was renamed you have to delete the already installed plugin first (if version < 0.9.2)
before updating it.
- FAQ [ps.gz]
- Installation [ps.gz]
- Templates [ps.gz]
- Internal Structure
- Database [ps.gz]
This is a living document and I appreciate any questions (email or a
forum, or any of the trackers
Feature Requests from the
project page) regarding the installation process, configuration, or any other part that should be explained
in detail. If you want to know or comment what is beeing implemented at the very time, have a look at the
- arxiv layout has changed. a set of all current input filters can be downloaded
here. replace the file parse_bibtex.pm and the directory parse_bibtex in your cgi-bin/modules directory.
- new input filters for Institute of Physics journals, Science Direct journals
- new document type problemset for publishing exercise sheets to your students
- a new prx input filter has been written to compensate for changes in prl/pra/... layout. download prx.pl
and replace the old one in the modules/parse_bibtex directory
- completely rewritten almost everything. new features are
- multiple files can be attached to a bibtex entry
- document type of existing entries can be altered
- speed improvement
- pagewise browsing through category view of search result (ie. offset and limit are working)
- more comfortable edit the doctypes/fields structure
- user space metadata can be attached to any document (comments, private bibtexids or categories)
- speeded up search a little by optimizing sql queries
- introduced cli command upload to put a document file for a specified document into the database (eg. for use in makefiles to update your
- 1x1 document description fields are now represented by two radio buttons rather than a checkbox
- Content-Disposition header ensures correct filenames when downloading documents
- some pages now have HTTP Last-modified header so ht://Dig
does not need to parse unchanged documents
- fixed database and style bugs and standards conformance
- after new entry submission user will be redirected to the corresponding entry view (in case he wants to add it to categories or file any meta-data)
- widely configurable
- charset recode (bibtex, html, rdf)
- tree structure for content classification
- multiuser containers may be used to group project oriented documents
- public runmode allows readonly access to allowed documents for anyone without authorization
- fully themeable
- traditional (standard) theme
- fully css-configurable per central stylesheet for nearly every class of
- user customizable (some options that influence the handling)
- strictly xml conform (uses doctype xhtml 1.0 transitional)
- optimized for mozilla, but also runs under recent versions of
other browsers. it's not planned to support obsolete browser versions because
i'd like to concentrate on standards like xml,
css and dom.
- simple layout that returns special pages for your internal search engine (eg. htdig)
- dynamic online help
- fields/doctypes structures and categories configurable via web interface
- command line interface (cli) for simple access during latex document compiling
- internal database search lets you specify complex search expressions
- fulltext search through document files available via integrated ht://dig support
- modular extensible. available extensions:
- frame and iframe based extensions of the traditional theme
- planned: scan2pdf interface (tiff/bmp/png/jpeg -> pdf)
- planned: content extraction (just have to upload a pdf file, database
extracts the necessary content)
- planned: printing and format conversion capabilities (ghostscript, xpdf,
- planned: email support (send registered users bibtex entries or document
- planned: documentation of template interface
- planned: little integration into some famous editors (eg. vi,
- planned: more browser integration (eg. internet explorer,
- mozilla/firefox integration (plugin and search engine)
- direct access to documents via toolbar
- select html embedded bibtex and post it directly to docarc via context menu
- let docarc parse the currently viewed web-page (many famous e-journals are supported)
- query document archive from sidebar or url input field