Coordinamento Servizi Informatici Bibliotecari di Ateneo  
Universitā degli Studi di Lecce

IV SEMINARIO
SISTEMA INFORMATIVO NAZIONALE PER LA MATEMATICA
SINM 2000 : un modello di sistema informativo nazionale per aree disciplinari
Lecce, Lunedì 2 ottobre 2000, ore 16.10

BERND WEGNER
The ERAM (Electronic Research Archive for Mathematics) project
[PowerPoint Slides]
[versione italiana]


  Abstract

Longevity is typical for research achievements in mathematics. Hence to improve the availability of the classical publications in that area and to enable to get quick information on these, electronic literature information services and digital archives of the complete texts will be needed as important tools for the mathematical research in the future. This will bring the holdings from the journals archives nearer to the user and will prevent lost of the papers because of the deterioration of the paper as a consequence of age.

The aim of this article is to give a more detailed and updated report of a project, the so-called ERAM project, capturing the "Jahrbuch" as a classical bibliographic service in mathematics in a database and using this activity to select important publications from the Jahrbuch period for digitisation and storage in a digital archive. The database will not be just a copy of the printed bibliography. It will contain a lot of enhancements like modern subject classifications as far as possible, keywords giving ideas about the content in modern terms and comments relating classical results to modern mathematical research areas. These features will remain open for additions within a living project. The digital archive built up in connection with the database will be linked to the database and provide all facilities associated with current digitisation projects. The content will be distributed to mirrors and combined with similar archiving activities in mathematics.

  Introduction

In order to understand the importance of the availability of classical mathematical publications for current research in mathematics some remarks concerning the interest of mathematicians in these publications should be made at the beginning of this article. While generally in science the interest in the author's personal views of a topic is the main motivation to consult a publication, mathematicians are looking for precise results like statements and proofs of theorems. Any reliable source to find these theorems will be sufficient for them. But a lot of these results are only stated at one place, the paper where they have been published first. Even if these results have not been reproduced in monographs or surveys later on, this does not indicate that they became obsolete. Mathematical knowledge does not age.

In addition to these results special aspects of their proofs may be of later interest. This requires to have easy access to these documents, no matter what their age and state are. Searchability will be an additional requirement to enable the researcher to find his way in the huge knowledge base of mathematical achievements. Admittedly, no current search engine is able to locate a statement in its abstract meaning. Names for some of them will help, and classification codes of special subject areas will restrict the set of documents where to look for the desired information considerably. Hence literature databases for the classical period of mathematics are desirable. They should offer the same facilities like the current literature information services in mathematics, and even more, they should also provide links to the future given by modern mathematics. This is the starting point for the project ERAM which also will be called the Jahrbuch-project for short.

The acronym ERAM stands for "Electronic Research Archive for Mathematics". The project is funded by the Deutsche Forschungsgemeinschaft (DFG). The institutions caring about the project are the Staats- und Universitätsbibliothek Göttingen (SUB) and the Technische Universität Berlin (TUB). Supervisors of both parts of the project are Prof. Dr. Elmar Mittler (SUB) and the author of this article (TUB). The official title of the project at DFG represents the programme already and may be translated as follows: Installation of a (digital) archive of articles relevant for mathematical research, full searchability and access through a database, captured from the "Jahrbuch über die Fortschritte der Mathematik" (1868-1943). This database needs special care and supervision by scientists who have some experience with literature databases in mathematics. Hence this part of the project is guided by Professor Dr. Keith Dennis (Cornell University, previously editor-in-chief of Mathematical Reviews) and the author of this article who is also editor-in-chief of the most comprehensive current literature database in mathematics, Zentralblatt MATH.

Being the scientifically more challenging part of the project, main attention is paid to the database in this article. Nevertheless, some hard work has to be invested into the installation of the digital archive. But the unpredictable part of this business are the negotiations with publishers to get the licenses for offering the content of the archive at a low rate only, which may cover the costs for the maintenance of the archive, not those of the installation.


  About the Jahrbuch

The Jahrbuch über die Fortschritte der Mathematik (JFM) was founded in 1868 by the mathematicians Carl Ohrtmann and Felix Müller. The authors formulated the aim of this service in the preamble of the first issue which freely translated reads as follows: "Our aim is on the one hand to provide a tool for those, who are not able to follow all publications on the comprehensive field of mathematics. This should allow them to obtain a general overview of the development of the mathematical science. On the other hand it should help the active scientist to find out known results in mathematics". The publisher of the JFM was de Gruyter, from the start to the end. They were so kind to give ERAM the licence to capture all the content of JFM in a database and to make this freely available in the web. The scientific supervision was the task of the Prussian Academy of Sciences.

The JFM appeared in 68 issues from 1868 - 1943. Commonly one issue contained all the reports of mathematical papers which appeared in the year mentioned on the issue, but some issues contained more than one year. More than 300.000 mathematical publications in the above period were reviewed by the JFM. The edition of the JFM terminated during the Second World War. It never had been taken up again afterwards. Since 1931 it had to compete with Zentralblatt für Mathematik, published by Springer-Verlag. The particular method of the JFM to appear only, when all reports on the publications of the given year were available, lead to a big backlog of the JFM. In some cases the user had to wait about four years until he could get the information he was interested in. Zentralblatt published all the reviews available. In a period of growing interest in mathematics and increased research activities in mathematics, the method of Zentralblatt was more in accordance with the needs of the users. Hence, if a service could be revived after the Second World War, the better chances were with Zentralblatt, and Zentralblatt could recover from these damages rather soon.

Nevertheless, thinking of the needs of ERAM the JFM is a perfect source to build up the electronic gateway to the digital archive. After such a long time the delay of the JFM does not matter at all, and the precision and completeness of the data for the corresponding year is at a level, which cannot achieved anymore by the current reviewing services in mathematics, which have to deal with an abundance of papers compared to what the JFM was confronted with. One disadvantage of the JFM is, that in contrast to Zentralblatt all reviews are in German. But the period of overlap of both services is comparatively small (1931 to 1943). Hence there is a long period were nothing else than the JFM was available, if comprehensive documentations of mathematical research were considered.

There will be an ongoing discussion, if the German reviews should be translated into English. So far I rejected all these proposals, because the difficulties are two-fold. The mathematics of the JFM-period had a quite different representation from what is published nowadays, and so are the reviews. Translation into English will lead another deformation of the content. Even the translation of current reviews from German/French into English may be misleading, because normally reviewers only use their native language instead of English, if they want to express something extraordinary. Hence, the contents of an article preferably should be described by English keywords in order to give those an idea about the content of the work who cannot read German. Future generations may have another opinion on this, but then it will be their task to care about the work for the translations.


  Structure and enhancements of the JFM-database

The first step of the ERAM-project is the production of a bibliographic database, the JFM-database, capturing the content of the Jahrbuch über die Fortschritte der Mathematik (JFM). As a matter of principle the JFM-database should be accessible with the same search software like the database Zentralblatt MATH. Hence it should provide a similar structure of fieldings like Zentralblatt and it should implement the same bibliographic standards. But it will not suffice just to bring the content of the JFM in an electronic form. Modern literature databases provide several search options for which the information could not easily be extracted from the text of the JFM. Editorial enhancements will be needed, and moreover historical links should be provide to modern research as far as possible.

At first the contents of the fields should be described which could be taken from the text of JFM directly or derived from it. They start with the accession number which is important for the identification of the review. It is the key for the retrieval of the review in the printed version and for the internal and external links. Obviously there should be a field for the author(s) of a publication and another one for its title. Supplements and further forms of the names can be found in a separate field. The syntax of the presentation of a name is the same as in the database of Zentralblatt. A special convention is made for collected works by taking as author's name that of the author of the collected works. This field contains further forms of the name of an author. An example for different name occurring in the JFM: V. Stekloff and W. Stekloff. The field for the title contains the original information about the title as given in the JFM. This may be the original title of the paper, a translation of the title or both.

An important item for reducing the range for a search is given in the field for the publication year. This is the year of appearance of the publications reviewed by the JFM. As mentioned above there was some delay between the delivery of the original publication and the appearance of a review in the JFM, normally two or three years, but also up to 7 years after the First World War. Up to some exceptions, each volume of the JFM contained reviews for all publications which have appeared in the same year. Hence there is a strong relation between the publication year and the number of a JFM-volume. The source of a publication is an important field for the hit list. Usually sources are abbreviated. Every volume of the JFM had a chapter expanding the abbreviations for sources. These abbreviations changed from time to time. Hence there was the need to introduce a new field, called "source normalised" which gives the full name of the journal and possible modifications over the time. In the field "review, abstract of the publication" the user will get the review as printed in the JFM. In principle the reviews were not anonymous. Every review was accompanied by the name or an abbreviation of the name of the reviewer. This information also is stored in a separate field.

The only formalised subject information in the JFM consists of the subject headings which are stored in the database like a raw classification. They are organised in three levels and they had changed over the time. Sometimes subject headings end with references to other relevant publications which are reviewed elsewhere in the JFM. These "standalone" links can be used as additional subject information for the publications mentioned under the links. But a more precise description of their subjects could only obtained by additional intellectual work as described below.

This has to be invested in the enhancement of the data captured from the JFM. This task has two parts, one relying on the support of mathematical experts and the other one involving the additional editing of the data by librarians. More than 150 mathematicians from all over the world collaborate with the project as voluntary experts. The experts should enhance the raw data so far available in the following sense: They should provide an English translation of the title of the single document, they should add a subject classification according to MSC2000 and assign some English keywords, and they should give some impression on the importance of the documents handled for the development of mathematics.

The keywords should characterise the contents of the article as far as they can be recognised from its title and the review. The keywords are uncontrolled, i.e. there is no standard vocabulary (thesaurus) listing the admissible keywords. The subject classification will be the only standardised information on the content of the paper. But this is comparatively coarse, and it divides the current research activities in mathematics into subject areas which in many cases will not fit with the corresponding divisions of mathematics during the JFM-period. Hence assignment of MSC-codes will not work at the same level of precision for the JFM-database as it does for current databases like Zentralblatt MATH.

In addition to this there will be an evaluation of the publications providing some ranking according to their importance. This ranking can be a superficial one only. Its only purpose is to decide on the relevance of the distinct documents for the digitisation of the full text. We have three classes in mind: publications of significant importance for the further development of mathematics, important publications, and all other publications. The first two classes should not contain more than 20% of the papers covered by the JFM. This is just to remain within the budget limits for the project. The evaluation will be decisive for the selection of the work to be stored in the digital archive.

More detailed updated information for the papers will be found under "expert's remarks". The expert may use this field for an annotation to the publication of any kind. This could be a verbal description of the importance of the publication or the reference to other developments in mathematics, for example. We hope that in particular those classical papers which initiated a whole series of research papers or even a new subject area in mathematics can get a special representation in the JFM database. Going back and forth with links that may enable the user to find relations between current publications having the same historical root without being part of the same modern mathematical subject. The content of this field will depend on the knowledge of the expert handling the corresponding document, and in general such remarks initially only will be found with a few papers in the database.

Another set of enhancements of the JFM-database will be the result of the editing of data by librarians. They will care about the standardisation of the information available and the provision of links to digital versions of the corresponding article or a library where the article can be ordered using a document delivery service. For example, the reviews in JFM cite sometimes former publications which have received an identifier within the JFM-database. These references will be transformed in executable hyperlinks. URLs of a digitised form of the original publication are specified, and the signatures of the source of the original publication at the SUB are added. SUB provides a document delivery service for these publications.

The standardisation cares about normalised sources. Here additional names of the sources are given like for journals with varying titles of journals in order to provide standard forms of the journal titles and the corresponding abbreviations. Additional efforts are spent for the identification of authors' names. In contrast to the titles of journals here only improvements could be reached and perfection with respect to authors' identification needs immense efforts and is beyond what can be paid from the budget of the project.


  The digital archive

In addition to its usage as high-quality source for information on classical mathematics, the JFM-database will provide access to the digital archive to be built up within the project. This archive should contain most of the relevant mathematical publications from the period of the JFM (1868 - 1943). A principal rough estimate of the amount of documents which could be covered by ERAM gave a rate of about 20 % of the total amount of mathematical publications from the JFM-period or 1.200.000 pages in absolute numbers. The publications are scanned (as gif-images) and stored in a document management system. At the start of the project there is no conversion of the images into text files. To allow text searches in the archive, text files will be an important addition to the scanned images later on.

The content of the archive are stored at SUB and made accessible there. Direct access is available under the URL http://www-gdz.sub.uni-goettingen.de/agora/html/docs/ Co-operation with other digital archives like mirroring of content and exchange of documents will be a matter of future discussions. A search in the JFM-database is the easiest way for accessing the archive. With the hit list of his search the user will get hypertext links to the document, if it is available in the digital archive, or to the order form of the document delivery service, if the document is available in the SUB-holdings at all.

The selection of the documents is based on the recommendations from the JFM-experts in principle. But this turned out to become a bottle neck of the project. Moreover, in several cases scanning distributed patches from journals and other sources has shown to be less efficient than scanning an issue of a journal as a whole. This lead to the conclusion that additional selection methods should be applied, without ignoring the advice from the experts at all. In addition to this the restriction of the selection to the period of the JFM is rather artificial, though this had been an important range to start with. The convenient access through the JFM-database will remain an important feature.

Hence the supervisors of the project started to add recommendations for documents to be scanned on their own. This includes some collected works and classical handbooks, monographs which are difficult to find in library holdings, mathematical doctoral theses, publications on paper with bad preservation properties, whole series of journal volumes etc. For example, we have the licence to put digital versions of the "Mathematische Annalen" into the archive, covering all the back volumes until those which appeared in the 50's. Some of the journals which have installed recent electronic versions in EMIS (European Mathematical Information Service) agreed that all of their print-only back volumes could be digitised and offered within ERAM. For example the back volumes from "Beiträge zur Algebra und Geometrie" may be found there.

Obviously, this part of the project involves additional efforts going beyond the technical work for scanning and storage of the documents. Having in mind that the access to the digital archive should be free up to a comparatively small administrational fee, a lot of offers from the archive need the agreement from the publisher resp. author that they may be handled in this way by ERAM. This involves quite troublesome negotiations with publishers and authors, where in particular publishers have their own ideas about the benefits they may get from the installation of digital versions of their publications. This automatically leads to a reduction of the set of documents available in the archive in comparison with what would have been desirable.


  The current offer of ERAM

The enhanced data captured from the JFM will be the final content of the WWW offer of the JFM-database. In contrast to the articles in the archive the input for the database will not be scanned, but it will be keyboarded according to the structure mentioned above. It has been decided to offer the data in the web at two levels. The first one will consist of the raw data, which just come up from the keyboarding without having been edited. On the second level the data will have been edited and enhanced by the experts. Furthermore, the database will remain open for additional enhancements, because in particular the section for comments may get important input permanently from other mathematicians who do not belong to the current panel of experts.

According to hints from the users of a first test version, the access even to the raw data seems to be an important facility for the mathematical community. Hence everything which has been put into the system is made freely accessible under the URL http://www.emis.de/projects/ clicking on the box for the Jahrbuch. Users are alerted that the content is in a preliminary state only in many cases. About 70% of the content of the JFM is stored in the system now, starting with the first issue and approaching soon the time where JFM and Zentralblatt MATH appeared in parallel. Most of the items had undergone the internal editing already, but for many of them the enhancements from the experts are missing. The enhancements seem to be a longer procedure, because it is difficult to find many volunteers to care about them, and the working capacity of the current experts is not very big due to their many other commitments.

The accessibility to first parts of the archive has been arranged, an extension is under preparation. The latter partially waited for having a critical set of contents in the JFM-database. But also the structures for the public access had to be discussed and installed. Hence the archive has more documents available than can be seen from outside, and the content is growing permanently. Also discussions have been initiated to link the JFM-database with other digital offers of documents from the JFM-period and to exchange documents with them. The goal is to establish a distributed system of digital archives for mathematics, for which the JFM-database can be used as the most convenient gateway.


Professor Dr. Bernd Wegner
Scientific Coordinator of EMIS
Fachbereich Mathematik, Technische Universitat Berlin
Strae des 17. Juni 135, D-10623 Berlin, Germany


Programma IV SINM
Seminari SINM
Home Page SINM
Home Page SIBA