Report of Working Group
From Online Dictionary of Crystallography
Revision as of 11:44, 17 February 2006 by BrianMcMahon (talk | contribs) (→APPENDIX: Technical considerations)
Contents
Remit
The Dictionary Working Group of the Commission on Crystallographic Nomenclature (CCN) was formed during the 20th IUCr Congress in Florence to provide guidance on the establishment and conduct of a project undertaken under the aegis of the Commission, with the approval of the IUCr Executive Committee and the involvement of other Commissions and appropriate bodies of the IUCr, to provide online definitions of terms used in the practice of crystallography. The remit of the Working Group covered the following topics:
1. Is the project to be and to remain an online project (web URLs)?
Or should a book form be also envisaged in the future (question by Henk Schenk, Chairman of the IUCr/OUP book Committee)?
2. The scientific scope of the project
Broadly speaking, the project should be confined to the subject of crystallography, the area of science over which the IUCr has authority. However, crystallography is used by, and merges with, very many other areas of chemistry, physics, mathematics, materials science, biology, computational data processing, etc.. What criteria should be applied for deciding which terms to include within the project, and which to exclude? (Consider, for example, the detailed descriptors for protein secondary structure included in the mmCIF dictionary. Should the online "crystallography dictionary" include definitions of protein folds, beta sheets etc.?)
- Should the names of compounds (minerals, materials, chemical or biological compounds) be included?
- Should physical concepts be included (entropy, energy, etc.)?
- Should mathematical terms be included? (group properties, tensor properties, etc.)
- Should there be translations of each entry in other languages (French, German, Spanish, Russian, Chinese, Japanese)? See old “red” International Tables as an example.
- Should names of people be included (Bravais, Bragg, Ewald, Laue, etc.) be included?
- Should reference to computer programs be included?
- How should double-word items be included: “neutron interferometry”, “X-ray interferometry”, or “interferometry (neutron)”, “interferometry (X-ray)”?
- Should specialized expressions such as “normalized structure factors” be itemized as such or appear within the definition of “structure factor”? (there are many such examples).
- Should equations be included?
3. The granularity of definitions
What is the appropriate amount of text for each entry in the compilation? (This may determine, among other things, the name of the project - glossary, dictionary, index, thesaurus, encyclopaedia?) From Longman's Dictionary of the English Language:
- glossary: a list of terms (e.g. those used in a particular text or in a specialized field), usually with their meanings
- dictionary: a reference book containing words, usually alphabetically arranged, together with information about them, especially their forms, pronunciations, parts of speech, meanings, origins, grammatical requirements, and idiomatic uses
- index: a guide or list to aid reference: e.g. an alphabetical list of items (e.g. topics or names) treated in a printed work that gives for each item the page number where it appears, or a list of items of a specified type
- thesaurus: 1. a book of words or of information about a particular field or set of concepts; especially a book of words grouped according to their meaning. 2. a list of subject headings or index terms, usually with a cross-reference system for use in the organization of a collection of documents for reference and retrieval
- encyclopaedia: a reference work that contains information on all branches of knowledge or treats comprehensively a specified branch of knowledge, usually in articles arranged in alphabetical order of subjects either in a single list or within each of several large subsections
- compendium: a full list or inventory (Webster), a book containing a list of useful hints (Collins); as an example see the IUPAC gold book (http://gold.zvon.org): a list of names, each one with hyperlinks.
What is the quantity of illustrations to be given (drawings, diagrams, spectra, photographs, etc.)
Should the work be organized in Categories and Subcategories?
4. The level of definitions
Should this be a reference work for authors and referees of IUCr Journals, research professionals, undergraduate students, high-school students, the general public? Can it be designed as a multi-level resource?
5. Delegation of authority and labour
What is an appropriate editorial structure to commission, review and implement definitions? This needs to take into account the involvement of Commissions, the possibility of different educational levels for the completed work, and perhaps some technical aspects of presentation and online editing.
6. Presentation
Some consideration should be given to broad aspects of how the project will be presented: as a single web site, as multiple sites (perhaps appropriate if different educational levels are supported), as a free resource or a potential source of revenue, as a companion to International Tables?
7. Financial implications
The project as currently envisaged in its project definition phase will rely heavily on volunteer scientist labour and existing hardware resources, but there may be a need for editorial honoraria, or other costs that the Working Group can specify, such as hardware, technical editing, secretarial help, etc. The Working Group will report its findings to the Finance Committee.
Medium
The project should be executed initially as solely an online project because of the flexibility of the online medium, the fact that there is no limit on the number of entries, the possibility of hyperlinks to IUCr and other web resources. It is possible to consider at a later stage a physical book with a CD containing all the hyperlinks.
Scientific scope
Terms selected for inclusion should have a clear crystallographic application. Terms from connected disciplines (mathematics, physics, chemistry, mineralogy, biology) should be included insofar as they relate to crystallography, e.g. “crystallographic group”. Names of chemical or biological substances or minerals should not be included at the present stage, but terms such as “albite twin law” should.
The Working Group agrees that translations of terms in other languages than English should be given, but not of their definitions.
Reference to computer programs per se should not be included, but there might be instances when it becomes essential, e.g. SHELX.
Names of people should only be included if they relate to crystallographic concepts, e.g. “Bragg’s law”, “Ewald sphere”.
Double-word items such as “X-ray interferometry” should be entered as such. A search on “interferometry “ will automatically retrieve them.
Equations should be included.
The number of entries is not predetermined. There is no technical limitations requiring this and the project can grow with time. The Working Group will start with a small number of terms in order to get the pilot project operational. The project will then be open the whole Commission on Crystallographic Nomenclature and a number of 500 terms should easily be reached when the web pages are first opened to the public. If the idea is successful, it will probably grow to some thousands of distinct terms over the years.
Granularity of definitions
The Working Group recommends a reference product that is a blend between “dictionary” and “encyclopaedia”, what the French call a “dictionnaire encyclopédique”: a list of terms with short definitions and cross-links to other entries in the work, with at times longer developments. For instance, “reciprocal lattice” will have a short definition and a hyperlink to the corresponding pamphlet on the IUCr web site (open access) and to the appropriate chapter of IT Volume B; the entry “Bragg’s law” should give the law with a drawing and its derivation could be obtained via an appropriate hyperlink.
Links to ComCIF dictionaries should be provided where appropriate.
The work may be structured in several ways to assist navigation. The terms will be entered alphabetically and can be retrieved alphabetically, but the WiKi software allows an ordering in categories (and eventually in subcategories). Each entry can be attached to a category (and a subcategory). These categories could correspond for instance to titles of IT Volumes, but with additional subjects (“mathematical crystallography” etc.); subcategories would correspond to Chapters. A click on a category entry will provide links to all the entries related to that category.
As an example, to get the entry “Grüneisen relations”, one may either click on that term, or click on the Category “Physical Properties”; thzt will provide links to subcategories, one of them being “Thermal expansion”; a click on that subcategory will provide a list of links to all the entries related to that topic, one of them being “Grüneisen relations”. The entry “Grüneisen relations” will give a definition and hyperlinks to Chapters 1.4 and 2.1 of International Tables Volume D.
There are several advantages to having categories and subcategories. One is to allow searches on areas of interest, for instance if you are looking for a particular type of twinning, but don’t remember its exact name. Another one is to make the work preparing the dictionary easier by assigning editors and subeditors to categories and subcategories. Their duty would be to oversee the definitions and to check that there are no obvious omissions.
Audience
The primary goal is to be a reference for authors and referees of IUCr Journals and to research professionals in general: it will give the “official” IUCr acceptance of terms. As such it will also be useful to students and to the general public.
The work forms part of a multi-level resource in the sense that besides the short definition hyperlinks will be provided either to a longer definition or to appropriate existing IUCr resources. It will complement International Tables.
Organization of contributors
The Editorial Board should consist of the members of the CCN, with representatives from the other Commissions as consultants for the various fields of crystallography. It is clear that, as Editors of the various IUCr publications, the members of the CCN are the people whose duty is to say how crystallographic terms should be used. Efficiency requires that the work should be done under the supervision of a Main Editor and Editors (and subeditors) for the various categories (and subcategories), chosen among the CCN members and consultants.
Presentation
It is expected that the resource would appear as a single web site. However, it should also act as a companion to International Tables and to the Journals. As the Online Dictionary of Crystallography would be an important and useful service to the researchers and authors it is desirable that it should be open access, bearing in mind that most definitions will have links to IT Volumes, which are not open access. This last point may incite people to suscribe to International Tables Online.
Financial implications
The project as currently envisaged will rely heavily on volunteer labour and existing hardware resources, but there may be a need for editorial honoraria, or other costs such as hardware, technical editing, secretarial help, etc. to be identified by the Working Group and reported to the Finance Committee.
Timescale
A Pilot Project with about 500 entries should be implemented and guidelines for further development as well as an estimation of its financial implications provided in time for the Finance Committee meeting in 2006 (usually around March).
APPENDIX: Technical considerations
A major goal of the initial pilot project was to identify a software platform capable of supporting collaborative work on an online dictionary by the distributed authorship that the project seems to require. The pilot is not specifically directed towards identifying a dissemination mechanism, but clearly it is helpful to consider tolls that create a version of the dictionary already suitable for public access.
Content management systems
The conventional software platform for managing the collection, editing, revision and publication of a large number of separate items is known as a content management system. Commercial implementations of such systems have been available for decades, and have been distinguished by their high price and complexity of use. The IUCr editorial office has considered such systems in the past (e.g. Texcel Information Manager), but concluded that the high costs and steep learning curves associated with such systems, coupled with their lack of flexibility for the innovative procedures we have developed, have made them less attractive than home-grown systems. Traditional packages were also poorly suited to web use.
More recently, open-source web-based packages such as Bricolage have begun to appear. Bricolage in particular is a system that is under consideration as a basis for collaborative input into the next generation of IUCr public web services. However, it shares many of the drawbacks of older content management systems. It requires heavy investment of time to configure it for a particular organization; there is a significant learning curve for content contributors (authors) to master; it offers rather little in the way of flexibility if one wishes to integrate the managed content with existing material, or with contributions from other sources; and it is rather poorly suited for technical content (especially mathematics). While it remains under consideration for possible future editorial use, it seems too heavyweight for the dictionary project as currently envisaged.
WiKi software
An alternative approach that was put forward at the Florence Congress and enthusiastically received by the members of the Nomenclature Commission present at that meeting was the use of so-called 'WiKis'. A WiKi (from the Hawai'ian for 'quick' or 'fast') is a web-centric content management system designed to be lightweight and encourage rapid development of web sites by a more or less informal collaboration of authors and editors. The public Wikipaedia project demonstrates the possibility to compile a very large compendium of content (at the moment almost a million encyclopaedic entries in the English-language edition, written and edited by tens of thousands of users). Although Wikipaedia encourages the process of authorship, it was felt that (at least for its initial implementation) the Online Dictionary of Crystallography should be seen as the work of expert authors, and that therefore controls should exist on the users able to contribute or edit content. (Indeed, Wikipaedia also has administrative privileges that control access to articles, though by design these are not used routinely.) A requirement therefore was for a lightweight WiKi implementation that had appropriate access control/user management functionality.
MoinMoin
The first package investigated was MoinMoin ([1]), which is already used in the IUCr editorial office for maintaining internal documentation. A pilot MoinMoin implementation was set up in mid-November 2005, and used extensively by Andre Authier (with a small amount of input from John Helliwell). Its advantages were:
- ease of installation
- ease of maintenance (individual entries are stored as files on the hard disk, and can readily be backed up, restored, deleted or moved)
- simple markup and ways of creating internal hyperlinks and links to resources on the Web
- simple access control mechanisms
- relatively easy modification to style sheets (so that only authorised users see the tabs/buttons allowing a page to be edited)
- ability to track all recent changes (essential for the chief editor(s) and system administrator)
- support for categorizing entries and for managing and indexing categories
- page templating
Its disadvantages were considered to be:
- multilingual support (perversely, since a requirement of the Online Dictionary is that it provides access to terms in multiple languages); the main problem was that the software is too well suited for multilingual operation: it recognised that Andre was using a French browser, and displayed the standard system pages and facilities in French. This behaviour would entail translation of all relevant help pages, Introduction and internal labels to French, German and a host of other 'supported' languages.
- lack of support for mathematics
- lack of support for images and graphical illustrations
- limited control over layout of complex pages
The multilingual support was an unexpected problem, and more of a nuisance than a real obstacle. Nevertheless, it would probably involve a significant amount of time to make the resource truly multilingual (note that this does not refer to the articles themselves, which are intended to be in English, but to the descriptive and navigational terms needed for effective use of the site).
We explored the ability to mark up mathematical content. The native markup allowed the creation of italic, bold, subscript and superscript rendering, and the use of Unicode allowed access to many mathematical symbols, but complex maths (e.g. built-up fractions) could not be rendered. The largest single obstacle to progress was the inability to render overbar characters ([math]p\bar1[/math], for example), which significantly impedes progress in descriptions of crystallography!
It was considered possible to write extensions to allow users to upload images for incorporation in the pages created on the MoinMoin site, and it is also possible that add-on processing could be written to extract markup in TeX and pipe it through an external process to render complex maths, but both would require a considerable investment of research and development time, and it was therefore decided to investigate another software platform.
mediawiki
mediawiki ([2]) is the software that is used by Wikipaedia itself, and therefore has a proven track record for the management of large sites with graphical and mathematical content. A first mediawiki implementation was set up in December 2005, and a reimplementation with updated software and appropriate access control mechanisms in late January 2006. All the initial content in the MoinMoin WiKi was transferred with little difficulty to the mediawiki version, and additional entries have been added by Andre Authier and Howard Flack.
The advantages of the new implementation are:
- native support for uploading of images and other non-text files
- native support for TeX-based processing of suitable marked-up mathematics content
- support for a substantial amount of raw HTML markup, allowing for the construction of complex tables and relatively complex page layout
- support for simple markup (similar to that used by MoinMoin) which is easy for a new author to learn, and is suitable for simple text-only entries)
- layered and extensible access rights, allowing the establishment of different classes of user: we envisage 'reader', 'author', 'editor' and 'systems administrator'
- support for categories (as with MoinMoin)
- automated section numbering
- numerous admin functions (collection of statistics, autoindexing of categories and of the entire site, identification of broken internal links etc.)
- support for automated rights metadata (the current pilot is advertising Creative Commons rights to copy, distribute, display, and perform the work, and to make derivative works - although the proper form of licensing has yet to be discussed by the Working Group)
The disadvantages are:
- much greater difficulty in set up
- greater administrative complexity (entries are stored in a database, requiring systematic dumping for purposes of backup, and with greater risk of corruption)
- less sophisticated handling of styles (it was necessary to write a new stylesheet to prevent unprivileged readers from seeing the "edit" tabs that they are unable to use anyway)
- limited ability (compared with MoinMoin) to track changes to the site overall (though it does provide an RSS feed to the autogenerated page that tracks recent changes, which is helpful)
- poor support for page templates in the style of MoinMoin (although templated data fields and transclusion may be useful features in the longer term)
- poor local documentation
Platform of choice
Both MoinMoin and mediawiki offer many features that are suitable for the Online Dictionary project - ability to create and edit entries, store version histories, exercise editorial control to freeze definitions if necessary, internal hyperlinking, indexing and search engines, the ability to annotate and discuss articles. Both also seem suitable as dissemination platforms (as well as the authoring environment that is the main concern of this stage of the project). mediawiki has greater complexity from the systems administration viewpoint; but much of that has to do with the initial setup, which has now been achieved for both platforms. mediawiki offers much better support off the shelf for maths and images, both of which were identified by Andre as essential for an effective crystallography dictionary.
The next phase of development of the Online Dictionary of Crystallography will therefore be based on the mediawiki implementation that can currently be found at
http://reference.iucr.org/dictionary