Actions

Difference between revisions of "Report of Working Group"

From Online Dictionary of Crystallography

(Granularity of definitions)
(APPENDIX: Membership of the Working Group: cat)
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Remit =
+
The ''Dictionary Working Group'' of the Commission on Crystallographic Nomenclature (CCN) was formed during the 20th IUCr Congress in Florence to provide guidance on the establishment and conduct of a project undertaken under the aegis of the Commission, with the approval of the IUCr Executive Committee and the involvement of other Commissions and appropriate bodies of the IUCr, to provide online definitions of terms used in the practice of crystallography.
  
The Dictionary Working Group of the Commission on Crystallographic Nomenclature (CCN) was formed during the 20th IUCr Congress in Florence to provide guidance on the establishment and conduct of a project undertaken under the aegis of the Commission, with the approval of the IUCr Executive Committee and the involvement of other Commissions and appropriate bodies of the IUCr, to provide online definitions of terms used in the practice of crystallography. The remit of the Working Group covered the following topics, each of which is addressed separately in the main body of the Report:
+
The first stage of the action of the Working Group was two-fold: on the one hand, to define the nature and scope of the Dictionary, and, on the other hand, to develop an appropriate tool for its implementation.
  
'''1. Is the project to be and to remain an online project (web URLs)?'''
+
The purpose of this report is to present the state of the project after nearly one year's experience and to give the Working Group's proposals on these two points and on the financial implications.
  
Or should a book form be also envisaged in the future (question by Henk Schenk, Chairman of the IUCr/OUP book Committee)?
+
= Nature and scope =
  
'''2. The scientific scope of the project'''
+
== Motivation==
  
Broadly speaking, the project should be confined to the subject of crystallography, the area of science over which the IUCr has authority. However, crystallography is used by, and merges with, very many other areas of chemistry, physics, mathematics, materials science, biology, computational data processing, ''etc.'' What criteria should be applied for deciding which terms to include within the project, and which to exclude? (Consider, for example, the detailed descriptors for protein secondary structure included in the mmCIF dictionary. Should the online "crystallography dictionary" include definitions of protein folds, beta sheets ''etc.''?)
+
Many definitions of crystallographic terms are scattered in the International Tables but there is, at present, no place where they are systematically compiled, as is the case, for instance, for the chemical terms defined in the various compendia published by IUPAC (the '[http://gold.zvon.org/ gold]', 'red', 'blue', 'purple', 'silver' books). The many questions received by the Commission on Crystallographic Nomenclature related to matters of definitions and nomenclature show that there is a real
* Should the names of compounds (minerals, materials, chemical or biological compounds) be included?
+
need for such a compendium for crystallography. The idea was received enthusiastically by the Executive Committee in Florence and the Working Group was set up to implement a pilot project for a dictionary of crystallographic terms.
* Should physical concepts be included (entropy, energy, ''etc.'')?
 
* Should mathematical terms be included? (group properties, tensor properties, ''etc.'')
 
* Should there be translations of each entry in other languages (French, German, Spanish, Russian, Chinese, Japanese)? See old “red” ''International Tables'' as an example.
 
* Should names of people be included (Bravais, Bragg, Ewald, Laue, ''etc.'')?
 
* Should reference to computer programs be included?
 
* How should double-word items be included: “neutron interferometry”, “X-ray interferometry”, or “interferometry (neutron)”, “interferometry (X-ray)”?
 
* Should specialized expressions such as “normalized structure factors” be itemized as such or appear within the definition of “structure factor”? (there are many such examples).
 
* Should equations be included?
 
 
'''3. The granularity of definitions'''
 
  
What is the appropriate amount of text for each entry in the compilation? (This may determine, among other things, the name of the project - glossary, dictionary, index, thesaurus, encyclopaedia?) From ''Longman's Dictionary of the English Language'':
+
== Medium==
 
* '''glossary''': a list of terms (''e.g.'' those used in a particular text or in a specialized field), usually with their meanings
 
  
* '''dictionary''': a reference book containing words, usually alphabetically arranged, together with information about them, especially their forms, pronunciations, parts of speech, meanings, origins, grammatical requirements, and idiomatic uses
+
It is proposed that the project should be executed initially as solely an online project because of the flexibility of the online medium, the fact that there is no limit on the number of entries, the possibility of hyperlinks to IUCr and other web resources. The present form of the project follows the ''Wikipedia'' pattern and makes use of the ''mediawiki'' software (see Technical Considerations). It was implemented by the Research and Development Officer, Brian McMahon.
  
* '''index''': a guide or list to aid reference: ''e.g.'' an alphabetical list of items (''e.g.'' topics or names) treated in a printed work that gives for each item the page number where it appears, or a list of items of a specified type
+
It will always be possible at a later stage to consider a physical book with a CD containing all the hyperlinks, if it appears that there is a need for such a product.
  
* '''thesaurus''': 1. a book of words or of information about a particular field or set of concepts; especially a book of words grouped according to their meaning. 2. a list of subject headings or index terms, usually with a cross-reference system for use in the organization of a collection of documents for reference and retrieval
+
== Scientific scope==
  
* '''encyclopaedia''': a reference work that contains information on all branches of knowledge or treats comprehensively a specified branch of knowledge, usually in articles arranged in alphabetical order of subjects either in a single list or within each of several large subsections
+
Broadly speaking, the project should be confined to the subject of crystallography, the area of science over which the IUCr has authority. Terms selected for inclusion should have a clear crystallographic implication and terms from connected disciplines (mathematics, physics, chemistry, mineralogy, biology, computational data processing, ''etc.'') should be included insofar as they relate to crystallography, ''e.g.'' ''crystallographic group''. Names of chemical or biological substances or minerals should not be included at the present stage, but terms such as ''albite twin law'' should. Reference to computer programs ''per se'' should not be included, but there might be instances when it becomes essential, ''e.g.'' ''SHELX''. Names of people should only be included if they relate to crystallographic concepts, ''e.g.'' ''[[Bragg's law]]'', ''[[Ewald sphere]]''. Double-word items such as “X-ray interferometry” should be entered as such. A search on “interferometry “ will automatically retrieve them. Equations, tables and figures are included where necessary (see, for instance, the entries ''[[Bragg's law]]'' and ''[[arithmetic crystal classes]]'').
  
* '''compendium''': a full list or inventory (Webster), a book containing a list of useful hints (Collins); as an example see the IUPAC gold book (http://gold.zvon.org): a list of names, each one with hyperlinks.
+
The Working Group considers that translations of terms in other languages than English should be given, but the definitions should not be translated into other languages. The pilot demonstrates many translations into French, Spanish, Italian, German and Russian. Because it is impossible to collect a comprehensive set of translations at any one time, an advantage of the WiKi approach is the ability to extend the list of translations at any time.
  
What is the quantity of illustrations to be given (drawings, diagrams, spectra, photographs, ''etc.'')
+
== The granularity of definitions==
  
Should the work be organized in Categories and Subcategories?
+
The Working Group recommends a reference product that is a blend between “dictionary” and “encyclopaedia”: a list of terms with short definitions and cross-links to other entries in the work, with at times longer developments. These longer developments are presented on a separate page that one accesses ''via'' a hyperlink (see for instance the page ''[[arithmetic crystal classes]]''). Hyperlinks are also provided to other web resources of the IUCr (Teaching Pamphlets, CIF dictionaries, ''International Tables'', Journals). For instance, in the entry ''[[reciprocal lattice]]'', hyperlinks are given to the corresponding pamphlet on the IUCr web site (open access) and to the appropriate chapters of ''IT Volumes A, B, C'' and ''D''; for these it is for the Executive Committee to decide (after recommendation from the Finance Committee) whether such links will be free access or not. As other examples, the entry ''[[CIF]]'' has links to Journal articles (subscribers only or by buying the articles) and the entry ''[[Bragg's law]]'' to [http://www.iucr.org/iucr-top/publ/50YearsOfXrayDiffraction/ 50 Years of X-ray Diffraction] (free access). Hyperlinks to other web sites such as the IUPAC web sites or educational web sites can also be provided, if appropriate (see, for instance  the entry ''[[absolute structure]]'').
  
'''4. The level of definitions'''
+
The general pattern of a typical page is:
  
Should this be a reference work for authors and referees of IUCr Journals, research professionals, undergraduate students, high-school students, the general public? Can it be designed as a multi-level resource?
+
* translation of the term in other languages,
 +
* main definition
 +
* examples or applications or special cases
 +
* history
 +
* list of links to other entries or to IUCr or other web pages
  
'''5. Delegation of authority and labour'''
+
== Structure of the work==
  
What is an appropriate editorial structure to commission, review and implement definitions? This needs to take into account the involvement of Commissions, the possibility of different educational levels for the completed work, and perhaps some technical aspects of presentation and online editing.
+
The work will be structured in several ways to assist navigation. The terms are entered alphabetically and can be retrieved alphabetically, but the WiKi software allows an ordering by categories and subcategories. Each entry can be attached to one or more categories (and subcategories). At the time of writing, categories are being assigned to entries on an ''ad hoc'' basis in an attempt to determine suitable  structuring mechanisms. A click on a category provides links to all the entries related to that category. The present list of categories is given on the Main Page. As an example the subcategory [http://reference.iucr.org/dictionary/Category:Twinning ''Twinning''] has been introduced in the category
 +
[http://reference.iucr.org/dictionary/Category:Fundamental_crystallography# ''Fundamental crystallography''].
  
'''6. Presentation'''
+
There are several advantages to having categories and subcategories. One is to allow searches on areas of interest, for instance if you are looking for a particular type of [[twinning]], but don’t remember its exact name. Another one is to make the work of preparing the dictionary easier by assigning editors and subeditors to categories and subcategories. Their duty would be to oversee the definitions and to check that there are no obvious omissions.
  
Some consideration should be given to broad aspects of how the project will be presented: as a single web site, as multiple sites (perhaps appropriate if different educational levels are supported), as a free resource or a potential source of revenue, as a companion to ''International Tables''?
+
Note that the Wiki software allows searches on headwords, but also full-text searching of the entire corpus, so that the user has available a large number of query-based informational retrieval strategies.
 
 
'''7. Financial implications'''
 
 
 
The project as currently envisaged in its project definition phase will rely heavily on volunteer scientist labour and existing hardware resources, but there may be a need for editorial honoraria, or other costs that the Working Group can specify, such as hardware, technical editing, secretarial help, ''etc.'' The Working Group will report its findings to the Finance Committee.
 
 
 
= Medium =
 
 
 
The project should be executed initially as solely an online project because of the flexibility of the online medium, the fact that there is no limit on the number of entries, the possibility of hyperlinks to IUCr and other web resources. It is  possible to consider at a later stage a physical book with a CD containing all the hyperlinks.
 
 
 
= Scientific scope =
 
 
 
Terms selected for inclusion should have a clear crystallographic application. Terms from connected disciplines (mathematics, physics, chemistry, mineralogy, biology) should be included insofar as they relate to crystallography, ''e.g.''  “crystallographic group”. Names of chemical or biological substances or minerals should not be included at the present stage, but terms such as “albite twin law” should.
 
 
 
The Working Group agrees that translations of terms in other languages than English should be given, but not of their definitions.
 
 
 
Reference to computer programs ''per se'' should not be included, but there might be instances when it becomes essential, ''e.g.'' ''SHELX''.
 
 
 
Names of people should only be included if they relate to crystallographic concepts, ''e.g.'' “Bragg’s law”, “Ewald sphere”.
 
 
 
Double-word items such as “X-ray interferometry” should be entered as such. A search on “interferometry “ will automatically retrieve them.
 
 
 
Equations should be included.
 
 
 
The number of entries is not predetermined. There is no technical limitation requiring this and the project can grow with time. The Working Group has contributed a small number of terms (~60) in order to get the pilot project operational. The project has recently been opened to the whole Commission on Crystallographic Nomenclature and a number of around 500 terms should be established as a target for when the web pages are first opened to the public. If the idea is successful, it will probably grow to some thousands of distinct terms over the years.
 
 
 
= Granularity of definitions =
 
 
 
The Working Group recommends a reference product that is a blend between “dictionary” and “encyclopaedia”, what the French call a ''dictionnaire encyclopédique'': a list of terms with short definitions and cross-links to other entries in the work, with at times longer developments. For instance, “reciprocal lattice” will have a short definition and a hyperlink to the corresponding pamphlet on the IUCr web site (open access) and to the appropriate chapter of ''IT Volume B''; the entry “Bragg’s law” should give the law with a drawing and its derivation could be obtained via an appropriate hyperlink.
 
 
 
Links to CIF dictionaries will be provided where appropriate.
 
 
 
The work may be structured in several ways to assist navigation. The terms will be entered alphabetically and can be retrieved alphabetically, but the WiKi software allows an ordering in categories (and eventually in subcategories). Each entry can be attached to one or more categories (and subcategories). These categories could correspond for instance to titles of IT Volumes, but with additional subjects (“mathematical crystallography” ''etc.''); subcategories could correspond to Chapters. At the time of writing, categories are being assigned to entries on an ''ad hoc'' basis in an attempt to determine suitable  structuring mechanisms. A click on a category provides links to all the entries related to that category.
 
 
 
As an example, to get the entry “Grüneisen relations”, one may either click on that term, or click on the Category “Physical Properties”; thzt will provide links to subcategories, one of them being “Thermal expansion”; a click on that subcategory will provide a list of links to all the entries related to that topic, one of them being “Grüneisen relations”. The entry “Grüneisen relations” will give a definition and hyperlinks to Chapters 1.4 and 2.1 of ''International Tables Volume D''.
 
 
 
There are several advantages to having categories and subcategories. One is to allow searches on areas of interest, for instance if you are looking for a particular type of twinning, but don’t remember its exact name. Another one is to make the work of preparing the dictionary easier by assigning editors and subeditors to categories and subcategories. Their duty would be to oversee the definitions and to check that there are no obvious omissions.
 
  
Note that the Wiki software allows searches on headwords, but also full-text searching of the entire corpus, so that the user has available a large number of query-based informational retrieval strategies.
+
==Level of definitions and audience==
  
= Audience =
+
The primary goal of the dictionary is to be a reference for authors and referees of IUCr Journals and for research professionals in general: it will give the “official” IUCr acceptance of terms. As such it will also be useful to students and to the general public.
  
The primary goal is to be a reference for authors and referees of IUCr Journals and to research professionals in general: it will give the “official” IUCr acceptance of terms. As such it will also be useful to students and to the general public.
+
== Organization of contributors==
  
The work forms part of a multi-level resource in the sense that besides the short definition hyperlinks will be provided either to a longer definition or to appropriate existing IUCr resources. It will complement ''International Tables''.
+
The Editorial Board should consist of the members of the CCN, with representatives from the other Commissions as consultants for the various fields of crystallography. It is clear that, as Editors of the various IUCr publications, the members of the CCN are the people whose duty is to say how crystallographic terms should be used.  
  
= Organization of contributors =
+
Efficiency, however, requires that the work should be done under the supervision of a Main Editor or Editor-in-Chief and and a small number of appointed Editors (and subeditors) for the various categories (and subcategories), chosen in priority among the CCN members and consultants.
  
The Editorial Board should consist of the members of the CCN, with representatives from the other Commissions as consultants for the various fields of crystallography. It is clear that, as Editors of the various IUCr publications, the members of the CCN are the people whose duty is to say how crystallographic terms should be used. Efficiency requires that the work should be done under the supervision of a Main Editor and Editors (and subeditors) for the various categories (and subcategories), chosen among the CCN members and consultants.
+
The initial experience of the Working Group has been, however, that even the greatest enthusiasts for the project are so busy that they find it difficult to spend the time necessary to make substantial contributions. The authoring privilege has been extended recently to the rest of the CCN. Early indications are that, again, the rate of accretion of new definitions is slower than we would like to see. It is likely that individuals will need to be recruited and charged with populating specific topic areas with content if one wants the project to proceed at a reasonable pace. This may involve some financial incentive.
  
= Presentation =
+
== Presentation==
  
It is expected that the resource would appear as a single web site. However, it should also act as a companion to ''International Tables'' and to the Journals. As the ''Online Dictionary of Crystallography'' would be an important and useful service to the researchers and authors it is desirable that it should be open access, bearing in mind that most definitions will have links to IT Volumes, which are not open access. This last point may incite people to suscribe to ''International Tables Online''.
+
It is expected that the resource would appear as a single web site. However, it should also act as a companion to ''International Tables'' and to the Journals, as well as to educational resources such as the Teaching Pamphlets and any new educational initiatives arising from the Teaching Commission. As the ''Online Dictionary of Crystallography'' would be an important and useful service to researchers, students and authors, it is desirable that it should be open access, bearing in mind that most definitions have links to IT Volumes, which are not open access. This last point  
 +
may encourage people to subscribe to ''International Tables Online''.
  
 
= Financial implications =
 
= Financial implications =
  
The project as initially envisaged will rely heavily on volunteer labour and existing hardware resources. The current pilot implementation shares the same hardware as the main IUCr web site (although is managed as a separate virtual server, so can easily be moved to its own server machine if required). Some additional software development will be required (''e.g.'' implementation of a reliable backup strategy, modifications to the style to conform with other IUCr web components); but so long as these are not time-critical, they can be absorbed within the existing workload of the R&D department. Significant software developments (such as creation of a hard-copy edition) would need to be assessed and costed separately. Note that hardware costs in the event of a migration to a separate server would be modest (''e.g.'' of the order of GBP 1000 would suffice for a powerful dedicated machine).
+
The project as initially envisaged will rely heavily on volunteer labour and existing hardware resources. The current pilot implementation shares the same hardware as the main IUCr web site (although it is managed as a separate virtual server, and so can easily be moved to its own server machine if required). Some additional software development will be required (''e.g.'' implementation of a reliable backup strategy, modifications to the style to conform with other IUCr web components); but so long as these are not time-critical, they can be absorbed within the existing workload of the R&D  
 +
department. Significant software developments (such as creation of a hard-copy edition) would need to be assessed and costed separately. Note that hardware costs in the event of a migration to a separate server would be modest (''e.g.'' of the order of GBP 1000 would suffice for a powerful dedicated machine).
  
Technical editing costs are ruled out at this stage (it is assumed that the invited contributors will have a high degree of literacy, and that there will be a measure of self-regulation as contributors edit each other's entries to correct minor spelling and typographic errors). Since each entry will be presented as a separate web page, minor inconsistencies of style and presentation will not be so important as they would be in a hard-copy publication. Conversely, however, the decision to produce a hard-copy publication would be likely to involve more rigorous technical editing, with subsequent added costs.
+
Technical editing costs are ruled out at this stage (it is assumed that the invited contributors will have a high degree of literacy, and that there will be a measure of self-regulation as contributors edit each other's entries to correct minor spelling and typographic errors). Since each entry will be presented as a separate web page, minor inconsistencies of style and presentation will not be so important as they would be in a hard-copy publication. Conversely, however, the decision to produce a hard-copy publication would be likely to involve more rigorous technical editing, with subsequent  
 +
added costs.
  
The Finance Committee should monitor the possible need for payment of editorial honoraria. It is expected that the project will require an Editor-in-Chief responsible for its overall shape and direction (at present this role is filled by the project initiator, Professor Authier). The roles of such an Editor-in-Chief will also cover the possible appointment of subsidiary editors to supervise the collection of definitions in topic areas where they have particular expertise, and the commissioning of definitions or sets of definitions to address topics not currently covered. The number and roles of secondary editors will depend in part on the readiness of the volunteer pool of contributors to identify deficiencies and provide needed definitions without prompting. The experience of the ''Wikipedia'' project suggests that this is possible in principle, but the early experience of the pilot suggests that significant effort will be needed, at least in the early stages, to build an initial critical mass of content that will inspire more active involvement by volunteer contributors. It is intended to return to this point when the project produces its next report. In the mean time, it is not unreasonable to provide conservatively for the appointment of a small number - six to a dozen - of specialist Editors responsible for commissioning content within their fields of expertise, and possibly paid a modest honorarium in recognition of their successes (by analogy in some way with journal editors' handling of manuscripts).
+
The Finance Committee should monitor the possible need for payment of editorial honoraria. It is expected that the project will require an Editor-in-Chief responsible for its overall shape and direction (at present this role is filled by the project initiator, Professor Authier). The roles of such an Editor-in-Chief will also cover the possible appointment of subsidiary editors to supervise the collection of definitions in topic areas where they have particular expertise, and the commissioning of definitions or sets of definitions to address topics not currently covered. The number and roles of secondary editors will depend in part on the readiness of the volunteer pool of contributors to identify deficiencies and provide needed definitions without prompting. The experience of the ''Wikipedia'' project suggests that this is possible in principle, but the early experience of the pilot suggests that significant effort will be needed in the early stages to build a critical mass of content that will inspire more active involvement by volunteer contributors.
  
= Timescale =
+
= Technical Considerations=
  
A Pilot Project with about 500 entries should be implemented and guidelines for further development as well as an estimation of its financial implications provided in time for the Finance Committee meeting in 2006 (usually around March).
+
A major goal of the pilot project was to identify a software platform capable of supporting collaborative work on an online dictionary by the distributed authorship that the project required. Ideally the software chosen would also act as a dissemination mechanism, ''i.e.'' the contributors would be working directly on the pages that readers would view.
 
 
 
 
= APPENDIX: Technical considerations =
 
 
 
A major goal of the initial pilot project was to identify a software platform capable of supporting collaborative work on an online dictionary by the distributed authorship that the project seems to require. The pilot is not specifically directed towards identifying a dissemination mechanism, but clearly it is helpful to consider tolls that create a version of the dictionary already suitable for public access.
 
 
 
== Content management systems ==
 
 
 
The conventional software platform for managing the collection, editing, revision and publication of a large number of separate items is known as a ''content management system''. Commercial implementations of such systems have been available for decades, and have been distinguished by their high price and complexity of use. The IUCr editorial office has considered such systems in the past (''e.g.'' Texcel Information Manager), but concluded that the high costs and steep learning curves associated with such systems, coupled with their lack of flexibility for the innovative procedures we have developed, have made them less attractive than home-grown systems. Traditional packages were also poorly suited to web use.
 
 
 
More recently, open-source web-based packages such as ''Bricolage'' have begun to appear. ''Bricolage'' in particular is a system that is under consideration as a basis for collaborative input into the next generation of IUCr public web services. However, it shares many of the drawbacks of older content management systems. It requires heavy investment of time to configure it for a particular organization; there is a significant learning curve for content contributors (authors) to master; it offers rather little in the way of flexibility if one wishes to integrate the managed content with existing material, or with contributions from other sources; and it is rather poorly suited for technical content (especially mathematics). While it remains under consideration for possible future editorial use, it seems too heavyweight for the dictionary project as currently envisaged.
 
  
 
== WiKi software ==
 
== WiKi software ==
  
An alternative approach that was put forward at the Florence Congress and enthusiastically received by the members of the Nomenclature Commission present at that meeting was the use of so-called 'WiKis'. A WiKi (from the Hawai'ian for 'quick' or 'fast') is a web-centric content management system designed to be lightweight and encourage rapid development of web sites by a more or less informal collaboration of authors and editors. The public ''Wikipaedia'' project demonstrates the possibility to compile a very large compendium of content (at the moment almost a million encyclopaedic entries in the English-language edition, written and edited by tens of thousands of users). Although ''Wikipaedia'' encourages the process of authorship, it was felt that (at least for its initial implementation) the Online Dictionary of Crystallography should be seen as the work of expert authors, and that therefore controls should exist on the users able to contribute or edit content. (Indeed, ''Wikipaedia'' also has administrative privileges that control access to articles, though by design these are not used routinely.) A requirement therefore was for a lightweight WiKi implementation that had appropriate access control/user management functionality.
+
The approach put forward at the Florence Congress and enthusiastically received by the Nomenclature Commission was the use of a 'WiKi'. A WiKi (from the Hawai'ian for 'quick' or 'fast') is a web-centric content management system designed to be lightweight and encourage rapid development of web sites by a collaboration of authors and editors. The public ''Wikipedia'' project is an example of a very large work of this sort (at the moment over one and a quarter million encyclopaedic entries in the English-language edition, written and edited by tens of thousands of users). Two software WiKi implementations were investigated, ''MoinMoin'', which is used in-house for technical documentation by the IUCr editorial staff, and ''mediawiki'', which is used in the Wikipedia project. Although ''MoinMoin'' had certain advantages in ease of set-up, maintenance and use, it proved to be too limited in its ability to handle images, mathematics, and complex page layouts. After a few months development on the ''MoinMoin'' platform, the content was transferred successfully to a ''mediawiki'' implementation, which will form the basis for future developments.
 
 
== MoinMoin ==
 
 
 
The first package investigated was ''MoinMoin'' ([http://moinmoin.wikiwikiweb.de/]), which is already used in the IUCr editorial office for maintaining internal documentation. A pilot ''MoinMoin'' implementation was set up in mid-November 2005, and used extensively by Andre Authier (with a small amount of input from John Helliwell). Its advantages were:
 
* ease of installation
 
* ease of maintenance (individual entries are stored as files on the hard disk, and can readily be backed up, restored, deleted or moved)
 
* simple markup and ways of creating internal hyperlinks and links to resources on the Web
 
* simple access control mechanisms
 
* relatively easy modification to style sheets (so that only authorised users see the tabs/buttons allowing a page to be edited)
 
* ability to track all recent changes (essential for the chief editor(s) and system administrator)
 
* support for categorizing entries and for managing and indexing categories
 
* page templating
 
 
 
Its disadvantages were considered to be:
 
* multilingual support (perversely, since a requirement of the Online Dictionary is that it provides access to terms in multiple languages); the main problem was that the software is ''too'' well suited for multilingual operation: it recognised that Andre was using a French browser, and displayed the standard system pages and facilities in French. This behaviour would entail translation of all relevant help pages, Introduction and internal labels to French, German and a host of other 'supported' languages.
 
* lack of support for mathematics
 
* lack of support for images and graphical illustrations
 
* limited control over layout of complex pages
 
 
 
The multilingual support was an unexpected problem, and more of a nuisance than a real obstacle. Nevertheless, it would probably involve a significant amount of time to make the resource truly multilingual (note that this does not refer to the articles themselves, which are intended to be in English, but to the descriptive and navigational terms needed for effective use of the site).
 
 
 
We explored the ability to mark up mathematical content. The native markup allowed the creation of italic, bold, subscript and superscript rendering, and the use of Unicode allowed access to many mathematical symbols, but complex maths (''e.g.'' built-up fractions) could not be rendered. The largest single obstacle to progress was the inability to render overbar characters (<math>p\bar1</math>, for example), which significantly impedes progress in descriptions of crystallography!
 
 
 
It was considered possible to write extensions to allow users to upload images for incorporation in the pages created on the ''MoinMoin'' site, and  it is also possible that add-on processing could be written to extract markup in TeX and pipe it through an external process to render complex maths, but both would require a considerable investment of research and development time, and it was therefore decided to investigate another software platform.
 
 
 
  
 
== mediawiki ==
 
== mediawiki ==
  
''mediawiki'' ([http://www.mediawiki.org]) is the software that is used by ''Wikipaedia'' itself, and therefore has a proven track record for the management of large sites with graphical and mathematical content. A first ''mediawiki'' implementation was set up in December 2005, and a reimplementation with updated software and appropriate access control mechanisms in late January 2006. All the initial content in the ''MoinMoin'' WiKi was transferred with little difficulty to the ''mediawiki'' version, and additional entries have been added by Andre Authier and Howard Flack.
+
The first ''mediawiki'' implementation (http://www.mediawiki.org) was set up in December 2005, and a reimplementation with updated software and appropriate access control mechanisms was launched in late January 2006.  
  
The advantages of the new implementation are:
+
The main advantages of this implementation are:
 
* native support for uploading of images and other non-text files
 
* native support for uploading of images and other non-text files
 
* native support for TeX-based processing of suitable marked-up mathematics content
 
* native support for TeX-based processing of suitable marked-up mathematics content
* support for a substantial amount of raw HTML markup, allowing for the construction of complex tables and relatively complex page layout
+
* support for raw HTML markup, allowing the construction of complex tables and relatively complex page layout
* support for simple markup (similar to that used by ''MoinMoin'') which is easy for a new author to learn, and is suitable for simple text-only entries)
+
* support for a simple markup that is easy for a new author to learn, and is suitable for simple text-only entries
* layered and extensible access rights, allowing the establishment of different classes of user: we envisage 'reader', 'author', 'editor' and 'systems administrator'
+
* layered and extensible access rights, allowing different classes of user:   'reader', 'author', 'editor' and 'systems administrator'
* support for categories (as with ''MoinMoin'')
+
* support for categories
 
* automated section numbering
 
* automated section numbering
 
* numerous admin functions (collection of statistics, autoindexing of categories and of the entire site, identification of broken internal links ''etc.'')
 
* numerous admin functions (collection of statistics, autoindexing of categories and of the entire site, identification of broken internal links ''etc.'')
* support for automated rights metadata (the current pilot is advertising  Creative Commons rights to copy, distribute, display, and perform the work, and to make derivative works - although the proper form of licensing has yet to be discussed  by the Working Group)
+
* support for automated rights metadata (the current pilot is advertising  Creative Commons rights to copy, distribute, display, and perform the work, and to make derivative works)
  
The disadvantages are:
+
Its main disadvantages are:
* much greater difficulty in set up
+
* sugnificantly greater administrative overhead than ''MoinMoin'' (although  much of this is one-off setup or introduction of new features)
* greater administrative complexity (entries are stored in a database, requiring systematic dumping for purposes of backup, and with greater risk of corruption)
+
* poor support for page templates (although templated data fields and transclusion will be useful features in the longer term)
* less sophisticated handling of styles (it was necessary to write a new stylesheet to prevent unprivileged readers from seeing the "edit" tabs that they are unable to use anyway)
 
* limited ability (compared with ''MoinMoin'') to track changes to the site overall (though it does provide an RSS feed to the autogenerated page that tracks recent changes, which is helpful)
 
* poor support for page templates in the style of ''MoinMoin'' (although templated data fields and transclusion may be useful features in the longer term)
 
 
* poor local documentation
 
* poor local documentation
  
== Platform of choice ==
+
''mediawiki'' offers many features that are suitable for the Online Dictionary project - ability to create and edit entries, store version histories, exercise editorial control to freeze definitions if necessary, internal hyperlinking, indexing and search engines, the ability to annotate and discuss articles. It is also suitable as a dissemination platform. It offers good support for maths and images, both of which are considered essential for an effective crystallography dictionary. It is therefore proposed to base the public Online Dictionary service on this software platform.
 +
 
 +
= APPENDIX: Membership of the Working Group =
 +
 
 +
The initial membership of the Working Group established in Florence consisted of:
 +
 
 +
* Andre Authier (Chair)
 +
* John Helliwell
 +
* Bill Clegg
 +
* Paola Spadon
 +
* I. David Brown
 +
* Brian McMahon
  
Both ''MoinMoin'' and ''mediawiki'' offer many features that are suitable for the Online Dictionary project - ability to create and edit entries, store version histories, exercise editorial control to freeze definitions if necessary, internal hyperlinking, indexing and search engines, the ability to annotate and discuss articles. Both also seem suitable as dissemination platforms (as well as the authoring environment that is the main concern of this stage of the project). ''mediawiki'' has greater complexity from the systems administration viewpoint; but much of that has to do with the initial setup, which has now been achieved for both platforms. ''mediawiki'' offers much better support off the shelf for maths and images, both of which were identified by Andre as essential for an effective crystallography dictionary.
+
Giovanni Ferraris, as Chair of the ''Commission on Inorganic and Mineral Structures'', Massimo Nespolo, as Chair of the ''Commission on Mathematical and Theoretical Crystallography'' and Peter Strickland, Managing Editor of IUCr publications, as observer, subsequently joined the group. Howard Flack also provided sample entries and useful feedback.
  
The next phase of development of the Online Dictionary of Crystallography will therefore be based on the ''mediawiki'' implementation that can currently be found at
+
[[Category:English documentation]]
  http://reference.iucr.org/dictionary
 

Latest revision as of 05:44, 30 March 2015

The Dictionary Working Group of the Commission on Crystallographic Nomenclature (CCN) was formed during the 20th IUCr Congress in Florence to provide guidance on the establishment and conduct of a project undertaken under the aegis of the Commission, with the approval of the IUCr Executive Committee and the involvement of other Commissions and appropriate bodies of the IUCr, to provide online definitions of terms used in the practice of crystallography.

The first stage of the action of the Working Group was two-fold: on the one hand, to define the nature and scope of the Dictionary, and, on the other hand, to develop an appropriate tool for its implementation.

The purpose of this report is to present the state of the project after nearly one year's experience and to give the Working Group's proposals on these two points and on the financial implications.

Nature and scope

Motivation

Many definitions of crystallographic terms are scattered in the International Tables but there is, at present, no place where they are systematically compiled, as is the case, for instance, for the chemical terms defined in the various compendia published by IUPAC (the 'gold', 'red', 'blue', 'purple', 'silver' books). The many questions received by the Commission on Crystallographic Nomenclature related to matters of definitions and nomenclature show that there is a real need for such a compendium for crystallography. The idea was received enthusiastically by the Executive Committee in Florence and the Working Group was set up to implement a pilot project for a dictionary of crystallographic terms.

Medium

It is proposed that the project should be executed initially as solely an online project because of the flexibility of the online medium, the fact that there is no limit on the number of entries, the possibility of hyperlinks to IUCr and other web resources. The present form of the project follows the Wikipedia pattern and makes use of the mediawiki software (see Technical Considerations). It was implemented by the Research and Development Officer, Brian McMahon.

It will always be possible at a later stage to consider a physical book with a CD containing all the hyperlinks, if it appears that there is a need for such a product.

Scientific scope

Broadly speaking, the project should be confined to the subject of crystallography, the area of science over which the IUCr has authority. Terms selected for inclusion should have a clear crystallographic implication and terms from connected disciplines (mathematics, physics, chemistry, mineralogy, biology, computational data processing, etc.) should be included insofar as they relate to crystallography, e.g. crystallographic group. Names of chemical or biological substances or minerals should not be included at the present stage, but terms such as albite twin law should. Reference to computer programs per se should not be included, but there might be instances when it becomes essential, e.g. SHELX. Names of people should only be included if they relate to crystallographic concepts, e.g. Bragg's law, Ewald sphere. Double-word items such as “X-ray interferometry” should be entered as such. A search on “interferometry “ will automatically retrieve them. Equations, tables and figures are included where necessary (see, for instance, the entries Bragg's law and arithmetic crystal classes).

The Working Group considers that translations of terms in other languages than English should be given, but the definitions should not be translated into other languages. The pilot demonstrates many translations into French, Spanish, Italian, German and Russian. Because it is impossible to collect a comprehensive set of translations at any one time, an advantage of the WiKi approach is the ability to extend the list of translations at any time.

The granularity of definitions

The Working Group recommends a reference product that is a blend between “dictionary” and “encyclopaedia”: a list of terms with short definitions and cross-links to other entries in the work, with at times longer developments. These longer developments are presented on a separate page that one accesses via a hyperlink (see for instance the page arithmetic crystal classes). Hyperlinks are also provided to other web resources of the IUCr (Teaching Pamphlets, CIF dictionaries, International Tables, Journals). For instance, in the entry reciprocal lattice, hyperlinks are given to the corresponding pamphlet on the IUCr web site (open access) and to the appropriate chapters of IT Volumes A, B, C and D; for these it is for the Executive Committee to decide (after recommendation from the Finance Committee) whether such links will be free access or not. As other examples, the entry CIF has links to Journal articles (subscribers only or by buying the articles) and the entry Bragg's law to 50 Years of X-ray Diffraction (free access). Hyperlinks to other web sites such as the IUPAC web sites or educational web sites can also be provided, if appropriate (see, for instance the entry absolute structure).

The general pattern of a typical page is:

  • translation of the term in other languages,
  • main definition
  • examples or applications or special cases
  • history
  • list of links to other entries or to IUCr or other web pages

Structure of the work

The work will be structured in several ways to assist navigation. The terms are entered alphabetically and can be retrieved alphabetically, but the WiKi software allows an ordering by categories and subcategories. Each entry can be attached to one or more categories (and subcategories). At the time of writing, categories are being assigned to entries on an ad hoc basis in an attempt to determine suitable structuring mechanisms. A click on a category provides links to all the entries related to that category. The present list of categories is given on the Main Page. As an example the subcategory Twinning has been introduced in the category Fundamental crystallography.

There are several advantages to having categories and subcategories. One is to allow searches on areas of interest, for instance if you are looking for a particular type of twinning, but don’t remember its exact name. Another one is to make the work of preparing the dictionary easier by assigning editors and subeditors to categories and subcategories. Their duty would be to oversee the definitions and to check that there are no obvious omissions.

Note that the Wiki software allows searches on headwords, but also full-text searching of the entire corpus, so that the user has available a large number of query-based informational retrieval strategies.

Level of definitions and audience

The primary goal of the dictionary is to be a reference for authors and referees of IUCr Journals and for research professionals in general: it will give the “official” IUCr acceptance of terms. As such it will also be useful to students and to the general public.

Organization of contributors

The Editorial Board should consist of the members of the CCN, with representatives from the other Commissions as consultants for the various fields of crystallography. It is clear that, as Editors of the various IUCr publications, the members of the CCN are the people whose duty is to say how crystallographic terms should be used.

Efficiency, however, requires that the work should be done under the supervision of a Main Editor or Editor-in-Chief and and a small number of appointed Editors (and subeditors) for the various categories (and subcategories), chosen in priority among the CCN members and consultants.

The initial experience of the Working Group has been, however, that even the greatest enthusiasts for the project are so busy that they find it difficult to spend the time necessary to make substantial contributions. The authoring privilege has been extended recently to the rest of the CCN. Early indications are that, again, the rate of accretion of new definitions is slower than we would like to see. It is likely that individuals will need to be recruited and charged with populating specific topic areas with content if one wants the project to proceed at a reasonable pace. This may involve some financial incentive.

Presentation

It is expected that the resource would appear as a single web site. However, it should also act as a companion to International Tables and to the Journals, as well as to educational resources such as the Teaching Pamphlets and any new educational initiatives arising from the Teaching Commission. As the Online Dictionary of Crystallography would be an important and useful service to researchers, students and authors, it is desirable that it should be open access, bearing in mind that most definitions have links to IT Volumes, which are not open access. This last point may encourage people to subscribe to International Tables Online.

Financial implications

The project as initially envisaged will rely heavily on volunteer labour and existing hardware resources. The current pilot implementation shares the same hardware as the main IUCr web site (although it is managed as a separate virtual server, and so can easily be moved to its own server machine if required). Some additional software development will be required (e.g. implementation of a reliable backup strategy, modifications to the style to conform with other IUCr web components); but so long as these are not time-critical, they can be absorbed within the existing workload of the R&D department. Significant software developments (such as creation of a hard-copy edition) would need to be assessed and costed separately. Note that hardware costs in the event of a migration to a separate server would be modest (e.g. of the order of GBP 1000 would suffice for a powerful dedicated machine).

Technical editing costs are ruled out at this stage (it is assumed that the invited contributors will have a high degree of literacy, and that there will be a measure of self-regulation as contributors edit each other's entries to correct minor spelling and typographic errors). Since each entry will be presented as a separate web page, minor inconsistencies of style and presentation will not be so important as they would be in a hard-copy publication. Conversely, however, the decision to produce a hard-copy publication would be likely to involve more rigorous technical editing, with subsequent added costs.

The Finance Committee should monitor the possible need for payment of editorial honoraria. It is expected that the project will require an Editor-in-Chief responsible for its overall shape and direction (at present this role is filled by the project initiator, Professor Authier). The roles of such an Editor-in-Chief will also cover the possible appointment of subsidiary editors to supervise the collection of definitions in topic areas where they have particular expertise, and the commissioning of definitions or sets of definitions to address topics not currently covered. The number and roles of secondary editors will depend in part on the readiness of the volunteer pool of contributors to identify deficiencies and provide needed definitions without prompting. The experience of the Wikipedia project suggests that this is possible in principle, but the early experience of the pilot suggests that significant effort will be needed in the early stages to build a critical mass of content that will inspire more active involvement by volunteer contributors.

Technical Considerations

A major goal of the pilot project was to identify a software platform capable of supporting collaborative work on an online dictionary by the distributed authorship that the project required. Ideally the software chosen would also act as a dissemination mechanism, i.e. the contributors would be working directly on the pages that readers would view.

WiKi software

The approach put forward at the Florence Congress and enthusiastically received by the Nomenclature Commission was the use of a 'WiKi'. A WiKi (from the Hawai'ian for 'quick' or 'fast') is a web-centric content management system designed to be lightweight and encourage rapid development of web sites by a collaboration of authors and editors. The public Wikipedia project is an example of a very large work of this sort (at the moment over one and a quarter million encyclopaedic entries in the English-language edition, written and edited by tens of thousands of users). Two software WiKi implementations were investigated, MoinMoin, which is used in-house for technical documentation by the IUCr editorial staff, and mediawiki, which is used in the Wikipedia project. Although MoinMoin had certain advantages in ease of set-up, maintenance and use, it proved to be too limited in its ability to handle images, mathematics, and complex page layouts. After a few months development on the MoinMoin platform, the content was transferred successfully to a mediawiki implementation, which will form the basis for future developments.

mediawiki

The first mediawiki implementation (http://www.mediawiki.org) was set up in December 2005, and a reimplementation with updated software and appropriate access control mechanisms was launched in late January 2006.

The main advantages of this implementation are:

  • native support for uploading of images and other non-text files
  • native support for TeX-based processing of suitable marked-up mathematics content
  • support for raw HTML markup, allowing the construction of complex tables and relatively complex page layout
  • support for a simple markup that is easy for a new author to learn, and is suitable for simple text-only entries
  • layered and extensible access rights, allowing different classes of user: 'reader', 'author', 'editor' and 'systems administrator'
  • support for categories
  • automated section numbering
  • numerous admin functions (collection of statistics, autoindexing of categories and of the entire site, identification of broken internal links etc.)
  • support for automated rights metadata (the current pilot is advertising Creative Commons rights to copy, distribute, display, and perform the work, and to make derivative works)

Its main disadvantages are:

  • sugnificantly greater administrative overhead than MoinMoin (although much of this is one-off setup or introduction of new features)
  • poor support for page templates (although templated data fields and transclusion will be useful features in the longer term)
  • poor local documentation

mediawiki offers many features that are suitable for the Online Dictionary project - ability to create and edit entries, store version histories, exercise editorial control to freeze definitions if necessary, internal hyperlinking, indexing and search engines, the ability to annotate and discuss articles. It is also suitable as a dissemination platform. It offers good support for maths and images, both of which are considered essential for an effective crystallography dictionary. It is therefore proposed to base the public Online Dictionary service on this software platform.

APPENDIX: Membership of the Working Group

The initial membership of the Working Group established in Florence consisted of:

  • Andre Authier (Chair)
  • John Helliwell
  • Bill Clegg
  • Paola Spadon
  • I. David Brown
  • Brian McMahon

Giovanni Ferraris, as Chair of the Commission on Inorganic and Mineral Structures, Massimo Nespolo, as Chair of the Commission on Mathematical and Theoretical Crystallography and Peter Strickland, Managing Editor of IUCr publications, as observer, subsequently joined the group. Howard Flack also provided sample entries and useful feedback.