BCRA > Publications > Database Format The format for this database is based on that of an EndNote "Tagged Import" (ENTI) file. This is, in turn based on the rules of the Refer/BiblX format. In this format, each field of data is preceded by an identifying tag (a percent sign) followed by a single capital letter and a space. There are some key difference to ENTI, which are documented below. View Examples that can be used as a Template |
The reason for choosing this format is that it is relatively simple to convert from existing lists of references to this format; and the format is simple to parse. Writing the database in XML would be a more laborious task to undertake although, once the data is parseable by our server-side PHP routines, we could (if we so desired) write a routine to output the data in XML format.
The reason the database is in plain text, rather than embedded within a mySQL database is, again, because it is easier to generate it in this way from the existing references. We could, as a second step, output it to mySQL, or similar, but what would be the point? Plain text is readable by humans, which is always an advantage, and we do not need the power of a query-based database at the present time.
This is a draft specification (for date see foot of page). Please contact David Gibson if you have any queries. The format of the CREG database that preceded this specification is here
The data for each issue number of a publication is held in a single file, which must be called j<issue>.html. If the issue is a book then it will contain only one set of reference data. If the issue is a magazine or journal, it will contain reference data for the complete publication followed by reference data for each of the articles or papers that comprises the issue.
The data is in plain text, but must be encapsulated in valid HTML mark-up. The markup immediately surrounding the data should be a pair of "preformat" tags, <PRE> and </PRE>. The HTML outside these tags is not specified, but could be something like, ...
<HTML><HEAD><TITLE>BCRA Database</TITLE></HEAD> <BODY> <P>This file is not intended to be read by humans. Please see the <A HREF="../index.html?j=97">formatted index to issue 99 - Volume 33(1)</A></P> <PRE> [database entries go between the preformat tags] </PRE> </BODY></HTML>
The parser reads a line at a time (lines terminated by the usual variety of line termination characters). It discards any line that does not begin with a % character, so it ignores all the HTML. Each valid line is expected to be in the format %<character><space><text> where <character> is a single ENTI-specified key, and <text> is the value corresponding to that key.
Any non-word characters, except the characters > and ) are removed from the end of lines so that the <text> portion always ends in a word character. One of the reasons for this is to allow for any inconsistency in whether entries end in a full-stop - the parser adds its own full-stops.
The tags and their meaning are as follows. Note that ...
This info is used to print a formatted 'Bibliograph'.
Tag | Name | Example | Description |
%0 | Type | Journal | Reference type as specifield in the EndNote standard, e.g. book, journal, journal article. |
%1 | Publication | cavekarstscience | This is a flag used by the parser to select an appropriate output format, and must be one of cavestudies, cregj, speleology, cavekarstscience. |
%2 | Price | £3.50 plus postage | The price of the item, e.g. £3.50 including postage, or Out of print. |
%S | Sandbox | text | If this flag is present, only the HEADER section of the database is printed; followed by the text of the sandbox tag. To view the ARTICLES section, the URL must include the query string &sandbox=yes. |
%T | Title | The title of the item. Used for book titles or article titles. For journal titles use 'J'. | |
%J | Journal Title | Cave and Karst Science | The title of the item. Used for journal titles. for book and article titles use 'T' |
%A | Author | The authors of the item. The surname must appear after the forenames and initials, with multiple authors separated by commas. Do not use this tag for editors. In fact, the parser includes a routine that attempts to make sense of different formats for names, but this cannot always be relied on, of course. See note below. | |
%E | Editor | John Gunn, David Lowe | The editors of the item. Do not use this tag for authors. See %A |
%7 | Edition | Edition, e.g. 2nd edition, or omitted. Use only for books. | |
%D | Publication Date | 2007 | Year of publication |
%C | Publication City | Buxton | City of publication |
%I | Publisher | British Cave Research Association | This will usually be British Cave Research Association |
%P | Pages | iv + 48 | The number of pages in the publication, e.g. 36pp, or xii + 120 |
%Z | Notes | A4, with photos, maps and diagrams | A custom field, used here to describe the format of the publication |
%N | Number | 33(1) | Issue number, e.g. 33(3), Cave Studies Series 15. Two further
fields are possible here, describing the cover date and the publication date.
If either or both fields are present they must be separated by commas (white
space allowed). Example: 32(2),for 2006,June 2007. Issue number Issue number,Cover date Issue number,Cover date,Publication Date Issue number,,Publication Date (For the present [2007] CREG journal data, this field is calculated automatically. The field can be present, but will be ignored.) |
%@ | ISSN/ISBN | ISSN 1356-191X | ISBN or ISSN, e.g. ISBN 0-900265-30-2 |
%3 | Summary | The transactions of the British Cave Research Association | A short ("one line") summary of the item, as might be used in the heading of review articles. This field can also be used to report brief comments such as Out of print or To be published in June 2006. |
%_ | Break | end | The %_ tag must take the value end. This tag is non-standard, and is used to separate sections of data. |
If the item is a book then this section can run on immediately after the header and can contain tags %4 and %X only (other tags will be ignored). In the case of journals each occurance of this section must be preceded by a %_ end tag so the parser knows when each article description finishes.
Update 1 Feb 2014: Books can now use this section for chapter listings.
Tag | Name | Example | Description |
P | Pages | 9-10 | For an article or paper, the page range, e.g. 12-16 without
any "p" or "pp". The value may also be in roman numerals, e.g.
i-iii. For books, this is interpreted as a chapter number. |
9 | Article Type | Paper | For C&KS, one of <blank>, Paper, Report, Forum or "other" as
required. Bibliographs are only printed if the article type is report or
paper. note to me: text case is not forced -
should be converted to title case For Speleology this field can be set to feature to cause a bibliograph to be printed |
T | Title | The title of the item | |
A | Author | The authors of the item. See %A above. | |
X | Abstract | Abstract or summary. There may be more than one %X tag, in which case they will be assembled in sequence, with HTML <BR> tags inbetween. The %X tags do not have to be consecutive, but they must be sequential. | |
4 | More | Text in this field will be appended to the concatenated %X tags. This field is needed if the cumulative length of the %X tags exceeds PHP's limitation on string length. | |
K | Keywords | laminar, turbulent, breakthrough. | Keywords |
8 | Date Received | Received 10 April 2006; Accepted 07 June 2006 or Published online 1 Feb 2014 |
A full description of the date of the item |
Z | Notes | openAccess | summary | Normally blank, this field can contain openAccess to display the Open Access logo. The field should contain summary if a 'layman's summary is available. |
The list of authors is separated by commas, with the names in 'first-name surname' order. There can be any number of first names or initials, and initials should be followed by a full stop. The parser removes white space and inserts its own punctuation. It will try to make sense of the possible variations in punctuation. Two points to be aware of are that initials must be followed by at least a full stop, and that hyphenated surnames must not contain spaces. Names like "Wookey" (no forst name or initial) and "David St. Pierre" (space after St.) are not parsed correctly. The case of the surname is forced to upper case, but the first name and initials are unaltered. note to me: these should be forced to to title case; also allow for missing first name.
Note to me: it appears that initials do not have to be followed by a full stop, but if one is present then all spaces between it snd the next initial are removed. This needs checking, and a policy on full stops (leave alone, delete or create) decided upon.
The data fields can contain HTML mark-up, but this must be mark-up that is valid within <DT> and <DD> tags. By no means all HTML tags are valid in this context. For example, <DT>...</DT> tags cannot contain <P> tags or lists. <DD>...</DD> tags can contain <P> and <UL> tags, but a <UL> tag cannot go inside a <P> tag.
Browsers are highly tolerant of HTML scripting errors (they have to be - what else could they do but make a 'guess' as to what you meant?) but you cannot guarantee that all browsers will render bad HTML in the same way. The problem is further complicated by the fact that contributors to the database will not know exactly how our database rendering routines will process the database data, e.g. which of it appears in <DT> or <DD> tags; how the lines are broken up, and so on.
It is therefore probably best to avoid using HTML in any of the data fields except within the %X tags, which are wrapped in <DD> tags, with successive paragraphs separated by <BR> (i.e. not using <P>). Use your HTML editor to check what is valid within <DD> tags and remember that the text in a %X field must not include line termination characters.
With the <HEAD> section of the database file, you can include a style sheet element like this...
<STYLE TYPE="text/css"> FONT.sub { vertical-align: text-bottom; font-size: smaller; } FONT.sup { vertical-align: text-top; font-size: smaller; } </STYLE>
and then use <FONT CLASS="sup"> and <FONT CLASS="sub"> tags to obtain superscripts and subscripts. The HTML-rendering part of the database program "knows about this" but it does not, obviously, know about any other miscellaneous style-sheet entries you might 'invent' - so dont.
PDF files and front cover graphics must obey a file naming convention so that the HTML-renderer knows where to find them.
rough notes from email to JDW on 31/12/08
the filenames have to be something like cks097009.pdf or cks097009.f.pdf where the "097" is a three digit issue number (33(1) is 097), the "009" is the first page number and the ".f" is included if the file is "free-issue" - e.g. contents lists, notes for contributors, editorial.
Added 15-Apr-11. If there is a layman's summary for an article, use the Z tag to indicate this (see above) and ensure that there is a PDF in the folder summary containing a file name in the format j[issue-reference-code]s.pdf
British Cave Research Association (UK
registered charity 267828). Registered Office: Old Methodist Chapel,
Great Hucklow, BUXTON, SK17 8RG
Access keys: ALT +
0 Top
1 Home Page
2 Summary Information
3 Publications,
4 Contact Us
7 Accessibility, Copyright & Policy Info
This page, http://mail.bcra.org.uk/pub/database_spec.html was last modified on Fri, 31 Jan 2014 13:53:54 +0000