Database format

About BCRA

Join, Donate, Volunteer

News

Science / SIGs

Grants, Prizes, Awards

Events & Meetings

Library & Search

CREG Journal Scans

March 2017: Online access is currently available for all issues.

The more recent issues contain the digital text and graphics, but the earlier issues are bit-mapped scans of the printed journals, so they are of a lower quality. Some (or maybe all?) of the bit-maps have been OCR-ed. It is possible that - eventually - the bit-maps will be replaced by digital copies. The issues that are digital copies are 28, 31, 34-37, 41 and 48 onwards.

Issue

Availability

Issue

Availability

Issue

Availability

1-25

bit-map

digital

bit-map

digital

bit-map

digital

bit-map

digital

bit-map

digital

bit-map

digital

bit-map

48-now

digital

Database Format (notes for staff)

For historical reasons, the lists of contents for the CREG journals are not stored in a relational database, but in plain text files (known in database terminology as flat-files), which are parsed and processed at run-time. The amount of data is so small that there is no particular need to 'upgrade' the storage to that of a relational database. A description of the database is as follows.

There is a separate database file for each journal. These are plain text 'flat files', not a relational database. The file names must be of the form j<issue-number>.html.

The files are historically plain text files, despite the HTML extension. However, from 27-Nov-2017, when any such file is accessed at localhost it is automatically processed to 'upgrade' its format by adding some HTML pre-amble and post-amble. This ensures that it is a valid HTML file and can be better interpreted by a browser although, of course, these raw data files are not intended to be read by humans anyway. Details of the processing (given as a 'reminder' to staff) are

Does file contain '<PRE>'? If not, add the HTML pre- and post-amble
Does file contain a comment line beginning '# htmlentities()'? If it does then adandon processing
Replace a double-quote outside html tags by its char entity
We dont like gratuitous use of curly quotes that might have been inserted historically. Replace them.
Use htmlentities() to replace 8-bit chars. Set the option NOT to replace single or double quotes; and make special provision for not replacing tag delimiters < and >. This operation could replace some valid &s inside tags, so make provision for that too.
Add a comment to the data file to say that this processing has taken place

The run-time processing begins as follows. The HTML pre- and post-amble is stripped out, along with any blank lines (i.e. with exactly zero characters between line-ending characters), and any lines beginning with the comment character #. The remaining text is assumed to be in the correct format for further processing, as follows...

Each record in the database comprises a pair of lines. The first line of the pair is the Title Record and it is followed by the Standfirst, or Abstract Record. Both these records MUST be present, but they MAY be separated by blank lines, or comment lines as noted above.

The records MUST be a single line of text (i.e. no line breaks) and should not contain any characters that are not 'safe' for HTML. That is, all symbols and accents, including ampersand, en-dash and quotation marks, SHOULD be converted into HTML Character Entities. Wherever possible Character Entities SHOULD NOT use a character code (that is, the entity should not begin with &#).

The pairs of records SHOULD be in page number order, with the Headline record first. This is not essential, though, except where multiple articles appear on the same page. In this situation, the articlesMUST be listed consecutively, to avoid confusing the inbuilt downloads counter. (Subsequent same-page articles are flagged with 'For download see previous item' and a single download counter used for them).

The Title Record contains the Title of the article, optionally followed by a cross-reference to the CREG Forum, optionally followed by page number range. Examples are...

CREG Journal 85 (0-24)
This is an example of a Headline record, which refers to an entire journal. This is indicated by the first number in the page range being zero, which causes the record to be treated specially.
RF Interference Caused by LED Lamps (21-24)
This is an example of a standard record. As with all records, the first page number (padded to three digits) is part of the name of the associated PDF. Everything after the (optional) hyphen is a comment, so page ranges like (21-24,6,9) are possible. If the page number range is missing this is not an error, but it means that there can be no reference to a PDF, because the naming of the PDF files depends on there being a known page number.
LED lighting: how the world has got brighter [cregf:viewtopic.php?f=27&t=1203]
Records like this are treated specially and cause a URL to be printed, referring the reader to an external resource. This could be an appendix to a CREG journal article (e.g. a software object file), or a reference to an author's web site. At the moment, only the identifier cregf is recognised. The construct causes a note to be printed along the lines of "Further reading on the CREG forum", together with a hyperlink comprised of the CREG forum URL, with its local part as specified in the construct. The text from the : to just before the ] is stripped out before the Title Record is further processed.

The Standfirst, or Abstract Record is usually just plain text. Like the Title record, it can contain valid HTML tags, but these SHOULD be avoided if possible as they cause problems during a search. If HTML tags are present they MUST be limited to tags that are valid within the construct<DT>Title</DT><DD>Abstract</DD>.

Naming of PDFs: jIIIPPP[.f].html where III is the three-digit issue number (padded if necessary), PPP is the three-digit page number of the first page of the article. It is padded if necessary and 000 is used to indicate a headline record that refers to an entire journal. The optional .f marks the file as free-issue. Note that because the PDF is referenced by the first page of the article it contains, articles that all begin on the same page WILL all reference the same PDF.

Sandbox: The data records can optionally be preceded by a single line beginning sandbox: (all lower-case; followed by a colon). This will cause the text following the string to be displayed on screen, and processing to be terminated at that point unless the URL's query string contains the parameter sandbox set to yes; that is ?sandbox=yes or ?j=99&sandbox=yes sort of thing, depending on context. This is intended to allow the data records to be tested by staff before making them publicly available. Typically the sandbox line might be...sandbox: This journal is now in-press and will be published shortly. It can be deleted after use, or edited so that it begins with the comment character #.

Database Links

The pages that list the CREG journal Bibliography are generated dynamically and are therefore not necessarily visible to all search engines, which sometimes do not process the query string. The list of links on this page was originally intended to ensure that search engines could find the contents of our database. It may no longer be necessary to do this.

Static links to raw data files...

Dynamic links to processed data...

This page, http://mail.bcra.org.uk/pub/cregj/database.html was last modified on Sat, 21 Jul 2018 16:11:05 +0100