Koha and the “magic” of XSLT : displaying new MARC fields on the OPAC.

A short tutorial on using XSLT to make MARC fields in your data visible on the OPAC.

Earlier today, Suresh Kumar Tejomurtula, a member of the venerable LIS-Forum mailing list being run out of IISc Bangalore, posted a question on that list:

Under language codes of 008/35-37 and also under 041$a I added language code. for eg: tel for Telugu language. But I do not see any difference to the view of the record, except that the marc tags contains that values. Adding these fields data will enable library team to understand the language of the material. My question is, How will the users of the OPAC, who do not know about marc language codes will understand the item language [from just the 3-letter code sequence].

Short answer : Using XSLT

Much of Koha’s superpowers on the OPAC (as also on the staff client) side comes from its judicious use of XSLT. When we search for documents in Koha, the result that is returned from the database by way of the various perl modules that perform much of Koha’s internal plumbing, comes in as an XML (eXtensible Markup Language) document. More precisely it is returned as a MARCXML record. Readers of this blog who are familiar with the MarcEdit software may have often encountered a MARCXML record. Those who are not so familiar may well like to read up a bit from here before proceeding with this post.

So what is XSLT?

Wikipedia gives the definition of XSLT as –

“XSLT (Extensible Stylesheet Language Transformations) is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG.”

Where to start?

At the OPAC level, the XSLT magic is primarily driven by two files i.e. (a) MARC21slim2OPACResults.xsl and (b) MARC21slim2OPACDetail.xsl. Koha’s default settings of how we see the search results on the OPAC and the document specific details in the Normal view, are defined in these two files.

N.B. Directly editing these two files is strictly not advised unless you are a XSLT guru.

Lucky for us that Koha’s system preferences provide options to override the defaults by creating new XSLT files and telling Koha to use the new ones instead. The screenshot below shows the default setting.

2017-05-04_03-05-15

The two files are available inside a folder named xslt under the locale of the selected theme you are using e.g. /usr/share/koha/opac/htdocs/opac-tmpl/bootstrap/en/xslt/ on a package-based installation. By default most of us in India would be using the English locale i.e. “en”. The default Koha theme is “bootstrap”. Thus if you are using a custom theme and/or different locale look for the files under that.

Step #1 : Copying the XSLT files

We will be fetching the 3-letter language code from MARC 008/35-37 of each marc record. Since our Koha instance here is named as “demo”, we’ll make a copy of the file MARC21slim2OPACResults.xsl as MARC21slim2OPACResults-demo.xsl and the MARC21slim2OPACDetail.xsl as MARC21slim2OPACDetail-demo.xsl.

N.B. You can give any name to the copies you make, but it is suggested that the less adventurous you are with the names, easier it will be for you to trace your steps back if you make a mistake while editing them. And trust me you will make mistakes, at least initially.

Step #2 : Editing MARC21slim2OPACResults-demo.xsl

The MARC21slim2OPACResults.xsl is what drives the display of results of a search on the Koha OPAC. And it does not show the value of 008/35-37 i.e. MARC language code, while displaying the search result. To add this facility, we need to do the following to our copy i.e. MARC21slim2OPACResults-demo.xsl. Around line no. 70 or there about, there will be this line <xsl:variable name="controlField008-30-31" select="substring($controlField008,31,2)"/>. Add the following line after it – <xsl:variable name="controlField008_35-37" select="substring($controlField008,36,3)"/>.

Explanation: we’re defining a new variable named controlField008_35-37 and in it we are storing the 3-digit value found at 008/35-37 from another variable – controlField008.

The second step in editing this file is to add the code to check if (a) 008/35-37 actually exists (e.g. if you have a blank 008 field, 008/35-37 won’t exist) and (b) it is set as something other than “und(i.e. undefined). The following code does that first and proceeds to match the value found in 008/35-37 against 23 different languages. These languages are English and the 22 official languages of India as per the Eighth Schedule of the Indian Constitution. You can add or remove any language from this list as you require. However, when you do, please make sure that the codes you setup in this list here are coming from the Code Sequence document of the MARC Code list for languages as published by the US Library of Congress.

This means that you must also be using these same exact codes in your MARC records in Koha.

2017-05-04_05-40-55

Where we place this code will depend on where we want this language to listed in the results. In our case, we decided to add it between Material type and Format. And thus in our case, the code was added at line no 637. If you too want language to be displayed between the Material Type and Format, then look out for this line <xsl:if test="string-length(normalize-space($physicalDescription))"> and add this block before.

Saved the file and moved on to the next section.

Step #3 : Editing MARC21slim2OPACDetail-demo.xsl

The process is similar to what we just did above. Open the file MARC21slim2OPACDetail-demo.xsl for editing and go to line no. 51 (or there about) and look for this line <xsl:variable name="controlField008" select="marc:controlfield[@tag=008]"/>. Add the following line immediately after that: <xsl:variable name="controlField008_35-37" select="substring($controlField008,36,3)"/>.

2017-05-04_06-02-17

Since we wanted the “Language of document” to come after Material type, we added the code immediate after the node <xsl:if test="$DisplayOPACiconsXSLT!='0'"> is closed but before <!--Series: Alternate Graphic Representation (MARC 880) -->; in our case, this worked out to be around line no. 195. Finally, we saved and closed the file.

Getting Koha to use the new XSLT files

The Koha system preference OPACXSLTResultsDisplay was changed from its original setting (i.e. “default”) and set to the path of our new file MARC21slim2OPACResults-demo.xsl i.e. /usr/share/koha/opac/htdocs/opac-tmpl/bootstrap/en/xslt/MARC21slim2OPACResults-demo.xsl. Likewise the other system preference OPACXSLTDetailsDisplay was changed to /usr/share/koha/opac/htdocs/opac-tmpl/bootstrap/en/xslt/MARC21slim2OPACDetail-demo.xsl. It was time to test our XSLT modifications.

And it works!

Since pictures are said to be worth a 1000 words, we will let before and after screen grabs do the talking here. Also if anyone wants to see directly how it actually looks after applying the changed XSLT, visit the URL http://demo-opac.l2c2academy.co.in/cgi-bin/koha/opac-detail.pl?biblionumber=18

Before : Using the default MARC21slim2OPACResults.xsl

chrome_2017-05-04_07-05-24

After : Using the new MARC21slim2OPACResults-demo.xsl

chrome_2017-05-04_07-17-09

Before : Using the default MARC21slim2OPACDetail.xsl

chrome_2017-05-04_07-48-28

After : Using the new MARC21slim2OPACDetail-demo.xsl

chrome_2017-05-04_07-49-09

Adding autocomplete support to MARC21 260 / 264 (imprint) fields in Koha

Adding auto-complete feature to Koha’s MARC21 260 / 264 field (Imprint – place, name of publisher, distributor etc) as a cataloging aid.

As per LC’s AACR2 (as well as RDA) instructions, the imprint information as captured in MARC21 field 260 (AACR2) and 264 (RDA) should be *transcribed* and not *recorded* using the principle – “Take What You See and Accept What You Get” [1]. As a result, the 260$a (place) and 260$b (publisher, distributor etc) are usually not handled as fields whose values are controlled using authorized values.

Up until 2001, the 260 field was a NR field. It was made repeatable to accomodate frequent publisher name changes.

Since the 260$a and 260$b are not usually guided by authorized values, it is noted (at least in the Indian sub continent context) that catalogers often make typographical errors while transcribing the data, e.g. “Pearson Education” may be inadvertantly added as “Peerson Education” or as even as “Pearshon education”, while “Kolkata” may have been entered both as “Kolkata” as well as “Kolkatta” or even as “Kolkhata”. Without authority control of the field, this cataloging quality check is often overlooked. Errors like this often end up affecting the result of advanced searches or custom SQL reports.

Luckily for us, Koha ILS uses jquery extensively while (via jquery-ui) provides for nice autocomplete widgets, like the ones we see in action when we type in part of the borrower’s name in patron search (checkout) or in the authoritiy headings search, where entering 3 characters triggers the AJAX based lookup with the option to select one of the offered list *OR* to type in our own.

AJAX stands for “Asynchronous Javascript and XML”. In simple terms it encompasses a set of web development technique that allows us to fetch and load data from a remote server into our currently open page, without requiring us to refresh / reload the page. [2]

Recently a client requested that we offer them a way to look up publisher names and place names (for field 260) already entered into their Koha instance, without having to type it all in every time. For example in their database they already had the following publisher names entered – “Pearson Education, Prentice Hall India, PacktPub, Press Trust of India” etc. Now they if they encountered an item that was from these publishers they should be able to pull up a list just by entering “P” into the 260$b field and then be able to select the one applicable. And if they encountered a publisher name say “Penguin Books”, they should be able to type it in as well.

Koha 16.11 ships with 3 (three) different ysearch.pl scripts that show us how to achieve this. You can find out which ones these are by using the command `locate ysearch.pl`. NOTE: You may be required to run `sudo updatedb` once before locate finds the files. For our requirement we modeled our script which we’ll call 260search.pl on /usr/share/koha/intranet/cgi-bin/cataloguing/ysearch.pl. You can grab a copy of 260search.pl from L2C2’s github repo here [3]. The script returns a JSON based result set if results matching your input is found.

Remember that **every** script that Koha executes, needs it executable bit set, and so does this one. Therefore, do *not* forget to set the executable bit for the script with `sudo chmod a+x /usr/share/koha/intranet/cgi-bin/cataloguing/260search.pl` before you proceed to the next step.

Step #2 : Enabling the fields

With the script in place, we now need to turn to IntranetUserJS system preference and enter the following jquery snippet to enable autocomplete in 260$a and 260$b :

$(document).ready(function(){
  $( '[id^="tag_260_subfield_a"]' ).autocomplete({
    source: function(request, response) {
      $.ajax({
        url: "/cgi-bin/koha/cataloguing/260search.pl",
        dataType: "json",
        data: {
          term: request.term,
          table: "biblioitems",
          field: "place"
        },
        success: function(data) {
          response( $.map( data, function( item ) {
            return {
              label: item.fieldvalue,
              value: item.fieldvalue
            }
          }));
        }
      });
    },
    minLength: 1,
  });
  $( '[id^="tag_260_subfield_b"]' ).autocomplete({
    source: function(request, response) {
      $.ajax({
        url: "/cgi-bin/koha/cataloguing/260search.pl",
        dataType: "json",
        data: {
          term: request.term,
          table: "biblioitems",
          field: "publishercode"
        },
        success: function(data) {
          response( $.map( data, function( item ) {
            return {
              label: item.fieldvalue,
              value: item.fieldvalue
            }
          }));
        }
      });
    },
    minLength: 1,
  });
});

A video of autocomplete in action

Conclusion

By tweaking the 260search.pl script or even by completely re-writing it to use the various search functions shipped by Koha inside its /usr/share/koha/lib directory on a .deb package based installation, you can do so much more than possible with this simple hack. Happy hacking! ūüôā [4]

References:

[1] http://www.loc.gov/catworkshop/RDA%20training%20materials/LC%20RDA%20Training/Module1IntroManifestItemsSept12.doc

[2] https://developer.mozilla.org/en-US/docs/AJAX/Getting_Started

[3] https://gist.github.com/l2c2technologies/7d0449dcb80c90880381ef4571003d1d

[4] http://catb.org/jargon/html/H/hack.html

JQuery tips for Koha : Adding easy to use indicator picklists

Adding picklists for selecting indicators for MARC tags used in Koha’s cataloging worksheets.

During data audits of users’ MARC21 data, quite frequently we find that most, if not all, records are often without any form of use of indicators. Trained library professionals often give a sheepish grin when asked why they didn’t add them while cataloging the documents. ūüėČ But trained librarians are not the only ones who work with Library systems like Koha. There are many people who find themselves working in a library without a formal training or sufficient theoretical background on MARC21. Generally speaking reasons for not adding the indicators range from:

  • Lack of practise – thus unsure of the correct indicator to use.
  • Lack of awareness – i.e. untrained people with a very basic knowledge of cataloging
  • Lack of user-friendly mechanism to input indicators
  • And lastly – sheer laziness

Now, about the last one we can’t do anything about, however the rest of the reasons might use a bit of leg-up! So here goes the newest tutorial on how to add easy-to-use picklists to help us correctly populate the indicators.

According to the Design Principles of MARC21, indicators form a part of the family of content designators [1]. As defined, an indicator is :

A data element associated with a data field that supplies additional information about the field. An indicator may be any ASCII lowercase alphabetic, numeric, or blank.

For this tutorial we will focus on MARC21 bibliographic data fields 100 and 110 i.e. Main Entry Personal Name and Main Entry Corporate Name respectively. We will not touch the Koha template files at all, rather as per the global best practice for Koha ILS, we will utilize only JQuery (JavaScript) and HTML via the Koha system preference IntranetUserJS.

Step #1 – Finding out the DOM nodes

We will start by going to Home > Cataloging > Add MARC record in Koha and select the framework we want to work on. In this case we chose to work with the “Default framework” that is shipped with Koha. We used Google Chrome’s Developer Tools Inspect option [2] to find out what is the id of the selector (DOM node) we need for Main Entry Personal Name.

Since we need space to setup the picklist we chose to use the free space available on the div that displays information about the field that follows immediate after it. As you can see in the image below that div has an id identifying it, which is very good for us, since it makes selecting the DOM node absolutely painless.

blog_01

It should noted that when Koha renders the cataloging interface, it suffixes the HTML element IDs with a random number (one for each new tag). In this case, the id was div_indicator_tag_100_838390 where “838390” is the random suffix number. We needed to latch on to the first part i.e. div_indicator_tag_100.

Step #2 – Let the JQuery magic work

We have to add the select dropdown picklists right after the text on the div_indicator_tag_XXX DIVs. The value we will use for the indicators will come from here and here respectively.

$(document).ready(function(){
if ( $("#cat_addbiblio") ) {	// only while adding biblios
  $('div[id^="div_indicator_tag_100"]').append(' <label for="tag_100_indicators">Apply Ind1, Ind2</label> <select id="tag_100_indicators"><option>-Select-</option><option value="1">1 - Surname</option><option value="0">0 - Forename</option><option value="3">3 - Family name</option></select>');
  $('div[id^="div_indicator_tag_110"]').append(' <label for="tag_110_indicators">Apply Ind1, Ind2</label> <select id="tag_110_indicators"><option>-Select-</option><option value="2">2 - Name in direct order</option><option value="0">0 - Inverted name</option><option value="1">1 - Jurisdiction name</option></select>'); };   // end if
});
});

blog_02

While that added the picklists, we still have to add the actual logic that will allow the indicators to be populated on selecting from the list. Again we will turn to JQuery for the following snippet:

$(document).ready(function(){
  $('#tag_100_indicators').click(function(){
    var what_clicked_100 = $('#tag_100_indicators').val();
    if ( !isNaN(what_clicked_100) ) {
      $('input[name^="tag_100_indicator1"]').val(what_clicked_100);
      $('input[name^="tag_100_indicator2"]').val("#");
    } else {
      $('input[name^="tag_100_indicator1"]').val("");
      $('input[name^="tag_100_indicator2"]').val("");
    }
  });
  $('#tag_110_indicators').click(function(){
    var what_clicked_110 = $('#tag_110_indicators').val();
    if ( !isNaN(what_clicked_110) ) {
      $('input[name^="tag_110_indicator1"]').val(what_clicked_110);
      $('input[name^="tag_110_indicator2"]').val("#");
    } else {
      $('input[name^="tag_110_indicator1"]').val("");
      $('input[name^="tag_110_indicator2"]').val("");
    }
  });
});

The code above is listening to see when we click and select a value from the picklists i.e. when we trigger a click JavaScript event. Next it checks if we had selected a real value OR whether we had just “clicked” on the placeholder “-Select-” option that no value. And lastly based on what we had selected it sets the ind1 and ind2 values according.

blog_04

Conclusion

In this manner we can add easy-to-use picklists for indicators. Since it is now only a matter of selecting from the available values, it also reduces significantly the scope for typographical errors during data entry into the indicator boxes. Before we leave for today, do note that the second code listing may be better handled as a JavaScript function to which the references are passed to by a handler hook. Doing so would make for a cleaner and leaner implementation of this concept especially if you are planning to set it up for all the non control MARC21 fields you use. Also, you may wish to implement the selected dropdown value check using something other than IsNan [3].

References

[1] “MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media – RECORD STRUCTURE (2000)” https://www.loc.gov/marc/specifications/specrecstruc.html

[2] “Chrome DevTools” – https://developers.google.com/web/tools/chrome-devtools/

[3] isNaN() – https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/isNaN

Customizing Koha’s MARC21 frameworks? Know the rules or get help!

Either you know what you are doing or take time to learn or invest in quality support. Fail on all three counts and you are quite literally asking for an operational nightmare.

Recently a young colleague Sri Ashkar K. from Thiruvananthapuram, Kerala (India) ran into a problem. He works as a librarian with Mathrubhumi, a major media house from Kerala. Specifically he needed to have a LMS solution to efficiently manage their collection of entertainment (mostly movies) related CDs and DVDs. For him the LMS was hosted Koha. However when he tried to issue an item i.e. a movie CD, he was stumped by this error every time:

Software error
No branchcode argument passed to koha::Calender->new at /var/koha_all/mathrubhumi/lib/C4/Circulation.pm line 3558

Being on a hosted Koha platform, he approached his service provider for support. He shared with them all the relevant screenshots leading to error detailed above.

The provider’s tech support could not identify the issue and instead informed him that they could perform checkouts (issue) without any errors. As Ashkar persisted, the service provider’s support desk asked him to provide remote desktop sharing using Teamviewer so that they could see “his problem” in action. Installing Teamviewer needed clearance from his IT department which required time and thus Ashkar’s checkout problem continued to linger. Finally about 10 days back he posted about it on the official Facebook page of Koha Library System Project, asking for suggestions to resolve it.

The first flag was raised by fellow Koha dev Mark Tompsett when he asked:

“/var/koha_all/mathrubhumi/lib/C4/Circulation.pm” — That is not a standard installation path. How did you install this? And what version?

Ashkar replied that since the software was hosted, he did not know the installation details. This got my attention! If he was on hosted Koha, why was he turning to the community for support? What was his service provider doing in the first place? I decided to find out more. That’s when I discovered the details of his situation. Desperate for help, he provided me with superlibrarian access to his hosted Koha’s staff client interface. I logged in and found that the problem was very real. In fact, I found out a few rather *disturbing* things.

The hosted Mathrubhumi Koha instance wasn’t running on the stable version (which is 16.05.05 at the time of writing) of Koha ILS. In fact, it was running on an unstable development version (at the time of this writing it was using Koha 16.0600023). Development versions are not GA releases and are *never* meant for production use, they are meant for use by testers and developers. And secondly, I could not do a MARC21 export for his bibliographic data.

That set alarm bells ringing in my head and so with Ashkar’s approval, I created a backup of his Koha database and installed the backup¬†on L2C2’s test server running the latest stable 16.05.x version.

The first clear indication of what was wrong came soon after running sudo koha-rebuild-zebra -v -f mathrubhumi successfully without any error. A wildcard search from both OPAC as well as the staff client failed to return a single result, even though the Zebra indexer and output logs showed no error. However, it was possible to access a record by directly accessing it by biblionumber.

Running the “MARC Bibliographic framework test” to check the MARC structure provided the answer. Sure enough there were two major errors as shown below:

homebranch NOT mapped the items.homebranch field MUST :

  • be mapped to a MARC subfield,
  • the corresponding subfield MUST have “Authorized value” set to “branches”
holdingbranch NOT mapped the items.holdingbranch field MUST :

  • be mapped to a MARC subfield,
  • the corresponding subfield MUST have “Authorized value” set to “branches”

The question now was to identify *which* MARC21 framework since he had three (03) of them.

marcframe1
Ashkar’s MARC21 frameworks

Checking the “MOVIES” framework, it was found that both 952 $a (homebranch) and $b (holdingbranch) were set to ignore¬†in the Managed in tab¬†dropdown. This explained the error displayed by “MARC Bibliographic framework test”. To know more about the the 952 MARC21 field in Koha, please read Holdings data fields (9xx) from the Koha community wiki.

The Fix

It was a simple matter of setting both 952 $a and $b to “items(10)” for the option Managed in tab. This took care of the “MARC Bibliographic framework test” error.

However, that was only the first part of the solution. Except two, none of his other 23 bibliographic records had their homebranch and holdingbranch defined. It was time for a batch item modification from the Tools page (Home > Tools). This has been covered in details in an earlier blog post – “Koha‚Äôs MARC modification templates comes to the rescue“, so if the topic sounds unfamiliar, it is suggested that you read that post first.

In order to find out all the barcodes that needed to be used to update the records, the following SQL report was used:

SELECT items.barcode 
  FROM items 
  LEFT JOIN biblioitems ON (items.biblioitemnumber=biblioitems.biblioitemnumber) 
  LEFT JOIN biblio ON (biblioitems.biblionumber=biblio.biblionumber);

With the list of barcodes in hand, it was time for the final steps:

  1. Load up barcodes for the records to be bulk modified
    marcframe4
  2. Select the two fields that we wanted to update – homebranch and holdingbranch
    marcframe2
  3. Select the actual branch option for both and click on Save
    marcframe3
And we were done! ūüôā

The Explanation

Understanding the error is quite simple if you know how circulation works inside Koha. A checkout operation needs to know a few basic things – (a) who owns the item; (b) where is the item presently located; (c) what to set as the issue and due dates and (d) who is taking it. Since the items attached to bibliographic records created using the MOVIES MARC21 framework did not have their homebranch and holdingbranch defined, at the time of checkout, as Koha tried to set the issue date and calculate the due date, using the date functions from the Koha::Calender object, it failed to do so. That’s what gave Ashkar his error and prevented him from checking out an item.

This still left one question unanswered – why did Ashkar’s hosting provider keep insisting that everything was working OK at their end and wanted him to provide them with Teamviewer access instead. My best guess is they were checking out the system using only the MARC21 frameworks which *they* had shipped i.e. default and fast add (FA) frameworks. Since records generated using these two frameworks (quite correctly) had 952 $a and $b set, none of these triggered Ashkar’s error during checkout. They certainly did not need Teamviewer access, the error in Ashkar’s framework should have been easily detected and quickly fixed. In fact, it took less than 3 mins to take care of it. But they failed, which is why it is important to either invest in your own skill development (read RTFM) OR invest in quality support.

“If you pay peanuts, you get monkeys” – James Goldsmith

Moral of the story: If you work with service providers whose front line tech-support is staffed with inexperienced people, be prepared for the long haul and self support yourself. Caveat Emptor!

Koha spine label is not printing the “/” in your call numbers? Here is why.

If you have defined DDC as your classification source and have a “/” in your Koha item call number, it is not going to be displayed when generating spine labels. If you are in hurry or you are aware of the segmentation mark, you can jump straight to the section The Answer.

The “Problem”

Earlier in the day a fellow user Dyuti Samanta came up with a question :

“Sir, I’m trying print spine labels from Koha. However I see that Koha does not print the the front slash (“/”) in my itemcallnumber, even though the same is recorded in my MARC record and is otherwise displayed by Koha elsewhere. For example, the “CHA / L” in “025.4 CHA / L” is being printed as “CHAL”. So where is the problem, how can I fix it?”

The Background

Dyuti’s question made me smile. And instead of immediately telling him about the “why” I pointed him to a comment left by Anamika Das on Vimal Kumar Vazaphally‘s blog post – “Spine label creation” saying “You are not alone with that question! ;-)”.

A call number typically consists of Dewey class number + book number i.e. Cutter number (or some other means of alphabetic arrangement). The frontslash “/” is deemed as a segmentation mark (ala prime mark of in C-I-P records) in the universe of Dewey Decimal Classification[1]. Up until DDC 22 published in 2003 [2], the slash or the prime mark was used to mark the start of every standard subdivision (notation from Table 1) as well as the end of the Abridged number. However, this rule changed from DDC 22 onwards (September 1, 2005 to be exact) and remains extant for the current edition i.e. DDC 23 published in 2011. The new rule has been that only *one* single segmentation mark may be used and that too only for marking the¬†end of the abridged number [3].

A prior and post example straight from Library of Congress

Before DDC 22 – 551.21/09797/84

DDC 22 onward – 551.210979/84

Further, if you follow LC and OCLC norms, while Dewey class number in MARC21 field 082 can definitely have (since Sep 1, 2005) a *single* segmentation mark, the call number should never have one. With this background story in place let’s look at Koha to understand what is happening here.

The Answer

The particular Koha code that has taken out the the slash from both Dyuti and Anamika’s call numbers resides in C4::Labels::Label Perl module which is located at /usr/share/koha/lib/C4/Labels/Label.pm. Even more specifically, it is the _split_ddcn subroutine in Label.pm that is taking out the “/“. As we have already noted, under LC rules, call numbers (unlike Dewey class numbers in 082) can’t have segmentation marks. Thus it takes out any “/” embedded in your call number while processing the spine label. Very specifically, it is this line in the _split_ddcn subroutine:¬†s/\///g; # in theory we should be able to simply remove all segmentation markers and arrive at the correct call number that does it. And just why does _split_ddcn get invoked? Well, it is because of something you did during cataloging, remember¬†that¬†you had recorded DDC as the classification schema? It is that definition in your MARC record that calls in this sub ūüėÄ

Below you can see the _split_ddcn subroutine as on date of this post.

sub _split_ddcn {
    my ($ddcn) = @_;
    $_ = $ddcn;
    s/\///g;   # in theory we should be able to simply remove all segmentation markers and arrive at the correct call number...
    my (@parts) = m/
        ^([-a-zA-Z]*\s?(?:$possible_decimal)?) # R220.3  CD-ROM 787.87 # will require extra splitting
        \s+
        (.+)                               # H2793Z H32 c.2 EAS # everything else (except bracketing spaces)
        \s*
        /x;
    unless (scalar @parts)  {
        warn sprintf('regexp failed to match string: %s', $_);
        push @parts, $_;     # if no match, just push the whole string.
    }

    if ($parts[0] =~ /^([-a-zA-Z]+)\s?($possible_decimal)$/) {
          shift @parts;         # pull off the mathching first element, like example 1
        unshift @parts, $1, $2; # replace it with the two pieces
    }

    push @parts, split /\s+/, pop @parts;   # split the last piece into an arbitrary number of pieces at spaces
    $debug and print STDERR "split_ddcn array: ", join(" | ", @parts), "\n";
    return @parts;
}

Note: The _split_ddcn was first submitted to the Koha codebase as part of C4::Labels::Label module by Chris Nighswonger on Jul 20, 2009, by which time the LC’s single segmentation mark rule was already long in place.

So now what?

There are a few options available to you at this point.

(a) If you know what you are doing, you can modify the _split_ddcn sub routine so that it does not discard the “/” and handles the call number as you want it to. (Non trivial and not recommended)

dontsplit

(b) Go¬†to “Manage Layouts” and editing your specific layout by un-checking the option¬†“Split call number“. If you do this then your call number will be printed AS-IS as a single line of text. This means, if the call number is longer that the size of your labels, as they will be at several point in time, you have a *problem*

(c) Keep an eye out to this bug report filed by Katrin Fisher from earlier this year, where she has said:

Currently the call number splitting seems to be mostly implemented for DDC and LC classifications. Those are both not very common in Germany and possibly other countries. A lot of our libraries use their own custom classification schemes so the call number splitting is something that should be individually configurable.

The bad new is that so far no one has responded to this bug, simply because to Koha developers servicing clients using LC / DDC, this is not a priority. So either you can wait with the hope that someone soon will attend to this bug OR you write this functionality yourself OR you sponsor a developer to write it for you.

(d) Take the item call number listing out of Koha as a CSV file and use a 3rd party tool, e.g, gLabels to generate your spine labels.

References:

[1] https://www.loc.gov/aba/dewey/segmentation.html

[2] Dewey_Decimal_Classification – Administration_and_publication

[3] “Sweet segment solution” from 025.431: The Dewey blog

A custom subject-wise report of titles with author name, no. of copies, subject name in serialized listing

A custom SQL report for Koha that generates subject wise title lists with author name, no. of copies, subject name and biblionumber, written in response to a reader query over email.

Last week Mr. Gautam Mukhopadhyay, Librarian, Chandrapur College in Burdwan, West Bengal wrote in with a request:

Respected Sir,

I’m writing this seeking a solution for the problem relating to a report generation from Koha.¬†I want to get a list of titles under a particular broader subject¬†field-tag¬†(650). Quite a number of times I’ve checked from SQL Report. But all were in vain as those were not the same what I actually want to get. Following is the specimen of the opted report:
Sr. No.   Title     Author   Copy No.    Subject
¬† 1 ¬† ¬† ¬† ¬† ………. ¬† ¬† X ¬† ¬† ¬† ¬† ¬†3 ¬† ¬† ¬† ¬† ¬† ¬† Bengali
Under the subject Bengali or English or whatsoever, I want to get the titles those are belong to that particular subject. However, it won’t be a problem if there are different reports for different subjects. It’s Ok. But the SQL Query should be a general query structure that can be applicable for all such reports on the titles belong to a broader subject like Bengali, History, Geography etc.
Sir, please let me know the query structure, if possible.
Regards,
GM

¬†Here is a possible solution for his request, which pretty much does what Mr. Mukhopadhyay had specified in his request.¬†In this example we’ll use a sample MARC21 file which can be downloaded from here to try out this example. This dataset has a 14 unique bibliographic records with a total of 42 item (holdings) record, belonging to 03 specific broader subjects i.e. English, Economics and Political Science. As per Mr. Mukhopadhyay’s use-case, the MARC field 650 holds the broader subject classification. However, to match real world scenarios the 650 fields in some of the cases have other subject headings defined including narrower terms. Also additionally we are going to add an additional column to our report – the biblionumber, so that if required we can cross check a title in the report generated against the biblionumber in the database.

CAVEAT EMPTOR: If you are going to try out this example, we suggest that you define a new Koha library and import this MARC file into it. Mixing this sample data with your existing records is strongly advised against.

Step #1 – Create a new Koha instance and set it up
We are going to use the koha-create Debian command to create a new Koha instance and we shall call our instance as demo.
sudo koha-create --create-db demo

You may calls your instance by whatever name you like.¬†If you are not aware of the koha-create command, please read up “Commands provided by the Debian packages“.¬†Next we will do a default setup and proceed to define a Library that we’ll call “L2C2 Technologies Demo Library” identified by the code “MAIN”, using these instructions here.

NB.: To use the marc file used in this example you must set the library code for your demo branch as “MAIN”, the name (of the library branch) can be whatever you want it to be.

Step #2 – Define a new Authorize value category
Since our example marc file has biblios with only (a) English (b) Economics and (c) Political science, we will define a new authorized value category which we’ll call as SUBLOOKUP, under¬†Home ‚Äļ Administration ‚Äļ Authorized values. Once setup our new authorized value SUBLOOKUP will look like this:
gmreport_02
This authorized value list will provide the¬†subject selection list for¬†our¬†custom SQL report. So if you have more subjects you will need to add them here so that they look like this. The “%” in the Authorized value is *critical*, and if you want to be really strict about it, you can drop the preceding “%” and retain only the one at the end. However should you do that, your first 650 field *must* always be the broader subject heading that you wish to filter your report on.
Step #3 – Define our custom SQL report
We will go to Home ‚Äļ Reports ‚Äļ Guided reports wizard ‚Äļ Create from SQL and create a new SQL report. In this case, we’ll name the report as “List title with number of copies filtered by subject”, add a note that says – “A report written at the request of Mr. Gautam Mukhopadhyay, Chandrapur College, BWN”. The SQL will be as given below. The report once¬†saved, will allow us to run it.
SELECT 
 (@row:=@row+1) AS `S/N`, 
 gmData.Title, 
 gmData.Author, 
 gmData.Copies, 
 REPLACE (@TargetSubject:=<<Select the subject|SUBLOOKUP>>, '%', '') AS Subject, 
 gmData.biblioid AS `Biblionumber` 
FROM 
 (SELECT
 biblio.title AS Title, 
 biblio.biblionumber as biblioid, 
 ExtractValue(biblioitems.marcxml,'//datafield[@tag="245"]/subfield[@code>="c"]') AS Author, 
 count(items.itemnumber) AS Copies, 
 ExtractValue(biblioitems.marcxml,'//datafield[@tag="650"]/subfield[@code>="a"]') AS Subject 
 FROM 
 items 
 LEFT JOIN 
 biblioitems on (items.biblioitemnumber=biblioitems.biblioitemnumber) 
 LEFT JOIN 
 biblio on (biblioitems.biblionumber=biblio.biblionumber) 
 GROUP BY 
 biblio.biblionumber 
 ORDER BY 
 biblio.biblionumber) as gmData, 
 (SELECT @row := 0) r 
 WHERE Subject LIKE <<Re-select the subject tag|SUBLOOKUP>>
Let us take a moment to understand what this piece of SQL syntax really means.
(@row:=@row+1) AS `S/N`,

and

(SELECT @row := 0) r

The use of the @row variable and the counter (@row:=@row+1) gives us our “serial number” column in the report listing. ¬†We can also see the authorized value list “SUBLOOKUP” that we had defined earlier referenced here in the SQL.

NOTE: As you may note, we are asking the user to select the subject *twice*, (first time: ‘Select the subject’ and second time: ‘Re-select the subject tag’).¬†While theoretically we should not be required to do so, thank to the use of the runtime variable @TargetSubject, in reality we ran into a type casting error (see below), thus we used this less than pretty way of asking the user to select the subject twice, to get our job done.

gmreport_03

Step #4 – Running the report

After the report is saved, it is now time to run it, using the “Run report” option. What we’ll see now will be like this:

gmreport_04

We need to select the *same* subject from both the drop-down lists and click on “Run the report” button. Selecting “Economics” we shall in our case get the following report:

gmreport_05

Step #5 – Prettifying the custom report user interface

Having the user to select the subject twice is cumbersome as well prone to human error, so we decided it is time for some jquery magic to streamline this and leave the users with one only a single drop-down to choose from. For this we’ll turn turn to the IntranetUserJS system preference and add the following jquery snippet:

 $("label[for='sql_params_Reselectthesubjecttag']").hide()
 $('#sql_params_Reselectthesubjecttag').hide();
 $("#sql_params_Selectthesubject").change(function() {
   var subval = $('#sql_params_Selectthesubject').val();
   $("#sql_params_Reselectthesubjecttag").val(subval);
 });

If this is the first time you are hearing about the IntranetUserJS system preference, you should definitely read up this. Those of you who are indeed familiar with IntranetUserJS, all we are doing here is to (1) hide the second subject selection dropdown and its label and then (2) we are defining that whenever the user chooses a value from the *first* drop-down, the second (and now hidden) drop-down should also have the same value selected automatically. After saving the IntranetUserJS update, on running the report we shall see this:

gmreport_06

And bingo! We are done!
Extraa Innings: To see the actual report in action
  1. Go to the URL https://demo-staff.l2c2academy.co.in/
  2. Use User name / Password : demo / demo
  3. Go to the section¬†Home ‚Äļ Reports ‚Äļ Guided reports wizard ‚Äļ Saved reports
  4. Select “Run” from the “Actions” dropdown at the right.gmreport_07
  5. Play with the subject selection options to see the different outcome.

 

 

MarcEdit QuickTip #3 – Getting your 952 (items / holdings data) field in place for import into Koha

Shows how to de-duplicate a .mrc file, by merging duplicate bibliographic records spread all over the file and then gather up the holdings record into repeatable 952 field that Koha expects its for item records.

Last night Pawan Sharma, a fellow user on “Koha Users” reached out for some help with importing his items into Koha. Like many other, he too had moved his catalog data from Microsoft Excel spreadsheet to MarcEdit utilizing MarcEdit’s “Delimited Text Translator” feature which at the end of the process had given him a .mrc file. This he proceeded to upload into Koha by using the More > Tools > Catalog > Stage MARC records for import¬†option.

There were no surprises here, *except* that for every single books with multiple copies Koha imported each of the copies as a separate biblio record, instead of a single entry for the biblio with multiple item records attached to it via Marc21 952 repeatable field that Koha uses for managing holdings data. Simply put his data needed to be de-duplicated with the holding data merged back before import, typically using the ISBN number of the records (MARC field 020).

NOTE: If you wish to read more about Koha’s holding records schema see “Holdings data fields (9xx)” from the Koha Community wiki.

For someone who has not done this before, MarcEdit’s de-duplication and then merging it can seem like a daunting task. This post will hopefully demystify the process.

The discussions on Koha Users were based on a lot of assumptions, especially with no idea about Pawan’s data. So, I offered to take a look at it. He first sent me a .mrc file that had 12806 records, which I immediately converted into MarcEdit’s MarcBreaker mnemonic, human readable format.

marcedit_01

And proceeded to take a “Field count” report (see under “Tools” menu of MarcEditor) to check exactly how many records had ISBN (MARC21 field 020) out of the total number of records.

marcedit_01A

The result as can be seen above – NOT A SINGLE ONE of the 12806 biblio records had an ISBN number! Well, this file can be de-duplicated and merged, but *not* using MarcEdit. Only being told about this Pawan mentioned that he had other .mrc files that had ISBN and so he sent a second .mrc (LG-32016-32979.mrc) file over. Turns out of the total of 965 biblio records in this second file, 828 records had ISBN numbers defined.

marcedit_02A

The¬†next task was to extract the¬†records that *had* ISBN numbers. The remaining 137 can not be dealt with in this process and will have to be dealt with separately.¬†For now, we closed the file LG-32016-32979.mrk file with 965 records and went back to the MarcEdit main window in order to use the “Delete Selected Records” option available under Tools > Select MARC Records

marcedit_02B

The next few steps are¬†simple, if not immediately apparent to a new user of MarcEdit. We’ll use the numbered markers on the screenshot to explain it in steps. First, we selected the LG-32016-32979.mrk file with the 965 records in step #1; next we typed in 020 (since we want to match for ISBN) in the Display Field option (by default it shows 245$a); third step was to click on “Import File” button. After the file is imported (takes just a second or two depending on your file size) this the top-left data grid which was blank so far, will show up data similar to this. Finally in step #4, we will click the “Does Not Match” link. Records that do not have an ISBN number will be selected just like the big red arrow here shows.

marcedit_02CThe last step is to click on “Delete Selected”, this will open a File Save dialog with the title “Remaining Records”.¬†In the case, we provided the name LG-32016-32979_ISBN.mrk¬†and saved it and exited from this deletion utility.

This file LG-32016-32979_ISBN.mrk now has the 828 records with ISBN numbers and each of which has a holding records. This is what we will work with for the deduplication process.

marcedit_02D

Using the Tools > Record Deduplication option of MarcEditor, we will now remove the duplicate records into a separate file and save it with the name LG-32016-32979_ISBN_DEDUP.mrk. We will use ISBN as the field to use to identify duplicates. A popup showed us that 828 records processes, so we are done with deduplication. We will also need to save our original work file LG-32016-32979_ISBN.mrk. This file now contains biblio records with unique ISBN number. A quick check with the Fields Count tool showed us there were now 523 records (down from 828 records originally, the rest 305 records are the duplicates that are now saved in LG-32016-32979_ISBN_DEDUP.mrk).

marcedit_02E

¬†Now for the next step MARC Merge, which was the last step in this process. We have to go back to the main MarcEdit window and use the menu option Tools > Merge Records. The order of files we specify here is highly¬†*important*. The “Source File” in this case was¬†LG-32016-32979_ISBN.mrk¬†(the file with the 523 records with unique ISBN numbers), the “Merge File” is¬†LG-32016-32979_ISBN_DEDUP.mrk (the file where we had removed the duplicates to in the previous step) and finally, “Save File” is simply the name of the new merged file we are going to create (Hint: this is the final file that we will push to Koha). We named the final file as¬†LG-32016-32979_ISBN_MERGED.mrk. The Record Identifier is of course 020 (i.e. ISBN number) and we move on the next screen.

marcedit_02F

This is next step is basically *everything* we have been working for in this post so far, we select the field to merge in from “Merge File” into the “Source File” and click next.

marcedit_02GIn this¬†case everything went well and we were presented with the following screen that said “Merge Completed” and gave us the full path and filename to our merged file¬†LG-32016-32979_ISBN_MERGED.mrk.

marcedit_02H

Of course we opened up the LG-32016-32979_ISBN_MERGED.mrk file in MarcEditor. The first thing was to check the Field Count report, and this is what we saw 523 biblio records with a total of 828 holding records, which sounds right! Below is example of the merged holdings.

marcedit_02I

Of course there is still the task of exporting the MarcBreaker (.mrk) back to .mrc so that Koha can ingest it for its MARC21 staging workflow, but everyone knows that ūüôā

NOTE: For reference to this tutorial I’m attaching the zip file containing all the¬†LG-32016-32979 files used in this example.

Setting up a MARC21 ETD Framework in Koha

Recently during a discussion on a Whatsapp group of professional colleagues from the North East, a topic that came up Рwhat was better suited for setting up an ETD repository at their academic institutions? As expected, most sided with DSpace, while a few suggested Eprints. I decided to introduce Koha ILS into the picture. For most, this was a rather surprising suggestion.

The ground reality

95% of Indian ETD operators do very little than scanning up a batch of printed document (bound thesis volumes) or born-digital electronic copy of the theses, make it into a PDF file, throw in some metadata together about the item and plug that into (usually) DSpace. The benefits of DSpace being statistics, organisation into collections and community (user groups), embargo capability, faceted and full text searches across the metadata. There is of course the other point of persistent URLs to the hosted resources via HANDLE system. But given that very few Indian repos actually invest in a Handle.net account, it is quite a moot point.

The argument for Koha

Utilizing MARC21, Koha already offers a highly granular and extensive metadata regime. Its capabilities include collections, search facets, full-text search on the metadata. And since Koha’s version 3.22, the capability to directly host files linked to the bibliographic records. Basically it offers everything that 95% of Indian institutions look for when they are planning to setup a repository. Compared to DSpace, LIS professionals are already better acquainted with Koha (or as many in India like to call it – “KOHA” :-P) as it is¬†the de-facto open source LMS. Further, by using Koha for hosting the repository, the institutions and professionals who are already using Koha as their LMS, gain one key advantage; they no longer need to maintain two separate and very dissimilar systems that use completely separate software application stacks (LAMP vs Java; MySQL vs PostgreSQL etc).

Step 1 Рbuilding a MARC21 ETD Framework

I had promised my colleagues to setup a demo ETD using Koha so that they could try it out. After all, there’s no better proof of the pudding than¬†eating it. The first challenge to that is the while Koha does ship with quite a few MARC21 frameworks, an ETD framework is not one of them.

As my starting point, I turned to ETD-MS v1.1, which is considered to be something of a gold standard for ETDs. Taking the help of this page, I worked out a single paged worksheet for cataloging ETDs  on Koha. The result looked like this:

rawmarc

Step  2- Usability and maintainability

Of course this posed a slight problem, how will a cataloger accustomed to / trained on DSpace’s metadata namespace correctly do the crosswalk? To solve this in lines with Koha’s recommended Best Practices for UI mods, jQuery statements were included into Koha’s¬†“intranetuserjs” and “intranetusercss” system preferences. The final outcome was this:

converted_marc

This approach has¬†three (03) key benefits ¬†– (a) that you DO NOT need to touch Koha’s¬†Template::Toolkit based¬†templates (i.e. the .tt files), the changes made are stored in your database and applied during runtime; (b) these changes remain¬†persistent and works across Koha’s monthly version upgrade cycle (since we didn’t change the .tt files) and (c) our other MARC21 templates are left alone, only the ETD framework is thus modified.

The jquery snippet is available on Github as a gist, as is the CSS includes into intranetusercss system preference. There is also a third gist which lists the 04 authorised value categories and their constituent sample options that help us introduce a consistent controlled vocabulary in our metadata.

The demo ETD is available at http://etddemo-opac.l2c2.co.in/. If you wish to access the staff client back-end, contact me via the comments section of this post.

Best wishes!

Resources:

  1. http://blog.l2c2.co.in/wp-content/uploads/2016/06/export_ETD.ods
  2. https://gist.github.com/l2c2technologies/09e7e06f695304f33aada9b529167de6
  3. https://gist.github.com/l2c2technologies/9bc1f9b812e37850959c655fbc0f8802
  4. https://gist.github.com/l2c2technologies/6b735a5e1f4c4f9cc3042af2b8fa5b32
Licensing- CC 4.0 – BY-SA-NC – (c) L2C2 Technologies 2016