JQuery tips for Koha : Adding easy to use indicator picklists

Adding picklists for selecting indicators for MARC tags used in Koha’s cataloging worksheets.

During data audits of users’ MARC21 data, quite frequently we find that most, if not all, records are often without any form of use of indicators. Trained library professionals often give a sheepish grin when asked why they didn’t add them while cataloging the documents. ūüėČ But trained librarians are not the only ones who work with Library systems like Koha. There are many people who find themselves working in a library without a formal training or sufficient theoretical background on MARC21. Generally speaking reasons for not adding the indicators range from:

  • Lack of practise – thus unsure of the correct indicator to use.
  • Lack of awareness – i.e. untrained people with a very basic knowledge of cataloging
  • Lack of user-friendly mechanism to input indicators
  • And lastly – sheer laziness

Now, about the last one we can’t do anything about, however the rest of the reasons might use a bit of leg-up! So here goes the newest tutorial on how to add easy-to-use picklists to help us correctly populate the indicators.

According to the Design Principles of MARC21, indicators form a part of the family of content designators [1]. As defined, an indicator is :

A data element associated with a data field that supplies additional information about the field. An indicator may be any ASCII lowercase alphabetic, numeric, or blank.

For this tutorial we will focus on MARC21 bibliographic data fields 100 and 110 i.e. Main Entry Personal Name and Main Entry Corporate Name respectively. We will not touch the Koha template files at all, rather as per the global best practice for Koha ILS, we will utilize only JQuery (JavaScript) and HTML via the Koha system preference IntranetUserJS.

Step #1 – Finding out the DOM nodes

We will start by going to Home > Cataloging > Add MARC record in Koha and select the framework we want to work on. In this case we chose to work with the “Default framework” that is shipped with Koha. We used Google Chrome’s Developer Tools Inspect option [2] to find out what is the id of the selector (DOM node) we need for Main Entry Personal Name.

Since we need space to setup the picklist we chose to use the free space available on the div that displays information about the field that follows immediate after it. As you can see in the image below that div has an id identifying it, which is very good for us, since it makes selecting the DOM node absolutely painless.

blog_01

It should noted that when Koha renders the cataloging interface, it suffixes the HTML element IDs with a random number (one for each new tag). In this case, the id was div_indicator_tag_100_838390 where “838390” is the random suffix number. We needed to latch on to the first part i.e. div_indicator_tag_100.

Step #2 – Let the JQuery magic work

We have to add the select dropdown picklists right after the text on the div_indicator_tag_XXX DIVs. The value we will use for the indicators will come from here and here respectively.

$(document).ready(function(){
if ( $("#cat_addbiblio") ) {	// only while adding biblios
  $('div[id^="div_indicator_tag_100"]').append(' <label for="tag_100_indicators">Apply Ind1, Ind2</label> <select id="tag_100_indicators"><option>-Select-</option><option value="1">1 - Surname</option><option value="0">0 - Forename</option><option value="3">3 - Family name</option></select>');
  $('div[id^="div_indicator_tag_110"]').append(' <label for="tag_110_indicators">Apply Ind1, Ind2</label> <select id="tag_110_indicators"><option>-Select-</option><option value="2">2 - Name in direct order</option><option value="0">0 - Inverted name</option><option value="1">1 - Jurisdiction name</option></select>'); };   // end if
});
});

blog_02

While that added the picklists, we still have to add the actual logic that will allow the indicators to be populated on selecting from the list. Again we will turn to JQuery for the following snippet:

$(document).ready(function(){
  $('#tag_100_indicators').click(function(){
    var what_clicked_100 = $('#tag_100_indicators').val();
    if ( !isNaN(what_clicked_100) ) {
      $('input[name^="tag_100_indicator1"]').val(what_clicked_100);
      $('input[name^="tag_100_indicator2"]').val("#");
    } else {
      $('input[name^="tag_100_indicator1"]').val("");
      $('input[name^="tag_100_indicator2"]').val("");
    }
  });
  $('#tag_110_indicators').click(function(){
    var what_clicked_110 = $('#tag_110_indicators').val();
    if ( !isNaN(what_clicked_110) ) {
      $('input[name^="tag_110_indicator1"]').val(what_clicked_110);
      $('input[name^="tag_110_indicator2"]').val("#");
    } else {
      $('input[name^="tag_110_indicator1"]').val("");
      $('input[name^="tag_110_indicator2"]').val("");
    }
  });
});

The code above is listening to see when we click and select a value from the picklists i.e. when we trigger a click JavaScript event. Next it checks if we had selected a real value OR whether we had just “clicked” on the placeholder “-Select-” option that no value. And lastly based on what we had selected it sets the ind1 and ind2 values according.

blog_04

Conclusion

In this manner we can add easy-to-use picklists for indicators. Since it is now only a matter of selecting from the available values, it also reduces significantly the scope for typographical errors during data entry into the indicator boxes. Before we leave for today, do note that the second code listing may be better handled as a JavaScript function to which the references are passed to by a handler hook. Doing so would make for a cleaner and leaner implementation of this concept especially if you are planning to set it up for all the non control MARC21 fields you use. Also, you may wish to implement the selected dropdown value check using something other than IsNan [3].

References

[1] “MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media – RECORD STRUCTURE (2000)” https://www.loc.gov/marc/specifications/specrecstruc.html

[2] “Chrome DevTools” – https://developers.google.com/web/tools/chrome-devtools/

[3] isNaN() – https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/isNaN

Koha spine label is not printing the “/” in your call numbers? Here is why.

If you have defined DDC as your classification source and have a “/” in your Koha item call number, it is not going to be displayed when generating spine labels. If you are in hurry or you are aware of the segmentation mark, you can jump straight to the section The Answer.

The “Problem”

Earlier in the day a fellow user Dyuti Samanta came up with a question :

“Sir, I’m trying print spine labels from Koha. However I see that Koha does not print the the front slash (“/”) in my itemcallnumber, even though the same is recorded in my MARC record and is otherwise displayed by Koha elsewhere. For example, the “CHA / L” in “025.4 CHA / L” is being printed as “CHAL”. So where is the problem, how can I fix it?”

The Background

Dyuti’s question made me smile. And instead of immediately telling him about the “why” I pointed him to a comment left by Anamika Das on Vimal Kumar Vazaphally‘s blog post – “Spine label creation” saying “You are not alone with that question! ;-)”.

A call number typically consists of Dewey class number + book number i.e. Cutter number (or some other means of alphabetic arrangement). The frontslash “/” is deemed as a segmentation mark (ala prime mark of in C-I-P records) in the universe of Dewey Decimal Classification[1]. Up until DDC 22 published in 2003 [2], the slash or the prime mark was used to mark the start of every standard subdivision (notation from Table 1) as well as the end of the Abridged number. However, this rule changed from DDC 22 onwards (September 1, 2005 to be exact) and remains extant for the current edition i.e. DDC 23 published in 2011. The new rule has been that only *one* single segmentation mark may be used and that too only for marking the¬†end of the abridged number [3].

A prior and post example straight from Library of Congress

Before DDC 22 – 551.21/09797/84

DDC 22 onward – 551.210979/84

Further, if you follow LC and OCLC norms, while Dewey class number in MARC21 field 082 can definitely have (since Sep 1, 2005) a *single* segmentation mark, the call number should never have one. With this background story in place let’s look at Koha to understand what is happening here.

The Answer

The particular Koha code that has taken out the the slash from both Dyuti and Anamika’s call numbers resides in C4::Labels::Label Perl module which is located at /usr/share/koha/lib/C4/Labels/Label.pm. Even more specifically, it is the _split_ddcn subroutine in Label.pm that is taking out the “/“. As we have already noted, under LC rules, call numbers (unlike Dewey class numbers in 082) can’t have segmentation marks. Thus it takes out any “/” embedded in your call number while processing the spine label. Very specifically, it is this line in the _split_ddcn subroutine:¬†s/\///g; # in theory we should be able to simply remove all segmentation markers and arrive at the correct call number that does it. And just why does _split_ddcn get invoked? Well, it is because of something you did during cataloging, remember¬†that¬†you had recorded DDC as the classification schema? It is that definition in your MARC record that calls in this sub ūüėÄ

Below you can see the _split_ddcn subroutine as on date of this post.

sub _split_ddcn {
    my ($ddcn) = @_;
    $_ = $ddcn;
    s/\///g;   # in theory we should be able to simply remove all segmentation markers and arrive at the correct call number...
    my (@parts) = m/
        ^([-a-zA-Z]*\s?(?:$possible_decimal)?) # R220.3  CD-ROM 787.87 # will require extra splitting
        \s+
        (.+)                               # H2793Z H32 c.2 EAS # everything else (except bracketing spaces)
        \s*
        /x;
    unless (scalar @parts)  {
        warn sprintf('regexp failed to match string: %s', $_);
        push @parts, $_;     # if no match, just push the whole string.
    }

    if ($parts[0] =~ /^([-a-zA-Z]+)\s?($possible_decimal)$/) {
          shift @parts;         # pull off the mathching first element, like example 1
        unshift @parts, $1, $2; # replace it with the two pieces
    }

    push @parts, split /\s+/, pop @parts;   # split the last piece into an arbitrary number of pieces at spaces
    $debug and print STDERR "split_ddcn array: ", join(" | ", @parts), "\n";
    return @parts;
}

Note: The _split_ddcn was first submitted to the Koha codebase as part of C4::Labels::Label module by Chris Nighswonger on Jul 20, 2009, by which time the LC’s single segmentation mark rule was already long in place.

So now what?

There are a few options available to you at this point.

(a) If you know what you are doing, you can modify the _split_ddcn sub routine so that it does not discard the “/” and handles the call number as you want it to. (Non trivial and not recommended)

dontsplit

(b) Go¬†to “Manage Layouts” and editing your specific layout by un-checking the option¬†“Split call number“. If you do this then your call number will be printed AS-IS as a single line of text. This means, if the call number is longer that the size of your labels, as they will be at several point in time, you have a *problem*

(c) Keep an eye out to this bug report filed by Katrin Fisher from earlier this year, where she has said:

Currently the call number splitting seems to be mostly implemented for DDC and LC classifications. Those are both not very common in Germany and possibly other countries. A lot of our libraries use their own custom classification schemes so the call number splitting is something that should be individually configurable.

The bad new is that so far no one has responded to this bug, simply because to Koha developers servicing clients using LC / DDC, this is not a priority. So either you can wait with the hope that someone soon will attend to this bug OR you write this functionality yourself OR you sponsor a developer to write it for you.

(d) Take the item call number listing out of Koha as a CSV file and use a 3rd party tool, e.g, gLabels to generate your spine labels.

References:

[1] https://www.loc.gov/aba/dewey/segmentation.html

[2] Dewey_Decimal_Classification – Administration_and_publication

[3] “Sweet segment solution” from 025.431: The Dewey blog

A custom subject-wise report of titles with author name, no. of copies, subject name in serialized listing

A custom SQL report for Koha that generates subject wise title lists with author name, no. of copies, subject name and biblionumber, written in response to a reader query over email.

Last week Mr. Gautam Mukhopadhyay, Librarian, Chandrapur College in Burdwan, West Bengal wrote in with a request:

Respected Sir,

I’m writing this seeking a solution for the problem relating to a report generation from Koha.¬†I want to get a list of titles under a particular broader subject¬†field-tag¬†(650). Quite a number of times I’ve checked from SQL Report. But all were in vain as those were not the same what I actually want to get. Following is the specimen of the opted report:
Sr. No.   Title     Author   Copy No.    Subject
¬† 1 ¬† ¬† ¬† ¬† ………. ¬† ¬† X ¬† ¬† ¬† ¬† ¬†3 ¬† ¬† ¬† ¬† ¬† ¬† Bengali
Under the subject Bengali or English or whatsoever, I want to get the titles those are belong to that particular subject. However, it won’t be a problem if there are different reports for different subjects. It’s Ok. But the SQL Query should be a general query structure that can be applicable for all such reports on the titles belong to a broader subject like Bengali, History, Geography etc.
Sir, please let me know the query structure, if possible.
Regards,
GM

¬†Here is a possible solution for his request, which pretty much does what Mr. Mukhopadhyay had specified in his request.¬†In this example we’ll use a sample MARC21 file which can be downloaded from here to try out this example. This dataset has a 14 unique bibliographic records with a total of 42 item (holdings) record, belonging to 03 specific broader subjects i.e. English, Economics and Political Science. As per Mr. Mukhopadhyay’s use-case, the MARC field 650 holds the broader subject classification. However, to match real world scenarios the 650 fields in some of the cases have other subject headings defined including narrower terms. Also additionally we are going to add an additional column to our report – the biblionumber, so that if required we can cross check a title in the report generated against the biblionumber in the database.

CAVEAT EMPTOR: If you are going to try out this example, we suggest that you define a new Koha library and import this MARC file into it. Mixing this sample data with your existing records is strongly advised against.

Step #1 – Create a new Koha instance and set it up
We are going to use the koha-create Debian command to create a new Koha instance and we shall call our instance as demo.
sudo koha-create --create-db demo

You may calls your instance by whatever name you like.¬†If you are not aware of the koha-create command, please read up “Commands provided by the Debian packages“.¬†Next we will do a default setup and proceed to define a Library that we’ll call “L2C2 Technologies Demo Library” identified by the code “MAIN”, using these instructions here.

NB.: To use the marc file used in this example you must set the library code for your demo branch as “MAIN”, the name (of the library branch) can be whatever you want it to be.

Step #2 – Define a new Authorize value category
Since our example marc file has biblios with only (a) English (b) Economics and (c) Political science, we will define a new authorized value category which we’ll call as SUBLOOKUP, under¬†Home ‚Äļ Administration ‚Äļ Authorized values. Once setup our new authorized value SUBLOOKUP will look like this:
gmreport_02
This authorized value list will provide the¬†subject selection list for¬†our¬†custom SQL report. So if you have more subjects you will need to add them here so that they look like this. The “%” in the Authorized value is *critical*, and if you want to be really strict about it, you can drop the preceding “%” and retain only the one at the end. However should you do that, your first 650 field *must* always be the broader subject heading that you wish to filter your report on.
Step #3 – Define our custom SQL report
We will go to Home ‚Äļ Reports ‚Äļ Guided reports wizard ‚Äļ Create from SQL and create a new SQL report. In this case, we’ll name the report as “List title with number of copies filtered by subject”, add a note that says – “A report written at the request of Mr. Gautam Mukhopadhyay, Chandrapur College, BWN”. The SQL will be as given below. The report once¬†saved, will allow us to run it.
SELECT 
 (@row:=@row+1) AS `S/N`, 
 gmData.Title, 
 gmData.Author, 
 gmData.Copies, 
 REPLACE (@TargetSubject:=<<Select the subject|SUBLOOKUP>>, '%', '') AS Subject, 
 gmData.biblioid AS `Biblionumber` 
FROM 
 (SELECT
 biblio.title AS Title, 
 biblio.biblionumber as biblioid, 
 ExtractValue(biblioitems.marcxml,'//datafield[@tag="245"]/subfield[@code>="c"]') AS Author, 
 count(items.itemnumber) AS Copies, 
 ExtractValue(biblioitems.marcxml,'//datafield[@tag="650"]/subfield[@code>="a"]') AS Subject 
 FROM 
 items 
 LEFT JOIN 
 biblioitems on (items.biblioitemnumber=biblioitems.biblioitemnumber) 
 LEFT JOIN 
 biblio on (biblioitems.biblionumber=biblio.biblionumber) 
 GROUP BY 
 biblio.biblionumber 
 ORDER BY 
 biblio.biblionumber) as gmData, 
 (SELECT @row := 0) r 
 WHERE Subject LIKE <<Re-select the subject tag|SUBLOOKUP>>
Let us take a moment to understand what this piece of SQL syntax really means.
(@row:=@row+1) AS `S/N`,

and

(SELECT @row := 0) r

The use of the @row variable and the counter (@row:=@row+1) gives us our “serial number” column in the report listing. ¬†We can also see the authorized value list “SUBLOOKUP” that we had defined earlier referenced here in the SQL.

NOTE: As you may note, we are asking the user to select the subject *twice*, (first time: ‘Select the subject’ and second time: ‘Re-select the subject tag’).¬†While theoretically we should not be required to do so, thank to the use of the runtime variable @TargetSubject, in reality we ran into a type casting error (see below), thus we used this less than pretty way of asking the user to select the subject twice, to get our job done.

gmreport_03

Step #4 – Running the report

After the report is saved, it is now time to run it, using the “Run report” option. What we’ll see now will be like this:

gmreport_04

We need to select the *same* subject from both the drop-down lists and click on “Run the report” button. Selecting “Economics” we shall in our case get the following report:

gmreport_05

Step #5 – Prettifying the custom report user interface

Having the user to select the subject twice is cumbersome as well prone to human error, so we decided it is time for some jquery magic to streamline this and leave the users with one only a single drop-down to choose from. For this we’ll turn turn to the IntranetUserJS system preference and add the following jquery snippet:

 $("label[for='sql_params_Reselectthesubjecttag']").hide()
 $('#sql_params_Reselectthesubjecttag').hide();
 $("#sql_params_Selectthesubject").change(function() {
   var subval = $('#sql_params_Selectthesubject').val();
   $("#sql_params_Reselectthesubjecttag").val(subval);
 });

If this is the first time you are hearing about the IntranetUserJS system preference, you should definitely read up this. Those of you who are indeed familiar with IntranetUserJS, all we are doing here is to (1) hide the second subject selection dropdown and its label and then (2) we are defining that whenever the user chooses a value from the *first* drop-down, the second (and now hidden) drop-down should also have the same value selected automatically. After saving the IntranetUserJS update, on running the report we shall see this:

gmreport_06

And bingo! We are done!
Extraa Innings: To see the actual report in action
  1. Go to the URL https://demo-staff.l2c2academy.co.in/
  2. Use User name / Password : demo / demo
  3. Go to the section¬†Home ‚Äļ Reports ‚Äļ Guided reports wizard ‚Äļ Saved reports
  4. Select “Run” from the “Actions” dropdown at the right.gmreport_07
  5. Play with the subject selection options to see the different outcome.

 

 

MarcEdit QuickTip #3 – Getting your 952 (items / holdings data) field in place for import into Koha

Shows how to de-duplicate a .mrc file, by merging duplicate bibliographic records spread all over the file and then gather up the holdings record into repeatable 952 field that Koha expects its for item records.

Last night Pawan Sharma, a fellow user on “Koha Users” reached out for some help with importing his items into Koha. Like many other, he too had moved his catalog data from Microsoft Excel spreadsheet to MarcEdit utilizing MarcEdit’s “Delimited Text Translator” feature which at the end of the process had given him a .mrc file. This he proceeded to upload into Koha by using the More > Tools > Catalog > Stage MARC records for import¬†option.

There were no surprises here, *except* that for every single books with multiple copies Koha imported each of the copies as a separate biblio record, instead of a single entry for the biblio with multiple item records attached to it via Marc21 952 repeatable field that Koha uses for managing holdings data. Simply put his data needed to be de-duplicated with the holding data merged back before import, typically using the ISBN number of the records (MARC field 020).

NOTE: If you wish to read more about Koha’s holding records schema see “Holdings data fields (9xx)” from the Koha Community wiki.

For someone who has not done this before, MarcEdit’s de-duplication and then merging it can seem like a daunting task. This post will hopefully demystify the process.

The discussions on Koha Users were based on a lot of assumptions, especially with no idea about Pawan’s data. So, I offered to take a look at it. He first sent me a .mrc file that had 12806 records, which I immediately converted into MarcEdit’s MarcBreaker mnemonic, human readable format.

marcedit_01

And proceeded to take a “Field count” report (see under “Tools” menu of MarcEditor) to check exactly how many records had ISBN (MARC21 field 020) out of the total number of records.

marcedit_01A

The result as can be seen above – NOT A SINGLE ONE of the 12806 biblio records had an ISBN number! Well, this file can be de-duplicated and merged, but *not* using MarcEdit. Only being told about this Pawan mentioned that he had other .mrc files that had ISBN and so he sent a second .mrc (LG-32016-32979.mrc) file over. Turns out of the total of 965 biblio records in this second file, 828 records had ISBN numbers defined.

marcedit_02A

The¬†next task was to extract the¬†records that *had* ISBN numbers. The remaining 137 can not be dealt with in this process and will have to be dealt with separately.¬†For now, we closed the file LG-32016-32979.mrk file with 965 records and went back to the MarcEdit main window in order to use the “Delete Selected Records” option available under Tools > Select MARC Records

marcedit_02B

The next few steps are¬†simple, if not immediately apparent to a new user of MarcEdit. We’ll use the numbered markers on the screenshot to explain it in steps. First, we selected the LG-32016-32979.mrk file with the 965 records in step #1; next we typed in 020 (since we want to match for ISBN) in the Display Field option (by default it shows 245$a); third step was to click on “Import File” button. After the file is imported (takes just a second or two depending on your file size) this the top-left data grid which was blank so far, will show up data similar to this. Finally in step #4, we will click the “Does Not Match” link. Records that do not have an ISBN number will be selected just like the big red arrow here shows.

marcedit_02CThe last step is to click on “Delete Selected”, this will open a File Save dialog with the title “Remaining Records”.¬†In the case, we provided the name LG-32016-32979_ISBN.mrk¬†and saved it and exited from this deletion utility.

This file LG-32016-32979_ISBN.mrk now has the 828 records with ISBN numbers and each of which has a holding records. This is what we will work with for the deduplication process.

marcedit_02D

Using the Tools > Record Deduplication option of MarcEditor, we will now remove the duplicate records into a separate file and save it with the name LG-32016-32979_ISBN_DEDUP.mrk. We will use ISBN as the field to use to identify duplicates. A popup showed us that 828 records processes, so we are done with deduplication. We will also need to save our original work file LG-32016-32979_ISBN.mrk. This file now contains biblio records with unique ISBN number. A quick check with the Fields Count tool showed us there were now 523 records (down from 828 records originally, the rest 305 records are the duplicates that are now saved in LG-32016-32979_ISBN_DEDUP.mrk).

marcedit_02E

¬†Now for the next step MARC Merge, which was the last step in this process. We have to go back to the main MarcEdit window and use the menu option Tools > Merge Records. The order of files we specify here is highly¬†*important*. The “Source File” in this case was¬†LG-32016-32979_ISBN.mrk¬†(the file with the 523 records with unique ISBN numbers), the “Merge File” is¬†LG-32016-32979_ISBN_DEDUP.mrk (the file where we had removed the duplicates to in the previous step) and finally, “Save File” is simply the name of the new merged file we are going to create (Hint: this is the final file that we will push to Koha). We named the final file as¬†LG-32016-32979_ISBN_MERGED.mrk. The Record Identifier is of course 020 (i.e. ISBN number) and we move on the next screen.

marcedit_02F

This is next step is basically *everything* we have been working for in this post so far, we select the field to merge in from “Merge File” into the “Source File” and click next.

marcedit_02GIn this¬†case everything went well and we were presented with the following screen that said “Merge Completed” and gave us the full path and filename to our merged file¬†LG-32016-32979_ISBN_MERGED.mrk.

marcedit_02H

Of course we opened up the LG-32016-32979_ISBN_MERGED.mrk file in MarcEditor. The first thing was to check the Field Count report, and this is what we saw 523 biblio records with a total of 828 holding records, which sounds right! Below is example of the merged holdings.

marcedit_02I

Of course there is still the task of exporting the MarcBreaker (.mrk) back to .mrc so that Koha can ingest it for its MARC21 staging workflow, but everyone knows that ūüôā

NOTE: For reference to this tutorial I’m attaching the zip file containing all the¬†LG-32016-32979 files used in this example.

Migration tips #1 : Separating your 100, 110 and 111s in the spreadsheet

Imagine the following scenario:¬† Over the last several months you have been steadily entering your library’s bibliographic data, either from the accession register or by accessing actual physical copies at your library into a spreadsheet. You have managed to create some ~23,000 records and now wish to import these into your favourite ILS i.e. Koha.

While working, in the spreadsheet you had made a single column for author information and recorded the name in the <lastname>, <firstname> style for the personal name entries (i.e. MARC21 tag 100). However you have also used the same column to capture corporate (110) or meeting (111) name entries as well.

110

So now how do you pick out these non-100 records, the 110 and 111s from among the ~23,000 records so that you do not leave any corporate body entry in the personal name field? A manual curation is possible. But it is simply too error prone and hugely time consuming.

Luckily since you have used the <lastname> comma <firstname> style consistently (or almost consistently) for all personal name entries you can use your Libreoffice Calc spreadsheet to do some magic. ūüėÄ

Step 1: A formula to identify possible corporate entries.

As we see in the example above, Calcutta University has been entered AS-IS and not as University, Calcutta so let us look for records that *do not* have any commas in them. [Sidenote: we will hit some false positives, but the magnitude of the problem will be less that searching all 23,000 records.

And for that we enter the following formula in cell B2 (where A2 is our first record and A1 being the header row):

=IF((ISBLANK(A2)),"NO",IF((ISNUMBER(FIND(",",A2))),"NO", "YES"))

The first part is just a safety check to ensure that our cell is not blank. For course, if¬†it is blank then obviously it can not be a corporate body entry so the answer we want in B2 is “NO”. Once we know it is not indeed blank, we move in to check if we can find our “comma”¬† in the cell. If the comma is present then it is assumed to be a personal name entry, hence also “NO”, and if not found *and* not blank then it is safe to assume it is most likely a corporate name (110) or a meeting name (111).

Lets see what happens when we apply this formula to all our ~23,000 records.

110_1

N.B. The “false positive” happened as the person doing the data entry did not format the name of “Mahasweta Devi” as “Devi, Mahasweta”

Step 2: Always count the eggs in your basket

We had started with exactly¬†22,959 records, let us see how many “YES” records our formula has found: 941. We have narrowed our search down to just about 4% of the total records.

110_4

Step 3: Filtering out the “YES” records

Luckily for us, LibreOffice Calc has a nice filtering tool. It is available under Data -> Filter -> Standard filter menu option. We had earlier named our column as “boolean” in the cell B1. So we’ll now filter out all 941 records that marked as “YES” using this tool.

110_2

And immediately our spreadsheet will show us only the “YES” records like this:

110_5

Step 4: Removing the “YES” from false positives

The false positives have a story to tell. They tell us that we need to do better quality control of our data entry. We also probably need to ensure that the persons entering the data understand how to handle Muslim or foreign names.

Example:¬†Let us take the name of¬†my good friend and Koha expert Mr¬†Vimal Kumar Vazhappally. Now the correct way to address him is as Mr. Vimal Kumar and *not* as Mr.¬†Vazhappally. Vazhappally isn’t really his surname, rather it is the name of his village.

For now, the simplest way to correct the false positive is simply to visual check the A column and if it is apparent that the corresponding cell in B column wrongly has a “YES”, simply to move to the cell in B and¬†simply delete the formula from that cell.

Instead of looking through 22,959 records, we now are going to check less than a 1000 records, but this time looking only for false positives.

110_3

Step 5: Two formula to separate the 100s and 110s

After we remove the B column cell values of the false positives – ones with “YES” but not a corporate entity, we now need two more columns to our spreadsheet. These will hold our separated 100 and 110 / 111 values.

So in order to copy over the values of corporate / meeting names (110 / 111) to the C column, we will define the following formula in the cell C2:

=IF((B2="YES"),A2,"")

And similarly in the next column to the right, in the cell D2 we will define this fomula:

=IF(B2="YES","",A2)

Next we will simply copy the range C2:D2 over to the entire range C3:D22959 and we are DONE!

110_6

Columns C and D now have our separated records for all 22959 records. And it took me less time to do the actual correction, than it took me write this blog post, take screenshots, crop, annotate, upload and proof read the final post.

Closing words

We found ~350+ corporate body / meeting name entries in the list, which could be separated out of 22,959 records using this technique.

 

Exporting your LibraryWorld bibliographic data to MARC21

Steps to export your LW data to MARC21 for import into Koha.

LibraryWorld (LW) is a hosted (SaaS) proprietary integrated Library system developed and promoted by San Jose, California based Libraryworld Inc. According to their website they charge USD 495 annually for a hosted instance of LW. So earlier in the day when Mr. Sonam Rinzee from Bhutan requested for some help in exporting his library’s catalog data, it presented the perfect opportunity for L2C2 Technologies to tinker with something new. These screenshots describe how it was done.

Step #1 : Access the LW website

lw_01

Step #2 : Login into the admin dashboard

lw_02

Step #3 : Go to Catalog module and click on “Advanced”

lw_03

Step #4 : Use the search filter like this

lw_04

Step #5 : Export the records

lw_05

Step #6 : Export as MARC using the export button

lw_06

Step #7 : Wait for your export to be ready. Click on “refresh” link to see if it is “done”

lw_07

Step #8 : Download the exported data by clicking on “FILE” link and change file extension to “.mrc”

lw_08

Step #9 : Open the file using MarcEdit and convert it to .mrk for edit work

lw_09

Step #10 : Now curate, remap and edit the data in a batch for making it import ready into Koha

lw_10

Published under CC 4.0 – BY – SA

Originally posted at https://www.facebook.com/pg/l2c2technologies/photos/?tab=album&album_id=993894330699800

MarcEdit QuickTip #2 – Unicode in your source file

Converting a batch of multi-lingual bibliographic records stored in a MS-Excel worksheet to .mrc using MarcEdit? Be sure to check your charset is set to UTF-8 while saving the spreadsheet or its CSV export.

Naveed Bhatti, a fellow Koha ILS user from neighboring Pakistan pinged last week over a problem he was facing. He had multi-lingual bibliographic data stored in an MS Excel worksheet. He wanted to use MarcEdit 6’s¬†Delimited Text Translator tool available under the “Add-ins” menu to convert this file into an Unicode (UTF-8) encoded MARC21 (.mrc) file so that he could import the records into Koha.

However, when he attempted to generate the .mrk file (MarcEdit’s intermediate MarcBreaker mnemonic format before export to .mrc) instead of seeing the Arabic script, he saw a bunch of “?????? ????? ???” wherever there was text in the Arabic script in the spreadsheet. Naveed thought he must be missing something small but crucial, perhaps a setting.

naveed_02

I had a hunch, but I wanted¬†to check the data before commenting. So I asked for a few sample records, which he sent over¬†the next day. I checked and found it was a simple case of charset conversion glitch at the spreadsheet end of things. I could easily generate both the incorrect as well as the correct output (see above) with a simple change of the charset filter. I was using LibreOffice Calc on Windows 8.1 and the default export charset was *not* set to Unicode (UTF-8). The case being, with the default export charset the exported / saved file did not contain the correct Unicode codepoints in the data for the non-Latin data. As a result, at MarcEdit’s end, it became a simple case of garbage-in-garbage-out instead of receiving the correct non-Latin data.

The screenshot below shows the correct filter to use if you are using LibreOffice Calc. If you are using MS-Office, you should see something similar.

naveed_03

Koha’s MARC modification templates comes to the rescue

A client had defined an item type they called “Hindi graphic books“. They used the notation “H-GR” to denote such an¬†item e.g. an Amar Chitra Katha graphic novel. Their data was stored in a MS-Excel spreadsheet as simple tabular data. We were tasked with moving this data into Koha. So, to start off, we defined a new itemtype in Koha with the code “H-GR” with the description set to “Hindi Graphics”. The new item type correctly showed up on the OPAC advanced search under the “Item type” search filter. We then converted the data in Excel into MARC21 records using MarcEdit’s “Delimited Text Translator” tool and imported the resulting .mrc file into Koha. Import went well. However, when we searched the newly entered entries, we found that in the OPAC results page, the Item type showed up in the search facet area as “H=GR” instead of as it should – “Hindi Graphics”. Uh oh! something was certainly wrong!

itemtype_error
Figure 1: Incorrect item type displayed

Using MarcEdit we could¬†see that this had happened due to an additional space having gotten accidentally padded to the itemtype data¬†during data migration. To the system “H-GR” is NOT EQUAL to “H-GR “. *Every* single record in this particular batch of imported MARC records¬†had their 942$c and 952$y fields affeccted.

itemtype_error_01
Figure 2: Extra whitespace padding highlighted.

Thankfully for us, Koha ships with the MARC modification template tool, which is basically like a “find and replace” feature on steroids! So in the end¬†we simply defined a rule that allowed us to¬†replace all the affected 942$c¬†in a very simple, single batch mode operation.

itemtype_error_06A

With the template¬†(we named it ‘correctH_GR’) in place (see above), all we now had to do¬†was to the gather together the ids (i.e. biblionumber) of the affected records and then tell Koha to run the template against this batch of biblionumbers, using the Batch record modification tool as shown below.itemtype_error_04

After this the Batch item modification tool was used to update 952$y which is actually stored in items.itype field. And like above, with just one difference that instead of biblionumber we now need the list of barcodes for all the H-GR item type, in order to modify it.

itemtype_error_05A

And finally our original problem is solved!

itemtype_error_06

PRO-TIP: MARC modification template tool is not only useful for post-facto correction. It can be used even while we are staging MARC records in the first place, offering us to make wide-ranging changes and updates to our incoming MARC record.

MarcEdit QuickTip #1

Introducing the MarcEdit quick tip series

If like us, you too work with data conversion into MARC21, MarcEdit is probably a trusty tool. However, there are a few “gotchas” that can leave any MarcEdit newbie confused or wasting time in figuring things out. This series of short tuts will focus on providing quick tips that may go a long way to alleviate some of these.

Tip #01 – MarcEdit does not like accessing open files

You are trying to use the nifty Delimited Text Translator tool from the Add-ins menu. You seem to do everything right as shown in the screenshot below:

marcedit_error00

But instead of reading the file, MarcEdit throws up a largish window full of error like this:

marcedit_error01

And if you click on “OK” above you are shown this window, which is basically MarcEdit telling you “NO CAN DO” in no uncertain terms.

marcedit_error02

TIP: The file you tried to give to MarcEdit as the “Source File” is open ¬†and MarcEdit failed to open the file (since it is already open) for reading. Close it and try again. This time you should not have this problem.

 

Setting up a MARC21 ETD Framework in Koha

Recently during a discussion on a Whatsapp group of professional colleagues from the North East, a topic that came up Рwhat was better suited for setting up an ETD repository at their academic institutions? As expected, most sided with DSpace, while a few suggested Eprints. I decided to introduce Koha ILS into the picture. For most, this was a rather surprising suggestion.

The ground reality

95% of Indian ETD operators do very little than scanning up a batch of printed document (bound thesis volumes) or born-digital electronic copy of the theses, make it into a PDF file, throw in some metadata together about the item and plug that into (usually) DSpace. The benefits of DSpace being statistics, organisation into collections and community (user groups), embargo capability, faceted and full text searches across the metadata. There is of course the other point of persistent URLs to the hosted resources via HANDLE system. But given that very few Indian repos actually invest in a Handle.net account, it is quite a moot point.

The argument for Koha

Utilizing MARC21, Koha already offers a highly granular and extensive metadata regime. Its capabilities include collections, search facets, full-text search on the metadata. And since Koha’s version 3.22, the capability to directly host files linked to the bibliographic records. Basically it offers everything that 95% of Indian institutions look for when they are planning to setup a repository. Compared to DSpace, LIS professionals are already better acquainted with Koha (or as many in India like to call it – “KOHA” :-P) as it is¬†the de-facto open source LMS. Further, by using Koha for hosting the repository, the institutions and professionals who are already using Koha as their LMS, gain one key advantage; they no longer need to maintain two separate and very dissimilar systems that use completely separate software application stacks (LAMP vs Java; MySQL vs PostgreSQL etc).

Step 1 Рbuilding a MARC21 ETD Framework

I had promised my colleagues to setup a demo ETD using Koha so that they could try it out. After all, there’s no better proof of the pudding than¬†eating it. The first challenge to that is the while Koha does ship with quite a few MARC21 frameworks, an ETD framework is not one of them.

As my starting point, I turned to ETD-MS v1.1, which is considered to be something of a gold standard for ETDs. Taking the help of this page, I worked out a single paged worksheet for cataloging ETDs  on Koha. The result looked like this:

rawmarc

Step  2- Usability and maintainability

Of course this posed a slight problem, how will a cataloger accustomed to / trained on DSpace’s metadata namespace correctly do the crosswalk? To solve this in lines with Koha’s recommended Best Practices for UI mods, jQuery statements were included into Koha’s¬†“intranetuserjs” and “intranetusercss” system preferences. The final outcome was this:

converted_marc

This approach has¬†three (03) key benefits ¬†– (a) that you DO NOT need to touch Koha’s¬†Template::Toolkit based¬†templates (i.e. the .tt files), the changes made are stored in your database and applied during runtime; (b) these changes remain¬†persistent and works across Koha’s monthly version upgrade cycle (since we didn’t change the .tt files) and (c) our other MARC21 templates are left alone, only the ETD framework is thus modified.

The jquery snippet is available on Github as a gist, as is the CSS includes into intranetusercss system preference. There is also a third gist which lists the 04 authorised value categories and their constituent sample options that help us introduce a consistent controlled vocabulary in our metadata.

The demo ETD is available at http://etddemo-opac.l2c2.co.in/. If you wish to access the staff client back-end, contact me via the comments section of this post.

Best wishes!

Resources:

  1. http://blog.l2c2.co.in/wp-content/uploads/2016/06/export_ETD.ods
  2. https://gist.github.com/l2c2technologies/09e7e06f695304f33aada9b529167de6
  3. https://gist.github.com/l2c2technologies/9bc1f9b812e37850959c655fbc0f8802
  4. https://gist.github.com/l2c2technologies/6b735a5e1f4c4f9cc3042af2b8fa5b32
Licensing- CC 4.0 – BY-SA-NC – (c) L2C2 Technologies 2016