Ramakrishna Sarada Mission Vivekananda Vidyabhavan Library partners with L2C2 Technologies to take their catalogue online

We are happy to extend a warm welcome to our newest client-partner Ramakrishna Sarada Mission Vivekananda Vidyabhavan (RKSMVV). The college has been a long-time Koha user, having started in 2011. We are happy to note that their entire physical collection of books have been catalogued into Koha and soon we’ll be updating the OPAC with their up-to-date MARC21 data.

The OPAC is available at https://rksmvvlibrary.in/ and is running currently on Koha 3.22.10 which was released on August 25, 2016.

L2C2 Technologies acknowledges with gratitude the co-operation extended by the library team at RKSMVV. We would particularly like to thank Devatmaprana Mataji the library in-charge, Smt. Parama Sarkhel, Librarian and Smt. Mousumi Adak, Guest Librarian for their trust and unstinting support.

About RKSMVV

Ramakrishna Sarada Mission Vivekananda Vidyabhavan is a partly residential degree college for women affiliated with the West Bengal State University, India. Colloquially more commonly known as “Sarada Mission College” or simply as “Sarada College”, it was the first educational institution started by Ramakrishna Sarada Mission. It came into existence in 1961 as an effort to carry out Swami Vivekananda’s ideals of education among women. Housed initially in a one-roomed makeshift structure, with thirty one students and a handful of young, idealistic and enthusiastic teachers and monastic members, the college today stands among the premier women’s institutions of the state. (Source: Wikipedia)

Koha spine label is not printing the “/” in your call numbers? Here is why.

If you have defined DDC as your classification source and have a “/” in your Koha item call number, it is not going to be displayed when generating spine labels. If you are in hurry or you are aware of the segmentation mark, you can jump straight to the section The Answer.

The “Problem”

Earlier in the day a fellow user Dyuti Samanta came up with a question :

“Sir, I’m trying print spine labels from Koha. However I see that Koha does not print the the front slash (“/”) in my itemcallnumber, even though the same is recorded in my MARC record and is otherwise displayed by Koha elsewhere. For example, the “CHA / L” in “025.4 CHA / L” is being printed as “CHAL”. So where is the problem, how can I fix it?”

The Background

Dyuti’s question made me smile. And instead of immediately telling him about the “why” I pointed him to a comment left by Anamika Das on Vimal Kumar Vazaphally‘s blog post – “Spine label creation” saying “You are not alone with that question! ;-)”.

A call number typically consists of Dewey class number + book number i.e. Cutter number (or some other means of alphabetic arrangement). The frontslash “/” is deemed as a segmentation mark (ala prime mark of in C-I-P records) in the universe of Dewey Decimal Classification[1]. Up until DDC 22 published in 2003 [2], the slash or the prime mark was used to mark the start of every standard subdivision (notation from Table 1) as well as the end of the Abridged number. However, this rule changed from DDC 22 onwards (September 1, 2005 to be exact) and remains extant for the current edition i.e. DDC 23 published in 2011. The new rule has been that only *one* single segmentation mark may be used and that too only for marking the end of the abridged number [3].

A prior and post example straight from Library of Congress

Before DDC 22 – 551.21/09797/84

DDC 22 onward – 551.210979/84

Further, if you follow LC and OCLC norms, while Dewey class number in MARC21 field 082 can definitely have (since Sep 1, 2005) a *single* segmentation mark, the call number should never have one. With this background story in place let’s look at Koha to understand what is happening here.

The Answer

The particular Koha code that has taken out the the slash from both Dyuti and Anamika’s call numbers resides in C4::Labels::Label Perl module which is located at /usr/share/koha/lib/C4/Labels/Label.pm. Even more specifically, it is the _split_ddcn subroutine in Label.pm that is taking out the “/“. As we have already noted, under LC rules, call numbers (unlike Dewey class numbers in 082) can’t have segmentation marks. Thus it takes out any “/” embedded in your call number while processing the spine label. Very specifically, it is this line in the _split_ddcn subroutine: s/\///g; # in theory we should be able to simply remove all segmentation markers and arrive at the correct call number that does it. And just why does _split_ddcn get invoked? Well, it is because of something you did during cataloging, remember that you had recorded DDC as the classification schema? It is that definition in your MARC record that calls in this sub 😀

Below you can see the _split_ddcn subroutine as on date of this post.

sub _split_ddcn {
    my ($ddcn) = @_;
    $_ = $ddcn;
    s/\///g;   # in theory we should be able to simply remove all segmentation markers and arrive at the correct call number...
    my (@parts) = m/
        ^([-a-zA-Z]*\s?(?:$possible_decimal)?) # R220.3  CD-ROM 787.87 # will require extra splitting
        \s+
        (.+)                               # H2793Z H32 c.2 EAS # everything else (except bracketing spaces)
        \s*
        /x;
    unless (scalar @parts)  {
        warn sprintf('regexp failed to match string: %s', $_);
        push @parts, $_;     # if no match, just push the whole string.
    }

    if ($parts[0] =~ /^([-a-zA-Z]+)\s?($possible_decimal)$/) {
          shift @parts;         # pull off the mathching first element, like example 1
        unshift @parts, $1, $2; # replace it with the two pieces
    }

    push @parts, split /\s+/, pop @parts;   # split the last piece into an arbitrary number of pieces at spaces
    $debug and print STDERR "split_ddcn array: ", join(" | ", @parts), "\n";
    return @parts;
}

Note: The _split_ddcn was first submitted to the Koha codebase as part of C4::Labels::Label module by Chris Nighswonger on Jul 20, 2009, by which time the LC’s single segmentation mark rule was already long in place.

So now what?

There are a few options available to you at this point.

(a) If you know what you are doing, you can modify the _split_ddcn sub routine so that it does not discard the “/” and handles the call number as you want it to. (Non trivial and not recommended)

dontsplit

(b) Go to “Manage Layouts” and editing your specific layout by un-checking the option “Split call number“. If you do this then your call number will be printed AS-IS as a single line of text. This means, if the call number is longer that the size of your labels, as they will be at several point in time, you have a *problem*

(c) Keep an eye out to this bug report filed by Katrin Fisher from earlier this year, where she has said:

Currently the call number splitting seems to be mostly implemented for DDC and LC classifications. Those are both not very common in Germany and possibly other countries. A lot of our libraries use their own custom classification schemes so the call number splitting is something that should be individually configurable.

The bad new is that so far no one has responded to this bug, simply because to Koha developers servicing clients using LC / DDC, this is not a priority. So either you can wait with the hope that someone soon will attend to this bug OR you write this functionality yourself OR you sponsor a developer to write it for you.

(d) Take the item call number listing out of Koha as a CSV file and use a 3rd party tool, e.g, gLabels to generate your spine labels.

References:

[1] https://www.loc.gov/aba/dewey/segmentation.html

[2] Dewey_Decimal_Classification – Administration_and_publication

[3] “Sweet segment solution” from 025.431: The Dewey blog

Koha OPAC over SSL breaks GoogleIndicTransliteration

GoogleIndicTransliteration is a nifty Koha feature allowing easy typing and searching in several Indian language to Indian users. However, a bug prevents it from working if the OPAC is run over SSL (i.e. https). This post provides a clear description and a fix for the problem.

Many Indian Koha users use the GoogleIndicTransliteration option to offer their users the facility to search in Indian languages on the OPAC. This nifty feature allows users to phonetically type in their search queries in Indian languages in order to search catalogs that are (a) multi-lingual or (b) in a Indian language other than English.

mixed_03

However, if you are security minded (and you *should* be if your OPAC is on the Net and allows your users to log in) and you decide to serve your site over to SSL (i.e. https), then guess what? The GoogleIndicTransliteration feature stops immediately with the browser console showing MIXED CONTENT error. Every single Koha version from 3.18.0 (when this feature made its way back into Koha after a long hiatus) up to the latest 16.05.2 (released on August 1, 2016) are affected by this problem.

mixed_02

I do not have time, just show me how to fix this

If you are in a hurry, jump over to the section “Your options until the patch is officially released” at the end of this post. Remember to read the caveat and the assumption, you have been warned! 😉

Why is HTTPS so important?

Let’s take a moment to understand why HTTPS is so important. Let’s assume that your Koha server is on your institutional LAN / intranet or hosted online, either on the cloud or on your own server connected to the Internet via a leased line.

Without HTTPS, every time you login into Koha (staff and/or OPAC) and perform *any* ILS transactions (e.g. patron contact information change, holds, fines, circulation etc) all of that information is available in PLAIN TEXT to everyone on your network.

If you are only connected to your institutional network, then that is the direct extent up to which anyone can see what you are doing. If your server is accessible over the Internet, then basically the whole wide world can see what you are doing. For instance, when you login over HTTP, it is actually the equivalent of writing down your username and password on a postcard and mailing it across the globe. Anyone who handles it during transit, or wants to, can simply read it. That’s why the world is moving away from the plain vanilla HTTP.

postcard

In simple terms, HTTPS on the other hand creates an end-to-end encrypted “tunnel” between your server and the browser that is requesting access (e.g. to the OPAC). Think of it as a secure, sealed box with the contents inside and only you, the user, have the “key” to unlock it. The actual process is depicted in the graphics below:

Image source : https://www.identrustssl.com/

Briefly HTTPS has 3 main benefits:

(a) Authentication
(b) Data integrity
(c) Secrecy

None of these are provided by HTTP, thus if your Koha server is online, the SSL (HTTPS) is simply a must these day!

The Basics Explained

GoogleIndicTransliteration feature utilizes a Google API designed for phonetic input of several Indian languages by transliterating text written in English on the fly to its Indian language equivalent. For example, if you type “Rabindranath” and it is set to transliterate to Bengali, the software will automatically convert to “রবীন্দ্রনাথ” or say “Premchand” to “प्रेमचाँद” if set for Hindi.

As with every Google API (and there are many), the Transliteration API too needs to be loaded by a minified Javascript API loader program, known simply as the “Google API Loader“.

How it works

Once GoogleIndicTransliteration system preference is set to “Show” from the Koha staff client, the code inside the file opac-bottom.inc loads up the API loader code available at www.google.com/jsapi, which in turns provides the framework so that the actual transliteration code available in the file googleindictransliteration.js can work its magic and provide the users with the transliteration feature.

GoogleIndicTransliteration system preference
The GoogleIndicTransliteration system preference is set to “Show” on the OPAC.
Why does HTTPS break it but not HTTP?

Short answer: Mixed context!

Long answer: HTTPS is important to protect both your site and your users from attacks online. As of now, Koha code in opac-bottom.inc calls the jsapi code over HTTP, instead of letting the browser handle it correctly based on the security context (i.e. whether the page is being served over HTTP or HTTPS). So when OPAC is on HTTP, jsapi is fetched over HTTP, things are on the same page. However, when the OPAC is served over HTTPS and jsapi continues to be fetched over HTTP, all modern browsers will flag it as a security violation known as “MIXED CONTENT” and halts the loading of jsapi, as seen in the screenshot below:

mixed_04
Error shown in Chrome’s browser console

As a result, googleindictransliteration.js has nothing to work with. End result, the GoogleIndicTransliteration feature does not work anymore! Bingo! We’ve found ourselves with a Koha bug!

Present status of bug

There is a patch submitted to Koha Bugzilla against Bug 17103 – Google API Loader jsapi called over http, waiting for sign-off and QA. Once it clears Koha’s project governance processes, it is expected to get pushed to the master and then be released with a stable version of Koha. Once that happenes we won’t have this issue anymore.
NOTE: Expect this fix to get backported across the current supported older releases.

Your options until the patch is officially released

(a) Do without GoogleIndicTransliteration feature until the fix is officially released by the Koha project if you are using HTTPS

OR

(b) Edit your “opac-tmpl/bootstrap/en/includes/opac-bottom.inc” file. Find the following section:

[% IF ( GoogleIndicTransliteration ) %]
    <script type="text/javascript" src="http://www.google.com/jsapi"></script>	
    <script type="text/javascript" src="[% interface %]/[% theme %]/js/googleindictransliteration.js"></script>
[% END %]

Replace the protocol notifier “http:” from jsapi URI with “https:“and save the file. It should look like this after the change:

[% IF ( GoogleIndicTransliteration ) %]
    <script type="text/javascript" src="https://www.google.com/jsapi"></script>	
    <script type="text/javascript" src="[% interface %]/[% theme %]/js/googleindictransliteration.js"></script>
[% END %]

CAVEAT: If you are doing this edit, it is assumed that you know what you are doing. If you make any mistake and break something during this, its all on you.

ASSUMPTION: This edit assumes you are on Koha 3.18.x and later and is using a .deb package based installation on Debian or Ubuntu.

New feature preview – Quick create patrons

Koha 16.11 which is set to be released sometime in Nov 2016 will include a new and useful feature [1] – to be able to quickly add a new patron. This request had been a long pending one. Afterall the request for this enhancement was posted on the Koha bugzilla almost 7 years back.

Those who are testing the Koha unstable release will be able to see it in action already, for the rest of you here are a couple of screenshots to whet your appetite. 🙂

Select the "Quick add new patron" button
Fig 1 :Select the “Quick add new patron” button
quickcreate_patron_02
Fig 2: The quick add form (with option to open up the full form)

Reference: [1] Bug 3534 – Patron quick add form

A custom subject-wise report of titles with author name, no. of copies, subject name in serialized listing

A custom SQL report for Koha that generates subject wise title lists with author name, no. of copies, subject name and biblionumber, written in response to a reader query over email.

Last week Mr. Gautam Mukhopadhyay, Librarian, Chandrapur College in Burdwan, West Bengal wrote in with a request:

Respected Sir,

I’m writing this seeking a solution for the problem relating to a report generation from Koha. I want to get a list of titles under a particular broader subject field-tag (650). Quite a number of times I’ve checked from SQL Report. But all were in vain as those were not the same what I actually want to get. Following is the specimen of the opted report:
Sr. No.   Title     Author   Copy No.    Subject
  1         ……….     X          3             Bengali
Under the subject Bengali or English or whatsoever, I want to get the titles those are belong to that particular subject. However, it won’t be a problem if there are different reports for different subjects. It’s Ok. But the SQL Query should be a general query structure that can be applicable for all such reports on the titles belong to a broader subject like Bengali, History, Geography etc.
Sir, please let me know the query structure, if possible.
Regards,
GM

 Here is a possible solution for his request, which pretty much does what Mr. Mukhopadhyay had specified in his request. In this example we’ll use a sample MARC21 file which can be downloaded from here to try out this example. This dataset has a 14 unique bibliographic records with a total of 42 item (holdings) record, belonging to 03 specific broader subjects i.e. English, Economics and Political Science. As per Mr. Mukhopadhyay’s use-case, the MARC field 650 holds the broader subject classification. However, to match real world scenarios the 650 fields in some of the cases have other subject headings defined including narrower terms. Also additionally we are going to add an additional column to our report – the biblionumber, so that if required we can cross check a title in the report generated against the biblionumber in the database.

CAVEAT EMPTOR: If you are going to try out this example, we suggest that you define a new Koha library and import this MARC file into it. Mixing this sample data with your existing records is strongly advised against.

Step #1 – Create a new Koha instance and set it up
We are going to use the koha-create Debian command to create a new Koha instance and we shall call our instance as demo.
sudo koha-create --create-db demo

You may calls your instance by whatever name you like. If you are not aware of the koha-create command, please read up “Commands provided by the Debian packages“. Next we will do a default setup and proceed to define a Library that we’ll call “L2C2 Technologies Demo Library” identified by the code “MAIN”, using these instructions here.

NB.: To use the marc file used in this example you must set the library code for your demo branch as “MAIN”, the name (of the library branch) can be whatever you want it to be.

Step #2 – Define a new Authorize value category
Since our example marc file has biblios with only (a) English (b) Economics and (c) Political science, we will define a new authorized value category which we’ll call as SUBLOOKUP, under Home › Administration › Authorized values. Once setup our new authorized value SUBLOOKUP will look like this:
gmreport_02
This authorized value list will provide the subject selection list for our custom SQL report. So if you have more subjects you will need to add them here so that they look like this. The “%” in the Authorized value is *critical*, and if you want to be really strict about it, you can drop the preceding “%” and retain only the one at the end. However should you do that, your first 650 field *must* always be the broader subject heading that you wish to filter your report on.
Step #3 – Define our custom SQL report
We will go to Home › Reports › Guided reports wizard › Create from SQL and create a new SQL report. In this case, we’ll name the report as “List title with number of copies filtered by subject”, add a note that says – “A report written at the request of Mr. Gautam Mukhopadhyay, Chandrapur College, BWN”. The SQL will be as given below. The report once saved, will allow us to run it.
SELECT 
 (@row:=@row+1) AS `S/N`, 
 gmData.Title, 
 gmData.Author, 
 gmData.Copies, 
 REPLACE (@TargetSubject:=<<Select the subject|SUBLOOKUP>>, '%', '') AS Subject, 
 gmData.biblioid AS `Biblionumber` 
FROM 
 (SELECT
 biblio.title AS Title, 
 biblio.biblionumber as biblioid, 
 ExtractValue(biblioitems.marcxml,'//datafield[@tag="245"]/subfield[@code>="c"]') AS Author, 
 count(items.itemnumber) AS Copies, 
 ExtractValue(biblioitems.marcxml,'//datafield[@tag="650"]/subfield[@code>="a"]') AS Subject 
 FROM 
 items 
 LEFT JOIN 
 biblioitems on (items.biblioitemnumber=biblioitems.biblioitemnumber) 
 LEFT JOIN 
 biblio on (biblioitems.biblionumber=biblio.biblionumber) 
 GROUP BY 
 biblio.biblionumber 
 ORDER BY 
 biblio.biblionumber) as gmData, 
 (SELECT @row := 0) r 
 WHERE Subject LIKE <<Re-select the subject tag|SUBLOOKUP>>
Let us take a moment to understand what this piece of SQL syntax really means.
(@row:=@row+1) AS `S/N`,

and

(SELECT @row := 0) r

The use of the @row variable and the counter (@row:=@row+1) gives us our “serial number” column in the report listing.  We can also see the authorized value list “SUBLOOKUP” that we had defined earlier referenced here in the SQL.

NOTE: As you may note, we are asking the user to select the subject *twice*, (first time: ‘Select the subject’ and second time: ‘Re-select the subject tag’). While theoretically we should not be required to do so, thank to the use of the runtime variable @TargetSubject, in reality we ran into a type casting error (see below), thus we used this less than pretty way of asking the user to select the subject twice, to get our job done.

gmreport_03

Step #4 – Running the report

After the report is saved, it is now time to run it, using the “Run report” option. What we’ll see now will be like this:

gmreport_04

We need to select the *same* subject from both the drop-down lists and click on “Run the report” button. Selecting “Economics” we shall in our case get the following report:

gmreport_05

Step #5 – Prettifying the custom report user interface

Having the user to select the subject twice is cumbersome as well prone to human error, so we decided it is time for some jquery magic to streamline this and leave the users with one only a single drop-down to choose from. For this we’ll turn turn to the IntranetUserJS system preference and add the following jquery snippet:

 $("label[for='sql_params_Reselectthesubjecttag']").hide()
 $('#sql_params_Reselectthesubjecttag').hide();
 $("#sql_params_Selectthesubject").change(function() {
   var subval = $('#sql_params_Selectthesubject').val();
   $("#sql_params_Reselectthesubjecttag").val(subval);
 });

If this is the first time you are hearing about the IntranetUserJS system preference, you should definitely read up this. Those of you who are indeed familiar with IntranetUserJS, all we are doing here is to (1) hide the second subject selection dropdown and its label and then (2) we are defining that whenever the user chooses a value from the *first* drop-down, the second (and now hidden) drop-down should also have the same value selected automatically. After saving the IntranetUserJS update, on running the report we shall see this:

gmreport_06

And bingo! We are done!
Extraa Innings: To see the actual report in action
  1. Go to the URL https://demo-staff.l2c2academy.co.in/
  2. Use User name / Password : demo / demo
  3. Go to the section Home › Reports › Guided reports wizard › Saved reports
  4. Select “Run” from the “Actions” dropdown at the right.gmreport_07
  5. Play with the subject selection options to see the different outcome.

 

 

This blog got featured in IASLIC’s June 2016 newsletter

It was nice to see IASLIC’s (Indian Association of Special Libraries and Information Centres) Newsletter for June 2016 feature one of L2C2 Technologies blogpost on Koha Integrated Library System‘s version numbering changes. The IASLIC newsletter can be access from here.  See page 5 of 8 under the section “Technology News”.

IASLIS_JUNE_2016

MarcEdit QuickTip #3 – Getting your 952 (items / holdings data) field in place for import into Koha

Shows how to de-duplicate a .mrc file, by merging duplicate bibliographic records spread all over the file and then gather up the holdings record into repeatable 952 field that Koha expects its for item records.

Last night Pawan Sharma, a fellow user on “Koha Users” reached out for some help with importing his items into Koha. Like many other, he too had moved his catalog data from Microsoft Excel spreadsheet to MarcEdit utilizing MarcEdit’s “Delimited Text Translator” feature which at the end of the process had given him a .mrc file. This he proceeded to upload into Koha by using the More > Tools > Catalog > Stage MARC records for import option.

There were no surprises here, *except* that for every single books with multiple copies Koha imported each of the copies as a separate biblio record, instead of a single entry for the biblio with multiple item records attached to it via Marc21 952 repeatable field that Koha uses for managing holdings data. Simply put his data needed to be de-duplicated with the holding data merged back before import, typically using the ISBN number of the records (MARC field 020).

NOTE: If you wish to read more about Koha’s holding records schema see “Holdings data fields (9xx)” from the Koha Community wiki.

For someone who has not done this before, MarcEdit’s de-duplication and then merging it can seem like a daunting task. This post will hopefully demystify the process.

The discussions on Koha Users were based on a lot of assumptions, especially with no idea about Pawan’s data. So, I offered to take a look at it. He first sent me a .mrc file that had 12806 records, which I immediately converted into MarcEdit’s MarcBreaker mnemonic, human readable format.

marcedit_01

And proceeded to take a “Field count” report (see under “Tools” menu of MarcEditor) to check exactly how many records had ISBN (MARC21 field 020) out of the total number of records.

marcedit_01A

The result as can be seen above – NOT A SINGLE ONE of the 12806 biblio records had an ISBN number! Well, this file can be de-duplicated and merged, but *not* using MarcEdit. Only being told about this Pawan mentioned that he had other .mrc files that had ISBN and so he sent a second .mrc (LG-32016-32979.mrc) file over. Turns out of the total of 965 biblio records in this second file, 828 records had ISBN numbers defined.

marcedit_02A

The next task was to extract the records that *had* ISBN numbers. The remaining 137 can not be dealt with in this process and will have to be dealt with separately. For now, we closed the file LG-32016-32979.mrk file with 965 records and went back to the MarcEdit main window in order to use the “Delete Selected Records” option available under Tools > Select MARC Records

marcedit_02B

The next few steps are simple, if not immediately apparent to a new user of MarcEdit. We’ll use the numbered markers on the screenshot to explain it in steps. First, we selected the LG-32016-32979.mrk file with the 965 records in step #1; next we typed in 020 (since we want to match for ISBN) in the Display Field option (by default it shows 245$a); third step was to click on “Import File” button. After the file is imported (takes just a second or two depending on your file size) this the top-left data grid which was blank so far, will show up data similar to this. Finally in step #4, we will click the “Does Not Match” link. Records that do not have an ISBN number will be selected just like the big red arrow here shows.

marcedit_02CThe last step is to click on “Delete Selected”, this will open a File Save dialog with the title “Remaining Records”. In the case, we provided the name LG-32016-32979_ISBN.mrk and saved it and exited from this deletion utility.

This file LG-32016-32979_ISBN.mrk now has the 828 records with ISBN numbers and each of which has a holding records. This is what we will work with for the deduplication process.

marcedit_02D

Using the Tools > Record Deduplication option of MarcEditor, we will now remove the duplicate records into a separate file and save it with the name LG-32016-32979_ISBN_DEDUP.mrk. We will use ISBN as the field to use to identify duplicates. A popup showed us that 828 records processes, so we are done with deduplication. We will also need to save our original work file LG-32016-32979_ISBN.mrk. This file now contains biblio records with unique ISBN number. A quick check with the Fields Count tool showed us there were now 523 records (down from 828 records originally, the rest 305 records are the duplicates that are now saved in LG-32016-32979_ISBN_DEDUP.mrk).

marcedit_02E

 Now for the next step MARC Merge, which was the last step in this process. We have to go back to the main MarcEdit window and use the menu option Tools > Merge Records. The order of files we specify here is highly *important*. The “Source File” in this case was LG-32016-32979_ISBN.mrk (the file with the 523 records with unique ISBN numbers), the “Merge File” is LG-32016-32979_ISBN_DEDUP.mrk (the file where we had removed the duplicates to in the previous step) and finally, “Save File” is simply the name of the new merged file we are going to create (Hint: this is the final file that we will push to Koha). We named the final file as LG-32016-32979_ISBN_MERGED.mrk. The Record Identifier is of course 020 (i.e. ISBN number) and we move on the next screen.

marcedit_02F

This is next step is basically *everything* we have been working for in this post so far, we select the field to merge in from “Merge File” into the “Source File” and click next.

marcedit_02GIn this case everything went well and we were presented with the following screen that said “Merge Completed” and gave us the full path and filename to our merged file LG-32016-32979_ISBN_MERGED.mrk.

marcedit_02H

Of course we opened up the LG-32016-32979_ISBN_MERGED.mrk file in MarcEditor. The first thing was to check the Field Count report, and this is what we saw 523 biblio records with a total of 828 holding records, which sounds right! Below is example of the merged holdings.

marcedit_02I

Of course there is still the task of exporting the MarcBreaker (.mrk) back to .mrc so that Koha can ingest it for its MARC21 staging workflow, but everyone knows that 🙂

NOTE: For reference to this tutorial I’m attaching the zip file containing all the LG-32016-32979 files used in this example.

Planning to bulk import your patrons? Make sure you do not have in-line line breaks in your data.

In-line line breaks in a CSV file can really send your Koha patron import script into a tailspin. Here is what you need to watch out for and the couple of other gotchas which will make you upgrade your Koha instance if the version you are using is less than 3.22.7.

Last week a friend working at a local college approached me for a spot of help. He was trying to import his patrons into Koha but was failing miserably. After he nearly got his head snapped off (Me: Do I look like I’m in the fortune telling profession???) he agreed to send over his data – an MS-Excel sheet for me to take a look at.

I pulled up a 3.22.6 instance I had laying around and tried to import his data. Quite expectedly, there were errors galore and the pretty much the same ones he was complaining about.

blog_patron_1

Hang on! categorycode, branchcode and surname fields were NOT missing in *any* single record. So what was going on here??? The most interesting to note here is that patron importer script said :

272 not imported because they are not in the expected format

272 records parsed

Now this was really something as the total number of student records in that patron uploader CSV file were only 144. So where does the number 272 come from?

The answer to this was easy to find. My friend’s data had several records in a rather bad shape – they had embedded line-breaks within the cell. I’ve highlighted the first few of the badly formatted cells with yellow in the screenshot below.

blog_patron_3A

So, I copied the first 28 records over to a new file, ran a hackish utility script to clean out the line breaks and saved these 28 records as a new file and proceeded to upload it. This time of course “the fat lady sang”[1] i.e. the records got imported nice and we were done!. 😀

blog_patron_3NOTE:  Of course while doing that we encountered a few Koha bugs as well – Bug id 15840 and Bug id 16426. The work-around mentioned in comment #16 of the latter bug, by Koha QA Manager Katrin Fischer holds good, in case you get stuck here and can’t immediately upgrade. Otherwise to avoid these to bugs, your real option is to upgrade your Koha instance, something that I’m going to recommend to my friend (aside from him fixing his data).

Reference: [1] Wikipedia “It ain’t over till the fat lady sings”