MarcEdit QuickTip #3 – Getting your 952 (items / holdings data) field in place for import into Koha

Shows how to de-duplicate a .mrc file, by merging duplicate bibliographic records spread all over the file and then gather up the holdings record into repeatable 952 field that Koha expects its for item records.

Last night Pawan Sharma, a fellow user on “Koha Users” reached out for some help with importing his items into Koha. Like many other, he too had moved his catalog data from Microsoft Excel spreadsheet to MarcEdit utilizing MarcEdit’s “Delimited Text Translator” feature which at the end of the process had given him a .mrc file. This he proceeded to upload into Koha by using the More > Tools > Catalog > Stage MARC records for import option.

There were no surprises here, *except* that for every single books with multiple copies Koha imported each of the copies as a separate biblio record, instead of a single entry for the biblio with multiple item records attached to it via Marc21 952 repeatable field that Koha uses for managing holdings data. Simply put his data needed to be de-duplicated with the holding data merged back before import, typically using the ISBN number of the records (MARC field 020).

NOTE: If you wish to read more about Koha’s holding records schema see “Holdings data fields (9xx)” from the Koha Community wiki.

For someone who has not done this before, MarcEdit’s de-duplication and then merging it can seem like a daunting task. This post will hopefully demystify the process.

The discussions on Koha Users were based on a lot of assumptions, especially with no idea about Pawan’s data. So, I offered to take a look at it. He first sent me a .mrc file that had 12806 records, which I immediately converted into MarcEdit’s MarcBreaker mnemonic, human readable format.

marcedit_01

And proceeded to take a “Field count” report (see under “Tools” menu of MarcEditor) to check exactly how many records had ISBN (MARC21 field 020) out of the total number of records.

marcedit_01A

The result as can be seen above – NOT A SINGLE ONE of the 12806 biblio records had an ISBN number! Well, this file can be de-duplicated and merged, but *not* using MarcEdit. Only being told about this Pawan mentioned that he had other .mrc files that had ISBN and so he sent a second .mrc (LG-32016-32979.mrc) file over. Turns out of the total of 965 biblio records in this second file, 828 records had ISBN numbers defined.

marcedit_02A

The next task was to extract the records that *had* ISBN numbers. The remaining 137 can not be dealt with in this process and will have to be dealt with separately. For now, we closed the file LG-32016-32979.mrk file with 965 records and went back to the MarcEdit main window in order to use the “Delete Selected Records” option available under Tools > Select MARC Records

marcedit_02B

The next few steps are simple, if not immediately apparent to a new user of MarcEdit. We’ll use the numbered markers on the screenshot to explain it in steps. First, we selected the LG-32016-32979.mrk file with the 965 records in step #1; next we typed in 020 (since we want to match for ISBN) in the Display Field option (by default it shows 245$a); third step was to click on “Import File” button. After the file is imported (takes just a second or two depending on your file size) this the top-left data grid which was blank so far, will show up data similar to this. Finally in step #4, we will click the “Does Not Match” link. Records that do not have an ISBN number will be selected just like the big red arrow here shows.

marcedit_02CThe last step is to click on “Delete Selected”, this will open a File Save dialog with the title “Remaining Records”. In the case, we provided the name LG-32016-32979_ISBN.mrk and saved it and exited from this deletion utility.

This file LG-32016-32979_ISBN.mrk now has the 828 records with ISBN numbers and each of which has a holding records. This is what we will work with for the deduplication process.

marcedit_02D

Using the Tools > Record Deduplication option of MarcEditor, we will now remove the duplicate records into a separate file and save it with the name LG-32016-32979_ISBN_DEDUP.mrk. We will use ISBN as the field to use to identify duplicates. A popup showed us that 828 records processes, so we are done with deduplication. We will also need to save our original work file LG-32016-32979_ISBN.mrk. This file now contains biblio records with unique ISBN number. A quick check with the Fields Count tool showed us there were now 523 records (down from 828 records originally, the rest 305 records are the duplicates that are now saved in LG-32016-32979_ISBN_DEDUP.mrk).

marcedit_02E

 Now for the next step MARC Merge, which was the last step in this process. We have to go back to the main MarcEdit window and use the menu option Tools > Merge Records. The order of files we specify here is highly *important*. The “Source File” in this case was LG-32016-32979_ISBN.mrk (the file with the 523 records with unique ISBN numbers), the “Merge File” is LG-32016-32979_ISBN_DEDUP.mrk (the file where we had removed the duplicates to in the previous step) and finally, “Save File” is simply the name of the new merged file we are going to create (Hint: this is the final file that we will push to Koha). We named the final file as LG-32016-32979_ISBN_MERGED.mrk. The Record Identifier is of course 020 (i.e. ISBN number) and we move on the next screen.

marcedit_02F

This is next step is basically *everything* we have been working for in this post so far, we select the field to merge in from “Merge File” into the “Source File” and click next.

marcedit_02GIn this case everything went well and we were presented with the following screen that said “Merge Completed” and gave us the full path and filename to our merged file LG-32016-32979_ISBN_MERGED.mrk.

marcedit_02H

Of course we opened up the LG-32016-32979_ISBN_MERGED.mrk file in MarcEditor. The first thing was to check the Field Count report, and this is what we saw 523 biblio records with a total of 828 holding records, which sounds right! Below is example of the merged holdings.

marcedit_02I

Of course there is still the task of exporting the MarcBreaker (.mrk) back to .mrc so that Koha can ingest it for its MARC21 staging workflow, but everyone knows that 🙂

NOTE: For reference to this tutorial I’m attaching the zip file containing all the LG-32016-32979 files used in this example.

Planning to bulk import your patrons? Make sure you do not have in-line line breaks in your data.

In-line line breaks in a CSV file can really send your Koha patron import script into a tailspin. Here is what you need to watch out for and the couple of other gotchas which will make you upgrade your Koha instance if the version you are using is less than 3.22.7.

Last week a friend working at a local college approached me for a spot of help. He was trying to import his patrons into Koha but was failing miserably. After he nearly got his head snapped off (Me: Do I look like I’m in the fortune telling profession???) he agreed to send over his data – an MS-Excel sheet for me to take a look at.

I pulled up a 3.22.6 instance I had laying around and tried to import his data. Quite expectedly, there were errors galore and the pretty much the same ones he was complaining about.

blog_patron_1

Hang on! categorycode, branchcode and surname fields were NOT missing in *any* single record. So what was going on here??? The most interesting to note here is that patron importer script said :

272 not imported because they are not in the expected format

272 records parsed

Now this was really something as the total number of student records in that patron uploader CSV file were only 144. So where does the number 272 come from?

The answer to this was easy to find. My friend’s data had several records in a rather bad shape – they had embedded line-breaks within the cell. I’ve highlighted the first few of the badly formatted cells with yellow in the screenshot below.

blog_patron_3A

So, I copied the first 28 records over to a new file, ran a hackish utility script to clean out the line breaks and saved these 28 records as a new file and proceeded to upload it. This time of course “the fat lady sang”[1] i.e. the records got imported nice and we were done!. 😀

blog_patron_3NOTE:  Of course while doing that we encountered a few Koha bugs as well – Bug id 15840 and Bug id 16426. The work-around mentioned in comment #16 of the latter bug, by Koha QA Manager Katrin Fischer holds good, in case you get stuck here and can’t immediately upgrade. Otherwise to avoid these to bugs, your real option is to upgrade your Koha instance, something that I’m going to recommend to my friend (aside from him fixing his data).

Reference: [1] Wikipedia “It ain’t over till the fat lady sings”

 

Supporting Kerala Library Assn’s new open access journal KJIST

I’m happy to announce that L2C2 Technologies will be working together with Kerala Library Association (KLA), India, for the next 5 years for online publishing, hosting and support of KLA’s new open access biannual journal – KLA Journal of Library & Information Technology (https://www.kjist.in/index.php/kjist). The first issue will be published in Jan 2017. The publishing platform is based on the open source OpenJournalSystem from Public Knowledge Project. ‪#‎OJS‬ ‪#‎KLA‬

I’m grateful to the support of KLA Executive Committee and to my good friend Mr. Venkitachalam Sriram who had worked closely with me at every step to make this possible. And also a big THANK YOU to all our well wishers.

If you represent a non-profit state-level library association wanting to publish an open-access online journal, do get in touch with us for some great offers.

Easy peasy way of automating remote backup on Google Drive for your Koha database

This post discusses how to automate your Koha ILS’s MySQL database backup on to Google drive and send an email when it is complete. It shows how you can take advantage of Google Drive’s 15GB space for free (Dropbox only gives you 2GB on the free access) and do it all from the command line and save the much needed RAM for your Koha server rather than waste it on the GUI, which is also a security risk. Further this attempts to introduce the novice readers into details of the commands they are supposed to follow, with further reading resources, should they be inclined to learn more.

Having your Koha ILS database to be regularly backed up on to remote, cloud storage is an excellent idea. By doing so you ensure a critical off-site, disaster recovery measure, which is very good. However, as with all things human, if we leave it on ourselves to do it, there will come to pass a time when we will (a) forget to do it or (b) be unable to do it for some
reason. As we all know good ol’ Captain Murphy’s Law[1] will strike us whenever we are least prepared; in this case typically that one time we forgot or were unable to take the backup, the darned thing will crash!

So backup automation is key. Not only it ensures regularity without fail. It also removed one more essential chore from our immediate plate, thus leaving us free to do other things without feeling guilty over this key housekeeping chore.

Cloud backup – Google vs Dropbox

Dropbox and Google Drive comes across as immediate choice of cloud based backup. However, their free editions differ [2]… only by about 13GB of space between them. So for long-term online backup Google Drive is the de-facto choice.

Our objective

So, here is what we set out to do:

  1. create a datetime stamped backup of the database; (so we can tell just by seeing the filename when the backup was taken)
  2. compress it with bzip2 utility; (so all those loooooong lines of SQL text do not take up so much space, a text file can compress up to within 10% of it original size)
  3. upload it to a specified folder on Google Drive; (so that all our backups remain in one place, date-wise)
  4. email the user that the remote backup process is complete. (so when we outside or on vacation and don’t have access to our workstation, we still get a notification when it was completed and if we don’t get one, then that something certainly went wrong and someone should do something about it)

And of course, since we are talking about making this happen everyday at the same time, we need to create a cron job that will deliver all of 1, 2, 3 and 4 to us in a single neat little command.

As you all know, no self respecting system administrator will ever be caught running the X11 windowing system on a production server. So we are going to do these the way real system admins do: from the command line.

NOTE: X11 is the geekspeak for the Graphical User Interface (GUI) environment we see e.g. when we log into an Ubuntu Desktop (which is typically the Unity desktop)

Command line in this day and age? Are you nuts???

No! And here is the reason. X11 is not only an inherently insecure protocol that puts your production system at risk, it is also (compared with a command line only system) a tremendous resource hog! We all know that more free memory (RAM) is usually-a-good-thing ™, so instead of wasting our precious RAM on running a GUI (and all the unnecessary software along with it making it slow *and* insecure) we are going to show you how to do this all from a command line. One other thing: if you ever need the assistance of an expert, you will find that command line setups are also easier to debug (for an expert), after all, aren’t they always asking you to check your “logs”? All those are after all command line output. So like the Chloromint ad below, please don’t ask us again why we love the command line! 😉

Preparations

We want a normal user account with no admin privileges; say in our case we will call it l2c2backup and we will do it from the terminal using the adduser l2c2backup command. See below:

blog_01

Next up, we need to switch over to the new user account and create a synchronization folder for Google drive.

blog_02

At this point, we’ll press “Ctrl+D” and exit from the l2c2backup user and come back to the root user or sudo user, for we now need to install a command line google drive client on our system. We are going to use the (almost) official Google Drive command line client for Linux known simply as “drive” and available from https://github.com/odeke-em/drive

Since we are using Debian, we have the advantage of using the pre-built binaries, which we shall install in the following manner by executing in turn each of the commands:
# apt-get install software-properties-common
# apt-add-repository 'deb http://shaggytwodope.github.io/repo ./'
# apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 7086E9CC7EC3233B
# apt-key update
# apt-get update
# sudo apt-get install drive

NOTE:If you are using Ubuntu or other mainstream Linux distributions, you can use the instructions given here on the Platform Packages page.

Once we have completed installation of “drive“, we now need to go back to our /home/l2c2backup/gdrive folder as the user l2c2backup and initialize the sync folder (i.e. /home/l2c2backup/gdrive) using the command “drive init

blog_03

Copy the really long URL that the command tells you to visit and open it in your web browser. You will see an application authorization dialog screen come up, click on the “Allow” button.

blog_03A

NOTE: Before pasting the URL, you must make sure that at this point you are logged in into the actual Google user account where you want to send the backups to. Don’t make a mess here.

Assuming you did everything as I have mentioned so far, you will be automatically redirected to the page with the authorization key. It will look pretty much like the one below. Of course, every request will generate a separate access authorization key, so use the one generated specifically against your request.

blog_03B

Copy this key and paste it back at the prompt in your terminal window and press <ENTER>. DO NOT TRY TO TYPE IT OUT BY HAND, COPY-N-PASTE IS THE ONLY WAY HERE!

If you have done everything alright then you should be back at the command prompt without any error or any other message. Your sync folder should now be ready.

Putting our solution together

Now that we have the Google Drive sync ready, it is time to look at each piece of our basic requirement.

1. Creating a datetime stamped backup of our database

First we need to create the name of our output file for the MySQL backup. For this we shall use this: BACKFILE="<dbname>.$(date +"%Y%m%d_%H%M%S").sql;. The date format will give us a datetime string formatted as “20160723_000001” when the date & time is 12:00:01 (AM) on 23-July-2016. For this example, let us assume that the BACKFILE environment variable will hold the value: koha_ghci.20160723_000001.sql.

Note: replace <dbname> with the actual name of your Koha database, which in our case is koha_ghci. So, the syntax for us looked like: BACKFILE=koha_ghci.$(date +"%Y%m%d_%H%M%S").sql;. If you want to learn more about the format specific to the date command, you can read up this.

Next we will create the actual db backup using the datetime stamped output filename we just created. For that mysqldump -u<mysql_db_username> -p<mysql_db_passwd> <dbname> > /home/l2c2backup/gdrive/$BACKFILE.

Note: replace the <mysql_db_username>, <mysql_db_passwd> and <dbname> placeholders with your actual values. In our example case, the actual backup command string looked like this: mysqldump -ukoha_ghci -pASx2xvercbHXzs2dP koha_ghci > /home/l2c2backup/gdrive/$BACKFILE.

2. Compressing our SQL export

The previous step had exported our koha_ghci database as koha_ghci.20160723_000001.sql. We shall now compress this with bzip2 /home/l2c2backup/gdrive/$BACKFILE, which will give us the compressed file koha_ghci.20160723_000001.sql.bz2

3. Upload the compressed SQL backup to Google drive

Before we proceed with the actual upload, we should create a dedicated directory *on* our actual Google drive to store our backups. Lets call this directory as DBBACKUPS and create it on our online Google Drive space. It should be mentioned here that the command for upload using this library we are using, takes the form of drive push --destination <remote_folder_name> <full_path_to_compressed_file>. This code will ask for confirmation and we need to pass “Y” for yes before it will proceed. So we need to take care out that by adding echo Y | before the drive push command.

So in our case it will be echo Y | drive push --destination DBBACKUPS /home/l2c2backup/gdrive/$BACKFILE.bz2

Note:If you wish to learn about the other various options you can additionally use with drive push, I suggest you read this for the details.

4. Sending an email when the upload is done.

We are not running a dedicated, full fledged mail server like say Postfix on this box. Rather we are using the lightweight mstmp-mta with our Gmail account as the mail relay. If you want to know how to configure it, I suggest that you read this tutorial, ignoring the “mutt” part which you do not require. It is very simple. We had email sending working in under a minute. That’s just how long it took use to configure it.

Note: Just remember you *must* have openssl installed otherwise you will never be able to talk to GMail. And also you will need to go to your Google account and enable support for that Google likes to call “less secure apps” (which means any app that does use Google’s OAuth2 protocol for authentication. You will be authenticating over TLS and it is a perfectly safe thing to do, so just ignore Google’s ominous tone and enable “less secure apps”.

Now that we have msmtp-mta up and running, we will send out that email using this: printf "To: <recipient_email_address>\nFrom: <your_gmail_address>\nSubject: <dbname> db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of <your_gmail_address>.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp <recipient_email_address>

In our case that happened to be printf "To: monitoring@l2c2.co.in\nFrom: indradg@gmail.com\nSubject: KOHA_GHCI db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of indradg@gmail.com.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp indradg@l2c2.co.in.

5. Putting it all together

Now that we have all the parts of the puzzle in place, it is time to assemble it into a single piece. And the way, it worked for us was BACKFILE=koha_ghci.$(date +"%Y%m%d_%H%M%S").sql; mysqldump -ukoha_ghci -pASx2xvercbHXzs2dP koha_ghci > /home/l2c2backup/gdrive/$BACKFILE && bzip2 /home/l2c2backup/gdrive/$BACKFILE && echo Y | drive push --destination DBBACKUPS /home/l2c2backup/gdrive/$BACKFILE.bz2 && printf "To: indradg@l2c2.co.in\nFrom: indradg@gmail.com\nSubject: KOHA_GHCI db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of indradg@gmail.com.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp indradg@l2c2.co.in

Note: The reason we used the “&&” is that in BASH it stands for what is called as “Logical AND”. In simple English this merely means that unless the previous command is not not executed successfully, whatever comes next simply won’t execute.

A BASH script and a cron job

We placed this one-liner script that cobbled together into the following BASH script which we named as “backuptogoogle.sh” and placed it in the folder /usr/local/bin after setting its execution bit on with chmod a+x /usr/local/bin/backuptogoogle.sh

#!/bin/bash
BACKFILE=koha_ghci.$(date +"%Y%m%d_%H%M%S").sql; mysqldump -ukoha_ghci -pASx2xvercbHXzs2dP koha_ghci > /home/l2c2backup/gdrive/$BACKFILE && bzip2 /home/l2c2backup/gdrive/$BACKFILE  && echo Y | drive push --destination DBBACKUPS /home/l2c2backup/gdrive/$BACKFILE.bz2 && printf "To: indradg@l2c2.co.in\nFrom: indradg@gmail.com\nSubject: KOHA_GHCI db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of indradg@gmail.com.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp indradg@l2c2.co.in

We setup a root user cron job with crontab -e and adding the following line and saving it.

@daily /usr/local/bin/backuptogoogle.sh

Note: The @daily shortcut will execute our script exactly at mid-night everyday. If you want to know what are the other useful cronism shortcuts, I suggest you read this useful post by my Koha colleague and good friend D. Ruth Bavousett over here.

Backup automation from command line

If you have been able to follow the instructions by suitably modifying them to your specific settings, you have just achieved backup automation from the command line. Like I said… It’s Easy Peasy!!! 😀

References:

[1] https://en.wikipedia.org/wiki/Murphy%27s_law

[2] http://www.cloudwards.net/dropbox-vs-google-drive/#features

Adding the item type filter to OPAC masthead search

Earlier today my good friend Vimal Kumar Vazaphally posted a question here about how to add the the item type (mc-itype in koha search speak) filter as a dropdown to the default main masthead search in the Koha OPAC.

Fig. 1: The default OPAC masthead search bar.
Fig. 1: The default OPAC masthead search bar.

RTFM and RTFM often!!! It may save your life!

<rant>The discussion that pursued on the FB group made something very clear. People forget to read the fine manual and when they do, they do not read “between the lines”. Trust me on this one, the Koha user manual is truly a ginormous treasure trove, if you take the pains to read it.</rant>

Ok! here is why I said that people really need to RTFM. The solution to the problem which Vimal shared can easily be extrapolated from this section in the manual – Appendix P: Extending KohaNewest title pulldown (#KohaTrivia it is based on a July 2012 blog post by Nicole C. Engard (Koha’s indefatigable Documentation manager).

“Reading between the lines” of a given solution

Here was the catch: Nicole’s solution pulled out only the newest arrivals of each itemtype, whereas we need it to pull out everything tagged to an itemtype, whether new or old. The second catch was that in Nicole’s example, she added plain HTML to the OpacNav system preference. However we are going to add it via the jquery / javascript route using the opacuserjs system preference.

We will follow Nicole’s example and pull out all item types that we need to populate the drop-down using SQL. However, we will need to escape the additional “backslash” (i.e. \ ) since we need MySQL to execute the query and actually generate the HTML markup that will be rendered on the browser via Javascript. Javascript does not care for arbitrary line breaks 😉 [1] and MySQL does not care about unescaped backslashes!

Now, if this sounded confusing, ponder for a moment on this Ajeet joke from yesteryears:

Raabert: Boss? Is kaa kyaa kare boss?
Ajeet: Rawbert! Is pille ko liquid oxygen me daal do. Liquid ise jeene nahi dega, aur oxygen ise marne nahi dega.

Nicole’s SQL based option list generator:
SELECT CONCAT('<option value=\"mc-itype:', itemtype, '\">',description,'</option>') FROM itemtypes

Our modification to the that make the output Javascript friendly:
SELECT CONCAT('<option value=\"mc-itype:', itemtype, '\">',description,'</option> \\') FROM itemtypes

In this case, turns out we have 5 itemtypes defined and we get this following output and we save it as CSV for introduction into our jquery.

<option value=”mc-itype:BBK”>Bengali Books</option>\
<option value=”mc-itype:BOOK”>Books</option>\
<option value=”mc-itype:BVOL”>Bound Volumes</option>\
<option value=”mc-itype:REF”>Reference Books</option>\
<option value=”mc-itype:SER”>Serials</option>\

(Hint: the exact item types is most likely to differ in your case; DO NOT copy-paste this output AS-IS.)

Let’s build the actual jQuery snippet

$( '<select name="limit" id="limitfiler" style="margin-left: 4px;"> \
<option value="">-- filter by item type --</option> \
<option value="mc-itype:BBK">Bengali Books</option> \
<option value="mc-itype:BOOK">Book</option> \
<option value="mc-itype:BVOL">Bound Volumes</option> \
<option value="mc-itype:REF">Reference Books</option> \
<option value="mc-itype:SER">Serials</option> \
</select>' ).insertAfter('#masthead_search');

And plug it into your opacuserjs system preference.

Let test our new drop-down!

add_itype_04
Fig 2: Searching for “Subject” as “English language” and no item-type based filtering
Fig 3: Additionally "BOOK" item type filter on.
Fig 3: Additionally “BOOK” item type filter on.
Fig 4: With REF itemtype filtering on.
Fig 4: With REF itemtype filtering on.

References:

[1] “Multi-Line JavaScript Strings” by David Walsh

Migration tips #1 : Separating your 100, 110 and 111s in the spreadsheet

Imagine the following scenario:  Over the last several months you have been steadily entering your library’s bibliographic data, either from the accession register or by accessing actual physical copies at your library into a spreadsheet. You have managed to create some ~23,000 records and now wish to import these into your favourite ILS i.e. Koha.

While working, in the spreadsheet you had made a single column for author information and recorded the name in the <lastname>, <firstname> style for the personal name entries (i.e. MARC21 tag 100). However you have also used the same column to capture corporate (110) or meeting (111) name entries as well.

110

So now how do you pick out these non-100 records, the 110 and 111s from among the ~23,000 records so that you do not leave any corporate body entry in the personal name field? A manual curation is possible. But it is simply too error prone and hugely time consuming.

Luckily since you have used the <lastname> comma <firstname> style consistently (or almost consistently) for all personal name entries you can use your Libreoffice Calc spreadsheet to do some magic. 😀

Step 1: A formula to identify possible corporate entries.

As we see in the example above, Calcutta University has been entered AS-IS and not as University, Calcutta so let us look for records that *do not* have any commas in them. [Sidenote: we will hit some false positives, but the magnitude of the problem will be less that searching all 23,000 records.

And for that we enter the following formula in cell B2 (where A2 is our first record and A1 being the header row):

=IF((ISBLANK(A2)),"NO",IF((ISNUMBER(FIND(",",A2))),"NO", "YES"))

The first part is just a safety check to ensure that our cell is not blank. For course, if it is blank then obviously it can not be a corporate body entry so the answer we want in B2 is “NO”. Once we know it is not indeed blank, we move in to check if we can find our “comma”  in the cell. If the comma is present then it is assumed to be a personal name entry, hence also “NO”, and if not found *and* not blank then it is safe to assume it is most likely a corporate name (110) or a meeting name (111).

Lets see what happens when we apply this formula to all our ~23,000 records.

110_1

N.B. The “false positive” happened as the person doing the data entry did not format the name of “Mahasweta Devi” as “Devi, Mahasweta”

Step 2: Always count the eggs in your basket

We had started with exactly 22,959 records, let us see how many “YES” records our formula has found: 941. We have narrowed our search down to just about 4% of the total records.

110_4

Step 3: Filtering out the “YES” records

Luckily for us, LibreOffice Calc has a nice filtering tool. It is available under Data -> Filter -> Standard filter menu option. We had earlier named our column as “boolean” in the cell B1. So we’ll now filter out all 941 records that marked as “YES” using this tool.

110_2

And immediately our spreadsheet will show us only the “YES” records like this:

110_5

Step 4: Removing the “YES” from false positives

The false positives have a story to tell. They tell us that we need to do better quality control of our data entry. We also probably need to ensure that the persons entering the data understand how to handle Muslim or foreign names.

Example: Let us take the name of my good friend and Koha expert Mr Vimal Kumar Vazhappally. Now the correct way to address him is as Mr. Vimal Kumar and *not* as Mr. Vazhappally. Vazhappally isn’t really his surname, rather it is the name of his village.

For now, the simplest way to correct the false positive is simply to visual check the A column and if it is apparent that the corresponding cell in B column wrongly has a “YES”, simply to move to the cell in B and simply delete the formula from that cell.

Instead of looking through 22,959 records, we now are going to check less than a 1000 records, but this time looking only for false positives.

110_3

Step 5: Two formula to separate the 100s and 110s

After we remove the B column cell values of the false positives – ones with “YES” but not a corporate entity, we now need two more columns to our spreadsheet. These will hold our separated 100 and 110 / 111 values.

So in order to copy over the values of corporate / meeting names (110 / 111) to the C column, we will define the following formula in the cell C2:

=IF((B2="YES"),A2,"")

And similarly in the next column to the right, in the cell D2 we will define this fomula:

=IF(B2="YES","",A2)

Next we will simply copy the range C2:D2 over to the entire range C3:D22959 and we are DONE!

110_6

Columns C and D now have our separated records for all 22959 records. And it took me less time to do the actual correction, than it took me write this blog post, take screenshots, crop, annotate, upload and proof read the final post.

Closing words

We found ~350+ corporate body / meeting name entries in the list, which could be separated out of 22,959 records using this technique.

 

Exporting your LibraryWorld bibliographic data to MARC21

Steps to export your LW data to MARC21 for import into Koha.

LibraryWorld (LW) is a hosted (SaaS) proprietary integrated Library system developed and promoted by San Jose, California based Libraryworld Inc. According to their website they charge USD 495 annually for a hosted instance of LW. So earlier in the day when Mr. Sonam Rinzee from Bhutan requested for some help in exporting his library’s catalog data, it presented the perfect opportunity for L2C2 Technologies to tinker with something new. These screenshots describe how it was done.

Step #1 : Access the LW website

lw_01

Step #2 : Login into the admin dashboard

lw_02

Step #3 : Go to Catalog module and click on “Advanced”

lw_03

Step #4 : Use the search filter like this

lw_04

Step #5 : Export the records

lw_05

Step #6 : Export as MARC using the export button

lw_06

Step #7 : Wait for your export to be ready. Click on “refresh” link to see if it is “done”

lw_07

Step #8 : Download the exported data by clicking on “FILE” link and change file extension to “.mrc”

lw_08

Step #9 : Open the file using MarcEdit and convert it to .mrk for edit work

lw_09

Step #10 : Now curate, remap and edit the data in a batch for making it import ready into Koha

lw_10

Published under CC 4.0 – BY – SA

Originally posted at https://www.facebook.com/pg/l2c2technologies/photos/?tab=album&album_id=993894330699800

MarcEdit QuickTip #2 – Unicode in your source file

Converting a batch of multi-lingual bibliographic records stored in a MS-Excel worksheet to .mrc using MarcEdit? Be sure to check your charset is set to UTF-8 while saving the spreadsheet or its CSV export.

Naveed Bhatti, a fellow Koha ILS user from neighboring Pakistan pinged last week over a problem he was facing. He had multi-lingual bibliographic data stored in an MS Excel worksheet. He wanted to use MarcEdit 6’s Delimited Text Translator tool available under the “Add-ins” menu to convert this file into an Unicode (UTF-8) encoded MARC21 (.mrc) file so that he could import the records into Koha.

However, when he attempted to generate the .mrk file (MarcEdit’s intermediate MarcBreaker mnemonic format before export to .mrc) instead of seeing the Arabic script, he saw a bunch of “?????? ????? ???” wherever there was text in the Arabic script in the spreadsheet. Naveed thought he must be missing something small but crucial, perhaps a setting.

naveed_02

I had a hunch, but I wanted to check the data before commenting. So I asked for a few sample records, which he sent over the next day. I checked and found it was a simple case of charset conversion glitch at the spreadsheet end of things. I could easily generate both the incorrect as well as the correct output (see above) with a simple change of the charset filter. I was using LibreOffice Calc on Windows 8.1 and the default export charset was *not* set to Unicode (UTF-8). The case being, with the default export charset the exported / saved file did not contain the correct Unicode codepoints in the data for the non-Latin data. As a result, at MarcEdit’s end, it became a simple case of garbage-in-garbage-out instead of receiving the correct non-Latin data.

The screenshot below shows the correct filter to use if you are using LibreOffice Calc. If you are using MS-Office, you should see something similar.

naveed_03

Downgrading your Koha ILS installation

Earlier in the day Dyuti messaged me. He was facing problems that did not exist before after he upgraded to Koha 16.05 from 3.22 on his production system. What he reported sounded like a bug, so I asked him to report it on the Koha bugzilla. Thankfully, he had remembered to take a database backup of his Koha instance before the upgrade. He wanted to know if there was any way he could go back to 3.22 using the backup where things were working fine (as far as the functions he required).

This is a fairly common scenario. People upgrading to the new stable after a major version release, often wish that they can *downgrade* to their earlier major version release, especially after realizing that something is broken in the new major version that used to work for them in the earlier one. If they could do that, then they could wait until the bug fix became available, without any interruption in the library service.

Luckily with Koha this is quite easy, especially if you were on the immediate previous version.

(HINT! one more reason why you should try to keep your Koha instance updated if you do not have a dedicated IT team to support you).

Taking Dyuti’s case as an example use-case, here are the steps to go back  to 3.22 from 16.05:

  1. Using the terminal, take a backup of koha_library database, preferably using koha-run-backups (click to reference details) and keep the backup safe.
  2. Use sudo apt-get purge koha-common command to remove Koha ILS from the system.
  3. Remove the existing koha_library database from MySQL.
  4. Re-install earlier major version series (3.22 in this case) of Koha using the oldstable repo. The change has to be made in your koha.list file (see here for explanation).
  5. Run the sudo koha-create --create-db library command to recreate your Koha instance.
  6. Import the old 3.22 backup into the newly created koha_library database.
  7. Access the Koha staff client using the new database user id and password from the newly recreated koha-conf.xml, and let the web installer run its course.
  8. Switch to a terminal and run sudo koha-rebuild-zebra -v -f library to update your zebra indexes.

You should be good to go! Enjoy!

NOTE 1: Dyuti’s Koha database is named ‘koha_library‘.  However your’s may be named differently, so remember to substitute your own db name here.

NOTE 2: These instructions are meant for a package based Koha installation either on Debian or Ubuntu as outlined in the Koha Community Wiki. If you are running a tarball or git based installation you probably already know what to do. 😀

Koha’s MARC modification templates comes to the rescue

A client had defined an item type they called “Hindi graphic books“. They used the notation “H-GR” to denote such an item e.g. an Amar Chitra Katha graphic novel. Their data was stored in a MS-Excel spreadsheet as simple tabular data. We were tasked with moving this data into Koha. So, to start off, we defined a new itemtype in Koha with the code “H-GR” with the description set to “Hindi Graphics”. The new item type correctly showed up on the OPAC advanced search under the “Item type” search filter. We then converted the data in Excel into MARC21 records using MarcEdit’s “Delimited Text Translator” tool and imported the resulting .mrc file into Koha. Import went well. However, when we searched the newly entered entries, we found that in the OPAC results page, the Item type showed up in the search facet area as “H=GR” instead of as it should – “Hindi Graphics”. Uh oh! something was certainly wrong!

itemtype_error
Figure 1: Incorrect item type displayed

Using MarcEdit we could see that this had happened due to an additional space having gotten accidentally padded to the itemtype data during data migration. To the system “H-GR” is NOT EQUAL to “H-GR “. *Every* single record in this particular batch of imported MARC records had their 942$c and 952$y fields affeccted.

itemtype_error_01
Figure 2: Extra whitespace padding highlighted.

Thankfully for us, Koha ships with the MARC modification template tool, which is basically like a “find and replace” feature on steroids! So in the end we simply defined a rule that allowed us to replace all the affected 942$c in a very simple, single batch mode operation.

itemtype_error_06A

With the template (we named it ‘correctH_GR’) in place (see above), all we now had to do was to the gather together the ids (i.e. biblionumber) of the affected records and then tell Koha to run the template against this batch of biblionumbers, using the Batch record modification tool as shown below.itemtype_error_04

After this the Batch item modification tool was used to update 952$y which is actually stored in items.itype field. And like above, with just one difference that instead of biblionumber we now need the list of barcodes for all the H-GR item type, in order to modify it.

itemtype_error_05A

And finally our original problem is solved!

itemtype_error_06

PRO-TIP: MARC modification template tool is not only useful for post-facto correction. It can be used even while we are staging MARC records in the first place, offering us to make wide-ranging changes and updates to our incoming MARC record.