Migration tips #1 : Separating your 100, 110 and 111s in the spreadsheet

Imagine the following scenario:  Over the last several months you have been steadily entering your library’s bibliographic data, either from the accession register or by accessing actual physical copies at your library into a spreadsheet. You have managed to create some ~23,000 records and now wish to import these into your favourite ILS i.e. Koha.

While working, in the spreadsheet you had made a single column for author information and recorded the name in the <lastname>, <firstname> style for the personal name entries (i.e. MARC21 tag 100). However you have also used the same column to capture corporate (110) or meeting (111) name entries as well.

110

So now how do you pick out these non-100 records, the 110 and 111s from among the ~23,000 records so that you do not leave any corporate body entry in the personal name field? A manual curation is possible. But it is simply too error prone and hugely time consuming.

Luckily since you have used the <lastname> comma <firstname> style consistently (or almost consistently) for all personal name entries you can use your Libreoffice Calc spreadsheet to do some magic. 😀

Step 1: A formula to identify possible corporate entries.

As we see in the example above, Calcutta University has been entered AS-IS and not as University, Calcutta so let us look for records that *do not* have any commas in them. [Sidenote: we will hit some false positives, but the magnitude of the problem will be less that searching all 23,000 records.

And for that we enter the following formula in cell B2 (where A2 is our first record and A1 being the header row):

=IF((ISBLANK(A2)),"NO",IF((ISNUMBER(FIND(",",A2))),"NO", "YES"))

The first part is just a safety check to ensure that our cell is not blank. For course, if it is blank then obviously it can not be a corporate body entry so the answer we want in B2 is “NO”. Once we know it is not indeed blank, we move in to check if we can find our “comma”  in the cell. If the comma is present then it is assumed to be a personal name entry, hence also “NO”, and if not found *and* not blank then it is safe to assume it is most likely a corporate name (110) or a meeting name (111).

Lets see what happens when we apply this formula to all our ~23,000 records.

110_1

N.B. The “false positive” happened as the person doing the data entry did not format the name of “Mahasweta Devi” as “Devi, Mahasweta”

Step 2: Always count the eggs in your basket

We had started with exactly 22,959 records, let us see how many “YES” records our formula has found: 941. We have narrowed our search down to just about 4% of the total records.

110_4

Step 3: Filtering out the “YES” records

Luckily for us, LibreOffice Calc has a nice filtering tool. It is available under Data -> Filter -> Standard filter menu option. We had earlier named our column as “boolean” in the cell B1. So we’ll now filter out all 941 records that marked as “YES” using this tool.

110_2

And immediately our spreadsheet will show us only the “YES” records like this:

110_5

Step 4: Removing the “YES” from false positives

The false positives have a story to tell. They tell us that we need to do better quality control of our data entry. We also probably need to ensure that the persons entering the data understand how to handle Muslim or foreign names.

Example: Let us take the name of my good friend and Koha expert Mr Vimal Kumar Vazhappally. Now the correct way to address him is as Mr. Vimal Kumar and *not* as Mr. Vazhappally. Vazhappally isn’t really his surname, rather it is the name of his village.

For now, the simplest way to correct the false positive is simply to visual check the A column and if it is apparent that the corresponding cell in B column wrongly has a “YES”, simply to move to the cell in B and simply delete the formula from that cell.

Instead of looking through 22,959 records, we now are going to check less than a 1000 records, but this time looking only for false positives.

110_3

Step 5: Two formula to separate the 100s and 110s

After we remove the B column cell values of the false positives – ones with “YES” but not a corporate entity, we now need two more columns to our spreadsheet. These will hold our separated 100 and 110 / 111 values.

So in order to copy over the values of corporate / meeting names (110 / 111) to the C column, we will define the following formula in the cell C2:

=IF((B2="YES"),A2,"")

And similarly in the next column to the right, in the cell D2 we will define this fomula:

=IF(B2="YES","",A2)

Next we will simply copy the range C2:D2 over to the entire range C3:D22959 and we are DONE!

110_6

Columns C and D now have our separated records for all 22959 records. And it took me less time to do the actual correction, than it took me write this blog post, take screenshots, crop, annotate, upload and proof read the final post.

Closing words

We found ~350+ corporate body / meeting name entries in the list, which could be separated out of 22,959 records using this technique.

 

MarcEdit QuickTip #2 – Unicode in your source file

Converting a batch of multi-lingual bibliographic records stored in a MS-Excel worksheet to .mrc using MarcEdit? Be sure to check your charset is set to UTF-8 while saving the spreadsheet or its CSV export.

Naveed Bhatti, a fellow Koha ILS user from neighboring Pakistan pinged last week over a problem he was facing. He had multi-lingual bibliographic data stored in an MS Excel worksheet. He wanted to use MarcEdit 6’s Delimited Text Translator tool available under the “Add-ins” menu to convert this file into an Unicode (UTF-8) encoded MARC21 (.mrc) file so that he could import the records into Koha.

However, when he attempted to generate the .mrk file (MarcEdit’s intermediate MarcBreaker mnemonic format before export to .mrc) instead of seeing the Arabic script, he saw a bunch of “?????? ????? ???” wherever there was text in the Arabic script in the spreadsheet. Naveed thought he must be missing something small but crucial, perhaps a setting.

naveed_02

I had a hunch, but I wanted to check the data before commenting. So I asked for a few sample records, which he sent over the next day. I checked and found it was a simple case of charset conversion glitch at the spreadsheet end of things. I could easily generate both the incorrect as well as the correct output (see above) with a simple change of the charset filter. I was using LibreOffice Calc on Windows 8.1 and the default export charset was *not* set to Unicode (UTF-8). The case being, with the default export charset the exported / saved file did not contain the correct Unicode codepoints in the data for the non-Latin data. As a result, at MarcEdit’s end, it became a simple case of garbage-in-garbage-out instead of receiving the correct non-Latin data.

The screenshot below shows the correct filter to use if you are using LibreOffice Calc. If you are using MS-Office, you should see something similar.

naveed_03

Downgrading your Koha ILS installation

Earlier in the day Dyuti messaged me. He was facing problems that did not exist before after he upgraded to Koha 16.05 from 3.22 on his production system. What he reported sounded like a bug, so I asked him to report it on the Koha bugzilla. Thankfully, he had remembered to take a database backup of his Koha instance before the upgrade. He wanted to know if there was any way he could go back to 3.22 using the backup where things were working fine (as far as the functions he required).

This is a fairly common scenario. People upgrading to the new stable after a major version release, often wish that they can *downgrade* to their earlier major version release, especially after realizing that something is broken in the new major version that used to work for them in the earlier one. If they could do that, then they could wait until the bug fix became available, without any interruption in the library service.

Luckily with Koha this is quite easy, especially if you were on the immediate previous version.

(HINT! one more reason why you should try to keep your Koha instance updated if you do not have a dedicated IT team to support you).

Taking Dyuti’s case as an example use-case, here are the steps to go back  to 3.22 from 16.05:

  1. Using the terminal, take a backup of koha_library database, preferably using koha-run-backups (click to reference details) and keep the backup safe.
  2. Use sudo apt-get purge koha-common command to remove Koha ILS from the system.
  3. Remove the existing koha_library database from MySQL.
  4. Re-install earlier major version series (3.22 in this case) of Koha using the oldstable repo. The change has to be made in your koha.list file (see here for explanation).
  5. Run the sudo koha-create --create-db library command to recreate your Koha instance.
  6. Import the old 3.22 backup into the newly created koha_library database.
  7. Access the Koha staff client using the new database user id and password from the newly recreated koha-conf.xml, and let the web installer run its course.
  8. Switch to a terminal and run sudo koha-rebuild-zebra -v -f library to update your zebra indexes.

You should be good to go! Enjoy!

NOTE 1: Dyuti’s Koha database is named ‘koha_library‘.  However your’s may be named differently, so remember to substitute your own db name here.

NOTE 2: These instructions are meant for a package based Koha installation either on Debian or Ubuntu as outlined in the Koha Community Wiki. If you are running a tarball or git based installation you probably already know what to do. 😀

Koha’s MARC modification templates comes to the rescue

A client had defined an item type they called “Hindi graphic books“. They used the notation “H-GR” to denote such an item e.g. an Amar Chitra Katha graphic novel. Their data was stored in a MS-Excel spreadsheet as simple tabular data. We were tasked with moving this data into Koha. So, to start off, we defined a new itemtype in Koha with the code “H-GR” with the description set to “Hindi Graphics”. The new item type correctly showed up on the OPAC advanced search under the “Item type” search filter. We then converted the data in Excel into MARC21 records using MarcEdit’s “Delimited Text Translator” tool and imported the resulting .mrc file into Koha. Import went well. However, when we searched the newly entered entries, we found that in the OPAC results page, the Item type showed up in the search facet area as “H=GR” instead of as it should – “Hindi Graphics”. Uh oh! something was certainly wrong!

itemtype_error
Figure 1: Incorrect item type displayed

Using MarcEdit we could see that this had happened due to an additional space having gotten accidentally padded to the itemtype data during data migration. To the system “H-GR” is NOT EQUAL to “H-GR “. *Every* single record in this particular batch of imported MARC records had their 942$c and 952$y fields affeccted.

itemtype_error_01
Figure 2: Extra whitespace padding highlighted.

Thankfully for us, Koha ships with the MARC modification template tool, which is basically like a “find and replace” feature on steroids! So in the end we simply defined a rule that allowed us to replace all the affected 942$c in a very simple, single batch mode operation.

itemtype_error_06A

With the template (we named it ‘correctH_GR’) in place (see above), all we now had to do was to the gather together the ids (i.e. biblionumber) of the affected records and then tell Koha to run the template against this batch of biblionumbers, using the Batch record modification tool as shown below.itemtype_error_04

After this the Batch item modification tool was used to update 952$y which is actually stored in items.itype field. And like above, with just one difference that instead of biblionumber we now need the list of barcodes for all the H-GR item type, in order to modify it.

itemtype_error_05A

And finally our original problem is solved!

itemtype_error_06

PRO-TIP: MARC modification template tool is not only useful for post-facto correction. It can be used even while we are staging MARC records in the first place, offering us to make wide-ranging changes and updates to our incoming MARC record.

Customizing Koha’s iconset to match page background

The world’s favorite open source ILS aka Koha ships with several iconsets included i.e. carredart, crystal-clear, seshat, bridge, npl, vokal etc. The primary use of these icon sets is to provide visual cues for different item types e.g. books, serials, DVDs, sound recordings, electronic materials etc. The icon set used by default is The Bridge Material Type Icon Set.
It is licensed under Creative Commons 2.5 Attribution, ShareAlike license.

This HOWTO will focus on how to customize the bridge iconset so that our icons match the OPAC page background. The iconset as shipped by Koha are in .gif format with a white background. This default works perfectly well when we stick with the default CSS of Koha, where the page background (the main div) is also white as seen below.

def_iconset

There are times when we may wish to customize the OPAC CSS using the opacusercss system preference, for instance in this case, we wanted to change the body.main div’s background-color to match our custom OPAC color theme:

/* change main div color */
div.main { background-color: #EAE9E7; }

But due to the white background of the Bridge iconset, the page now looked like this:

def_iconset_01

 

Obviously this needed to be fixed. We went to The Bridge Material Type Icon Set Project page and downloaded the zip file (downloaded from here) containing the original .psd files of the icons.

While you can use Photoshop to edit them, we used GIMP (GNU Image Manipulation Program) to edit the .psd files. We will use the book material type icon – book.psd for this example.

Instructions for editing the icons using GIMP

    1. Open book.psd in GIMP
    2. Check the layers – both visible and the ones turned off gimp_layers
    3. Set the foreground color to #EAE9E7 gimp_forecolor
    4. Add a new layer using using foreground color as set above gimp_newlayer
    5. Arrange the layers correctly with only the text layer, the graphic and the new layer using the color EAE9E7 being visible in proper order gimp_addedlayer
    6. Use GIMP’s Export As option in the File menu to create your new book.gif

Repeat the above steps until you have changed all the required icons. Save all the exported .gif files in a single directory / folder which we will call as “demolib” in this example.

Getting Koha to use the customized icons

N.B.: Koha stores the item type icons in two separate locations i.e. opac-tmpl/bootstrap/itemtypeimg/ (for the OPAC) and at intranet-tmpl/prog/img/itemtypeimg/ (for the staff client)

  1. Create an OPAC sub-folder for your customized icon set mkdir -p /usr/share/koha/opac/htdocs/opac-tmpl/bootstrap/itemtypeimg/demolib
  2. Create a staff client sub-folder for your customized icon set mkdir -p /usr/share/koha/intranet/htdocs/intranet-tmpl/prog/img/itemtypeimg/demolib
  3. Copy all the .gif files in your local demolib folder into both the newly created sub-folders above. Note: You may need to use an FTP / SCP client to copy the files, if you Koha server is located on a remote machine.
  4. Navigate to Home -> Administration -> Item types administration on your Koha system.itemtypes
  5. Click on Edit option against the itemtype Book (BK). You should now be able to see a new icon set tab named “demolib” under the “Choose an icon” section def_iconset_03
  6. Select the new icon for the Book Item type from this demolib library and save the change.
  7. Repeat for each of the item types that you wish to update with your new customized iconset.

Now visit your OPAC page again and select the “Advanced search” option. This time Koha will be using your new icons and these will merge correctly with the page background.
def_iconset_02

PRO-TIP: Instead of editing the existing Bridge icon set, you can also create your own icon set. The usual icon sizes are 32×32 pixel or 40×40 pixel (Bridge uses this dimension). Design and save them to a specific folder and then follow the instructions in the section Getting Koha to use the customized icons. Demolib is just a name we are using here, you can change it to anything else as long as they are not the ones that are already used in Koha.

Setting up a MARC21 ETD Framework in Koha

Recently during a discussion on a Whatsapp group of professional colleagues from the North East, a topic that came up – what was better suited for setting up an ETD repository at their academic institutions? As expected, most sided with DSpace, while a few suggested Eprints. I decided to introduce Koha ILS into the picture. For most, this was a rather surprising suggestion.

The ground reality

95% of Indian ETD operators do very little than scanning up a batch of printed document (bound thesis volumes) or born-digital electronic copy of the theses, make it into a PDF file, throw in some metadata together about the item and plug that into (usually) DSpace. The benefits of DSpace being statistics, organisation into collections and community (user groups), embargo capability, faceted and full text searches across the metadata. There is of course the other point of persistent URLs to the hosted resources via HANDLE system. But given that very few Indian repos actually invest in a Handle.net account, it is quite a moot point.

The argument for Koha

Utilizing MARC21, Koha already offers a highly granular and extensive metadata regime. Its capabilities include collections, search facets, full-text search on the metadata. And since Koha’s version 3.22, the capability to directly host files linked to the bibliographic records. Basically it offers everything that 95% of Indian institutions look for when they are planning to setup a repository. Compared to DSpace, LIS professionals are already better acquainted with Koha (or as many in India like to call it – “KOHA” :-P) as it is the de-facto open source LMS. Further, by using Koha for hosting the repository, the institutions and professionals who are already using Koha as their LMS, gain one key advantage; they no longer need to maintain two separate and very dissimilar systems that use completely separate software application stacks (LAMP vs Java; MySQL vs PostgreSQL etc).

Step 1 – building a MARC21 ETD Framework

I had promised my colleagues to setup a demo ETD using Koha so that they could try it out. After all, there’s no better proof of the pudding than eating it. The first challenge to that is the while Koha does ship with quite a few MARC21 frameworks, an ETD framework is not one of them.

As my starting point, I turned to ETD-MS v1.1, which is considered to be something of a gold standard for ETDs. Taking the help of this page, I worked out a single paged worksheet for cataloging ETDs  on Koha. The result looked like this:

rawmarc

Step  2- Usability and maintainability

Of course this posed a slight problem, how will a cataloger accustomed to / trained on DSpace’s metadata namespace correctly do the crosswalk? To solve this in lines with Koha’s recommended Best Practices for UI mods, jQuery statements were included into Koha’s “intranetuserjs” and “intranetusercss” system preferences. The final outcome was this:

converted_marc

This approach has three (03) key benefits  – (a) that you DO NOT need to touch Koha’s Template::Toolkit based templates (i.e. the .tt files), the changes made are stored in your database and applied during runtime; (b) these changes remain persistent and works across Koha’s monthly version upgrade cycle (since we didn’t change the .tt files) and (c) our other MARC21 templates are left alone, only the ETD framework is thus modified.

The jquery snippet is available on Github as a gist, as is the CSS includes into intranetusercss system preference. There is also a third gist which lists the 04 authorised value categories and their constituent sample options that help us introduce a consistent controlled vocabulary in our metadata.

The demo ETD is available at http://etddemo-opac.l2c2.co.in/. If you wish to access the staff client back-end, contact me via the comments section of this post.

Best wishes!

Resources:

  1. http://blog.l2c2.co.in/wp-content/uploads/2016/06/export_ETD.ods
  2. https://gist.github.com/l2c2technologies/09e7e06f695304f33aada9b529167de6
  3. https://gist.github.com/l2c2technologies/9bc1f9b812e37850959c655fbc0f8802
  4. https://gist.github.com/l2c2technologies/6b735a5e1f4c4f9cc3042af2b8fa5b32
Licensing- CC 4.0 – BY-SA-NC – (c) L2C2 Technologies 2016

The Koha manual is your friend :-)

The Koha User Manual is an amazingly in-depth document that keeps getting better with each iteration, thanks to Koha Library System Project’s indefatigable Documentation Manager Nicole C. Engard. This post exemplifies a situation where it can come in handy.

Yesterday a librarian colleague reached out with a query. He is on Koha 3.20.x series and since the last 3 – 4 days, he had been noticing that the “due date” for items his staff was issuing out was coming out rather odd! For instance all the due dates were set to “18th June 2016“. He clarified that he did not have a hard due date set, so what is going on???

This morning he sent over a screen shot of the checkout operation. And boy! the date now read 25th-June-2016. This was an immediate clue! On checking out his calendar – 18th and 25th turned out to be the only two days that are marked as “library is open” in June 2016 (for issues made in these last two weeks i.e.  before the holidays begin) during their summer recess.

That holiday setting working together with the system preference “useDaysMode” set to “the calendar to push the due date to the next open day” for circulations was the reason why Koha was pushing the due date beyond the usual designated due dates for each patron category.  Basically, it was just Koha doing its job (as it was asked to do) and doing it well.

If you are on 3.20, remember to read up http://manual.koha-community.org/3.20/en/calholidays.html, the relevant system preference are all explained very nicely there by Nicole C. Engard, Koha’s indefatigable Documentation Manager.

Sending email alerts via GMAIL : a marriage of convenience or is it???

Library professionals using Koha ILS quite naturally want to send email alerts of circulation details, due date reminders, overdue notices etc to the patrons / users / customers of their libraries. Google’s GMAIL free email service seems to provide an alluring option:

  • gmail is *huge*! Everyone and their uncle is on gmail – resulting in almost instantaneous delivery of emails.
  • anti-virus, anti-spam facilities that are almost second to none.
  • and most importantly,  it is FREE (as in the proverbial lunch, supported by ads that google places on its web mail interface).

It is ridiculously easy to set up Postfix email server to use GMAIL to relay (i.e. send out the emails to the actual recipients) the email alerts generated by Koha. Endless tutorials exist online. Sounds like a perfect match! A Free Software ILS and a free email service, what could be better!

However, Google is a business entity and they are here to make money – for their shareholders, and over the years Google has been slowly cutting down on the number of emails you can send out in a day. If you hit the daily mail quota limits, Gmail will quietly disable email sending from your account for 24 hours. Do it too often, they may (and do) suspend your email account.

“A marriage of convenience, is neither a marriage nor a convenience!”

No, it won’t hit you at first when you are testing the setup or when you have low circulation figures. However, as you roll out and keep expanding services, the number of alerts you need to generate to provide an even more improved user experience, you are likely to end up hitting those quota. I kid you not!

Koha ILS project changes version numbering

During the General IRC meeting on 9th March 2016, the developers voted [1] in a change in the version numbering scheme used in Koha. For almost 3+ years, Koha has been using the version 3.x. Every stable public release had an incrementally updated even number as “x” (i.e. 3.0, 3.2, 3.4… 3.18, 3.20, 3.22 et. al.). The internal development versions (used only by the developers) used the next odd number after the current stable release e.g. 3.21 was used when 3.20.x was the latest stable series.

All that is set to change with the new major version of Koha to be released in just a few days. We are going to align the time-based public releases to yy.mm format. So, the upcoming stable release will be numbered 16.05, and the one scheduled for November 2016 will be 16.11.

References: [1] http://irc.koha-community.org/koha/2016-03-09#i_1797503

We are switching to Let’s Encrypt as it exits public beta

Announcement!

Back in February this year, we had announced that we were switching on Let’s Encrypt SSL certificates on our servers and VMs on a trial basis. We are happy to share the news that with Let’s Encrypt finally getting out of beta stage, we are shifting to LE certs  for all our SSL support.

Why Let’s Encrypt?

From the LE website, some numbers:

 

Since our beta began in September 2015 we’ve issued more than 1.7 million certificates for more than 3.8 million websites. [..] We set out to encrypt 100% of the Web. We’re excited to be off to a strong start, and with so much support across the industry.

Why SSL is important to you?

By default, Koha’s OPAC and intranet sites use HTTP and *not* HTTPS. This means all data exchanged between your server and your visitor / patron / library staff accessing either of the sites over the Internet, goes over as plain clear text.

To give a real world analogy, this is like writing down your credit / debit card number, CVV / CVV2, expiry date on a postcard and posting it. Anyone that comes across the postcard while it goes from the sender to the recipient can read it. You are essentially broadcasting your user credentials to everyone on the Internet.

On the contrary, HTTPS encrypts all the data exchanged between the parties communicating. Only parties who can read the information thus encrypted is you and your user / visitor and no one else.

How does shifting to LE certs help our users?

Earlier we had to charge our users extra (INR 1200 to 2500 per year) for SSL support for their hosted OPAC and Koha ILS staff client. We were using SSL services of providers like PositiveSSL etc. However, LE’s objective is to bring affordable encryption to 100% of the world wide web – their certificate is FREE! So as our client-partner YOU get to enjoy (a) better value proposition (b) a well-supported quality SSL certificate with global recognition.

You may wonder why and how LE does this. This is what LE has to say about itself on it web site:

Let’s Encrypt is a free, automated, and open certificate authority brought to you by the Internet Security Research Group (ISRG). [..] ISRG’s mission is to reduce financial, technological, and education barriers to secure communication over the Internet.

The sponsors who have committed their support to make LE a success read like a who’s who of the modern Internet as we know it. It is also supported by American Library Association (ALA). So if you are a librarian, this endorsement will mean just how seriously LE is being taken! 😀

Even if you not a client-partner, but happen to follow this blog, we suggest that you seriously consider giving Let’s Encrypt a shot, if for nothing else for peace of mind.

Happy SSLing!