GoogleIndicTransliteration is a nifty Koha feature allowing easy typing and searching in several Indian language to Indian users. However, a bug prevents it from working if the OPAC is run over SSL (i.e. https). This post provides a clear description and a fix for the problem.
Many Indian Koha users use the GoogleIndicTransliteration option to offer their users the facility to search in Indian languages on the OPAC. This nifty feature allows users to phonetically type in their search queries in Indian languages in order to search catalogs that are (a) multi-lingual or (b) in a Indian language other than English.
However, if you are security minded (and you *should* be if your OPAC is on the Net and allows your users to log in) and you decide to serve your site over to SSL (i.e. https), then guess what? The GoogleIndicTransliteration feature stops immediately with the browser console showing MIXED CONTENT error. Every single Koha version from 3.18.0 (when this feature made its way back into Koha after a long hiatus) up to the latest 16.05.2 (released on August 1, 2016) are affected by this problem.
I do not have time, just show me how to fix this
If you are in a hurry, jump over to the section “Your options until the patch is officially released” at the end of this post. Remember to read the caveat and the assumption, you have been warned! 😉
Why is HTTPS so important?
Let’s take a moment to understand why HTTPS is so important. Let’s assume that your Koha server is on your institutional LAN / intranet or hosted online, either on the cloud or on your own server connected to the Internet via a leased line.
Without HTTPS, every time you login into Koha (staff and/or OPAC) and perform *any* ILS transactions (e.g. patron contact information change, holds, fines, circulation etc) all of that information is available in PLAIN TEXT to everyone on your network.
If you are only connected to your institutional network, then that is the direct extent up to which anyone can see what you are doing. If your server is accessible over the Internet, then basically the whole wide world can see what you are doing. For instance, when you login over HTTP, it is actually the equivalent of writing down your username and password on a postcard and mailing it across the globe. Anyone who handles it during transit, or wants to, can simply read it. That’s why the world is moving away from the plain vanilla HTTP.
In simple terms, HTTPS on the other hand creates an end-to-end encrypted“tunnel” between your server and the browser that is requesting access (e.g. to the OPAC). Think of it as a secure, sealed box with the contents inside and only you, the user, have the “key” to unlock it. The actual process is depicted in the graphics below:
Briefly HTTPS has 3 main benefits:
(b) Data integrity
None of these are provided by HTTP, thus if your Koha server is online, the SSL (HTTPS) is simply a must these day!
The Basics Explained
GoogleIndicTransliteration feature utilizes a Google API designed for phonetic input of several Indian languages by transliterating text written in English on the fly to its Indian language equivalent. For example, if you type “Rabindranath” and it is set to transliterate to Bengali, the software will automatically convert to “রবীন্দ্রনাথ” or say “Premchand” to “प्रेमचाँद” if set for Hindi.
Once GoogleIndicTransliteration system preference is set to “Show” from the Koha staff client, the code inside the file opac-bottom.inc loads up the API loader code available at www.google.com/jsapi, which in turns provides the framework so that the actual transliteration code available in the file googleindictransliteration.js can work its magic and provide the users with the transliteration feature.
Why does HTTPS break it but not HTTP?
Short answer: Mixed context!
Long answer: HTTPS is important to protect both your site and your users from attacks online. As of now, Koha code in opac-bottom.inc calls the jsapi code over HTTP, instead of letting the browser handle it correctly based on the security context (i.e. whether the page is being served over HTTP or HTTPS). So when OPAC is on HTTP, jsapi is fetched over HTTP, things are on the same page. However, when the OPAC is served over HTTPS and jsapi continues to be fetched over HTTP, all modern browsers will flag it as a security violation known as “MIXED CONTENT” and halts the loading of jsapi, as seen in the screenshot below:
As a result, googleindictransliteration.js has nothing to work with. End result, the GoogleIndicTransliteration feature does not work anymore! Bingo! We’ve found ourselves with a Koha bug!
Present status of bug
There is a patch submitted to Koha Bugzilla against Bug 17103 – Google API Loader jsapi called over http, waiting for sign-off and QA. Once it clears Koha’s project governance processes, it is expected to get pushed to the master and then be released with a stable version of Koha. Once that happenes we won’t have this issue anymore. NOTE: Expect this fix to get backported across the current supported older releases.
Your options until the patch is officially released
(a) Do without GoogleIndicTransliteration feature until the fix is officially released by the Koha project if you are using HTTPS
(b) Edit your “opac-tmpl/bootstrap/en/includes/opac-bottom.inc” file. Find the following section:
[% IF ( GoogleIndicTransliteration ) %]
[% END %]
Replace the protocol notifier “http:” from jsapi URI with “https:“and save the file. It should look like this after the change:
[% IF ( GoogleIndicTransliteration ) %]
[% END %]
CAVEAT: If you are doing this edit, it is assumed that you know what you are doing. If you make any mistake and break something during this, its all on you.
ASSUMPTION: This edit assumes you are on Koha 3.18.x and later and is using a .deb package based installation on Debian or Ubuntu.
I’m writing this seeking a solution for the problem relating to a report generation from Koha. I want to get a list of titles under a particular broader subject field-tag (650). Quite a number of times I’ve checked from SQL Report. But all were in vain as those were not the same what I actually want to get. Following is the specimen of the opted report:
Sr. No. Title Author Copy No. Subject
1 ………. X 3 Bengali
Under the subject Bengali or English or whatsoever, I want to get the titles those are belong to that particular subject. However, it won’t be a problem if there are different reports for different subjects. It’s Ok. But the SQL Query should be a general query structure that can be applicable for all such reports on the titles belong to a broader subject like Bengali, History, Geography etc.
Sir, please let me know the query structure, if possible.
Here is a possible solution for his request, which pretty much does what Mr. Mukhopadhyay had specified in his request. In this example we’ll use a sample MARC21 file which can be downloaded from here to try out this example. This dataset has a 14 unique bibliographic records with a total of 42 item (holdings) record, belonging to 03 specific broader subjects i.e. English, Economics and Political Science. As per Mr. Mukhopadhyay’s use-case, the MARC field 650 holds the broader subject classification. However, to match real world scenarios the 650 fields in some of the cases have other subject headings defined including narrower terms. Also additionally we are going to add an additional column to our report – the biblionumber, so that if required we can cross check a title in the report generated against the biblionumber in the database.
CAVEAT EMPTOR: If you are going to try out this example, we suggest that you define a new Koha library and import this MARC file into it. Mixing this sample data with your existing records is strongly advised against.
Step #1 – Create a new Koha instance and set it up
We are going to use the koha-create Debian command to create a new Koha instance and we shall call our instance as demo.
sudo koha-create --create-db demo
You may calls your instance by whatever name you like. If you are not aware of the koha-create command, please read up “Commands provided by the Debian packages“. Next we will do a default setup and proceed to define a Library that we’ll call “L2C2 Technologies Demo Library” identified by the code “MAIN”, using these instructions here.
NB.: To use the marc file used in this example you must set the library code for your demo branch as “MAIN”, the name (of the library branch) can be whatever you want it to be.
Step #2 – Define a new Authorize value category
Since our example marc file has biblios with only (a) English (b) Economics and (c) Political science, we will define a new authorized value category which we’ll call as SUBLOOKUP, under Home › Administration › Authorized values. Once setup our new authorized value SUBLOOKUP will look like this:
This authorized value list will provide the subject selection list for our custom SQL report. So if you have more subjects you will need to add them here so that they look like this. The “%” in the Authorized value is *critical*, and if you want to be really strict about it, you can drop the preceding “%” and retain only the one at the end. However should you do that, your first 650 field *must* always be the broader subject heading that you wish to filter your report on.
Step #3 – Define our custom SQL report
We will go to Home › Reports › Guided reports wizard › Create from SQL and create a new SQL report. In this case, we’ll name the report as “List title with number of copies filtered by subject”, add a note that says – “A report written at the request of Mr. Gautam Mukhopadhyay, Chandrapur College, BWN”. The SQL will be as given below. The report once saved, will allow us to run it.
(@row:=@row+1) AS `S/N`,
REPLACE (@TargetSubject:=<<Select the subject|SUBLOOKUP>>, '%', '') AS Subject,
gmData.biblioid AS `Biblionumber`
biblio.title AS Title,
biblio.biblionumber as biblioid,
ExtractValue(biblioitems.marcxml,'//datafield[@tag="245"]/subfield[@code>="c"]') AS Author,
count(items.itemnumber) AS Copies,
ExtractValue(biblioitems.marcxml,'//datafield[@tag="650"]/subfield[@code>="a"]') AS Subject
biblioitems on (items.biblioitemnumber=biblioitems.biblioitemnumber)
biblio on (biblioitems.biblionumber=biblio.biblionumber)
biblio.biblionumber) as gmData,
(SELECT @row := 0) r
WHERE Subject LIKE <<Re-select the subject tag|SUBLOOKUP>>
Let us take a moment to understand what this piece of SQL syntax really means.
(@row:=@row+1) AS `S/N`,
(SELECT @row := 0) r
The use of the @row variable and the counter (@row:=@row+1) gives us our “serial number” column in the report listing. We can also see the authorized value list “SUBLOOKUP” that we had defined earlier referenced here in the SQL.
NOTE: As you may note, we are asking the user to select the subject *twice*, (first time: ‘Select the subject’ and second time: ‘Re-select the subject tag’). While theoretically we should not be required to do so, thank to the use of the runtime variable@TargetSubject, in reality we ran into a type casting error (see below), thus we used this less than pretty way of asking the user to select the subject twice, to get our job done.
Step #4 – Running the report
After the report is saved, it is now time to run it, using the “Run report” option. What we’ll see now will be like this:
We need to select the *same* subject from both the drop-down lists and click on “Run the report” button. Selecting “Economics” we shall in our case get the following report:
Step #5 – Prettifying the custom report user interface
Having the user to select the subject twice is cumbersome as well prone to human error, so we decided it is time for some jquery magic to streamline this and leave the users with one only a single drop-down to choose from. For this we’ll turn turn to the IntranetUserJS system preference and add the following jquery snippet:
var subval = $('#sql_params_Selectthesubject').val();
If this is the first time you are hearing about the IntranetUserJS system preference, you should definitely read up this. Those of you who are indeed familiar with IntranetUserJS, all we are doing here is to (1) hide the second subject selection dropdown and its label and then (2) we are defining that whenever the user chooses a value from the *first* drop-down, the second (and now hidden) drop-down should also have the same value selected automatically. After saving the IntranetUserJS update, on running the report we shall see this:
And bingo! We are done!
Extraa Innings: To see the actual report in action
It was nice to see IASLIC’s (Indian Association of Special Libraries and Information Centres) Newsletter for June 2016 feature one of L2C2 Technologies blogpost on Koha Integrated Library System‘s version numbering changes. The IASLIC newsletter can be access from here. See page 5 of 8 under the section “Technology News”.
Shows how to de-duplicate a .mrc file, by merging duplicate bibliographic records spread all over the file and then gather up the holdings record into repeatable 952 field that Koha expects its for item records.
Last night Pawan Sharma, a fellow user on “Koha Users” reached out for some help with importing his items into Koha. Like many other, he too had moved his catalog data from Microsoft Excel spreadsheet to MarcEdit utilizing MarcEdit’s “Delimited Text Translator” feature which at the end of the process had given him a .mrc file. This he proceeded to upload into Koha by using the More > Tools > Catalog > Stage MARC records for importoption.
There were no surprises here, *except* that for every single books with multiple copies Koha imported each of the copies as a separate biblio record, instead of a single entry for the biblio with multiple item records attached to it via Marc21 952 repeatable field that Koha uses for managing holdings data. Simply put his data needed to be de-duplicated with the holding data merged back before import, typically using the ISBN number of the records (MARC field 020).
For someone who has not done this before, MarcEdit’s de-duplication and then merging it can seem like a daunting task. This post will hopefully demystify the process.
The discussions on Koha Users were based on a lot of assumptions, especially with no idea about Pawan’s data. So, I offered to take a look at it. He first sent me a .mrc file that had 12806 records, which I immediately converted into MarcEdit’s MarcBreaker mnemonic, human readable format.
And proceeded to take a “Field count” report (see under “Tools” menu of MarcEditor) to check exactly how many records had ISBN (MARC21 field 020) out of the total number of records.
The result as can be seen above – NOT A SINGLE ONE of the 12806 biblio records had an ISBN number! Well, this file can be de-duplicated and merged, but *not* using MarcEdit. Only being told about this Pawan mentioned that he had other .mrc files that had ISBN and so he sent a second .mrc (LG-32016-32979.mrc) file over. Turns out of the total of 965 biblio records in this second file, 828 records had ISBN numbers defined.
The next task was to extract the records that *had* ISBN numbers. The remaining 137 can not be dealt with in this process and will have to be dealt with separately. For now, we closed the file LG-32016-32979.mrk file with 965 records and went back to the MarcEdit main window in order to use the “Delete Selected Records” option available under Tools > Select MARC Records
The next few steps are simple, if not immediately apparent to a new user of MarcEdit. We’ll use the numbered markers on the screenshot to explain it in steps. First, we selected the LG-32016-32979.mrk file with the 965 records in step #1; next we typed in 020 (since we want to match for ISBN) in the Display Field option (by default it shows 245$a); third step was to click on “Import File” button. After the file is imported (takes just a second or two depending on your file size) this the top-left data grid which was blank so far, will show up data similar to this. Finally in step #4, we will click the “Does Not Match” link. Records that do not have an ISBN number will be selected just like the big red arrow here shows.
The last step is to click on “Delete Selected”, this will open a File Save dialog with the title “Remaining Records”. In the case, we provided the name LG-32016-32979_ISBN.mrk and saved it and exited from this deletion utility.
This file LG-32016-32979_ISBN.mrk now has the 828 records with ISBN numbers and each of which has a holding records. This is what we will work with for the deduplication process.
Using the Tools > Record Deduplication option of MarcEditor, we will now remove the duplicate records into a separate file and save it with the name LG-32016-32979_ISBN_DEDUP.mrk. We will use ISBN as the field to use to identify duplicates. A popup showed us that 828 records processes, so we are done with deduplication. We will also need to save our original work file LG-32016-32979_ISBN.mrk. This file now contains biblio records with unique ISBN number. A quick check with the Fields Count tool showed us there were now 523 records (down from 828 records originally, the rest 305 records are the duplicates that are now saved in LG-32016-32979_ISBN_DEDUP.mrk).
Now for the next step MARC Merge, which was the last step in this process. We have to go back to the main MarcEdit window and use the menu option Tools > Merge Records. The order of files we specify here is highly *important*. The “Source File” in this case was LG-32016-32979_ISBN.mrk(the file with the 523 records with unique ISBN numbers), the “Merge File” is LG-32016-32979_ISBN_DEDUP.mrk(the file where we had removed the duplicates to in the previous step) and finally, “Save File” is simply the name of the new merged file we are going to create (Hint: this is the final file that we will push to Koha). We named the final file as LG-32016-32979_ISBN_MERGED.mrk. The Record Identifier is of course 020 (i.e. ISBN number) and we move on the next screen.
This is next step is basically *everything* we have been working for in this post so far, we select the field to merge in from “Merge File” into the “Source File” and click next.
In this case everything went well and we were presented with the following screen that said “Merge Completed” and gave us the full path and filename to our merged file LG-32016-32979_ISBN_MERGED.mrk.
Of course we opened up the LG-32016-32979_ISBN_MERGED.mrk file in MarcEditor. The first thing was to check the Field Count report, and this is what we saw 523 biblio records with a total of 828 holding records, which sounds right! Below is example of the merged holdings.
Of course there is still the task of exporting the MarcBreaker (.mrk) back to .mrc so that Koha can ingest it for its MARC21 staging workflow, but everyone knows that 🙂
NOTE: For reference to this tutorial I’m attaching the zip file containing all the LG-32016-32979 files used in this example.
In-line line breaks in a CSV file can really send your Koha patron import script into a tailspin. Here is what you need to watch out for and the couple of other gotchas which will make you upgrade your Koha instance if the version you are using is less than 3.22.7.
Last week a friend working at a local college approached me for a spot of help. He was trying to import his patrons into Koha but was failing miserably. After he nearly got his head snapped off (Me: Do I look like I’m in the fortune telling profession???) he agreed to send over his data – an MS-Excel sheet for me to take a look at.
I pulled up a 3.22.6 instance I had laying around and tried to import his data. Quite expectedly, there were errors galore and the pretty much the same ones he was complaining about.
Hang on! categorycode, branchcode and surname fields were NOT missing in *any* single record. So what was going on here??? The most interesting to note here is that patron importer script said :
272 not imported because they are not in the expected format
272 records parsed
Now this was really something as the total number of student records in that patron uploader CSV file were only 144. So where does the number 272 come from?
The answer to this was easy to find. My friend’s data had several records in a rather bad shape – they had embedded line-breaks within the cell. I’ve highlighted the first few of the badly formatted cells with yellow in the screenshot below.
So, I copied the first 28 records over to a new file, ran a hackish utility script to clean out the line breaks and saved these 28 records as a new file and proceeded to upload it. This time of course “the fat lady sang” i.e. the records got imported nice and we were done!. 😀
NOTE:Of course while doing that we encountered a few Koha bugs as well – Bug id 15840 and Bug id 16426. The work-around mentioned in comment #16 of the latter bug, by Koha QA Manager Katrin Fischer holds good, in case you get stuck here and can’t immediately upgrade. Otherwise to avoid these to bugs, your real option is to upgrade your Koha instance, something that I’m going to recommend to my friend (aside from him fixing his data).
Reference:  Wikipedia “It ain’t over till the fat lady sings”
I’m grateful to the support of KLA Executive Committee and to my good friend Mr. Venkitachalam Sriram who had worked closely with me at every step to make this possible. And also a big THANK YOU to all our well wishers.
If you represent a non-profit state-level library association wanting to publish an open-access online journal, do get in touch with us for some great offers.
This post discusses how to automate your Koha ILS’s MySQL database backup on to Google drive and send an email when it is complete. It shows how you can take advantage of Google Drive’s 15GB space for free (Dropbox only gives you 2GB on the free access) and do it all from the command line and save the much needed RAM for your Koha server rather than waste it on the GUI, which is also a security risk. Further this attempts to introduce the novice readers into details of the commands they are supposed to follow, with further reading resources, should they be inclined to learn more.
Having your Koha ILS database to be regularly backed up on to remote, cloud storage is an excellent idea. By doing so you ensure a critical off-site, disaster recovery measure, which is very good. However, as with all things human, if we leave it on ourselves to do it, there will come to pass a time when we will (a) forget to do it or (b) be unable to do it for some
reason. As we all know good ol’ Captain Murphy’s Law will strike us whenever we are least prepared; in this case typically that one time we forgot or were unable to take the backup, the darned thing will crash!
So backup automation is key. Not only it ensures regularity without fail. It also removed one more essential chore from our immediate plate, thus leaving us free to do other things without feeling guilty over this key housekeeping chore.
Cloud backup – Google vs Dropbox
Dropbox and Google Drive comes across as immediate choice of cloud based backup. However, their free editions differ … only by about 13GB of space between them. So for long-term online backup Google Drive is the de-facto choice.
So, here is what we set out to do:
create a datetime stamped backup of the database; (so we can tell just by seeing the filename when the backup was taken)
compress it with bzip2 utility; (so all those loooooong lines of SQL text do not take up so much space, a text file can compress up to within 10% of it original size)
upload it to a specified folder on Google Drive; (so that all our backups remain in one place, date-wise)
email the user that the remote backup process is complete. (so when we outside or on vacation and don’t have access to our workstation, we still get a notification when it was completed and if we don’t get one, then that something certainly went wrong and someone should do something about it)
And of course, since we are talking about making this happen everyday at the same time, we need to create a cron job that will deliver all of 1, 2, 3 and 4 to us in a single neat little command.
As you all know, no self respecting system administrator will ever be caught running the X11 windowing system on a production server. So we are going to do these the way real system admins do: from the command line.
NOTE: X11 is the geekspeak for the Graphical User Interface (GUI) environment we see e.g. when we log into an Ubuntu Desktop (which is typically the Unity desktop)
Command line in this day and age? Are you nuts???
No! And here is the reason. X11 is not only an inherently insecure protocol that puts your production system at risk, it is also (compared with a command line only system) a tremendous resource hog! We all know that more free memory (RAM) is usually-a-good-thing ™, so instead of wasting our precious RAM on running a GUI (and all the unnecessary software along with it making it slow *and* insecure) we are going to show you how to do this all from a command line. One other thing: if you ever need the assistance of an expert, you will find that command line setups are also easier to debug (for an expert), after all, aren’t they always asking you to check your “logs”? All those are after all command line output. So like the Chloromint ad below, please don’t ask us again why we love the command line! 😉
We want a normal user account with no admin privileges; say in our case we will call it l2c2backup and we will do it from the terminal using the adduser l2c2backup command. See below:
Next up, we need to switch over to the new user account and create a synchronization folder for Google drive.
At this point, we’ll press “Ctrl+D” and exit from the l2c2backup user and come back to the root user or sudo user, for we now need to install a command line google drive client on our system. We are going to use the (almost) official Google Drive command line client for Linux known simply as “drive” and available from https://github.com/odeke-em/drive
Since we are using Debian, we have the advantage of using the pre-built binaries, which we shall install in the following manner by executing in turn each of the commands: # apt-get install software-properties-common
# apt-add-repository 'deb http://shaggytwodope.github.io/repo ./'
# apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 7086E9CC7EC3233B
# apt-key update
# apt-get update
# sudo apt-get install drive
NOTE:If you are using Ubuntu or other mainstream Linux distributions, you can use the instructions given here on the Platform Packages page.
Once we have completed installation of “drive“, we now need to go back to our /home/l2c2backup/gdrive folder as the user l2c2backup and initialize the sync folder (i.e. /home/l2c2backup/gdrive) using the command “drive init”
Copy the really long URL that the command tells you to visit and open it in your web browser. You will see an application authorization dialog screen come up, click on the “Allow” button.
NOTE: Before pasting the URL, you must make sure that at this point you are logged in into the actual Google user account where you want to send the backups to. Don’t make a mess here.
Assuming you did everything as I have mentioned so far, you will be automatically redirected to the page with the authorization key. It will look pretty much like the one below. Of course, every request will generate a separate access authorization key, so use the one generated specifically against your request.
Copy this key and paste it back at the prompt in your terminal window and press <ENTER>. DO NOT TRY TO TYPE IT OUT BY HAND, COPY-N-PASTE IS THE ONLY WAY HERE!
If you have done everything alright then you should be back at the command prompt without any error or any other message. Your sync folder should now be ready.
Putting our solution together
Now that we have the Google Drive sync ready, it is time to look at each piece of our basic requirement.
1. Creating a datetime stamped backup of our database
First we need to create the name of our output file for the MySQL backup. For this we shall use this: BACKFILE="<dbname>.$(date +"%Y%m%d_%H%M%S").sql;. The date format will give us a datetime string formatted as “20160723_000001” when the date & time is 12:00:01 (AM) on 23-July-2016. For this example, let us assume that the BACKFILE environment variable will hold the value: koha_ghci.20160723_000001.sql.
Note: replace <dbname> with the actual name of your Koha database, which in our case is koha_ghci. So, the syntax for us looked like: BACKFILE=koha_ghci.$(date +"%Y%m%d_%H%M%S").sql;. If you want to learn more about the format specific to the date command, you can read up this.
Next we will create the actual db backup using the datetime stamped output filename we just created. For that mysqldump -u<mysql_db_username> -p<mysql_db_passwd> <dbname> > /home/l2c2backup/gdrive/$BACKFILE.
Note: replace the <mysql_db_username>, <mysql_db_passwd> and <dbname> placeholders with your actual values. In our example case, the actual backup command string looked like this: mysqldump -ukoha_ghci -pASx2xvercbHXzs2dP koha_ghci > /home/l2c2backup/gdrive/$BACKFILE.
2. Compressing our SQL export
The previous step had exported our koha_ghci database as koha_ghci.20160723_000001.sql. We shall now compress this with bzip2 /home/l2c2backup/gdrive/$BACKFILE, which will give us the compressed file koha_ghci.20160723_000001.sql.bz2
3. Upload the compressed SQL backup to Google drive
Before we proceed with the actual upload, we should create a dedicated directory *on* our actual Google drive to store our backups. Lets call this directory as DBBACKUPS and create it on our online Google Drive space. It should be mentioned here that the command for upload using this library we are using, takes the form of drive push --destination <remote_folder_name> <full_path_to_compressed_file>. This code will ask for confirmation and we need to pass “Y” for yes before it will proceed. So we need to take care out that by adding echo Y | before the drive push command.
So in our case it will be echo Y | drive push --destination DBBACKUPS /home/l2c2backup/gdrive/$BACKFILE.bz2
Note:If you wish to learn about the other various options you can additionally use with drive push, I suggest you read this for the details.
4. Sending an email when the upload is done.
We are not running a dedicated, full fledged mail server like say Postfix on this box. Rather we are using the lightweight mstmp-mta with our Gmail account as the mail relay. If you want to know how to configure it, I suggest that you read this tutorial, ignoring the “mutt” part which you do not require. It is very simple. We had email sending working in under a minute. That’s just how long it took use to configure it.
Note: Just remember you *must* have openssl installed otherwise you will never be able to talk to GMail. And also you will need to go to your Google account and enable support for that Google likes to call “less secure apps” (which means any app that does use Google’s OAuth2 protocol for authentication. You will be authenticating over TLS and it is a perfectly safe thing to do, so just ignore Google’s ominous tone and enable “less secure apps”.
Now that we have msmtp-mta up and running, we will send out that email using this: printf "To: <recipient_email_address>\nFrom: <your_gmail_address>\nSubject: <dbname> db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of <your_gmail_address>.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp <recipient_email_address>
In our case that happened to be printf "To: firstname.lastname@example.org\nFrom: email@example.com\nSubject: KOHA_GHCI db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of firstname.lastname@example.org.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp email@example.com.
5. Putting it all together
Now that we have all the parts of the puzzle in place, it is time to assemble it into a single piece. And the way, it worked for us was BACKFILE=koha_ghci.$(date +"%Y%m%d_%H%M%S").sql; mysqldump -ukoha_ghci -pASx2xvercbHXzs2dP koha_ghci > /home/l2c2backup/gdrive/$BACKFILE && bzip2 /home/l2c2backup/gdrive/$BACKFILE && echo Y | drive push --destination DBBACKUPS /home/l2c2backup/gdrive/$BACKFILE.bz2 && printf "To: firstname.lastname@example.org\nFrom: email@example.com\nSubject: KOHA_GHCI db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of firstname.lastname@example.org.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp email@example.com
Note: The reason we used the “&&” is that in BASH it stands for what is called as “Logical AND”. In simple English this merely means that unless the previous command is not not executed successfully, whatever comes next simply won’t execute.
A BASH script and a cron job
We placed this one-liner script that cobbled together into the following BASH script which we named as “backuptogoogle.sh” and placed it in the folder /usr/local/bin after setting its execution bit on with chmod a+x /usr/local/bin/backuptogoogle.sh
BACKFILE=koha_ghci.$(date +"%Y%m%d_%H%M%S").sql; mysqldump -ukoha_ghci -pASx2xvercbHXzs2dP koha_ghci > /home/l2c2backup/gdrive/$BACKFILE && bzip2 /home/l2c2backup/gdrive/$BACKFILE && echo Y | drive push --destination DBBACKUPS /home/l2c2backup/gdrive/$BACKFILE.bz2 && printf "To: firstname.lastname@example.org\nFrom: email@example.com\nSubject: KOHA_GHCI db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of firstname.lastname@example.org.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp email@example.com
We setup a root user cron job with crontab -e and adding the following line and saving it.
Note: The @daily shortcut will execute our script exactly at mid-night everyday. If you want to know what are the other useful cronism shortcuts, I suggest you read this useful post by my Koha colleague and good friend D. Ruth Bavousett over here.
Backup automation from command line
If you have been able to follow the instructions by suitably modifying them to your specific settings, you have just achieved backup automation from the command line. Like I said… It’s Easy Peasy!!! 😀
Earlier today my good friend Vimal Kumar Vazaphally posted a question here about how to add the the item type (mc-itype in koha search speak) filter as a dropdown to the default main masthead search in the Koha OPAC.
RTFM and RTFM often!!! It may save your life!
<rant>The discussion that pursued on the FB group made something very clear. People forget to read the fine manual and when they do, they do not read “between the lines”. Trust me on this one, the Koha user manual is truly a ginormous treasure trove, if you take the pains to read it.</rant>
Ok! here is why I said that people really need to RTFM. The solution to the problem which Vimal shared can easily be extrapolated from this section in the manual – Appendix P: Extending Koha – Newest title pulldown (#KohaTrivia it is based on a July 2012 blog post by Nicole C. Engard (Koha’s indefatigable Documentation manager).
“Reading between the lines” of a given solution
Now, if this sounded confusing, ponder for a moment on this Ajeet joke from yesteryears:
Raabert: Boss? Is kaa kyaa kare boss? Ajeet: Rawbert! Is pille ko liquid oxygen me daal do. Liquid ise jeene nahi dega, aur oxygen ise marne nahi dega.
Nicole’s SQL based option list generator: SELECT CONCAT('<option value=\"mc-itype:', itemtype, '\">',description,'</option>') FROM itemtypes
In this case, turns out we have 5 itemtypes defined and we get this following output and we save it as CSV for introduction into our jquery.
Steps to export your LW data to MARC21 for import into Koha.
LibraryWorld (LW) is a hosted (SaaS) proprietary integrated Library system developed and promoted by San Jose, California based Libraryworld Inc. According to their website they charge USD 495 annually for a hosted instance of LW. So earlier in the day when Mr. Sonam Rinzee from Bhutan requested for some help in exporting his library’s catalog data, it presented the perfect opportunity for L2C2 Technologies to tinker with something new. These screenshots describe how it was done.
Step #1 : Access the LW website
Step #2 : Login into the admin dashboard
Step #3 : Go to Catalog module and click on “Advanced”
Step #4 : Use the search filter like this
Step #5 : Export the records
Step #6 : Export as MARC using the export button
Step #7 : Wait for your export to be ready. Click on “refresh” link to see if it is “done”
Step #8 : Download the exported data by clicking on “FILE” link and change file extension to “.mrc”
Step #9 : Open the file using MarcEdit and convert it to .mrk for edit work
Step #10 : Now curate, remap and edit the data in a batch for making it import ready into Koha
Recently during a discussion on a Whatsapp group of professional colleagues from the North East, a topic that came up – what was better suited for setting up an ETD repository at their academic institutions? As expected, most sided with DSpace, while a few suggested Eprints. I decided to introduce Koha ILS into the picture. For most, this was a rather surprising suggestion.
The ground reality
95% of Indian ETD operators do very little than scanning up a batch of printed document (bound thesis volumes) or born-digital electronic copy of the theses, make it into a PDF file, throw in some metadata together about the item and plug that into (usually) DSpace. The benefits of DSpace being statistics, organisation into collections and community (user groups), embargo capability, faceted and full text searches across the metadata. There is of course the other point of persistent URLs to the hosted resources via HANDLE system. But given that very few Indian repos actually invest in a Handle.net account, it is quite a moot point.
The argument for Koha
Utilizing MARC21, Koha already offers a highly granular and extensive metadata regime. Its capabilities include collections, search facets, full-text search on the metadata. And since Koha’s version 3.22, the capability to directly host files linked to the bibliographic records. Basically it offers everything that 95% of Indian institutions look for when they are planning to setup a repository. Compared to DSpace, LIS professionals are already better acquainted with Koha (or as many in India like to call it – “KOHA” :-P) as it is the de-facto open source LMS. Further, by using Koha for hosting the repository, the institutions and professionals who are already using Koha as their LMS, gain one key advantage; they no longer need to maintain two separate and very dissimilar systems that use completely separate software application stacks (LAMP vs Java; MySQL vs PostgreSQL etc).
Step 1 – building a MARC21 ETD Framework
I had promised my colleagues to setup a demo ETD using Koha so that they could try it out. After all, there’s no better proof of the pudding than eating it. The first challenge to that is the while Koha does ship with quite a few MARC21 frameworks, an ETD framework is not one of them.
As my starting point, I turned to ETD-MS v1.1, which is considered to be something of a gold standard for ETDs. Taking the help of this page, I worked out a single paged worksheet for cataloging ETDs on Koha. The result looked like this:
Step 2- Usability and maintainability
Of course this posed a slight problem, how will a cataloger accustomed to / trained on DSpace’s metadata namespace correctly do the crosswalk? To solve this in lines with Koha’s recommended Best Practices for UI mods, jQuery statements were included into Koha’s “intranetuserjs” and “intranetusercss” system preferences. The final outcome was this:
This approach has three (03) key benefits – (a) that you DO NOT need to touch Koha’s Template::Toolkit based templates (i.e. the .tt files), the changes made are stored in your database and applied during runtime; (b) these changes remain persistent and works across Koha’s monthly version upgrade cycle (since we didn’t change the .tt files) and (c) our other MARC21 templates are left alone, only the ETD framework is thus modified.
The jquery snippet is available on Github as a gist, as is the CSS includes into intranetusercss system preference. There is also a third gist which lists the 04 authorised value categories and their constituent sample options that help us introduce a consistent controlled vocabulary in our metadata.
The demo ETD is available at http://etddemo-opac.l2c2.co.in/. If you wish to access the staff client back-end, contact me via the comments section of this post.