Our client partner Bangabasi College is putting up a collection of their college questions papers from previous years as PDF files. You can have a glimpse of it by clicking here. The page that is being presented to the visitor to the OPAC is generated using a facility called “Koha as a CMS“. Now here is the thing, while the HTML required to display the scanned question paper PDF files is handled well by the “Koha as a CMS” functionality, it does not handle the part where we need to actually upload the PDF files into Bangabasi College’s Koha instance’s Apache2 DocumentRoot path.
So here is what we did. A normal SCP user account was created on the server hosting Bangabasi college account, into which the PDF files were uploaded into by the library staff users. However, after this, it required manual intervention from us, in order to move these files into the correct DocumentRoot path. We had created a folder QB for the question back under the DocumentRoot as /usr/share/koha/opac/htdocs/bangabasi/qb. And into this QB folder the uploaded PDF files were moved into by us.
But this created one problem, a big one. Our client was dependent on us at all times to move / sync the uploaded files into their final destination the QB folder. Also, if they needed to correct and re-upload a PDF file, they would again need us to help them move the corrected file into the DocumentRoot location. So, basically if we were not available for any reason, we would be holding them up from updating / uploading their own files into their hosted Koha. While our client was happy with how things were happening, to us, this was clearly not at all a desirable situation.
Our client was already using Google Drive and that’s when we figured than instead of simply using the Google Drive for backup, we could also use it to allow our client to do direct, independent uploads – their data in their own hands at all times. And thus this experiment.
Setting it all up
1) We created a folder named “qb” on Bangabasi College’s Google drive.
2) Next on the server within the folder /usr/share/koha/opac/htdocs/bangabasi we ran the command drive init. This asked us to authorize the google drive command line client and fetch the API key, which is what we did after logging in into Bangabasi college library’s gmail account. The API key was copied from the browser and pasted back into the command line. Basically what this did was to create a hidden directory named .gd under /usr/share/koha/opac/htdocs/bangabasi and create a file there called credentials.json. This completed the authentication setup with bangabasi’s google drive account.
3) Lastly, we set up the following cron job as the root user:
Now whenever a library staff user uploads a PDF file into their Google drive’s qb folder, every 5 minutes the cron job on our server will check if there is a new file on the remote Google drive. If there is, then the new file(s) are pulled down automatically onto the /usr/share/koha/opac/htdocs/bangabasi/qb. In this case for instance 125 .pdf files totaling in at about 19 MB were pulled down in ~18 seconds.
Similarly in case of a modification or removal of a file from Bangabasi’s Google drive “qb” it would be similarly synced or removed from the /usr/share/koha/opac/htdocs/custinc/bangabasi/qb folder on our server.
How to reference the files
Since the PDF files are stored under /bangabasi/qb folder under the Koha OPAC’s DocumentRoot i.e. /usr/share/koha/opac/htdocs/, we simply need to refer to the files with the following href attribute value set to /bangabasi/qb/<filename> in our HTML code.
Pros and cons of this approach
First the cons:
1) Google’s AI algorithms gets to read all your PDF files. But since our client is already using Google’s services, this is apparently not a major concern to them. And anyway our client is allowing public dissemination of these files, so Google is going to read it one way or another.
2) If the library staff user accidentally or maliciously deletes the Google drive folder or files in it, then the very next run of the pull command will remove the same off our server. But same would have been the case if the staff users had root / sudo access to the Koha DocumentRoot (i.e. /usr/share/koha/opac/htdocs). In fact, in the latter case, they can even rm -rf the entire server, removing *everything* from it.
1) You can now allow your staff users to freely upload the processed files without having to give everyone the access to the actual Koha server’s filesystem. The chances of accidental or malicious deletion of files off the Koha server is largely minimized.
2) The speed! Simply put uploading files to Google drive is usually faster then directly uploading to the Koha server hosted on the Internet. The transfers between Google servers and the hosted Koha server also happens at a high rate of transfer.
3) You basically have *two* online copies of your PDF files – (a) on the Google Drive folder and (b) on the Koha server, which is good in terms of redundancy.
This post discusses how to automate your Koha ILS’s MySQL database backup on to Google drive and send an email when it is complete. It shows how you can take advantage of Google Drive’s 15GB space for free (Dropbox only gives you 2GB on the free access) and do it all from the command line and save the much needed RAM for your Koha server rather than waste it on the GUI, which is also a security risk. Further this attempts to introduce the novice readers into details of the commands they are supposed to follow, with further reading resources, should they be inclined to learn more.
Having your Koha ILS database to be regularly backed up on to remote, cloud storage is an excellent idea. By doing so you ensure a critical off-site, disaster recovery measure, which is very good. However, as with all things human, if we leave it on ourselves to do it, there will come to pass a time when we will (a) forget to do it or (b) be unable to do it for some
reason. As we all know good ol’ Captain Murphy’s Law will strike us whenever we are least prepared; in this case typically that one time we forgot or were unable to take the backup, the darned thing will crash!
So backup automation is key. Not only it ensures regularity without fail. It also removed one more essential chore from our immediate plate, thus leaving us free to do other things without feeling guilty over this key housekeeping chore.
Cloud backup – Google vs Dropbox
Dropbox and Google Drive comes across as immediate choice of cloud based backup. However, their free editions differ … only by about 13GB of space between them. So for long-term online backup Google Drive is the de-facto choice.
So, here is what we set out to do:
create a datetime stamped backup of the database; (so we can tell just by seeing the filename when the backup was taken)
compress it with bzip2 utility; (so all those loooooong lines of SQL text do not take up so much space, a text file can compress up to within 10% of it original size)
upload it to a specified folder on Google Drive; (so that all our backups remain in one place, date-wise)
email the user that the remote backup process is complete. (so when we outside or on vacation and don’t have access to our workstation, we still get a notification when it was completed and if we don’t get one, then that something certainly went wrong and someone should do something about it)
And of course, since we are talking about making this happen everyday at the same time, we need to create a cron job that will deliver all of 1, 2, 3 and 4 to us in a single neat little command.
As you all know, no self respecting system administrator will ever be caught running the X11 windowing system on a production server. So we are going to do these the way real system admins do: from the command line.
NOTE: X11 is the geekspeak for the Graphical User Interface (GUI) environment we see e.g. when we log into an Ubuntu Desktop (which is typically the Unity desktop)
Command line in this day and age? Are you nuts???
No! And here is the reason. X11 is not only an inherently insecure protocol that puts your production system at risk, it is also (compared with a command line only system) a tremendous resource hog! We all know that more free memory (RAM) is usually-a-good-thing ™, so instead of wasting our precious RAM on running a GUI (and all the unnecessary software along with it making it slow *and* insecure) we are going to show you how to do this all from a command line. One other thing: if you ever need the assistance of an expert, you will find that command line setups are also easier to debug (for an expert), after all, aren’t they always asking you to check your “logs”? All those are after all command line output. So like the Chloromint ad below, please don’t ask us again why we love the command line! 😉
We want a normal user account with no admin privileges; say in our case we will call it l2c2backup and we will do it from the terminal using the adduser l2c2backup command. See below:
Next up, we need to switch over to the new user account and create a synchronization folder for Google drive.
At this point, we’ll press “Ctrl+D” and exit from the l2c2backup user and come back to the root user or sudo user, for we now need to install a command line google drive client on our system. We are going to use the (almost) official Google Drive command line client for Linux known simply as “drive” and available from https://github.com/odeke-em/drive
Since we are using Debian, we have the advantage of using the pre-built binaries, which we shall install in the following manner by executing in turn each of the commands: # apt-get install software-properties-common
# apt-add-repository 'deb http://shaggytwodope.github.io/repo ./'
# apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 7086E9CC7EC3233B
# apt-key update
# apt-get update
# sudo apt-get install drive
NOTE:If you are using Ubuntu or other mainstream Linux distributions, you can use the instructions given here on the Platform Packages page.
Once we have completed installation of “drive“, we now need to go back to our /home/l2c2backup/gdrive folder as the user l2c2backup and initialize the sync folder (i.e. /home/l2c2backup/gdrive) using the command “drive init”
Copy the really long URL that the command tells you to visit and open it in your web browser. You will see an application authorization dialog screen come up, click on the “Allow” button.
NOTE: Before pasting the URL, you must make sure that at this point you are logged in into the actual Google user account where you want to send the backups to. Don’t make a mess here.
Assuming you did everything as I have mentioned so far, you will be automatically redirected to the page with the authorization key. It will look pretty much like the one below. Of course, every request will generate a separate access authorization key, so use the one generated specifically against your request.
Copy this key and paste it back at the prompt in your terminal window and press <ENTER>. DO NOT TRY TO TYPE IT OUT BY HAND, COPY-N-PASTE IS THE ONLY WAY HERE!
If you have done everything alright then you should be back at the command prompt without any error or any other message. Your sync folder should now be ready.
Putting our solution together
Now that we have the Google Drive sync ready, it is time to look at each piece of our basic requirement.
1. Creating a datetime stamped backup of our database
First we need to create the name of our output file for the MySQL backup. For this we shall use this: BACKFILE="<dbname>.$(date +"%Y%m%d_%H%M%S").sql;. The date format will give us a datetime string formatted as “20160723_000001” when the date & time is 12:00:01 (AM) on 23-July-2016. For this example, let us assume that the BACKFILE environment variable will hold the value: koha_ghci.20160723_000001.sql.
Note: replace <dbname> with the actual name of your Koha database, which in our case is koha_ghci. So, the syntax for us looked like: BACKFILE=koha_ghci.$(date +"%Y%m%d_%H%M%S").sql;. If you want to learn more about the format specific to the date command, you can read up this.
Next we will create the actual db backup using the datetime stamped output filename we just created. For that mysqldump -u<mysql_db_username> -p<mysql_db_passwd> <dbname> > /home/l2c2backup/gdrive/$BACKFILE.
Note: replace the <mysql_db_username>, <mysql_db_passwd> and <dbname> placeholders with your actual values. In our example case, the actual backup command string looked like this: mysqldump -ukoha_ghci -pASx2xvercbHXzs2dP koha_ghci > /home/l2c2backup/gdrive/$BACKFILE.
2. Compressing our SQL export
The previous step had exported our koha_ghci database as koha_ghci.20160723_000001.sql. We shall now compress this with bzip2 /home/l2c2backup/gdrive/$BACKFILE, which will give us the compressed file koha_ghci.20160723_000001.sql.bz2
3. Upload the compressed SQL backup to Google drive
Before we proceed with the actual upload, we should create a dedicated directory *on* our actual Google drive to store our backups. Lets call this directory as DBBACKUPS and create it on our online Google Drive space. It should be mentioned here that the command for upload using this library we are using, takes the form of drive push --destination <remote_folder_name> <full_path_to_compressed_file>. This code will ask for confirmation and we need to pass “Y” for yes before it will proceed. So we need to take care out that by adding echo Y | before the drive push command.
So in our case it will be echo Y | drive push --destination DBBACKUPS /home/l2c2backup/gdrive/$BACKFILE.bz2
Note:If you wish to learn about the other various options you can additionally use with drive push, I suggest you read this for the details.
4. Sending an email when the upload is done.
We are not running a dedicated, full fledged mail server like say Postfix on this box. Rather we are using the lightweight mstmp-mta with our Gmail account as the mail relay. If you want to know how to configure it, I suggest that you read this tutorial, ignoring the “mutt” part which you do not require. It is very simple. We had email sending working in under a minute. That’s just how long it took use to configure it.
Note: Just remember you *must* have openssl installed otherwise you will never be able to talk to GMail. And also you will need to go to your Google account and enable support for that Google likes to call “less secure apps” (which means any app that does use Google’s OAuth2 protocol for authentication. You will be authenticating over TLS and it is a perfectly safe thing to do, so just ignore Google’s ominous tone and enable “less secure apps”.
Now that we have msmtp-mta up and running, we will send out that email using this: printf "To: <recipient_email_address>\nFrom: <your_gmail_address>\nSubject: <dbname> db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of <your_gmail_address>.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp <recipient_email_address>
In our case that happened to be printf "To: firstname.lastname@example.org\nFrom: email@example.com\nSubject: KOHA_GHCI db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of firstname.lastname@example.org.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp email@example.com.
5. Putting it all together
Now that we have all the parts of the puzzle in place, it is time to assemble it into a single piece. And the way, it worked for us was BACKFILE=koha_ghci.$(date +"%Y%m%d_%H%M%S").sql; mysqldump -ukoha_ghci -pASx2xvercbHXzs2dP koha_ghci > /home/l2c2backup/gdrive/$BACKFILE && bzip2 /home/l2c2backup/gdrive/$BACKFILE && echo Y | drive push --destination DBBACKUPS /home/l2c2backup/gdrive/$BACKFILE.bz2 && printf "To: firstname.lastname@example.org\nFrom: email@example.com\nSubject: KOHA_GHCI db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of firstname.lastname@example.org.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp email@example.com
Note: The reason we used the “&&” is that in BASH it stands for what is called as “Logical AND”. In simple English this merely means that unless the previous command is not not executed successfully, whatever comes next simply won’t execute.
A BASH script and a cron job
We placed this one-liner script that cobbled together into the following BASH script which we named as “backuptogoogle.sh” and placed it in the folder /usr/local/bin after setting its execution bit on with chmod a+x /usr/local/bin/backuptogoogle.sh
BACKFILE=koha_ghci.$(date +"%Y%m%d_%H%M%S").sql; mysqldump -ukoha_ghci -pASx2xvercbHXzs2dP koha_ghci > /home/l2c2backup/gdrive/$BACKFILE && bzip2 /home/l2c2backup/gdrive/$BACKFILE && echo Y | drive push --destination DBBACKUPS /home/l2c2backup/gdrive/$BACKFILE.bz2 && printf "To: firstname.lastname@example.org\nFrom: email@example.com\nSubject: KOHA_GHCI db backed up on GDrive\n\nSee filename $BACKFILE.bz2 on DBBACKUPS folder on Google Drive of firstname.lastname@example.org.\n\nBackup synced at $(date +"%Y-%m-%d %H:%M:%S")" | msmtp email@example.com
We setup a root user cron job with crontab -e and adding the following line and saving it.
Note: The @daily shortcut will execute our script exactly at mid-night everyday. If you want to know what are the other useful cronism shortcuts, I suggest you read this useful post by my Koha colleague and good friend D. Ruth Bavousett over here.
Backup automation from command line
If you have been able to follow the instructions by suitably modifying them to your specific settings, you have just achieved backup automation from the command line. Like I said… It’s Easy Peasy!!! 😀