Almost a year back, we had shared a post about how to use google drive as a remote backup storage. If you are unfamiliar with it and wish to understand the concepts presented here, we suggest that you first read it and then proceed with this.
Why did we do this?
Our client partner Bangabasi College is putting up a collection of their college questions papers from previous years as PDF files. You can have a glimpse of it by clicking here. The page that is being presented to the visitor to the OPAC is generated using a facility called “Koha as a CMS“. Now here is the thing, while the HTML required to display the scanned question paper PDF files is handled well by the “Koha as a CMS” functionality, it does not handle the part where we need to actually upload the PDF files into Bangabasi College’s Koha instance’s Apache2 DocumentRoot path.
So here is what we did. A normal SCP user account was created on the server hosting Bangabasi college account, into which the PDF files were uploaded into by the library staff users. However, after this, it required manual intervention from us, in order to move these files into the correct DocumentRoot path. We had created a folder QB for the question back under the DocumentRoot as
/usr/share/koha/opac/htdocs/bangabasi/qb. And into this QB folder the uploaded PDF files were moved into by us.
But this created one problem, a big one. Our client was dependent on us at all times to move / sync the uploaded files into their final destination the QB folder. Also, if they needed to correct and re-upload a PDF file, they would again need us to help them move the corrected file into the DocumentRoot location. So, basically if we were not available for any reason, we would be holding them up from updating / uploading their own files into their hosted Koha. While our client was happy with how things were happening, to us, this was clearly not at all a desirable situation.
Our client was already using Google Drive and that’s when we figured than instead of simply using the Google Drive for backup, we could also use it to allow our client to do direct, independent uploads – their data in their own hands at all times. And thus this experiment.
Setting it all up
1) We created a folder named “qb” on Bangabasi College’s Google drive.
2) Next on the server within the folder
/usr/share/koha/opac/htdocs/bangabasi we ran the command
drive init. This asked us to authorize the google drive command line client and fetch the API key, which is what we did after logging in into Bangabasi college library’s gmail account. The API key was copied from the browser and pasted back into the command line. Basically what this did was to create a hidden directory named
/usr/share/koha/opac/htdocs/bangabasi and create a file there called
credentials.json. This completed the authentication setup with bangabasi’s google drive account.
3) Lastly, we set up the following cron job as the root user:
*/5 * * * * cd /usr/share/koha/opac/htdocs/bangabasi && /usr/bin/drive pull -quiet qb
to execute on our server once every 5 minutes.
How it works
Now whenever a library staff user uploads a PDF file into their Google drive’s qb folder, every 5 minutes the cron job on our server will check if there is a new file on the remote Google drive. If there is, then the new file(s) are pulled down automatically onto the
/usr/share/koha/opac/htdocs/bangabasi/qb. In this case for instance 125 .pdf files totaling in at about 19 MB were pulled down in ~18 seconds.
Similarly in case of a modification or removal of a file from Bangabasi’s Google drive “qb” it would be similarly synced or removed from the
/usr/share/koha/opac/htdocs/custinc/bangabasi/qb folder on our server.
How to reference the files
Since the PDF files are stored under
/bangabasi/qb folder under the Koha OPAC’s DocumentRoot i.e.
/usr/share/koha/opac/htdocs/, we simply need to refer to the files with the following href attribute value set to
/bangabasi/qb/<filename> in our HTML code.
Pros and cons of this approach
First the cons:
1) Google’s AI algorithms gets to read all your PDF files. But since our client is already using Google’s services, this is apparently not a major concern to them. And anyway our client is allowing public dissemination of these files, so Google is going to read it one way or another.
2) If the library staff user accidentally or maliciously deletes the Google drive folder or files in it, then the very next run of the
pull command will remove the same off our server. But same would have been the case if the staff users had root / sudo access to the Koha DocumentRoot (i.e. /usr/share/koha/opac/htdocs). In fact, in the latter case, they can even
rm -rf the entire server, removing *everything* from it.
1) You can now allow your staff users to freely upload the processed files without having to give everyone the access to the actual Koha server’s filesystem. The chances of accidental or malicious deletion of files off the Koha server is largely minimized.
2) The speed! Simply put uploading files to Google drive is usually faster then directly uploading to the Koha server hosted on the Internet. The transfers between Google servers and the hosted Koha server also happens at a high rate of transfer.
3) You basically have *two* online copies of your PDF files – (a) on the Google Drive folder and (b) on the Koha server, which is good in terms of redundancy.