Extracting patron images from Koha DB

This is part of the series of blog posts on how to extract your data from a Koha system where you unable to access the DB directly, but where you have “superlibrarian” access. In this part, we talk about extracting the patron images stored inside the patronimage table in your Koha database.

Prequisites

  • You should be comfortable using the GNU/Linux command line
  • Familiarity with SQL syntax
  • The Koha DB schema reference for your version of Koha
  • A Linux system with cURL and LibreOffice installed
  • Superlibrarian userid and password of your Koha instance

Modus operandi

  1. Use the Koha Reports module to download the list of the borrowernumbers of patrons having their image in the db
  2. Remove the header line and save it as a text file, with one borrowernumber per line
  3. Use cURL to get the CGISESSID cookie using the authentication SVC API call
  4. Execute the BASH script, passing the authenticated session id and the file holding the list of borrowernumbers as command line parameters
  5. Use the Koha Reports module to generate the IDLINK.txt batch patron image uploader file

The background

Patron images in Koha are stored in the patronimage table of the database. The table has following schema:

Column Type Size Parents Comments
borrowernumber int 10 borrowers the borrowernumber of the patron this image is attached to (borrowers.borrowernumber)
mimetype varchar 15   the format of the image (png, jpg, etc)
imagefile mediumblob 16777215   the image

A blob is a “binary large object” which means it is not a text field. The contents won’t make any sense if we used the report module to SELECT imagefile FROM patronimage. This brings cURL into the picture.

The patron image is served on the OPAC / staff client using the patronimage.pl Perl script using an URL like this: http://<your_instance>/cgi-bin/koha/members/patronimage.pl?borrowernumber=<borrowernumber>.

But before we can run cURL on this URL, we must do one more thing. We need our BASH script to login using our superlibrarian userid and password using Koha authentication web service API call.

We need to do this for two reasons: (a) login into our Koha instance with superlibrarian access and (b) get hold of the CGISESSID cookie.

Putting it all together

  1. A simple SQL report as given below with fetch us the borrowernumber of patrons with their image stored in the DB.

    SELECT borrowernumber FROM patronimage

    We will download the result as Open Document spreadsheet.

    Next we copy and save the list of patron borrowernumbers into a text file (let’s call it list-of-pix.txt), with one borrowernumber per line.

  2. Lets put together our small BASH script. We shall call it get_pix.sh.
    #!/bin/bash
    
    mkdir patronimages
    
    while read LINE
        do  curl -b CGISESSID=$1  -o ./patronimages/$LINE.png http://<your_instance>/cgi-bin/koha/members/patronimage.pl?borrowernumber=$LINE
    done < $2

    NOTE: <your_instance> need to be changed with **your** actual server URL.
    Make the script executable by running chmod a+x get_pix.sh.

  3. From the command-line run the following command:
    $ curl -i http://<your_instance>/cgi-bin/koha/svc/authentication -d 'userid=<yourid>&password=<yourpass>'

    If we had put in the correct information, we shall see something like this:

    HTTP/1.1 200 OK
    Date: Thu, 20 Dec 2018 09:53:02 GMT
    Server: Apache/2.4.18 (Ubuntu)
    Set-Cookie: CGISESSID=4a732ae75d8991a994d7ad0df584f84c; path=/; HttpOnly
    Vary: Accept-Encoding
    Transfer-Encoding: chunked
    Content-Type: text/xml; charset=ISO-8859-1
    
    <?xml version='1.0' standalone='yes'?>
    <response>
      <status>ok</status>
    </response>

    This gives us the two things we need for the next step: (a) the CGISESSID and (b) the status as OK. OK signifies that we were successfully logged in.

  4. Now we run the get_pix.sh script
    ./get_pix.sh f03a894f1885e891e30f6d40e6e9838c ./list-of-pix.txt

    And we shall see the patron images getting downloaded.

  5. Once the patron images have all been downloaded to patronimages folder created by the get_pix.sh script, its time to generate the IDLINK.txt file, which we will place in the aforementioned patronimages folder. Again, we shall turn to Koha's Reports module with this small SQL script:
    SELECT CONCAT(borrowers.cardnumber,", ", patronimage.borrowernumber, ".png") AS 'IDLINK'
      FROM patronimage 
      LEFT JOIN borrowers ON patronimage.borrowernumber=borrowers.borrowernumber

    This will generate our listing for the IDLINK.txt. If you are not sure about what IDLINK.txt does, please read up from here.

  6. Download the report as Open Document Spreadsheet, remove the header line and copy the rest into a text file and save it as IDLINK.txt in the patronimages folder.
  7. Zip up the patronimages folder as mentioned here and we have the patron images ready for a batch upload.