Extracting patron images from Koha DB

This is part of the series of blog posts on how to extract your data from a Koha system where you unable to access the DB directly, but where you have “superlibrarian” access. In this part, we talk about extracting the patron images stored inside the patronimage table in your Koha database.

Prequisites

  • You should be comfortable using the GNU/Linux command line
  • Familiarity with SQL syntax
  • The Koha DB schema reference for your version of Koha
  • A Linux system with cURL and LibreOffice installed
  • Superlibrarian userid and password of your Koha instance

Modus operandi

  1. Use the Koha Reports module to download the list of the borrowernumbers of patrons having their image in the db
  2. Remove the header line and save it as a text file, with one borrowernumber per line
  3. Use cURL to get the CGISESSID cookie using the authentication SVC API call
  4. Execute the BASH script, passing the authenticated session id and the file holding the list of borrowernumbers as command line parameters
  5. Use the Koha Reports module to generate the IDLINK.txt batch patron image uploader file

The background

Patron images in Koha are stored in the patronimage table of the database. The table has following schema:

Column Type Size Parents Comments
borrowernumber int 10 borrowers the borrowernumber of the patron this image is attached to (borrowers.borrowernumber)
mimetype varchar 15   the format of the image (png, jpg, etc)
imagefile mediumblob 16777215   the image

A blob is a “binary large object” which means it is not a text field. The contents won’t make any sense if we used the report module to SELECT imagefile FROM patronimage. This brings cURL into the picture.

The patron image is served on the OPAC / staff client using the patronimage.pl Perl script using an URL like this: http://<your_instance>/cgi-bin/koha/members/patronimage.pl?borrowernumber=<borrowernumber>.

But before we can run cURL on this URL, we must do one more thing. We need our BASH script to login using our superlibrarian userid and password using Koha authentication web service API call.

We need to do this for two reasons: (a) login into our Koha instance with superlibrarian access and (b) get hold of the CGISESSID cookie.

Putting it all together

  1. A simple SQL report as given below with fetch us the borrowernumber of patrons with their image stored in the DB.

    SELECT borrowernumber FROM patronimage

    We will download the result as Open Document spreadsheet.

    Next we copy and save the list of patron borrowernumbers into a text file (let’s call it list-of-pix.txt), with one borrowernumber per line.

  2. Lets put together our small BASH script. We shall call it get_pix.sh.
    #!/bin/bash
    
    mkdir patronimages
    
    while read LINE
        do  curl -b CGISESSID=$1  -o ./patronimages/$LINE.png http://<your_instance>/cgi-bin/koha/members/patronimage.pl?borrowernumber=$LINE
    done < $2

    NOTE: <your_instance> need to be changed with **your** actual server URL.
    Make the script executable by running chmod a+x get_pix.sh.

  3. From the command-line run the following command:
    $ curl -i http://<your_instance>/cgi-bin/koha/svc/authentication -d 'userid=<yourid>&password=<yourpass>'

    If we had put in the correct information, we shall see something like this:

    HTTP/1.1 200 OK
    Date: Thu, 20 Dec 2018 09:53:02 GMT
    Server: Apache/2.4.18 (Ubuntu)
    Set-Cookie: CGISESSID=4a732ae75d8991a994d7ad0df584f84c; path=/; HttpOnly
    Vary: Accept-Encoding
    Transfer-Encoding: chunked
    Content-Type: text/xml; charset=ISO-8859-1
    
    <?xml version='1.0' standalone='yes'?>
    <response>
      <status>ok</status>
    </response>

    This gives us the two things we need for the next step: (a) the CGISESSID and (b) the status as OK. OK signifies that we were successfully logged in.

  4. Now we run the get_pix.sh script
    ./get_pix.sh f03a894f1885e891e30f6d40e6e9838c ./list-of-pix.txt

    And we shall see the patron images getting downloaded.

  5. Once the patron images have all been downloaded to patronimages folder created by the get_pix.sh script, its time to generate the IDLINK.txt file, which we will place in the aforementioned patronimages folder. Again, we shall turn to Koha's Reports module with this small SQL script:
    SELECT CONCAT(borrowers.cardnumber,", ", patronimage.borrowernumber, ".png") AS 'IDLINK'
      FROM patronimage 
      LEFT JOIN borrowers ON patronimage.borrowernumber=borrowers.borrowernumber

    This will generate our listing for the IDLINK.txt. If you are not sure about what IDLINK.txt does, please read up from here.

  6. Download the report as Open Document Spreadsheet, remove the header line and copy the rest into a text file and save it as IDLINK.txt in the patronimages folder.
  7. Zip up the patronimages folder as mentioned here and we have the patron images ready for a batch upload.

The Mysterious Mr. Z in Z39.50 ;-)

An explainer about what “z” in z39.50 stands for all the confused souls.

27 years back, in 1991, Subhash Ghai released a movie named – Saudagar. The signature tune from its original soundtrack was a song – “ILU ILU.. YEH ILU ILU KYA HAIN? ILU ILU? (‘ILU ILU… What is this ILU ILU?’)“.

In case you are wondering about the context of this song, ever since NECOPAC z39.50 service went live last month, the most repeated common question coming from LIS students and professionals on FB has been – “What does ‘Z’ stand for in z39.50?”.

For some, it is a question that popped up in their head when encountering z39.50 search in Koha. For others, especially from West Bengal, India, apparently this is a question that is being asked at the currently on-going interview for WBHRB.

Some wondered that it perhaps stood for the company that started z39.50, while others had no idea.

So what is z39.50?

In very simple terms Z39.50 is a communications protocol for searching and retrieving information from a bibliographic information database over a TCP/IP computer network. It is covered by ANSI/NISO standard Z39.50, and ISO standard 23950:1998. The standard is maintained by US Library of Congress.

Cataloguers mostly encounter z39.50 when they attempt to do copy cataloging. Copy cataloging is a process of fetching and editing a pre-existing bibliographic record from a z39.50 server instead of creating a completely new record from scratch. Thus helping to save time, effort and therefore money, while bringing in a certain standard in cataloging quality.

Ok! Just tell me what “Z” stand for!

Asking what “Z” represents is actually asking the wrong question. The correct question to ask is What does Z39 stand for?.

The short answer

On its own, Z39 simply refers to the American National Standards Committee Z39. By itself “Z” has no special meaning. In the present context, Z39 refers to NISO standards related to publishing, bibliographic and library applications in the United States of America, all of which start with “ANSI/NISO Z39.”.

Towards the end of this post a few example NISO standards have been listed.

The long answer

To understand we have to look back at the history of standardization process as it happened in the United States of America during the last century.

Exactly 100 years back in 1918, 5 engineering organisations and 3 federal organisations came together to form the American Engineering Standards Committee (AESC). In 1928, AESC re-organised to form the American Standards Association (ASA). In 1966, ASA became the USA Standards Institute, followed by a further transformation in 1969 to become the American National Standards Institute or ANSI as we know it today.

The centenary video from ANSI describes the journey of standardization in United States and its global impact.

It was during the ASA years that formal standardization of librarianship started to take shape.

Image source: The Legacy of a Librarian: Carolyn Ulrich’s Little Magazines

In 1935 Carolyn F. Ulrich of New York Public Library led the initiative to create a standard for arrangement of periodicals that became known as Z29.1-1935.

In 1937, prompted by various library associations, ASA appointed Ulrich to represent ASA on International Standards Association’s (ISA) Committee 46 – an international committee on documentation.

This further led to the organisation of a national committee on library standards in June of 1939. The committee was simply named as “Committee Z39” and was tasked with setting up

“Standards for [library] concepts, definitions, terminology, letters and signs, practices, methods, supplies and equipment.”

Over time it came to be known as the “American National Standards Committee Z39“. In 1984, it changed its name and structure to become the National Information Standards Organization (NISO). NISO today continues to develop, maintain and publish technical standards related to publishing, bibliographic and library applications in the United States of America as an ANSI accredited SDO (standards designator organization). All NISO standards all start with “ANSI/NISO Z39.” (read as zee or zed thirty nine dot).

To cut a long story short, z39.50 is simply the 50th NISO approved standard

Example of NISO standards

If you wish to explore further into the world of NISO standards, please do visit the NISO standards tracker for active standards.

Featured image is from the document “Task Force on American National Standards Committee Z39: Activities and Future Directions” published in 1976. The full-text of this historical document is available here.