Table of Contents

ML in Clojure via Python

Notes around ongoing experiments with doing ML with Python libraries via Clojure using libpython-clj

Introduction

The primary focus of these notes is around exploring the use of Chris Nuernberger's libpython-clj (does he have a blog or Twitter?) for Clojurists unwilling to leave the comfort of their favourite language while still being able to leverage the power of the vast body of work available to Pythonistas.

For more depth and breadth, follow Carin Meier - here, here, and here.

The Python++ Setup

The Shell

  • Straightforward shell commands to set up a virtualenv which we will call ml

  • MacOS notes (haven't checked Linux yet)

    • Symlinking of the dylib from the virtualenv's lib directory is required for libpython-clj to work

    • libpython-clj version 1.36

    • Python version 3.7 installed via virtualenv

    • Clojure initialization of the Python subsystem also needs some careful consideration, which is shown in the Clojure code sample later.

  # Choose your own path
  virtualenv -p python3 ~/.venv/ml
  source ~/.venv/ml/bin/activate

  # Use the latest pip
  pip install --upgrade pip

  # Install tensorflow
  pip install tensorflow

  cd ~/.venv/ml/lib
  # *IMPORTANT* - Based on the paths of brew-owned python version 3.7
  ln -s  /usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/libpython3.7m.dylib

Here onwards, in any new shell where you wish to use the ml virtualenv, run the following

  source ~/.venv/ml/bin/activate
  python --version
  python -c 'import tensorflow as tf; print(tf.__version__)'

Python 3.7.6
2.1.0

Session setup   emacs

The steps in this section relate to my setup with Spacemacs.

  • My configuration based on using Spacemacs can be found here.

  • This document is written in org-mode with literate style, hence this section to keep a record of the steps required to get going.

  • Here we use pyvenv to switch to the relevant virtualenv first.

    Python

Evaluate the following elisp to switch the venv subsystem to ml.

  (pyvenv-activate "~/.venv/ml")

Quick system check to ensure we are in the right place, and also display version(s)

  import tensorflow as tf
  print(tf.__version__)
Python 3.7.6 (default, Dec 30 2019, 19:38:26) 
[Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
2.1.0
>>> python.el: native completion setup loaded

Clojure

The Leiningen build file project.clj looks like this - the key part to focus on being the :dependencies, ignoring the other parts as mostly irrelevant to this note.

  (defproject machine-learning-notes "0.1.0-SNAPSHOT"

    :dependencies [[org.clojure/clojure "1.10.1"]
                   [clj-python/libpython-clj "1.36"]]

    :min-lein-version "2.9.1"

    :source-paths ["src"]

    :repl-options {:port 25092})

With the Leiningen project set up, you should simply be able to cider-jack-in away. Or, of course, use any other editor/IDE combination you are comfortable with.

Hello Tensorflow   python

This example is from the official Tensorflow quickstart tutorial, and running it successfully will ensure we have everything set up right.

When accessed for the first time, the mnist dataset will be automatically downloaded.

  from __future__ import absolute_import, division, print_function, unicode_literals
  import tensorflow as tf

  # Let's load the MNIST dataset
  mnist = tf.keras.datasets.mnist
  (x_train, y_train), (x_test, y_test) = mnist.load_data()
  x_train, x_test = x_train / 255.0, x_test / 255.0

ytrain is a vector of labels. Let's see what it looks like

y_train
array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

Let's create the network

  model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation='relu'),
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(10, activation='softmax')
  ])

  model.compile(optimizer='adam',
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

Train

This next statement runs 5 rounds of the model fitting routines.

model.fit(x_train, y_train, epochs=5)

Evaluate

Like model fitting, the evaluation too is straightforward execution.

model.evaluate(x_test, y_test, verbose=2)
10000/10000 - 0s - loss: 0.0729 - accuracy: 0.9782
[0.07289657505885698, 0.9782]

Hello Tensorflow - Clojure Edition   clojure

Here's the equivalent version in Clojure leveraging the libpython-clj library.

Note that it is assumed that the previously mentioned manual steps (or equivalent) have been executed, otherwise the code below will most likely not work.

We first require some namespaces that we will use to configure libpython-clj to point to the right Python version/installation.

  (ns machine-learning-notes.hello-ml
    (:require [libpython-clj.python :as py]
              [libpython-clj.jna.base]))

  ;; Depending on your Python version and virtualenv setup, change accordingly
  (alter-var-root #'libpython-clj.jna.base/*python-library* (constantly "python3.7m"))
  (py/initialize! :python-executable (str (System/getenv "HOME") "/.venv/ml/bin/python"))

These steps are required on my machine, and YMMV. But you get the idea, and should be able to tweak accordingly.

Next, we require require-python and then use it liberally to pull in various python modules. The following should be executed only after the above initialize! sequence.

  ;; Note that the next require expression needs to come *after* the py/initialize! ebove
  (require '[libpython-clj.require :refer [require-python]])
  (require-python '[tensorflow :as tf]
                  '[tensorflow.keras.models :as models]
                  '[tensorflow.keras.layers :as layers]
                  '[tensorflow.keras.datasets.mnist :as mnist]
                  '[numpy :as numpy]
                  '[builtins :as python])

We're now set, and can move on to the implementation. What follows has pretty much a one-to-one correspondence with the Python version.

  (defonce mnist-data (mnist/load_data))

  (let [[[x-train y-train] [x-test y-test]] mnist-data] ;; => 1
    (def x-train (numpy/divide x-train 255)) ;; => 2
    (def y-train y-train)
    (def x-test (numpy/divide x-test 255))
    (def y-test y-test))

  (defonce model (models/Sequential [(layers/Flatten :input_shape [28 28]) ;; => 3
                                     (layers/Dense 128 :activation "relu")
                                     (layers/Dropout 0.2)
                                     (layers/Dense 10 :activation "softmax")
                                     ]))

  (py/py. model compile ;; 4, 5
          :optimizer "adam"
          :loss "sparse_categorical_crossentropy"
          :metrics (python/list ["accuracy"])) ;; 6
  (py/py. model fit x-train y-train :epochs 5)

A few notes

1

Destructuring over python datastructures feels absolutely native

2

While not as straightforward as Python, numpy comes to the rescue for the division. Note that numpy is already installed with tensorflow.

3

Use of named arguments in Python has a clean kwarg equivalent in Clojure

4

Notice the use of the py. macro. Furthermore, there are py.. and py.- macros too. Do they remind you of the Javascript interop functions of Clojurescript? Clever naming!

5

Calling a method on model is via the py/py. macro

6

Clojure vector doesn't cleanly translate into a Python list for the metrics named argument and needs to be wrapped in python/list.

All of the above is pretty neat. Save for a few quirks, but which is easily forgiven for the resultant Joy of Clojure!

Let's evaluate the model against the training dataset.

  (py/py. model evaluate x-test y-test :verbose 2)
10000/10000 - 0s - loss: 0.0683 - accuracy: 0.9775

The numbers match up with those of the Python version above - which probably means that the Clojure-Python bridge, and the operations on the data have worked as expected.

We can also visualize the model, using in-built support in Tensorflow. For the rendering, we'll need to install two more Python packages - pydot and graphviz.

# Make sure it's in the same virtualenv
pip install pydot graphviz

Calling Python again in the straightforward manner that libpython-clj offers.

  (require-python 'tensorflow.keras.utils)
  (tensorflow.keras.utils/plot_model
   model
   :to_file "model.png"
   :show_shapes true
   :show_layer_names true
   :rankdir "TB"
   :expand_nested false
   :dpi 96)

model.png

Another very satisfying - nay, exciting aspect is the autocomplete from the python "namespaces"! numpy-autocomplete.png

Or, even seeing documentation of Python functions list-doc.png

This brings us to a logical checkpoint - our setup looks good and we should now be able to move on to the next parts.

Huggingface Tokenizers

Note This section is WIP.

Install some prerequisites

pip install tokenizers

# If the version is not the most recent (0.4.2 as of writing this)
pip install --upgrade tokenizers

Moving on to the Clojure bits. It's not too different from what you'd expect.

  (require-python '[tokenizers
                    :refer [BertWordPieceTokenizer
                            SentencePieceBPETokenizer
                            CharBPETokenizer
                            ByteLevelBPETokenizer]])

  ;; Files downloaded from
  ;; https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json
  ;; https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt

  (def tokenizer (ByteLevelBPETokenizer "gpt2-vocab.json" "gpt2-merges.txt"))

  (def encoded (py/a$ tokenizer encode "I can feel the magic, can you?"))

  (py/py.- encoded #_type_ids #_tokens offsets)