MLX
(pyvenv-activate "~/.venv/machine-learning")
import mlx.core as mx
a = mx.array([0, 1, 2, 3, 4, 5])
print(a.shape)
print(a.dtype)
b = mx.array([10, 11.0, 12.0, 13.0, 14.0, 15.0])
print(b.shape)
print(b.dtype)
(6,)
mlx.core.int32
(6,)
mlx.core.float32
c = a + b
print(c)
mx.eval(c)
print(c)
array([10, 12, 14, 16, 18, 20], dtype=float32)
array([10, 12, 14, 16, 18, 20], dtype=float32)
print(mx.arange(10))
print(mx.random.normal((1, 10)))
array([0, 1, 2, ..., 7, 8, 9], dtype=int32)
array([[0.492185, -0.329351, -1.04443, ..., 0.454991, -0.884072, -0.39394]], dtype=float32)
Utility Functions
import time
def timeit(f):
start = time.perf_counter()
ret = f()
end = time.perf_counter()
return ret, end - start
Testing this…
ret, elapsed = timeit(lambda : 1 + 1)
print(ret)
print(elapsed)
2
4.000000000004e-06
Unified Memory
The CPU and GPU use the same RAM. We don’t need to move data when planning operations between the CPU and the GPU.
It’s essential to understand what the CPU and GPU are good and not great for, before we choose how to distribute our computing. Here’s an example from the MLX :: Unified Memory documentation.
Here’s a function that operates on its arguments, and takes as parameters the devices on which to run its constituent operations.
def fun(a, b, d1, d2): # d1, d2 -> devices
x = mx.matmul(a, b, stream=d1)
for _ in range(500):
b = mx.exp(b, stream=d2)
return x, b
We create some sample inputs a and b
a = mx.random.uniform(shape=(4096, 512))
b = mx.random.uniform(shape=(512, 4))
ret, elapsed = timeit(lambda: fun(a, b, mx.gpu, mx.gpu))
print(ret)
print(elapsed)
ret, elapsed = timeit(lambda: fun(a, b, mx.gpu, mx.cpu))
print(ret)
print(elapsed)
(array([[134.262, 129.237, 130.571, 130.649],
[130.853, 129.764, 131.11, 131.915],
[125.345, 126.757, 126.702, 126.284],
...,
[129.413, 126.892, 128.287, 128.288],
[132.635, 127.151, 128.288, 132.74],
[131.024, 122.929, 128.609, 130.531]], dtype=float32), array([[inf, inf, inf, inf],
[inf, inf, inf, inf],
[inf, inf, inf, inf],
...,
[inf, inf, inf, inf],
[inf, inf, inf, inf],
[inf, inf, inf, inf]], dtype=float32))
0.00019970801076851785
(array([[134.262, 129.237, 130.571, 130.649],
[130.853, 129.764, 131.11, 131.915],
[125.345, 126.757, 126.702, 126.284],
...,
[129.413, 126.892, 128.287, 128.288],
[132.635, 127.151, 128.288, 132.74],
[131.024, 122.929, 128.609, 130.531]], dtype=float32), array([[inf, inf, inf, inf],
[inf, inf, inf, inf],
[inf, inf, inf, inf],
...,
[inf, inf, inf, inf],
[inf, inf, inf, inf],
[inf, inf, inf, inf]], dtype=float32))
0.00015633300063200295
Saving and Loading Arrays
MLX can save and load arrays in multiple formats, making it very easy to interoperate with other libraries/tools/applications. Supported formats and corresponding functions in the `mx.core` package.
Format | Save function | Load function |
---|---|---|
NumPy | `save` | `load` |
NumPy Archive | `savez` and `savez_compressed` | `load` |
Safetensors | `save_safetensors` | `load` |
GGUF | `save_gguf` | `load` |
As you can see, the `load` function works for all formats - it infers the format either from the suffix of the file used for reading, or can take an (optional) argument that indicates the format.
GPT
This is from Andrej Karpathy’s Let’s build GPT: from scratch, in code, spelled out video.
We’ll be using the dataset indicated in the video - called the Tiny Shakespeare text. Download it.
if [ ! -f "input.txt" ]; then
wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
fi
We read the text into memory, and then evaluate some quick details and create the encoder and decoder. To and from integers.
with open('input.txt', 'r', encoding='utf-8') as f:
text = f.read()
print("length of dataset in characters: ", len(text))
length of dataset in characters: 1115394
chars = sorted(list(set(text)))
vocab_size = len(chars)
stoi = { ch : i for i, ch in enumerate(chars) }
itos = { i : ch for i, ch in enumerate(chars) }
encode = lambda s: [stoi[c] for c in s] # string -> list[integer]
decode = lambda l: ''.join([itos[i] for i in l]) # list[integer] -> string
Let’s quickly check our code
print(''.join(chars))
print(vocab_size)
print(encode("Hello, World!"))
print(decode(encode("How are you doing today?")))
!$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
65
[20, 43, 50, 50, 53, 6, 1, 35, 53, 56, 50, 42, 2]
How are you doing today?
We have a custom encoder/decoder pair here. For real-world use-cases, libraries like sentencepiece and tiktoken are appropriate.
We’ll next encode the entire text into an `mlx.core.array` object
import mlx.core as mx
data = mx.array(encode(text))
Quick check
print(data.shape, data.dtype)
print(data[:1000])
(1115394,) mlx.core.int32
array([18, 47, 56, ..., 8, 0, 0], dtype=int32)
Splitting into train and test
n = int(0.9 * len(data))
train_data = data[:n]
val_data = data[n:]
block_size = 8 # The maximum context length for predictions
Quick check, and a show of what our input training data and corresponding target values are
print(train_data[:block_size+1])
x = train_data[:block_size]
y = train_data[1:block_size+1]
for t in range(block_size):
context = x[:t+1]
target = y[t]
print(f"When input is {context}, the target is {target}")
array([18, 47, 56, ..., 15, 47, 58], dtype=int32)
When input is array([18], dtype=int32), the target is array(47, dtype=int32)
When input is array([18, 47], dtype=int32), the target is array(56, dtype=int32)
When input is array([18, 47, 56], dtype=int32), the target is array(57, dtype=int32)
When input is array([18, 47, 56, 57], dtype=int32), the target is array(58, dtype=int32)
When input is array([18, 47, 56, 57, 58], dtype=int32), the target is array(1, dtype=int32)
When input is array([18, 47, 56, 57, 58, 1], dtype=int32), the target is array(15, dtype=int32)
When input is array([18, 47, 56, ..., 58, 1, 15], dtype=int32), the target is array(47, dtype=int32)
When input is array([18, 47, 56, ..., 1, 15, 47], dtype=int32), the target is array(58, dtype=int32)
Coming to now, the batch size. We pass in data in batches to be efficient with the use of the CPU/GPU resources which can handle multiple computations of the same kind in parallel.
batch_size = 4 # Number of independent sequences we will process in parallel
mx.random.seed(1337)
def get_batch(data):
# Generate a small batch of data of inputs x and targets y
ix = mx.random.randint(0, len(data) - block_size, [batch_size])
x = mx.stack([data[i:i+block_size] for i in ix.tolist()])
y = mx.stack([data[i+1:i+block_size+1] for i in ix.tolist()])
return x, y
Quick check
x, y = get_batch(train_data)
print("inputs")
print(x.shape)
print(x)
print("targets")
print(y.shape)
print(y)
for b in range(batch_size): # batch dimension
for t in range(block_size): # time dimension
context = x[b, :t+1]
target = y[b, t]
print(f"when input is {context.tolist()}, target is {target}")
inputs
(4, 8)
array([[53, 1, 51, ..., 43, 50, 44],
[32, 53, 1, ..., 39, 58, 1],
[53, 59, 1, ..., 50, 50, 1],
[23, 17, 10, ..., 39, 52, 1]], dtype=int32)
targets
(4, 8)
array([[1, 51, 63, ..., 50, 44, 0],
[53, 1, 61, ..., 58, 1, 61],
[59, 1, 58, ..., 50, 1, 51],
[17, 10, 0, ..., 52, 1, 52]], dtype=int32)
when input is [53], target is array(1, dtype=int32)
when input is [53, 1], target is array(51, dtype=int32)
when input is [53, 1, 51], target is array(63, dtype=int32)
when input is [53, 1, 51, 63], target is array(57, dtype=int32)
when input is [53, 1, 51, 63, 57], target is array(43, dtype=int32)
when input is [53, 1, 51, 63, 57, 43], target is array(50, dtype=int32)
when input is [53, 1, 51, 63, 57, 43, 50], target is array(44, dtype=int32)
when input is [53, 1, 51, 63, 57, 43, 50, 44], target is array(0, dtype=int32)
when input is [32], target is array(53, dtype=int32)
when input is [32, 53], target is array(1, dtype=int32)
when input is [32, 53, 1], target is array(61, dtype=int32)
when input is [32, 53, 1, 61], target is array(46, dtype=int32)
when input is [32, 53, 1, 61, 46], target is array(39, dtype=int32)
when input is [32, 53, 1, 61, 46, 39], target is array(58, dtype=int32)
when input is [32, 53, 1, 61, 46, 39, 58], target is array(1, dtype=int32)
when input is [32, 53, 1, 61, 46, 39, 58, 1], target is array(61, dtype=int32)
when input is [53], target is array(59, dtype=int32)
when input is [53, 59], target is array(1, dtype=int32)
when input is [53, 59, 1], target is array(58, dtype=int32)
when input is [53, 59, 1, 58], target is array(43, dtype=int32)
when input is [53, 59, 1, 58, 43], target is array(50, dtype=int32)
when input is [53, 59, 1, 58, 43, 50], target is array(50, dtype=int32)
when input is [53, 59, 1, 58, 43, 50, 50], target is array(1, dtype=int32)
when input is [53, 59, 1, 58, 43, 50, 50, 1], target is array(51, dtype=int32)
when input is [23], target is array(17, dtype=int32)
when input is [23, 17], target is array(10, dtype=int32)
when input is [23, 17, 10], target is array(0, dtype=int32)
when input is [23, 17, 10, 0], target is array(15, dtype=int32)
when input is [23, 17, 10, 0, 15], target is array(39, dtype=int32)
when input is [23, 17, 10, 0, 15, 39], target is array(52, dtype=int32)
when input is [23, 17, 10, 0, 15, 39, 52], target is array(1, dtype=int32)
when input is [23, 17, 10, 0, 15, 39, 52, 1], target is array(52, dtype=int32)
Let’s next implement the `BigramLanguageModel`
import mlx.nn as nn
class BigramLanguageModel(nn.Module):
def __init__(self, vocab_size):
super().__init__()
self.token_embedding_table = nn.Embedding(vocab_size, vocab_size)
def __call__(self, idx, targets):
# idx and targets are both (B,T) tensor of integers
logits = self.token_embedding_table(idx) # (B, T, C)
loss = nn.losses.cross_entropy(logits, targets)
return logits
m = BigramLanguageModel(vocab_size)
out = m(x, y)
print(out.shape)
array([[4.24595, 4.1331, 4.40312, ..., 4.31248, 4.05547, 4.20361],
[4.10041, 4.24595, 4.21379, ..., 4.32175, 4.28335, 4.21379],
[4.11496, 4.26998, 4.01405, ..., 4.19023, 4.3109, 4.1331],
[4.0343, 4.34374, 4.10972, ..., 4.19232, 4.04217, 4.41227]], dtype=float32)
(4, 8, 65)