matmul.py

Fundamental since deep learning is mostly matrix multiplication.

NumPy does not automatically use the GPU for it: stackoverflow.com/questions/49605231/does-numpy-automatically-detect-and-use-gpu, and PyTorch is one of the most notable compatible implementations, as it uses the same memory structure as NumPy arrays.

Sample runs on P51 to observe the GPU speedup:

$ time ./matmul.py g 10000 1000 10000 100
real    0m22.980s
user    0m22.679s
sys     0m1.129s
$ time ./matmul.py c 10000 1000 10000 100
real    1m9.924s
user    4m16.213s
sys     0m17.293s

python/pytorch/matmul.py

#!/usr/bin/env python3

# https://cirosantilli.com/_file/python/pytorch/matmul.py

import sys

import torch

print(torch.cuda.is_available())

if len(sys.argv) > 1:
    gpu = sys.argv[1] == 'g'
else:
    gpu = False
if len(sys.argv) > 2:
    n = int(sys.argv[2])
else:
    n = 5
if len(sys.argv) > 3:
    m = int(sys.argv[3])
else:
    m = 5
if len(sys.argv) > 4:
    o = int(sys.argv[4])
else:
    o = 10
if len(sys.argv) > 5:
    repeat = int(sys.argv[5])
else:
    repeat = 10
t1 = torch.ones((n, m))
t2 = torch.ones((m, o))
t3 = torch.zeros(n, o)
if gpu:
    t1 = t1.to('cuda')
    t2 = t2.to('cuda')
    t3 = t3.to('cuda')
for i in range(repeat):
    t3 += t1 @ t2
print(t3)

python/pytorch/matmul.py

 Ancestors