Ciro Santilli OurBigBook.com $£ Sponsor €¥ 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱
Matrix multiplication example.
Fundamental since deep learning is mostly matrix multiplication.
NumPy does not automatically use the GPU for it: stackoverflow.com/questions/49605231/does-numpy-automatically-detect-and-use-gpu, and PyTorch is one of the most notable compatible implementations, as it uses the same memory structure as NumPy arrays.
Sample runs on P51 to observe the GPU speedup:
$ time ./matmul.py g 10000 1000 10000 100
real    0m22.980s
user    0m22.679s
sys     0m1.129s
$ time ./matmul.py c 10000 1000 10000 100
real    1m9.924s
user    4m16.213s
sys     0m17.293s
python/pytorch/matmul.py
#!/usr/bin/env python3

# https://cirosantilli.com/_file/python/pytorch/matmul.py

import sys

import torch

print(torch.cuda.is_available())

if len(sys.argv) > 1:
    gpu = sys.argv[1] == 'g'
else:
    gpu = False
if len(sys.argv) > 2:
    n = int(sys.argv[2])
else:
    n = 5
if len(sys.argv) > 3:
    m = int(sys.argv[3])
else:
    m = 5
if len(sys.argv) > 4:
    o = int(sys.argv[4])
else:
    o = 10
if len(sys.argv) > 5:
    repeat = int(sys.argv[5])
else:
    repeat = 10
t1 = torch.ones((n, m))
t2 = torch.ones((m, o))
t3 = torch.zeros(n, o)
if gpu:
    t1 = t1.to('cuda')
    t2 = t2.to('cuda')
    t3 = t3.to('cuda')
for i in range(repeat):
    t3 += t1 @ t2
print(t3)

Ancestors

  1. PyTorch
  2. Deep learning framework
  3. Deep learning
  4. Artificial neural network
  5. Neural network
  6. Machine learning
  7. Computer
  8. Information technology
  9. Area of technology
  10. Technology
  11. Ciro Santilli's Homepage