How to use Theano? (with OpenCL)

I have a MacBook Pro with AMD display card, so I want Theano use OpenCL. First, create virtualenv and activate it:

1
2
python3 -m venv venv
source venv/bin/activate

Install some dependencies for Theano and OpenCL:

1
pip install cython nose

those dependencies isn’t in setup.py of Theano (or pygpu).

Install Theano (latest development version, so far, 0.8.2 doesn’t support OpenCL):

1
pip install git+https://github.com/Theano/Theano.git

If github is temporary unavailable, try my personal csdn mirror(CSDN CODE can sync github repo conveniently):

1
pip install git+https://code.csdn.net/u010096836/theano.git

Next step is installing gpuarray for supporting OpenCL:

1
2
git clone https://github.com/Theano/libgpuarray.git
cd libgpuarray

build native part

1
2
3
cmake . -DCMAKE_INSTALL_PREFIX=../venv/ -DCMAKE_BUILD_TYPE=Release
make
make install

export some env for Theano’s dynamic compilation

1
2
export LIBRARY_PATH=$LIBRARY_PATH:$PWD/../venv/lib
export CPATH=$CPATH:$PWD/../venv/

install pygpu/gpuarray

1
2
python setup.py build
python setup.py install

Now, Theano can use OpenCL to accelerate computing.


We can use Theano’s check1.py to check that OpenCL is available:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from theano import function, config, shared, tensor, sandbox
import numpy
import time

vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')

Performance:

  • Only use CPU (1.662902 s)
1
2
3
4
5
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python check1.py 
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 1.662902 seconds
Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761
1.62323284]

Using the cpu

  • OpenCL over CPU (1.057008 s)
1
2
3
4
5
6
$ THEANO_FLAGS=mode=FAST_RUN,device=opencl0:0,floatX=float32 python check1.py 
Mapped name None to device opencl0:0: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 1.057008 seconds
Result is [ 1.23178029 1.61879325 1.52278078 ..., 2.20771813 2.29967737
1.62323272]

Using the cpu

  • OpenCL over Intel GPU (0.554572 s)
1
2
3
4
5
6
$ THEANO_FLAGS=mode=FAST_RUN,device=opencl0:1,floatX=float32 python check1.py 
Mapped name None to device opencl0:1: Iris Pro
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.554572 seconds
Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761
1.62323284]

Using the Intel gpu

  • OpenCL over AMD GPU (0.470640 s)
1
2
3
4
5
6
$ THEANO_FLAGS=mode=FAST_RUN,device=opencl0:2,floatX=float32 python check1.py 
Mapped name None to device opencl0:2: AMD Radeon R9 M370X Compute Engine
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no\_inplace}.0)\]
Looping 1000 times took 0.470640 seconds
Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761
1.62323284]

Using the AMD gpu


If you using stable(0.8.2) Theano, you will meet this error when you try to use OpenCL:

RuntimeError: ('Wrong major API version for gpuarray:', -9998, 'Make sure Theano and libgpuarray/pygpu are in sync.')

so, you should install development version of Theano.

作者

Robert Lu

发布于

2016-05-01

许可协议

评论