Orthogonal Matching PursuitΒΆ
Speed benchmarks for JAX implementation
Each row of the following table describes:
problem type and configuration (M x N is dictionary size, K is sparsity level)
Average time taken in CPU/GPU configurations
Speed improvement ratios
System used
All benchmarks have been generated on Google Colab
CPU and GPU configurations Google Colab have been used
M |
N |
K |
CPU |
CPU + JIT |
CPU / CPU + JIT |
GPU |
GPU + JIT |
GPU / GPU + JIT |
CPU + JIT / GPU + JIT |
---|---|---|---|---|---|---|---|---|---|
256 |
1024 |
16 |
148 ms |
8.27 ms |
17.9x |
139 ms |
1.28 ms |
108x |
6.46x |
Observations
JIT (Just In Time) compilation seems to give significant performance improvements in both CPU and GPU architectures
Current implementation seems to be slower on GPU vs CPU with JIT.
GPU speed gain over CPU (with JIT on) is relatively meager. On TensorFlow, people regularly report 30x improvements between CPU to GPU for neural networks implemented using Keras.
Possible deficiencies
There is opportunity to improve parallelization in the OMP implementation.
Cholesky update based implements depends heavily on solving triangular systems.
GPUs may not be great at solving triangular systems.