Moore's Law is over :(
Cray 1 - the first supercomputer to implement the vector processor design
Can a purely functional language support vectorisation?
The question is incomplete...
Can a purely functional language support vectorisation, preserving its original syntax and semantics while being as efficient as low level systems programming languages?
Parsing the previous statement :
We answer that in 2 steps.
LLVM backend for NESL
NESL : An ML Dialect by Guy Blelloch (CMU) which supports Nested Data Parallelism
State of the art optimising compiler
Where have we made code changes?
Fundamental Code changes:
data Format = ... | VecFormat !Length !ScalarFormat !Width type Length = Int data Width = ... | W128 | W256 | W512
data Instr = VBROADCAST Format AddrMode Reg | VEXTRACT Format Operand Reg Operand ... -- number of other instructions like vector move, -- shuffle, arithmetic, logical x86 instructions
data MachOp = MO_VF_Broadcast Length Width | MO_VF_Insert ....
70% code changes in compiler/nativeGen/X86/CodeGen.hs
Lots of seg-faults and code reviews later
The user-level API
-- constructors packFloatX4# :: (# Float#, Float#, Float#, Float# #) -> FloatX4# broadcastFloatX4# :: Float# -> FloatX4# -- destructors unpackFloatX4# :: FloatX4# (# Float#, Float#, Float#, Float# #) -- operations plusFloatX4# :: FloatX4# -> FloatX4# -> FloatX4# minusFloatX4# :: FloatX4# -> FloatX4# -> FloatX4# timesFloatX4# :: FloatX4# -> FloatX4# -> FloatX4# divideFloatX4# :: FloatX4# -> FloatX4# -> FloatX4#
SSE as well as AVX support
Previously we had
data Int8 = I8# Int# data Int16 = I16# Int# data Int32 = I32# Int# data Int64 = I64# Int#
Now GHC has :
Example : Vectorizing dot product
This spoils the declarative nature of Haskell!
Imperative languages have powerful vector programming libraries
Polymorphic SIMD functions for vector programming
Lift-vector provides two interfaces
class (Num v, Real (Elem v)) => SIMDVector v where type Elem v type ElemTuple v nullVector :: v vectorSize :: v -> Int elementSize :: v -> Int broadcastVector :: Elem v -> v mapVector :: (Elem v -> Elem v) -> v -> v zipVector :: (Elem v -> Elem v -> Elem v) -> v -> v -> v foldVector :: (Elem v -> Elem v -> Elem v) -> v -> Elem v sumVector :: v -> Elem v packVector :: ElemTuple v -> v unpackVector :: v -> ElemTuple v
What does dot product look like with this?
class (Num a, Num b) => ArithVector t a b where -- | The folding function should be commutative. fold :: (a -> a -> a) -> (b -> b -> b) -> b -> t b -> b zipVec :: (a -> a -> a) -> (b -> b -> b) -> t b -> t b -> t b fmap :: (a -> a) -> (b -> b) -> t b -> t b -- | Work efficient Blelloch Scan scanb :: (a -> a -> a) -> (b -> b -> b) -> b -> t b -> t b
newtype VecList a = VecList [a]
A wrapper over plain lists
Wrappers around unboxed vectors.
import qualified Data.Vector.Unboxed as U data VecArray sh a = VecArray !sh (U.Vector a)
data Z = Z data tail :. head = !tail :. !head type DIM0 = Z type DIM1 = DIM0 :. Int type DIM2 = DIM1 :. Int type DIM3 = DIM2 :. Int class Eq sh => Shape sh where toIndex :: sh -> sh -> Int ... instance Shape Z where toIndex _ _ = 0 instance Shape sh => Shape (sh :. Int) where toIndex (sh1 :. sh2) (sh1' :. sh2') = toIndex sh1 sh1' * sh2 + sh2'
What does dot product look like with the parallel operators?
How are the
Data.Operations operators implemented?
scan or prefix sum looks like an inherently sequential operation
> scanl (+) 0 [1, 2, 3, 4, 5, 6, 7, 8] > [0,1,3,6,10,15,21,28,36]
This is parallel but does O(n log2 n) work.
Sequential scan does O(n) operations.
How long it would take if it were fully sequential (work)
How long it would take if it were as parallel as possible (span)
"Programming parallel algorithms" - Guy Blelloch, 1996
Can we have a work efficient parallel scan?
So are scans really that useful?
"Prefix sums and their applications." - Guy Blelloch, 1990
Q: So are
scans sufficient enough combinators?
Heavy reads and less write
Pearson Correlation Coefficient
Writes perform better now via
freezing inside the ST monad.
Almost any paper by Guy Blelloch
From the Haskell side:
"Automatic SIMD vectorization for Haskell" - Petersen, Orchard, Glew
"Exploiting vector instructions with generalized stream fusion" - Mainland, Leschinskiy, Jones
If you are lazy read my 5k words summary:
Haskell is already a great parallel language!
Haskell could be the state of art in vector programming!
Special Thanks to
Looking for PhD Advisors in the field of
High performance functional programming
(^ Not an oxymoron!)
THANK YOU FOR LISTENING
Contact me if this interests you:
Email : email@example.com
Twitter : @catamorphic