Introduction - If you have any usage issues, please Google them yourself
Codes exhibiting suitable data-flow parallelism can often
profit from using Intel SSE, a SIMD extension to the CISC style In-tel 64 instruction set architecture. As SSE instructions are, on average,
larger than scalar instructions, they exhibit a heavier load on instruc-tion pre-decoding, decoding, and caching hardware. For long straight-line
SSE codes, instruction lengths become an obstacle to high performance
that is not adequately handled by available optimizing compilers