Separate simd implementation by files to maintain them easier. Now we have avx, c, neon version implementation base, we can add implementations to them.