Implement efficient wavenet using c++

By zironycho ❤️ Neosapience, Inc, and OSEU 2018.09.28

Estimate Computing power

하나의 CPU로 작업이 가능한가?

cpu core (intel skylake, included avx-512)
- Lower Bound: 28 cores * 1.7 GHz * 32 FLOPS/Hz = 1523.2 GFLOPS
- 1523.2 / 28cores = 54.4 GFLOPS/s == 0.054 TFLOPS/s
gpu core: titanx maxwell
- 6 TFLOPS/s
R=64, S=256, A=256
- 0.042 TFLOPS/s

pytorch코드에서 layer별로 python 테스트코드 작성
python 사용하는 layer의 일부를 떼어서 데이터 추출용으로 구현
- 디버깅을 위해서 각각 input, output, weight, bias들을 사람이 판별하기 쉬운 값으로 셋팅
  - 0, 1, 2, ...
  - 1, 1, 1, ..
  - 0, 0, 0, ...
- 각각의 input, output, weight, bias들을 text file로 저장
C/C++ GoogleTest 를 이용해서 각각 layer를 unittest 코드와 함께 작성
- 각각의 layer들은 forward, set_weight 함수를 지님
- weight를 로딩하는 부분, 실제로 forward를 하는 부분을 구현
- matrix계산이나, math function들은 naive하게 구현
- 메모리를 최대한 한 번만 생성하게 구현
python random weight, input들을 넣고 output을 얻어내서 저장
C/C++ 테스트가 정상적으로 돌아가는지 체크
- float형태의 계산결과가 다를 수 있으니, isclose같은 함수를 사용하여 비교