inference_report.log 6.6 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
  1. | Pre_alloc OCM | linebuffer(may SWAP later) | ringbuffer | parameter |
  2. |:-------------------------|:-----------------------------|:-------------|:-------------------|
  3. | size(ratio of whole OCM) | 0(0.0)% | 0(0.0)% | 112128(5.3)% |
  4. | range | (None, None) | (None, None) | (1985024, 2097152) |
  5. | Pre_alloc DDR | ringbuffer | feature_swap |
  6. |:----------------|-------------:|---------------:|
  7. | size(M) | 0.00 | 0.00 |
  8. | profile conv | work_cyc | linebuf | warmup_tail | core_idle | io_idle | stride2_idle | standalone_fetch | MAC |
  9. |:---------------|:-----------------|:--------------|:--------------|:------------|:--------------|:----------------|:-------------------|:----------------|
  10. | ratio in conv | 7776900 (100.0%) | 195139 (2.5%) | 273265 (3.5%) | 0 (0.0%) | 225640 (2.9%) | 1489405 (19.2%) | 285 (0.0%) | 5303168 (68.2%) |
  11. | profile ideal DDR_IO | min_io_sum | min_params_read | min_inputs_read | min_outputs_write |
  12. |:-----------------------|:------------------|:------------------|:------------------|:--------------------|
  13. | DDR IO size (Byte) | 10853504 (100.0%) | 3907904 (36.0%) | 409600 (3.8%) | 6536000 (60.2%) |
  14. | profile extra DDR_IO | extra_ddr_io | extra_params_read | extra_inputs_read | extra_outputs_wrt | extra_swap_read | extra_swap_wrt | ddr_rb_read | ddr_rb_wrt |
  15. |:-----------------------|:-----------------|:--------------------|:--------------------|:--------------------|:------------------|:-----------------|:--------------|:-------------|
  16. | DDR IO size (Byte) | 7475200 (100.0%) | 0 (0.0%) | 3788800 (50.7%) | 3686400 (49.3%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
  17. profile stream EU: ld/st_ratio might include ringbuf/linebuf/feature_swap parts; mv_ratio migth have ringbuf part.
  18. | profile stream EU | work_cyc | ld_ratio | ld_param_ratio | mv_ratio | mv_linebuffer_ratio | st_ratio |
  19. |:--------------------|-----------:|:---------------|:-----------------|:---------------|:----------------------|:---------------|
  20. | teng | 15534772 | 2125313(13.7%) | 2090049(13.5%) | 6013711(38.7%) | 143718(0.9%) | 5161980(33.2%) |
  21. | breakdown of mv_ratio in profile stream EU | teng | all_eus with mv-cmds |
  22. |:---------------------------------------------|:----------------|:-----------------------|
  23. | total_cyc_num | 6013711(100.0%) | 6013711(100.0%) |
  24. | mv,affine,unpack_lsb | 2325432(38.7%) | 2325432(38.7%) |
  25. | weight0_mode,convNxM,mode23,nopad | 729952(12.1%) | 729952(12.1%) |
  26. | teng_binary_mul | 634426(10.5%) | 634426(10.5%) |
  27. | mv,subtensor | 427299(7.1%) | 427299(7.1%) |
  28. | teng_binary_add | 317213(5.3%) | 317213(5.3%) |
  29. | dequant | 317188(5.3%) | 317188(5.3%) |
  30. | mv,concat_c | 296713(4.9%) | 296713(4.9%) |
  31. | mv_patch | 262377(4.4%) | 262377(4.4%) |
  32. | weight0_mode,convNxM,mode20,nopad | 214922(3.6%) | 214922(3.6%) |
  33. | mv,depth2space | 211328(3.5%) | 211328(3.5%) |
  34. | mv,upsample | 121582(2.0%) | 121582(2.0%) |
  35. | mv,padding_ch | 105049(1.7%) | 105049(1.7%) |
  36. | const | 28323(0.5%) | 28323(0.5%) |
  37. | mv,padding | 21678(0.4%) | 21678(0.4%) |
  38. | revert_split | 229(0.0%) | 229(0.0%) |
  39. | EU | work_cyc | tot_cyc | ratio | fps | fps_bnd |
  40. |:-----------|-----------:|----------:|:--------|-------:|----------:|
  41. | conv-1core | 7776901 | 16969628 | 45.0% | 47.140 | 102.870 |
  42. | teng | 15534772 | 16969628 | 91.0% | 47.140 | 51.500 |
  43. inference: 21.2 ms, 47.14 fps
  44. qps = fps * batch_size = 47.14
  45. simulated fps is based on DDR_BW: 1.59 GB/s
  46. DDR IO stats:
  47. ideal_input_data_size: 4317504 Byte,
  48. ideal_output_data_size: 6536000 Byte,
  49. extra_mid_io_data_size: 7475200 Byte,
  50. total_io_data_size: 18328704 Byte
  51. MAC per inference: 6109250560 MAC@int8
  52. MAC utils: 31.25 %
  53. commit_id:
  54. | breakdown of cmds_num for each op | cmds_num | percentage |
  55. |:----------------------------------------------|-----------:|:-------------|
  56. | mv,affine,unpack_lsb | 1676 | 30.53% |
  57. | weight0_mode,mode23 | 1310 | 23.86% |
  58. | weight0_mode,mode23,nopad | 318 | 5.79% |
  59. | weight0_mode,convNxM,mode23,nopad | 296 | 5.39% |
  60. | pool,weight0_mode,mode20,conv_align,subtensor | 288 | 5.25% |
  61. | weight0_mode,mode20 | 204 | 3.72% |
  62. | mv,subtensor | 185 | 3.37% |
  63. | revert_split | 170 | 3.10% |
  64. | weight0_mode,convNxM,mode26,nopad | 154 | 2.81% |
  65. | weight0_mode,convNxM,mode20,nopad | 144 | 2.62% |
  66. | const | 135 | 2.46% |
  67. | mv,concat_c | 133 | 2.42% |
  68. | weight0_mode,mode20,nopad | 120 | 2.19% |
  69. | pool,weight0_mode,mode20,subtensor | 96 | 1.75% |
  70. | teng_binary_mul | 77 | 1.40% |
  71. | dequant | 34 | 0.62% |
  72. | teng_binary_add | 34 | 0.62% |
  73. | mv,depth2space | 32 | 0.58% |
  74. | mv_patch | 30 | 0.55% |
  75. | mv,upsample | 27 | 0.49% |
  76. | yuv42xto444,yuv2bgr0 | 11 | 0.20% |
  77. | mv,padding_ch | 10 | 0.18% |
  78. | mv,padding | 6 | 0.11% |
  79. | total_num | 5490 | 100% |
  80. subgraph num: 6