1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495 |
- | Pre_alloc OCM | linebuffer(may SWAP later) | ringbuffer | parameter |
- |:-------------------------|:-----------------------------|:-------------|:-------------------|
- | size(ratio of whole OCM) | 0(0.0)% | 0(0.0)% | 112128(5.3)% |
- | range | (None, None) | (None, None) | (1985024, 2097152) |
- | Pre_alloc DDR | ringbuffer | feature_swap |
- |:----------------|-------------:|---------------:|
- | size(M) | 0.00 | 0.00 |
- | profile conv | work_cyc | linebuf | warmup_tail | core_idle | io_idle | stride2_idle | standalone_fetch | MAC |
- |:---------------|:-----------------|:--------------|:--------------|:------------|:--------------|:----------------|:-------------------|:----------------|
- | ratio in conv | 7776900 (100.0%) | 195139 (2.5%) | 273265 (3.5%) | 0 (0.0%) | 225640 (2.9%) | 1489405 (19.2%) | 285 (0.0%) | 5303168 (68.2%) |
- | profile ideal DDR_IO | min_io_sum | min_params_read | min_inputs_read | min_outputs_write |
- |:-----------------------|:------------------|:------------------|:------------------|:--------------------|
- | DDR IO size (Byte) | 10853504 (100.0%) | 3907904 (36.0%) | 409600 (3.8%) | 6536000 (60.2%) |
- | profile extra DDR_IO | extra_ddr_io | extra_params_read | extra_inputs_read | extra_outputs_wrt | extra_swap_read | extra_swap_wrt | ddr_rb_read | ddr_rb_wrt |
- |:-----------------------|:-----------------|:--------------------|:--------------------|:--------------------|:------------------|:-----------------|:--------------|:-------------|
- | DDR IO size (Byte) | 7475200 (100.0%) | 0 (0.0%) | 3788800 (50.7%) | 3686400 (49.3%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
- profile stream EU: ld/st_ratio might include ringbuf/linebuf/feature_swap parts; mv_ratio migth have ringbuf part.
- | profile stream EU | work_cyc | ld_ratio | ld_param_ratio | mv_ratio | mv_linebuffer_ratio | st_ratio |
- |:--------------------|-----------:|:---------------|:-----------------|:---------------|:----------------------|:---------------|
- | teng | 15534772 | 2125313(13.7%) | 2090049(13.5%) | 6013711(38.7%) | 143718(0.9%) | 5161980(33.2%) |
- | breakdown of mv_ratio in profile stream EU | teng | all_eus with mv-cmds |
- |:---------------------------------------------|:----------------|:-----------------------|
- | total_cyc_num | 6013711(100.0%) | 6013711(100.0%) |
- | mv,affine,unpack_lsb | 2325432(38.7%) | 2325432(38.7%) |
- | weight0_mode,convNxM,mode23,nopad | 729952(12.1%) | 729952(12.1%) |
- | teng_binary_mul | 634426(10.5%) | 634426(10.5%) |
- | mv,subtensor | 427299(7.1%) | 427299(7.1%) |
- | teng_binary_add | 317213(5.3%) | 317213(5.3%) |
- | dequant | 317188(5.3%) | 317188(5.3%) |
- | mv,concat_c | 296713(4.9%) | 296713(4.9%) |
- | mv_patch | 262377(4.4%) | 262377(4.4%) |
- | weight0_mode,convNxM,mode20,nopad | 214922(3.6%) | 214922(3.6%) |
- | mv,depth2space | 211328(3.5%) | 211328(3.5%) |
- | mv,upsample | 121582(2.0%) | 121582(2.0%) |
- | mv,padding_ch | 105049(1.7%) | 105049(1.7%) |
- | const | 28323(0.5%) | 28323(0.5%) |
- | mv,padding | 21678(0.4%) | 21678(0.4%) |
- | revert_split | 229(0.0%) | 229(0.0%) |
- | EU | work_cyc | tot_cyc | ratio | fps | fps_bnd |
- |:-----------|-----------:|----------:|:--------|-------:|----------:|
- | conv-1core | 7776901 | 16969628 | 45.0% | 47.140 | 102.870 |
- | teng | 15534772 | 16969628 | 91.0% | 47.140 | 51.500 |
- inference: 21.2 ms, 47.14 fps
- qps = fps * batch_size = 47.14
- simulated fps is based on DDR_BW: 1.59 GB/s
- DDR IO stats:
- ideal_input_data_size: 4317504 Byte,
- ideal_output_data_size: 6536000 Byte,
- extra_mid_io_data_size: 7475200 Byte,
- total_io_data_size: 18328704 Byte
- MAC per inference: 6109250560 MAC@int8
- MAC utils: 31.25 %
- commit_id:
- | breakdown of cmds_num for each op | cmds_num | percentage |
- |:----------------------------------------------|-----------:|:-------------|
- | mv,affine,unpack_lsb | 1676 | 30.53% |
- | weight0_mode,mode23 | 1310 | 23.86% |
- | weight0_mode,mode23,nopad | 318 | 5.79% |
- | weight0_mode,convNxM,mode23,nopad | 296 | 5.39% |
- | pool,weight0_mode,mode20,conv_align,subtensor | 288 | 5.25% |
- | weight0_mode,mode20 | 204 | 3.72% |
- | mv,subtensor | 185 | 3.37% |
- | revert_split | 170 | 3.10% |
- | weight0_mode,convNxM,mode26,nopad | 154 | 2.81% |
- | weight0_mode,convNxM,mode20,nopad | 144 | 2.62% |
- | const | 135 | 2.46% |
- | mv,concat_c | 133 | 2.42% |
- | weight0_mode,mode20,nopad | 120 | 2.19% |
- | pool,weight0_mode,mode20,subtensor | 96 | 1.75% |
- | teng_binary_mul | 77 | 1.40% |
- | dequant | 34 | 0.62% |
- | teng_binary_add | 34 | 0.62% |
- | mv,depth2space | 32 | 0.58% |
- | mv_patch | 30 | 0.55% |
- | mv,upsample | 27 | 0.49% |
- | yuv42xto444,yuv2bgr0 | 11 | 0.20% |
- | mv,padding_ch | 10 | 0.18% |
- | mv,padding | 6 | 0.11% |
- | total_num | 5490 | 100% |
- subgraph num: 6
|