Skip to content
Snippets Groups Projects
  • Joel Sing's avatar
    c6f56985
    math/big: implement addVW in riscv64 assembly · c6f56985
    Joel Sing authored
    This provides an assembly implementation of addVW for riscv64,
    processing up to four words per loop, resulting in a significant
    performance gain.
    
    On a StarFive VisionFive 2:
    
                      │   addvw.1    │               addvw.2               │
                      │    sec/op    │   sec/op     vs base                │
    AddVW/1-4            57.43n ± 0%   41.45n ± 0%  -27.83% (p=0.000 n=10)
    AddVW/2-4            69.31n ± 0%   48.15n ± 0%  -30.53% (p=0.000 n=10)
    AddVW/3-4            76.12n ± 0%   54.97n ± 0%  -27.79% (p=0.000 n=10)
    AddVW/4-4            85.47n ± 0%   56.14n ± 0%  -34.32% (p=0.000 n=10)
    AddVW/5-4            96.16n ± 0%   62.82n ± 0%  -34.67% (p=0.000 n=10)
    AddVW/10-4          149.60n ± 0%   89.55n ± 0%  -40.14% (p=0.000 n=10)
    AddVW/100-4         1115.0n ± 0%   549.3n ± 0%  -50.74% (p=0.000 n=10)
    AddVW/1000-4        10.732µ ± 0%   5.060µ ± 0%  -52.85% (p=0.000 n=10)
    AddVW/10000-4        151.7µ ± 0%   103.7µ ± 0%  -31.63% (p=0.000 n=10)
    AddVW/100000-4       1.523m ± 0%   1.050m ± 0%  -31.03% (p=0.000 n=10)
    AddVWext/1-4         57.42n ± 0%   41.45n ± 0%  -27.81% (p=0.000 n=10)
    AddVWext/2-4         69.32n ± 0%   48.15n ± 0%  -30.54% (p=0.000 n=10)
    AddVWext/3-4         76.12n ± 0%   54.87n ± 0%  -27.92% (p=0.000 n=10)
    AddVWext/4-4         85.47n ± 0%   56.14n ± 0%  -34.32% (p=0.000 n=10)
    AddVWext/5-4         96.15n ± 0%   62.82n ± 0%  -34.66% (p=0.000 n=10)
    AddVWext/10-4       149.60n ± 0%   89.55n ± 0%  -40.14% (p=0.000 n=10)
    AddVWext/100-4      1115.0n ± 0%   549.3n ± 0%  -50.74% (p=0.000 n=10)
    AddVWext/1000-4     10.732µ ± 0%   5.060µ ± 0%  -52.85% (p=0.000 n=10)
    AddVWext/10000-4     150.5µ ± 0%   103.7µ ± 0%  -31.10% (p=0.000 n=10)
    AddVWext/100000-4    1.530m ± 0%   1.049m ± 0%  -31.41% (p=0.000 n=10)
    geomean              1.003µ        633.9n       -36.79%
    
                      │   addvw.1    │                addvw.2                 │
                      │     B/s      │      B/s       vs base                 │
    AddVW/1-4           132.8Mi ± 0%    184.1Mi ± 0%   +38.55% (p=0.000 n=10)
    AddVW/2-4           220.1Mi ± 0%    316.9Mi ± 0%   +43.96% (p=0.000 n=10)
    AddVW/3-4           300.7Mi ± 0%    416.4Mi ± 0%   +38.48% (p=0.000 n=10)
    AddVW/4-4           357.1Mi ± 0%    543.6Mi ± 0%   +52.25% (p=0.000 n=10)
    AddVW/5-4           396.7Mi ± 0%    607.2Mi ± 0%   +53.06% (p=0.000 n=10)
    AddVW/10-4          510.1Mi ± 0%    852.0Mi ± 0%   +67.02% (p=0.000 n=10)
    AddVW/100-4         684.1Mi ± 0%   1389.0Mi ± 0%  +103.03% (p=0.000 n=10)
    AddVW/1000-4        710.9Mi ± 0%   1507.8Mi ± 0%  +112.08% (p=0.000 n=10)
    AddVW/10000-4       503.1Mi ± 0%    735.8Mi ± 0%   +46.26% (p=0.000 n=10)
    AddVW/100000-4      501.0Mi ± 0%    726.5Mi ± 0%   +45.00% (p=0.000 n=10)
    AddVWext/1-4        132.9Mi ± 0%    184.1Mi ± 0%   +38.55% (p=0.000 n=10)
    AddVWext/2-4        220.1Mi ± 0%    316.9Mi ± 0%   +43.98% (p=0.000 n=10)
    AddVWext/3-4        300.7Mi ± 0%    417.1Mi ± 0%   +38.73% (p=0.000 n=10)
    AddVWext/4-4        357.1Mi ± 0%    543.6Mi ± 0%   +52.25% (p=0.000 n=10)
    AddVWext/5-4        396.7Mi ± 0%    607.2Mi ± 0%   +53.05% (p=0.000 n=10)
    AddVWext/10-4       510.1Mi ± 0%    852.0Mi ± 0%   +67.02% (p=0.000 n=10)
    AddVWext/100-4      684.2Mi ± 0%   1389.0Mi ± 0%  +103.02% (p=0.000 n=10)
    AddVWext/1000-4     710.9Mi ± 0%   1507.7Mi ± 0%  +112.08% (p=0.000 n=10)
    AddVWext/10000-4    506.9Mi ± 0%    735.8Mi ± 0%   +45.15% (p=0.000 n=10)
    AddVWext/100000-4   498.6Mi ± 0%    727.0Mi ± 0%   +45.79% (p=0.000 n=10)
    geomean             388.3Mi         614.3Mi        +58.19%
    
    Change-Id: Ib14a4b8c1d81e710753bbf6dd5546bbca44fe3f1
    Reviewed-on: https://go-review.googlesource.com/c/go/+/595397
    
    
    Reviewed-by: default avatarMeng Zhuo <mengzhuo1203@gmail.com>
    Reviewed-by: default avatarCherry Mui <cherryyz@google.com>
    LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
    Reviewed-by: default avatarMark Ryan <markdryan@rivosinc.com>
    Reviewed-by: default avatarDmitri Shuralyov <dmitshur@google.com>
    c6f56985
    History
    math/big: implement addVW in riscv64 assembly
    Joel Sing authored
    This provides an assembly implementation of addVW for riscv64,
    processing up to four words per loop, resulting in a significant
    performance gain.
    
    On a StarFive VisionFive 2:
    
                      │   addvw.1    │               addvw.2               │
                      │    sec/op    │   sec/op     vs base                │
    AddVW/1-4            57.43n ± 0%   41.45n ± 0%  -27.83% (p=0.000 n=10)
    AddVW/2-4            69.31n ± 0%   48.15n ± 0%  -30.53% (p=0.000 n=10)
    AddVW/3-4            76.12n ± 0%   54.97n ± 0%  -27.79% (p=0.000 n=10)
    AddVW/4-4            85.47n ± 0%   56.14n ± 0%  -34.32% (p=0.000 n=10)
    AddVW/5-4            96.16n ± 0%   62.82n ± 0%  -34.67% (p=0.000 n=10)
    AddVW/10-4          149.60n ± 0%   89.55n ± 0%  -40.14% (p=0.000 n=10)
    AddVW/100-4         1115.0n ± 0%   549.3n ± 0%  -50.74% (p=0.000 n=10)
    AddVW/1000-4        10.732µ ± 0%   5.060µ ± 0%  -52.85% (p=0.000 n=10)
    AddVW/10000-4        151.7µ ± 0%   103.7µ ± 0%  -31.63% (p=0.000 n=10)
    AddVW/100000-4       1.523m ± 0%   1.050m ± 0%  -31.03% (p=0.000 n=10)
    AddVWext/1-4         57.42n ± 0%   41.45n ± 0%  -27.81% (p=0.000 n=10)
    AddVWext/2-4         69.32n ± 0%   48.15n ± 0%  -30.54% (p=0.000 n=10)
    AddVWext/3-4         76.12n ± 0%   54.87n ± 0%  -27.92% (p=0.000 n=10)
    AddVWext/4-4         85.47n ± 0%   56.14n ± 0%  -34.32% (p=0.000 n=10)
    AddVWext/5-4         96.15n ± 0%   62.82n ± 0%  -34.66% (p=0.000 n=10)
    AddVWext/10-4       149.60n ± 0%   89.55n ± 0%  -40.14% (p=0.000 n=10)
    AddVWext/100-4      1115.0n ± 0%   549.3n ± 0%  -50.74% (p=0.000 n=10)
    AddVWext/1000-4     10.732µ ± 0%   5.060µ ± 0%  -52.85% (p=0.000 n=10)
    AddVWext/10000-4     150.5µ ± 0%   103.7µ ± 0%  -31.10% (p=0.000 n=10)
    AddVWext/100000-4    1.530m ± 0%   1.049m ± 0%  -31.41% (p=0.000 n=10)
    geomean              1.003µ        633.9n       -36.79%
    
                      │   addvw.1    │                addvw.2                 │
                      │     B/s      │      B/s       vs base                 │
    AddVW/1-4           132.8Mi ± 0%    184.1Mi ± 0%   +38.55% (p=0.000 n=10)
    AddVW/2-4           220.1Mi ± 0%    316.9Mi ± 0%   +43.96% (p=0.000 n=10)
    AddVW/3-4           300.7Mi ± 0%    416.4Mi ± 0%   +38.48% (p=0.000 n=10)
    AddVW/4-4           357.1Mi ± 0%    543.6Mi ± 0%   +52.25% (p=0.000 n=10)
    AddVW/5-4           396.7Mi ± 0%    607.2Mi ± 0%   +53.06% (p=0.000 n=10)
    AddVW/10-4          510.1Mi ± 0%    852.0Mi ± 0%   +67.02% (p=0.000 n=10)
    AddVW/100-4         684.1Mi ± 0%   1389.0Mi ± 0%  +103.03% (p=0.000 n=10)
    AddVW/1000-4        710.9Mi ± 0%   1507.8Mi ± 0%  +112.08% (p=0.000 n=10)
    AddVW/10000-4       503.1Mi ± 0%    735.8Mi ± 0%   +46.26% (p=0.000 n=10)
    AddVW/100000-4      501.0Mi ± 0%    726.5Mi ± 0%   +45.00% (p=0.000 n=10)
    AddVWext/1-4        132.9Mi ± 0%    184.1Mi ± 0%   +38.55% (p=0.000 n=10)
    AddVWext/2-4        220.1Mi ± 0%    316.9Mi ± 0%   +43.98% (p=0.000 n=10)
    AddVWext/3-4        300.7Mi ± 0%    417.1Mi ± 0%   +38.73% (p=0.000 n=10)
    AddVWext/4-4        357.1Mi ± 0%    543.6Mi ± 0%   +52.25% (p=0.000 n=10)
    AddVWext/5-4        396.7Mi ± 0%    607.2Mi ± 0%   +53.05% (p=0.000 n=10)
    AddVWext/10-4       510.1Mi ± 0%    852.0Mi ± 0%   +67.02% (p=0.000 n=10)
    AddVWext/100-4      684.2Mi ± 0%   1389.0Mi ± 0%  +103.02% (p=0.000 n=10)
    AddVWext/1000-4     710.9Mi ± 0%   1507.7Mi ± 0%  +112.08% (p=0.000 n=10)
    AddVWext/10000-4    506.9Mi ± 0%    735.8Mi ± 0%   +45.15% (p=0.000 n=10)
    AddVWext/100000-4   498.6Mi ± 0%    727.0Mi ± 0%   +45.79% (p=0.000 n=10)
    geomean             388.3Mi         614.3Mi        +58.19%
    
    Change-Id: Ib14a4b8c1d81e710753bbf6dd5546bbca44fe3f1
    Reviewed-on: https://go-review.googlesource.com/c/go/+/595397
    
    
    Reviewed-by: default avatarMeng Zhuo <mengzhuo1203@gmail.com>
    Reviewed-by: default avatarCherry Mui <cherryyz@google.com>
    LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
    Reviewed-by: default avatarMark Ryan <markdryan@rivosinc.com>
    Reviewed-by: default avatarDmitri Shuralyov <dmitshur@google.com>
Code owners
Assign users and groups as approvers for specific file changes. Learn more.