Skip to content
Snippets Groups Projects
  • limeidan's avatar
    ecdd429a
    runtime: optimize the function memequal using SIMD on loong64 · ecdd429a
    limeidan authored
    goos: linux
    goarch: loong64
    pkg: bytes
    cpu: Loongson-3A6000-HV @ 2500.00MHz
                                  │      old      │                 new                  │
                                  │    sec/op     │    sec/op     vs base                │
    Equal/0                          0.4012n ± 0%   0.4003n ± 0%   -0.21% (p=0.000 n=10)
    Equal/same/1                      2.555n ± 1%    2.419n ± 0%   -5.32% (p=0.000 n=10)
    Equal/same/6                      2.574n ± 1%    2.425n ± 1%   -5.79% (p=0.000 n=10)
    Equal/same/9                      2.578n ± 0%    2.419n ± 1%   -6.19% (p=0.000 n=10)
    Equal/same/15                     2.565n ± 1%    2.417n ± 0%   -5.73% (p=0.000 n=10)
    Equal/same/16                     2.576n ± 1%    2.414n ± 0%   -6.31% (p=0.000 n=10)
    Equal/same/20                     2.573n ± 1%    2.416n ± 0%   -6.10% (p=0.000 n=10)
    Equal/same/32                     2.559n ± 0%    2.411n ± 0%   -5.80% (p=0.000 n=10)
    Equal/same/4K                     2.579n ± 1%    2.410n ± 0%   -6.53% (p=0.000 n=10)
    Equal/same/4M                     2.571n ± 0%    2.411n ± 0%   -6.22% (p=0.000 n=10)
    Equal/same/64M                    2.568n ± 1%    2.413n ± 0%   -6.05% (p=0.000 n=10)
    Equal/1                           5.215n ± 0%    6.404n ± 0%  +22.80% (p=0.000 n=10)
    Equal/6                          11.630n ± 0%    6.404n ± 0%  -44.94% (p=0.000 n=10)
    Equal/9                          15.240n ± 0%    6.404n ± 0%  -57.98% (p=0.000 n=10)
    Equal/15                         22.925n ± 0%    6.404n ± 0%  -72.07% (p=0.000 n=10)
    Equal/16                         24.070n ± 0%    5.203n ± 0%  -78.38% (p=0.000 n=10)
    Equal/20                         28.880n ± 0%    6.404n ± 0%  -77.83% (p=0.000 n=10)
    Equal/32                         43.320n ± 0%    6.404n ± 0%  -85.22% (p=0.000 n=10)
    Equal/4K                        4938.50n ± 0%    55.43n ± 0%  -98.88% (p=0.000 n=10)
    Equal/4M                         5048.8µ ± 0%    202.0µ ± 0%  -96.00% (p=0.000 n=10)
    Equal/64M                        80.819m ± 0%    4.539m ± 0%  -94.38% (p=0.000 n=10)
    EqualBothUnaligned/64_0          79.830n ± 0%    4.803n ± 0%  -93.98% (p=0.000 n=10)
    EqualBothUnaligned/64_1          79.830n ± 0%    4.803n ± 0%  -93.98% (p=0.000 n=10)
    EqualBothUnaligned/64_4          79.830n ± 0%    4.803n ± 0%  -93.98% (p=0.000 n=10)
    EqualBothUnaligned/64_7          79.830n ± 0%    4.803n ± 0%  -93.98% (p=0.000 n=10)
    EqualBothUnaligned/4096_0       4937.00n ± 0%    65.64n ± 0%  -98.67% (p=0.000 n=10)
    EqualBothUnaligned/4096_1       4937.00n ± 0%    78.85n ± 0%  -98.40% (p=0.000 n=10)
    EqualBothUnaligned/4096_4       4937.00n ± 0%    78.87n ± 0%  -98.40% (p=0.000 n=10)
    EqualBothUnaligned/4096_7       4937.00n ± 0%    78.87n ± 0%  -98.40% (p=0.000 n=10)
    EqualBothUnaligned/4194304_0     5049.2µ ± 0%    204.2µ ± 0%  -95.96% (p=0.000 n=10)
    EqualBothUnaligned/4194304_1     5049.2µ ± 0%    205.1µ ± 0%  -95.94% (p=0.000 n=10)
    EqualBothUnaligned/4194304_4     5049.4µ ± 0%    205.1µ ± 0%  -95.94% (p=0.000 n=10)
    EqualBothUnaligned/4194304_7     5049.2µ ± 0%    205.1µ ± 0%  -95.94% (p=0.000 n=10)
    EqualBothUnaligned/67108864_0    80.796m ± 0%    3.863m ± 0%  -95.22% (p=0.000 n=10)
    EqualBothUnaligned/67108864_1    80.801m ± 0%    3.706m ± 0%  -95.41% (p=0.000 n=10)
    EqualBothUnaligned/67108864_4    80.799m ± 0%    3.706m ± 0%  -95.41% (p=0.000 n=10)
    EqualBothUnaligned/67108864_7    80.781m ± 0%    3.706m ± 0%  -95.41% (p=0.000 n=10)
    geomean                           1.040µ         149.6n       -85.63%
    
    Change-Id: Id4c2bc0ca758337dd9759df83750c761814be488
    Reviewed-on: https://go-review.googlesource.com/c/go/+/667255
    
    
    Reviewed-by: default avatarabner chenc <chenguoqi@loongson.cn>
    Reviewed-by: default avatarMichael Pratt <mpratt@google.com>
    Reviewed-by: default avatarsophie zhao <zhaoxiaolin@loongson.cn>
    LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
    Reviewed-by: default avatarJunyang Shao <shaojunyang@google.com>
    ecdd429a
    History
    runtime: optimize the function memequal using SIMD on loong64
    limeidan authored
    goos: linux
    goarch: loong64
    pkg: bytes
    cpu: Loongson-3A6000-HV @ 2500.00MHz
                                  │      old      │                 new                  │
                                  │    sec/op     │    sec/op     vs base                │
    Equal/0                          0.4012n ± 0%   0.4003n ± 0%   -0.21% (p=0.000 n=10)
    Equal/same/1                      2.555n ± 1%    2.419n ± 0%   -5.32% (p=0.000 n=10)
    Equal/same/6                      2.574n ± 1%    2.425n ± 1%   -5.79% (p=0.000 n=10)
    Equal/same/9                      2.578n ± 0%    2.419n ± 1%   -6.19% (p=0.000 n=10)
    Equal/same/15                     2.565n ± 1%    2.417n ± 0%   -5.73% (p=0.000 n=10)
    Equal/same/16                     2.576n ± 1%    2.414n ± 0%   -6.31% (p=0.000 n=10)
    Equal/same/20                     2.573n ± 1%    2.416n ± 0%   -6.10% (p=0.000 n=10)
    Equal/same/32                     2.559n ± 0%    2.411n ± 0%   -5.80% (p=0.000 n=10)
    Equal/same/4K                     2.579n ± 1%    2.410n ± 0%   -6.53% (p=0.000 n=10)
    Equal/same/4M                     2.571n ± 0%    2.411n ± 0%   -6.22% (p=0.000 n=10)
    Equal/same/64M                    2.568n ± 1%    2.413n ± 0%   -6.05% (p=0.000 n=10)
    Equal/1                           5.215n ± 0%    6.404n ± 0%  +22.80% (p=0.000 n=10)
    Equal/6                          11.630n ± 0%    6.404n ± 0%  -44.94% (p=0.000 n=10)
    Equal/9                          15.240n ± 0%    6.404n ± 0%  -57.98% (p=0.000 n=10)
    Equal/15                         22.925n ± 0%    6.404n ± 0%  -72.07% (p=0.000 n=10)
    Equal/16                         24.070n ± 0%    5.203n ± 0%  -78.38% (p=0.000 n=10)
    Equal/20                         28.880n ± 0%    6.404n ± 0%  -77.83% (p=0.000 n=10)
    Equal/32                         43.320n ± 0%    6.404n ± 0%  -85.22% (p=0.000 n=10)
    Equal/4K                        4938.50n ± 0%    55.43n ± 0%  -98.88% (p=0.000 n=10)
    Equal/4M                         5048.8µ ± 0%    202.0µ ± 0%  -96.00% (p=0.000 n=10)
    Equal/64M                        80.819m ± 0%    4.539m ± 0%  -94.38% (p=0.000 n=10)
    EqualBothUnaligned/64_0          79.830n ± 0%    4.803n ± 0%  -93.98% (p=0.000 n=10)
    EqualBothUnaligned/64_1          79.830n ± 0%    4.803n ± 0%  -93.98% (p=0.000 n=10)
    EqualBothUnaligned/64_4          79.830n ± 0%    4.803n ± 0%  -93.98% (p=0.000 n=10)
    EqualBothUnaligned/64_7          79.830n ± 0%    4.803n ± 0%  -93.98% (p=0.000 n=10)
    EqualBothUnaligned/4096_0       4937.00n ± 0%    65.64n ± 0%  -98.67% (p=0.000 n=10)
    EqualBothUnaligned/4096_1       4937.00n ± 0%    78.85n ± 0%  -98.40% (p=0.000 n=10)
    EqualBothUnaligned/4096_4       4937.00n ± 0%    78.87n ± 0%  -98.40% (p=0.000 n=10)
    EqualBothUnaligned/4096_7       4937.00n ± 0%    78.87n ± 0%  -98.40% (p=0.000 n=10)
    EqualBothUnaligned/4194304_0     5049.2µ ± 0%    204.2µ ± 0%  -95.96% (p=0.000 n=10)
    EqualBothUnaligned/4194304_1     5049.2µ ± 0%    205.1µ ± 0%  -95.94% (p=0.000 n=10)
    EqualBothUnaligned/4194304_4     5049.4µ ± 0%    205.1µ ± 0%  -95.94% (p=0.000 n=10)
    EqualBothUnaligned/4194304_7     5049.2µ ± 0%    205.1µ ± 0%  -95.94% (p=0.000 n=10)
    EqualBothUnaligned/67108864_0    80.796m ± 0%    3.863m ± 0%  -95.22% (p=0.000 n=10)
    EqualBothUnaligned/67108864_1    80.801m ± 0%    3.706m ± 0%  -95.41% (p=0.000 n=10)
    EqualBothUnaligned/67108864_4    80.799m ± 0%    3.706m ± 0%  -95.41% (p=0.000 n=10)
    EqualBothUnaligned/67108864_7    80.781m ± 0%    3.706m ± 0%  -95.41% (p=0.000 n=10)
    geomean                           1.040µ         149.6n       -85.63%
    
    Change-Id: Id4c2bc0ca758337dd9759df83750c761814be488
    Reviewed-on: https://go-review.googlesource.com/c/go/+/667255
    
    
    Reviewed-by: default avatarabner chenc <chenguoqi@loongson.cn>
    Reviewed-by: default avatarMichael Pratt <mpratt@google.com>
    Reviewed-by: default avatarsophie zhao <zhaoxiaolin@loongson.cn>
    LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
    Reviewed-by: default avatarJunyang Shao <shaojunyang@google.com>
Code owners
Assign users and groups as approvers for specific file changes. Learn more.