Skip to content
Snippets Groups Projects
  • eric fang's avatar
    ddc7d2a8
    cmd/compile: add late lower pass for last rules to run · ddc7d2a8
    eric fang authored
    Usually optimization rules have corresponding priorities, some need to
    be run first, some run next, and some run last, which produces the best
    code. But currently our optimization rules have no priority, this CL
    adds a late lower pass that runs those rules that need to be run at last,
    such as split unreasonable constant folding. This pass can be seen as
    the second round of the lower pass.
    
    For example:
    func foo(a, b uint64) uint64 {
            d := a+0x1234568
            d1 := b+0x1234568
            return d&d1
    }
    The code generated by the master branch:
    	0x0004 00004        ADD     $19088744, R0, R2 // movz+movk+add
    	0x0010 00016        ADD     $19088744, R1, R1 // movz+movk+add
    	0x001c 00028        AND     R1, R2, R0
    
    This is because the current constant folding optimization rules do not
    take into account the range of constants, causing the constant to be
    loaded repeatedly. This CL splits these unreasonable constants folding
    in the late lower pass. With this CL the generated code:
    	0x0004 00004        MOVD    $19088744, R2 // movz+movk
    	0x000c 00012        ADD     R0, R2, R3
    	0x0010 00016        ADD     R1, R2, R1
    	0x0014 00020        AND     R1, R3, R0
    
    This CL also adds constant folding optimization for ADDS instruction.
    
    In addition, in order not to introduce the codegen regression, an
    optimization rule is added to change the addition of a negative number
    into a subtraction of a positive number.
    
    go1 benchmarks:
    name                     old time/op    new time/op    delta
    BinaryTree17-8              1.22s ± 1%     1.24s ± 0%  +1.56%  (p=0.008 n=5+5)
    Fannkuch11-8                1.54s ± 0%     1.53s ± 0%  -0.69%  (p=0.016 n=4+5)
    FmtFprintfEmpty-8          14.1ns ± 0%    14.1ns ± 0%    ~     (p=0.079 n=4+5)
    FmtFprintfString-8         26.0ns ± 0%    26.1ns ± 0%  +0.23%  (p=0.008 n=5+5)
    FmtFprintfInt-8            32.3ns ± 0%    32.9ns ± 1%  +1.72%  (p=0.008 n=5+5)
    FmtFprintfIntInt-8         54.5ns ± 0%    55.5ns ± 0%  +1.83%  (p=0.008 n=5+5)
    FmtFprintfPrefixedInt-8    61.5ns ± 0%    62.0ns ± 0%  +0.93%  (p=0.008 n=5+5)
    FmtFprintfFloat-8          72.0ns ± 0%    73.6ns ± 0%  +2.24%  (p=0.008 n=5+5)
    FmtManyArgs-8               221ns ± 0%     224ns ± 0%  +1.22%  (p=0.008 n=5+5)
    GobDecode-8                1.91ms ± 0%    1.93ms ± 0%  +0.98%  (p=0.008 n=5+5)
    GobEncode-8                1.40ms ± 1%    1.39ms ± 0%  -0.79%  (p=0.032 n=5+5)
    Gzip-8                      115ms ± 0%     117ms ± 1%  +1.17%  (p=0.008 n=5+5)
    Gunzip-8                   19.4ms ± 1%    19.3ms ± 0%  -0.71%  (p=0.016 n=5+4)
    HTTPClientServer-8         27.0µs ± 0%    27.3µs ± 0%  +0.80%  (p=0.008 n=5+5)
    JSONEncode-8               3.36ms ± 1%    3.33ms ± 0%    ~     (p=0.056 n=5+5)
    JSONDecode-8               17.5ms ± 2%    17.8ms ± 0%  +1.71%  (p=0.016 n=5+4)
    Mandelbrot200-8            2.29ms ± 0%    2.29ms ± 0%    ~     (p=0.151 n=5+5)
    GoParse-8                  1.35ms ± 1%    1.36ms ± 1%    ~     (p=0.056 n=5+5)
    RegexpMatchEasy0_32-8      24.5ns ± 0%    24.5ns ± 0%    ~     (p=0.444 n=4+5)
    RegexpMatchEasy0_1K-8       131ns ±11%     118ns ± 6%    ~     (p=0.056 n=5+5)
    RegexpMatchEasy1_32-8      22.9ns ± 0%    22.9ns ± 0%    ~     (p=0.905 n=4+5)
    RegexpMatchEasy1_1K-8       126ns ± 0%     127ns ± 0%    ~     (p=0.063 n=4+5)
    RegexpMatchMedium_32-8      486ns ± 5%     483ns ± 0%    ~     (p=0.381 n=5+4)
    RegexpMatchMedium_1K-8     15.4µs ± 1%    15.5µs ± 0%    ~     (p=0.151 n=5+5)
    RegexpMatchHard_32-8        687ns ± 0%     686ns ± 0%    ~     (p=0.103 n=5+5)
    RegexpMatchHard_1K-8       20.7µs ± 0%    20.7µs ± 1%    ~     (p=0.151 n=5+5)
    Revcomp-8                   175ms ± 2%     176ms ± 3%    ~     (p=1.000 n=5+5)
    Template-8                 20.4ms ± 6%    20.1ms ± 2%    ~     (p=0.151 n=5+5)
    TimeParse-8                 112ns ± 0%     113ns ± 0%  +0.97%  (p=0.016 n=5+4)
    TimeFormat-8                156ns ± 0%     145ns ± 0%  -7.14%  (p=0.029 n=4+4)
    
    Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da
    Reviewed-on: https://go-review.googlesource.com/c/go/+/425134
    
    
    Reviewed-by: default avatarKeith Randall <khr@google.com>
    Reviewed-by: default avatarCherry Mui <cherryyz@google.com>
    TryBot-Result: Gopher Robot <gobot@golang.org>
    Reviewed-by: default avatarKeith Randall <khr@golang.org>
    Run-TryBot: Eric Fang <eric.fang@arm.com>
    ddc7d2a8
    History
    cmd/compile: add late lower pass for last rules to run
    eric fang authored
    Usually optimization rules have corresponding priorities, some need to
    be run first, some run next, and some run last, which produces the best
    code. But currently our optimization rules have no priority, this CL
    adds a late lower pass that runs those rules that need to be run at last,
    such as split unreasonable constant folding. This pass can be seen as
    the second round of the lower pass.
    
    For example:
    func foo(a, b uint64) uint64 {
            d := a+0x1234568
            d1 := b+0x1234568
            return d&d1
    }
    The code generated by the master branch:
    	0x0004 00004        ADD     $19088744, R0, R2 // movz+movk+add
    	0x0010 00016        ADD     $19088744, R1, R1 // movz+movk+add
    	0x001c 00028        AND     R1, R2, R0
    
    This is because the current constant folding optimization rules do not
    take into account the range of constants, causing the constant to be
    loaded repeatedly. This CL splits these unreasonable constants folding
    in the late lower pass. With this CL the generated code:
    	0x0004 00004        MOVD    $19088744, R2 // movz+movk
    	0x000c 00012        ADD     R0, R2, R3
    	0x0010 00016        ADD     R1, R2, R1
    	0x0014 00020        AND     R1, R3, R0
    
    This CL also adds constant folding optimization for ADDS instruction.
    
    In addition, in order not to introduce the codegen regression, an
    optimization rule is added to change the addition of a negative number
    into a subtraction of a positive number.
    
    go1 benchmarks:
    name                     old time/op    new time/op    delta
    BinaryTree17-8              1.22s ± 1%     1.24s ± 0%  +1.56%  (p=0.008 n=5+5)
    Fannkuch11-8                1.54s ± 0%     1.53s ± 0%  -0.69%  (p=0.016 n=4+5)
    FmtFprintfEmpty-8          14.1ns ± 0%    14.1ns ± 0%    ~     (p=0.079 n=4+5)
    FmtFprintfString-8         26.0ns ± 0%    26.1ns ± 0%  +0.23%  (p=0.008 n=5+5)
    FmtFprintfInt-8            32.3ns ± 0%    32.9ns ± 1%  +1.72%  (p=0.008 n=5+5)
    FmtFprintfIntInt-8         54.5ns ± 0%    55.5ns ± 0%  +1.83%  (p=0.008 n=5+5)
    FmtFprintfPrefixedInt-8    61.5ns ± 0%    62.0ns ± 0%  +0.93%  (p=0.008 n=5+5)
    FmtFprintfFloat-8          72.0ns ± 0%    73.6ns ± 0%  +2.24%  (p=0.008 n=5+5)
    FmtManyArgs-8               221ns ± 0%     224ns ± 0%  +1.22%  (p=0.008 n=5+5)
    GobDecode-8                1.91ms ± 0%    1.93ms ± 0%  +0.98%  (p=0.008 n=5+5)
    GobEncode-8                1.40ms ± 1%    1.39ms ± 0%  -0.79%  (p=0.032 n=5+5)
    Gzip-8                      115ms ± 0%     117ms ± 1%  +1.17%  (p=0.008 n=5+5)
    Gunzip-8                   19.4ms ± 1%    19.3ms ± 0%  -0.71%  (p=0.016 n=5+4)
    HTTPClientServer-8         27.0µs ± 0%    27.3µs ± 0%  +0.80%  (p=0.008 n=5+5)
    JSONEncode-8               3.36ms ± 1%    3.33ms ± 0%    ~     (p=0.056 n=5+5)
    JSONDecode-8               17.5ms ± 2%    17.8ms ± 0%  +1.71%  (p=0.016 n=5+4)
    Mandelbrot200-8            2.29ms ± 0%    2.29ms ± 0%    ~     (p=0.151 n=5+5)
    GoParse-8                  1.35ms ± 1%    1.36ms ± 1%    ~     (p=0.056 n=5+5)
    RegexpMatchEasy0_32-8      24.5ns ± 0%    24.5ns ± 0%    ~     (p=0.444 n=4+5)
    RegexpMatchEasy0_1K-8       131ns ±11%     118ns ± 6%    ~     (p=0.056 n=5+5)
    RegexpMatchEasy1_32-8      22.9ns ± 0%    22.9ns ± 0%    ~     (p=0.905 n=4+5)
    RegexpMatchEasy1_1K-8       126ns ± 0%     127ns ± 0%    ~     (p=0.063 n=4+5)
    RegexpMatchMedium_32-8      486ns ± 5%     483ns ± 0%    ~     (p=0.381 n=5+4)
    RegexpMatchMedium_1K-8     15.4µs ± 1%    15.5µs ± 0%    ~     (p=0.151 n=5+5)
    RegexpMatchHard_32-8        687ns ± 0%     686ns ± 0%    ~     (p=0.103 n=5+5)
    RegexpMatchHard_1K-8       20.7µs ± 0%    20.7µs ± 1%    ~     (p=0.151 n=5+5)
    Revcomp-8                   175ms ± 2%     176ms ± 3%    ~     (p=1.000 n=5+5)
    Template-8                 20.4ms ± 6%    20.1ms ± 2%    ~     (p=0.151 n=5+5)
    TimeParse-8                 112ns ± 0%     113ns ± 0%  +0.97%  (p=0.016 n=5+4)
    TimeFormat-8                156ns ± 0%     145ns ± 0%  -7.14%  (p=0.029 n=4+4)
    
    Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da
    Reviewed-on: https://go-review.googlesource.com/c/go/+/425134
    
    
    Reviewed-by: default avatarKeith Randall <khr@google.com>
    Reviewed-by: default avatarCherry Mui <cherryyz@google.com>
    TryBot-Result: Gopher Robot <gobot@golang.org>
    Reviewed-by: default avatarKeith Randall <khr@golang.org>
    Run-TryBot: Eric Fang <eric.fang@arm.com>
Code owners
Assign users and groups as approvers for specific file changes. Learn more.