pprof 性能分析

面对一个未知的程序,如何去分析这个程序的性能,并找到瓶颈点呢?

pprof 就是用来解决这个问题的。pprof 包含两部分:

  • 编译到程序中的 runtime/pprof
  • 性能剖析工具 go tool pprof

性能分析

记录性能数据会对程序的性能产生影响,建议一次只记录一类数据。

CPU性能分析

生成profile

Go 的运行时性能分析接口都位于 runtime/pprof 包中。只需要调用 runtime/pprof 库即可得到我们想要的数据。

假设我们实现了这么一个程序,随机生成了 5 组数据,并且使用冒泡排序法排序。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package main

import (
"fmt"
"math/rand"
"os"
"runtime/pprof"
"time"
)

func generate(n int) []int {
rand.Seed(time.Now().UnixNano())
nums := make([]int, 0)
for i := 0; i < n; i++ {
nums = append(nums, rand.Int())
}
return nums
}

func bubbleSort(nums []int) {
for i := 0; i < len(nums); i++ {
for j := 1; j < len(nums)-1; j++ {
if nums[j] < nums[j-1] {
nums[j], nums[j-1] = nums[j-1], nums[j]
}
}
}
}

func main() {
n := 10
for i := 0; i < 5; i++ {
nums := generate(n)
bubbleSort(nums)
n *= 10
}

fmt.Println("执行完毕")
}

如果我们想度量这个应用程序的 CPU 性能数据,只需要在 main 函数中添加 2 行代码即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import (
"math/rand"
"os"
"runtime/pprof"
"time"
)

func main() {
pprof.StartCPUProfile(os.Stdout)
defer pprof.StopCPUProfile()
n := 10
for i := 0; i < 5; i++ {
nums := generate(n)
bubbleSort(nums)
n *= 10
}

fmt.Println("执行完毕")
}

为了简单,直接将数据输出到标准输出 os.Stdout。运行该程序,将输出定向到文件 cpu.pprof 中。

一般来说,不建议将结果直接输出到标准输出,因为如果程序本身有输出,则会相互干扰,直接记录到一个文件中是最好的方式。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
func startCPUProfile() {
f, _ := os.OpenFile("./cpu.pprof", os.O_CREATE|os.O_RDWR, 0644)
defer f.Close()
_ = pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
}

func main() {
startCPUProfile()
n := 10
for i := 0; i < 5; i++ {
nums := generate(n)
bubbleSort(nums)
n *= 10
}

fmt.Println("执行完毕")
}

这样只需运行 go run main.go 即可。

分析数据

接下来,可以用 go tool pprof 分析这份数据

1
go tool pprof -http=:9999 cpu.pprof

访问 localhost:9999, 即可看到

除了在网页中查看分析数据外,我们也可以在命令行中使用交互模式查看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
go tool pprof  /var/folders/dd/11ddhj_s2dbdnj8mx91b97800000gn/T/profile2456682500/mem.pprof                                                                                    1

Type: inuse_space
Time: Oct 26, 2022 at 10:35am (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 548.73kB, 98.95% of 554.55kB total
Dropped 56 nodes (cum <= 2.77kB)
flat flat% sum% cum cum%
524.61kB 94.60% 94.60% 546.48kB 98.55% main.concat
21.88kB 3.94% 98.55% 21.88kB 3.94% main.randomString
2.25kB 0.41% 98.95% 3.31kB 0.6% runtime.allocm
0 0% 98.95% 548.61kB 98.93% main.main
0 0% 98.95% 548.61kB 98.93% runtime.main
0 0% 98.95% 3.31kB 0.6% runtime.newm
0 0% 98.95% 3.31kB 0.6% runtime.resetspinning
0 0% 98.95% 3.31kB 0.6% runtime.schedule
0 0% 98.95% 3.31kB 0.6% runtime.startm
0 0% 98.95% 3.31kB 0.6% runtime.wakep

还可以按照 cum (累计消耗)排序:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(pprof) top -cum
Showing nodes accounting for 548.73kB, 98.95% of 554.55kB total
Dropped 56 nodes (cum <= 2.77kB)
flat flat% sum% cum cum%
0 0% 0% 548.61kB 98.93% main.main
0 0% 0% 548.61kB 98.93% runtime.main
524.61kB 94.60% 94.60% 546.48kB 98.55% main.concat
21.88kB 3.94% 98.55% 21.88kB 3.94% main.randomString
2.25kB 0.41% 98.95% 3.31kB 0.6% runtime.allocm
0 0% 98.95% 3.31kB 0.6% runtime.newm
0 0% 98.95% 3.31kB 0.6% runtime.resetspinning
0 0% 98.95% 3.31kB 0.6% runtime.schedule
0 0% 98.95% 3.31kB 0.6% runtime.startm
0 0% 98.95% 3.31kB 0.6% runtime.wakep

help 可以查看所有支持的命令和选项:

1
2
3
4
5
6
7
8
9
10
11
(pprof) help
Commands:
callgrind Outputs a graph in callgrind format
comments Output all profile comments
disasm Output assembly listings annotated with samples
dot Outputs a graph in DOT format
eog Visualize graph through eog
evince Visualize graph through evince
gif Outputs a graph image in GIF format
gv Visualize graph through gv
......

内存性能分析

生成profile

假设我们实现了这么一个程序,生成长度为 N 的随机字符串,拼接在一起。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
package main

import (
"github.com/pkg/profile"
"math/rand"
)

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func randomString(n int) string {
b := make([]byte, n)
for i := range b {
b[i] = letterBytes[rand.Intn(len(letterBytes))]
}
return string(b)
}

func concat(n int) string {
s := ""
for i := 0; i < n; i++ {
s += randomString(n)
}
return s
}

func main() {
defer profile.Start(profile.MemProfile, profile.MemProfileRate(1)).Stop()
concat(100)
}

接下来,我们使用一个易用性更强的库 pkg/profile 来采集性能数据,pkg/profile 封装了 runtime/pprof 的接口,使用起来更简单。

比如我们想度量 concat() 的 CPU 性能数据,只需要一行代码即可生成 profile 文件。

1
2
3
4
5
6
7
8
import (
"github.com/pkg/profile"
)

func main() {
defer profile.Start().Stop()
concat(100)
}

运行 go run main.go:

1
2
3
go run main.go 
2022/10/26 10:33:08 profile: cpu profiling enabled, /var/folders/dd/11ddhj_s2dbdnj8mx91b97800000gn/T/profile3671108945/cpu.pprof
2022/10/26 10:33:08 profile: cpu profiling disabled, /var/folders/dd/11ddhj_s2dbdnj8mx91b97800000gn/T/profile3671108945/cpu.pprof

CPU profile 文件已经在 tmp 目录生成,得到 profile 文件后,就可以像之前一样,用 go tool pprof 命令,在浏览器或命令行进行分析了。

接下来将使用类似的方式,进行采集内存数据,同样地,只需简单地修改 main 函数即可。

1
2
3
4
func main() {
defer profile.Start(profile.MemProfile, profile.MemProfileRate(1)).Stop()
concat(100)
}

运行 go run main.go

1
2
3
go run main.go                                                             
2022/10/26 10:35:50 profile: memory profiling enabled (rate 1), /var/folders/dd/11ddhj_s2dbdnj8mx91b97800000gn/T/profile2456682500/mem.pprof
2022/10/26 10:35:50 profile: memory profiling disabled, /var/folders/dd/11ddhj_s2dbdnj8mx91b97800000gn/T/profile2456682500/mem.pprof

分析数据

接下来,我们就可以在浏览器中分析内存性能数据:

1
2
3
go tool pprof -http=:9999  /var/folders/dd/11ddhj_s2dbdnj8mx91b97800000gn/T/profile2456682500/mem.pprof

Serving web UI on http://localhost:9999

Benchmark生成profile

Benchmark除了直接在命令行中查看测试的结果外,也可以生成 profile 文件,使用 go tool pprof 分析。

testing 支持生成 CPUmemoryblockprofile 文件。

  • -cpuprofile=$FILE
  • -memprofile=$FILE, -memprofilerate=N 调整记录速率为原来的1/N
  • -blockprofile=$FILE

只需要在 go test 添加 -cpuprofile 参数即可生成 BenchmarkFib 对应的 CPU profile 文件:

1
2
3
4
5
6
7
8
go test -bench="Fib$" -cpuprofile=cpu.pprof .
goos: darwin
goarch: amd64
pkg: Go/pprof/benchmark
cpu: Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz
BenchmarkFib-4 265 4732255 ns/op
PASS
ok Go/pprof/benchmark 1.827s

用例执行完毕后,当前目录多出了一个 cpu.pprof 文件,接下来就可以使用 go tool pprof 命令进行分析了。

1
2
go tool pprof -http=:9999 cpu.pprof
Serving web UI on http://localhost:9999

也可以使用 -text 选项可以直接将结果以文本形式打印出来。

1
2
3
4
5
6
7
8
9
10
> go tool pprof -text cpu.pprof
Type: cpu
Time: Oct 26, 2022 at 10:53am (CST)
Duration: 1.75s, Total samples = 540ms (30.80%)
Showing nodes accounting for 540ms, 100% of 540ms total
flat flat% sum% cum cum%
540ms 100% 100% 540ms 100% Go/pprof/benchmark.fib
0 0% 100% 540ms 100% Go/pprof/benchmark.BenchmarkFib
0 0% 100% 540ms 100% testing.(*B).launch
0 0% 100% 540ms 100% testing.(*B).runN

pprof 支持多种输出格式(图片、文本、Web等),直接在命令行中运行 go tool pprof 即可看到所有支持的选项:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
> go tool pprof
Details:
Output formats (select at most one):
-callgrind Outputs a graph in callgrind format
-comments Output all profile comments
-disasm Output assembly listings annotated with samples
-dot Outputs a graph in DOT format
-eog Visualize graph through eog
-evince Visualize graph through evince
-gif Outputs a graph image in GIF format
-gv Visualize graph through gv
-kcachegrind Visualize report in KCachegrind
-list Output annotated source for functions matching regexp
-pdf Outputs a graph in PDF format
-peek Output callers/callees of functions matching regexp
-png Outputs a graph image in PNG format
-proto Outputs the profile in compressed protobuf format
-ps Outputs a graph in PS format
-raw Outputs a text representation of the raw profile
-svg Outputs a graph in SVG format
-tags Outputs all tags in the profile
-text Outputs top entries in text form
-top Outputs top entries in text form
-topproto Outputs top entries in compressed protobuf format
-traces Outputs all profile samples in text form
-tree Outputs a text rendering of call graph
-web Visualize graph through web browser
-weblist Display annotated source in a web browser

Powered by Hexo and Hexo-theme-hiker

Copyright © 2017 - 2022 Keep It Simple And Stupid All Rights Reserved.

访客数 : | 访问量 :