Want to know how to get rough profiling of any tool written in any language, whether you control the source or not? Keep reading.
For example, here's the output you get when you compile Go from source code. Each of these lines prints out a few seconds apart. How would you go about getting timings for each individual step, or how would you notice if one step was suddenly executing much more slowly than it did three days ago?
$ GOROOT_BOOTSTRAP=~/go1.10 ./make.bash Building Go cmd/dist using /Users/kevin/go1.10. Building Go toolchain1 using /Users/kevin/go1.10. Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1. Building Go toolchain2 using go_bootstrap and Go toolchain1. Building Go toolchain3 using go_bootstrap and Go toolchain2. Building packages and commands for darwin/amd64. --- Installed Go for darwin/amd64 in /Users/kevin/go Installed commands in /Users/kevin/go/bin
There are a few different methods you can use to get a sense for this, but I wanted to share my favorite one. Don't reach for a profiler, don't try to set a global variable, or do start/stop timing in code. Don't even start figuring out how to configure a logger to print timestamps, and use a log output format.
Then when you've done all this, rerun the command and pass it through a command
that attaches timestamps to every output line. For example, I have a tool
tss (based on
ts in the moreutils package) that
does this. Pipe your command to
tss and you'll get output like this:
$ GOROOT_BOOTSTRAP=~/go1.10 ./make.bash | tss 11ms Building Go cmd/dist using /Users/kevin/go1.10. 690ms 679ms Building Go toolchain1 using /Users/kevin/go1.10. 11.435s 10.745s Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1. 18.483s 7.048s Building Go toolchain2 using go_bootstrap and Go toolchain1. 40.788s 22.305s Building Go toolchain3 using go_bootstrap and Go toolchain2. 1m0.112s 19.324s Building packages and commands for darwin/amd64. 1m19.053s 18.942s --- 1m19.053s 0s Installed Go for darwin/amd64 in /Users/kevin/go 1m19.053s 0s Installed commands in /Users/kevin/go/bin
The first column is the amount of time that has elapsed since the tool was invoked. The second column is the amount of time that has elapsed since the previous line was printed.
The good thing about this technique is you can use it in pretty much any situation - you don't have to know anything about the code besides:
- how to invoke it
- a rough idea of which code paths get hit
- how to print things to stdout
Just start adding print lines around interesting bits of code.
For example, you might have this in your test suite in Ruby to empty the database. How much time does this add to every test in your suite? Put print lines before and after to find out.
config.after(:each) do puts "database clean start" DatabaseCleaner.clean puts "database clean end" end
Similarly, if you wanted to find out why it takes so long for your test suite to
start running or your web framework to boot, you can annotate your
bootstrap file, the top of the test file, before and after
and the start and end of the first test in the suite. You can even drill down
into third party libraries, just find the source file on disk and start adding
You can also use it with strace! strace will show you a list of the syscalls
being opened by your program. Passing the output through
tss can show you
which syscall activity is taking a lot of time, or when your program is doing
a lot of stuff without going to the disk.
If you pipe program a to program b, and program a exits with a non-zero
return code, Bash will by default report a return code of zero for the entire
operation. This means if you are piping output to
ts you may
accidentally change a failing program to a passing one. Use
set -o pipefail in
Bash scripts to ensure that Bash will return a non-zero return code if any part
of a pipe operation fails. Or add this to a Makefile:
SHELL = /bin/bash -o pipefail
Second, the program being profiled may not flush output to stdout, that is, it may print something but it may not appear on the screen at the same time it was printed. If the timings on screen don't seem to match up with your intuition, check that the program is flushing print statements to the stdout file descriptor after printing them.
Liked what you read? I am available for hire.
If you’re having trouble with stdout buffering, running the program under
stdbuf -o L
might help you out.
Another tool inspired by ts is tj:
Its output is a little more colorful and configurable, and it can output timings to a file very easily.