Retrieving Debug Information
Each Dgraph data node exposes profile over /debug/pprof
endpoint and metrics over /debug/vars
endpoint. Each Dgraph data node has it’s own profiling and metrics information. Below is a list of debugging information exposed by Dgraph and the corresponding commands to retrieve them.
Metrics Information
If you are collecting these metrics from outside the Dgraph instance you need to pass --expose_trace=true
flag, otherwise there metrics can be collected by connecting to the instance over localhost.
curl http://<IP>:<HTTP_PORT>/debug/vars
Metrics can also be retrieved in the Prometheus format at /debug/prometheus_metrics
. See the Metrics section for the full list of metrics.
Profiling Information
Profiling information is available via the go tool pprof
profiling tool built into Go. The “Profiling Go programs” Go blog post will help you get started with using pprof. Each Dgraph Zero and Dgraph Alpha exposes a debug endpoint at /debug/pprof/<profile>
via the HTTP port.
go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/heap
Fetching profile from ...
Saved Profile in ...
The output of the command would show the location where the profile is stored.
In the interactive pprof shell, you can use commands like top
to get a listing of the top functions in the profile, web
to get a visual graph of the profile opened in a web browser, or list
to display a code listing with profiling information overlaid.
CPU Profile
go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/profile
Memory Profile
go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/heap
Block Profile
Dgraph by default doesn’t collect the block profile. Dgraph must be started with --profile_mode=block
and --block_rate=<N>
with N > 1.
go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/block
Goroutine stack
The HTTP page /debug/pprof/
is available at the HTTP port of a Dgraph Zero or Dgraph Alpha. From this page a link to the “full goroutine stack dump” is available (e.g., on a Dgraph Alpha this page would be at http://localhost:8080/debug/pprof/goroutine?debug=2
). Looking at the full goroutine stack can be useful to understand goroutine usage at that moment.
Profiling Information with debuginfo
Instead of sending a request to the server for each CPU, memory, and goroutine
profile, you can use the debuginfo
command to collect all of these profiles, along with several metrics.
You can run the command like this:
dgraph debuginfo -a <alpha_address:port> -z <zero_address:port> -d <path_to_dir_to_store_profiles>
Your output should look like:
I0311 14:13:53.243667 32654 run.go:118] using directory /tmp/dgraph-debuginfo037351492 for debug info dump.
I0311 14:13:53.243864 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/heap
I0311 14:13:53.243872 32654 debugging.go:70] please wait... (30s)
I0311 14:13:53.245338 32654 debugging.go:58] saving heap metric in /tmp/dgraph-debuginfo037351492/alpha_heap.gz
I0311 14:13:53.245349 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/profile?seconds=30
I0311 14:13:53.245357 32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.250079 32654 debugging.go:58] saving cpu metric in /tmp/dgraph-debuginfo037351492/alpha_cpu.gz
I0311 14:14:23.250148 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/state
I0311 14:14:23.250173 32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.255467 32654 debugging.go:58] saving state metric in /tmp/dgraph-debuginfo037351492/alpha_state.gz
I0311 14:14:23.255507 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/health
I0311 14:14:23.255528 32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.257453 32654 debugging.go:58] saving health metric in /tmp/dgraph-debuginfo037351492/alpha_health.gz
I0311 14:14:23.257507 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/jemalloc
I0311 14:14:23.257548 32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.259009 32654 debugging.go:58] saving jemalloc metric in /tmp/dgraph-debuginfo037351492/alpha_jemalloc.gz
I0311 14:14:23.259055 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/trace?seconds=30
I0311 14:14:23.259091 32654 debugging.go:70] please wait... (30s)
I0311 14:14:53.266092 32654 debugging.go:58] saving trace metric in /tmp/dgraph-debuginfo037351492/alpha_trace.gz
I0311 14:14:53.266152 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/metrics
I0311 14:14:53.266181 32654 debugging.go:70] please wait... (30s)
I0311 14:14:53.276357 32654 debugging.go:58] saving metrics metric in /tmp/dgraph-debuginfo037351492/alpha_metrics.gz
I0311 14:14:53.276414 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/vars
I0311 14:14:53.276439 32654 debugging.go:70] please wait... (30s)
I0311 14:14:53.278295 32654 debugging.go:58] saving vars metric in /tmp/dgraph-debuginfo037351492/alpha_vars.gz
I0311 14:14:53.278340 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/trace?seconds=30
I0311 14:14:53.278366 32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.286770 32654 debugging.go:58] saving trace metric in /tmp/dgraph-debuginfo037351492/alpha_trace.gz
I0311 14:15:23.286830 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/goroutine?debug=2
I0311 14:15:23.286886 32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.291120 32654 debugging.go:58] saving goroutine metric in /tmp/dgraph-debuginfo037351492/alpha_goroutine.gz
I0311 14:15:23.291164 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/block
I0311 14:15:23.291192 32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.304562 32654 debugging.go:58] saving block metric in /tmp/dgraph-debuginfo037351492/alpha_block.gz
I0311 14:15:23.304664 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/mutex
I0311 14:15:23.304706 32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.309171 32654 debugging.go:58] saving mutex metric in /tmp/dgraph-debuginfo037351492/alpha_mutex.gz
I0311 14:15:23.309228 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/threadcreate
I0311 14:15:23.309256 32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.313026 32654 debugging.go:58] saving threadcreate metric in /tmp/dgraph-debuginfo037351492/alpha_threadcreate.gz
I0311 14:15:23.385359 32654 run.go:150] Debuginfo archive successful: dgraph-debuginfo037351492.tar.gz
When the command finishes, debuginfo
returns the tarball’s file name. If no destination has been specified, the file will be created in the same directory from where you ran the debuginfo
command.
The following files contain the metrics collected by the debuginfo
command:
dgraph-debuginfo639541060
├── alpha_block.gz
├── alpha_goroutine.gz
├── alpha_health.gz
├── alpha_heap.gz
├── alpha_jemalloc.gz
├── alpha_mutex.gz
├── alpha_profile.gz
├── alpha_state.gz
├── alpha_threadcreate.gz
├── alpha_trace.gz
├── zero_block.gz
├── zero_goroutine.gz
├── zero_health.gz
├── zero_heap.gz
├── zero_jemalloc.gz
├── zero_mutex.gz
├── zero_profile.gz
├── zero_state.gz
├── zero_threadcreate.gz
└── zero_trace.gz
Command parameters
-a, --alpha string Address of running dgraph alpha. (default "localhost:8080")
-x, --archive Whether to archive the generated report (default true)
-d, --directory string Directory to write the debug info into.
-h, --help help for debuginfo
-m, --metrics strings List of metrics & profiles to dump in the report. (default [heap,cpu,state,health,jemalloc,trace,metrics,vars,trace,goroutine,block,mutex,threadcreate])
-s, --seconds uint32 Duration for time-based metric collection. (default 30)
-z, --zero string Address of running dgraph zero.
The metrics flag (-m
)
By default, debuginfo
collects:
heap
cpu
state
health
jemalloc
trace
metrics
vars
trace
goroutine
block
mutex
threadcreate
If needed, you can collect some of them (not necessarily all). For example, this command will collect only jemalloc
and health
profiles:
dgraph debuginfo -m jemalloc,health
Profiles details
-
cpu profile
: CPU profile determines where a program spends its time while actively consuming CPU cycles (as opposed to while sleeping or waiting for I/O). -
heap
: Heap profile reports memory allocation samples; used to monitor current and historical memory usage, and to check for memory leaks. -
threadcreate
: Thread creation profile reports the sections of the program that lead the creation of new OS threads. -
goroutine
: Goroutine profile reports the stack traces of all current goroutines. -
block
: Block profile shows where goroutines block waiting on synchronization primitives (including timer channels). -
mutex
: Mutex profile reports the lock contentions. When you think your CPU is not fully utilized due to a mutex contention, use this profile. -
trace
: this capture a wide range of runtime events. Execution tracer is a tool to detect latency and utilization problems. You can examine how well the CPU is utilized, and when networking or syscalls are a cause of preemption for the goroutines. Tracer is useful to identify poorly parallelized execution, understand some of the core runtime events, and how your goroutines execute.