GDB(GNU Debugger)
是 GNU
项目的调试器,允许你在另一个程序执行时看到它内部
发生了什么,或者另一个程序在崩溃时正在做什么。本文以 Python多线程
在 waiter.acquire()
中概率地阻塞场景的 GDB
。
GDB支持的语言
- Ada
- Assembly
- C
- C++
- D
- Fortran
- Go
- Objective-C
- OpenCL
- Python
- Modula-2
- Pascal
- Rust
安装
sudo apt install gdb python3.8-dbg -y
sudo yum install gdb python-debuginfo
or
sudo yum install yum-utils
sudo debuginfo-install glibc
sudo yum install gdb python-debuginfo
使用
gdb –args
gdb --args <test-bin> arg1 arg2 ...
# 运行,使用 help 查看帮助
r
gdb --args <test-bin>
set args arg1 arg2 ...
show args
r
gdb <test-bin>
r arg1 arg2 ...
使用 gdb 运行 Python 程序
使用 gdb 启动程序
$ gdb python
...
(gdb) run <programname>.py <arguments>
or
$ gdb -ex r --args python <programname>.py <arguments>
attach 到已有程序
$ gdb python <pid of running process>
Debugging 进程
- 进程卡住时,直接调试
- 程序正常运行,按
Ctrl+C
中断进程
已知 Python hung
进程为 6325,调试如下:
$ gdb python 6325
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
Reading symbols from /usr/lib/debug/.build-id/b8/25858594a4d78f020c75d54a744ac644ed19f5.debug...
Attaching to program: /usr/bin/python3, process 6325
[New LWP 154764]
[New LWP 154766]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x2568670)
at ../sysdeps/nptl/futex-internal.h:320
320 ../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb)
C 调试
(gdb) bt
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x2568670)
at ../sysdeps/nptl/futex-internal.h:320
#1 do_futex_wait (sem=sem@entry=0x2568670, abstime=0x0, clockid=0) at sem_waitcommon.c:112
#2 0x00007f9de962c4e8 in __new_sem_wait_slow (sem=sem@entry=0x2568670, abstime=0x0, clockid=0)
at sem_waitcommon.c:184
...
--Type <RET> for more, q to quit, c to continue without paging--
- list 查看当前 C 应用程序上下文
- bt 查看当前 C 应用程序调用堆栈
- print 查看 C 变量
Python 调试
(gdb) py-bt
Traceback (most recent call first):
File "/usr/lib/python3.8/threading.py", line 302, in wait
waiter.acquire()
File "/usr/lib/python3.8/site-packages/paramiko/buffered_pipe.py", line 160, in read
self._cv.wait(timeout)
...
调试命令:
- py-list 查看当前 python 应用程序上下文
- py-bt 查看当前 python 应用程序调用堆栈
- py-bt-full 查看当前 python 应用程序调用堆栈,并且显示每个frame的详细情况
- py-print 查看 python 变量
- py-locals 查看当前的 scope 的变量
- py-up 查看上一个 frame
- py-down 查看下一个 frame
Python Hung 进程排查示例
如果一个进程出现挂起,它可能正在等待某个东西(锁、IO等),或者在某个地方处于繁忙的循环中。在任何一种情况下,附加到进程并获得回溯跟踪都会有所帮助。
如果进程处于繁忙循环中,您可能想要继续执行一段时间(使用cont命令),然后再次中断(Ctrl+C)并打开堆栈跟踪。
如果挂起发生在某个线程中,下面的命令可能很方便:
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f9de9470740 (LWP 6325) "python3" futex_abstimed_wait_cancelable (private=0, abstime=0x0,
clockid=0, expected=0, futex_word=0x2568670) at ../sysdeps/nptl/futex-internal.h:320
2 Thread 0x7f9de450c700 (LWP 154764) "python3" 0x00007f9de9751aff in __GI___poll (fds=fds@entry=0x7f9de450b358,
nfds=nfds@entry=1, timeout=timeout@entry=100) at ../sysdeps/unix/sysv/linux/poll.c:29
3 Thread 0x7f9de350a700 (LWP 154766) "python3" 0x00007f9de9751aff in __GI___poll (fds=fds@entry=0x7f9de3509358,
nfds=nfds@entry=1, timeout=timeout@entry=100) at ../sysdeps/unix/sysv/linux/poll.c:29
(gdb)
当前线程标记为*
。要查看它在 Python
代码中的位置,请使用py-list
:
(gdb) py-list
297 self._waiters.append(waiter)
298 saved_state = self._release_save()
299 gotit = False
300 try: # restore state no matter what (e.g., KeyboardInterrupt)
301 if timeout is None:
>302 waiter.acquire()
303 gotit = True
304 else:
305 if timeout > 0:
306 gotit = waiter.acquire(True, timeout)
307 else:
(gdb)
要查看所有线程的Python
代码位置,请使用:
(gdb) thread apply all py-list
Thread 3 (Thread 0x7f9de350a700 (LWP 154766)):
296 while n > 0:
297 got_timeout = False
298 if self.handshake_timed_out():
299 raise EOFError()
300 try:
>301 x = self.__socket.recv(n)
302 if len(x) == 0:
303 raise EOFError()
304 out += x
305 n -= len(x)
306 except socket.timeout:
Thread 2 (Thread 0x7f9de450c700 (LWP 154764)):
296 while n > 0:
297 got_timeout = False
298 if self.handshake_timed_out():
299 raise EOFError()
300 try:
>301 x = self.__socket.recv(n)
302 if len(x) == 0:
303 raise EOFError()
304 out += x
305 n -= len(x)
306 except socket.timeout:
--Type <RET> for more, q to quit, c to continue without paging--
Thread 1 (Thread 0x7f9de9470740 (LWP 6325)):
297 self._waiters.append(waiter)
298 saved_state = self._release_save()
299 gotit = False
300 try: # restore state no matter what (e.g., KeyboardInterrupt)
301 if timeout is None:
>302 waiter.acquire()
303 gotit = True
304 else:
305 if timeout > 0:
306 gotit = waiter.acquire(True, timeout)
307 else:
该问题是由于 Python 多进程程序在 threading
waiter.acquire()
中概率地阻塞,参考:https://github.com/paramiko/paramiko/issues/515
扩展