Linux GDB 调试

发布时间: 更新时间: 总字数:1180 阅读时间:3m 作者: IP上海 分享 网址

GDB(GNU Debugger)GNU 项目的调试器,允许你在另一个程序执行时看到它内部发生了什么,或者另一个程序在崩溃时正在做什么。本文以 Python多线程waiter.acquire() 中概率地阻塞场景的 GDB

GDB支持的语言

  • Ada
  • Assembly
  • C
  • C++
  • D
  • Fortran
  • Go
  • Objective-C
  • OpenCL
  • Python
  • Modula-2
  • Pascal
  • Rust

安装

  • Ubuntu
sudo apt install gdb python3.8-dbg -y
  • CentOS
sudo yum install gdb python-debuginfo

or

sudo yum install yum-utils
sudo debuginfo-install glibc
sudo yum install gdb python-debuginfo

使用

gdb –args

  • 方式一
gdb --args <test-bin> arg1 arg2 ...
# 运行,使用 help 查看帮助
r
  • 方式二
gdb --args <test-bin>
set args  arg1 arg2 ...
show args
r
  • 方式三
gdb <test-bin>
r arg1 arg2 ...

使用 gdb 运行 Python 程序

使用 gdb 启动程序

$ gdb python
...
(gdb) run <programname>.py <arguments>

or

$ gdb -ex r --args python <programname>.py <arguments>

attach 到已有程序

$ gdb python <pid of running process>

Debugging 进程

  • 进程卡住时,直接调试
  • 程序正常运行,按 Ctrl+C 中断进程

已知 Python hung 进程为 6325,调试如下:

$ gdb python 6325
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
Reading symbols from /usr/lib/debug/.build-id/b8/25858594a4d78f020c75d54a744ac644ed19f5.debug...
Attaching to program: /usr/bin/python3, process 6325
[New LWP 154764]
[New LWP 154766]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x2568670)
    at ../sysdeps/nptl/futex-internal.h:320
320	../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb)

C 调试

(gdb) bt
#0  futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x2568670)
    at ../sysdeps/nptl/futex-internal.h:320
#1  do_futex_wait (sem=sem@entry=0x2568670, abstime=0x0, clockid=0) at sem_waitcommon.c:112
#2  0x00007f9de962c4e8 in __new_sem_wait_slow (sem=sem@entry=0x2568670, abstime=0x0, clockid=0)
    at sem_waitcommon.c:184
...
--Type <RET> for more, q to quit, c to continue without paging--
  • list 查看当前 C 应用程序上下文
  • bt 查看当前 C 应用程序调用堆栈
  • print 查看 C 变量

Python 调试

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib/python3.8/threading.py", line 302, in wait
    waiter.acquire()
  File "/usr/lib/python3.8/site-packages/paramiko/buffered_pipe.py", line 160, in read
    self._cv.wait(timeout)
...

调试命令:

  • py-list 查看当前 python 应用程序上下文
  • py-bt 查看当前 python 应用程序调用堆栈
  • py-bt-full 查看当前 python 应用程序调用堆栈,并且显示每个frame的详细情况
  • py-print 查看 python 变量
  • py-locals 查看当前的 scope 的变量
  • py-up 查看上一个 frame
  • py-down 查看下一个 frame

Python Hung 进程排查示例

如果一个进程出现挂起,它可能正在等待某个东西(锁、IO等),或者在某个地方处于繁忙的循环中。在任何一种情况下,附加到进程并获得回溯跟踪都会有所帮助。

如果进程处于繁忙循环中,您可能想要继续执行一段时间(使用cont命令),然后再次中断(Ctrl+C)并打开堆栈跟踪。

如果挂起发生在某个线程中,下面的命令可能很方便:

(gdb) info threads
  Id   Target Id                                    Frame
* 1    Thread 0x7f9de9470740 (LWP 6325) "python3"   futex_abstimed_wait_cancelable (private=0, abstime=0x0,
    clockid=0, expected=0, futex_word=0x2568670) at ../sysdeps/nptl/futex-internal.h:320
  2    Thread 0x7f9de450c700 (LWP 154764) "python3" 0x00007f9de9751aff in __GI___poll (fds=fds@entry=0x7f9de450b358,
    nfds=nfds@entry=1, timeout=timeout@entry=100) at ../sysdeps/unix/sysv/linux/poll.c:29
  3    Thread 0x7f9de350a700 (LWP 154766) "python3" 0x00007f9de9751aff in __GI___poll (fds=fds@entry=0x7f9de3509358,
    nfds=nfds@entry=1, timeout=timeout@entry=100) at ../sysdeps/unix/sysv/linux/poll.c:29
(gdb)

当前线程标记为*。要查看它在 Python代码中的位置,请使用py-list

(gdb) py-list
 297            self._waiters.append(waiter)
 298            saved_state = self._release_save()
 299            gotit = False
 300            try:    # restore state no matter what (e.g., KeyboardInterrupt)
 301                if timeout is None:
>302                    waiter.acquire()
 303                    gotit = True
 304                else:
 305                    if timeout > 0:
 306                        gotit = waiter.acquire(True, timeout)
 307                    else:
(gdb)

要查看所有线程的Python代码位置,请使用:

(gdb) thread apply all py-list

Thread 3 (Thread 0x7f9de350a700 (LWP 154766)):
 296            while n > 0:
 297                got_timeout = False
 298                if self.handshake_timed_out():
 299                    raise EOFError()
 300                try:
>301                    x = self.__socket.recv(n)
 302                    if len(x) == 0:
 303                        raise EOFError()
 304                    out += x
 305                    n -= len(x)
 306                except socket.timeout:

Thread 2 (Thread 0x7f9de450c700 (LWP 154764)):
 296            while n > 0:
 297                got_timeout = False
 298                if self.handshake_timed_out():
 299                    raise EOFError()
 300                try:
>301                    x = self.__socket.recv(n)
 302                    if len(x) == 0:
 303                        raise EOFError()
 304                    out += x
 305                    n -= len(x)
 306                except socket.timeout:
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 (Thread 0x7f9de9470740 (LWP 6325)):
 297            self._waiters.append(waiter)
 298            saved_state = self._release_save()
 299            gotit = False
 300            try:    # restore state no matter what (e.g., KeyboardInterrupt)
 301                if timeout is None:
>302                    waiter.acquire()
 303                    gotit = True
 304                else:
 305                    if timeout > 0:
 306                        gotit = waiter.acquire(True, timeout)
 307                    else:

该问题是由于 Python 多进程程序在 threading waiter.acquire() 中概率地阻塞,参考:https://github.com/paramiko/paramiko/issues/515

扩展

  • gdbgui 基于浏览器的 gdb(GNU 调试器)前端

参考

  1. https://www.gnu.org/software/gdb/
  2. https://wiki.python.org/moin/DebuggingWithGdb
Home Archives Categories Tags Statistics
本文总阅读量 次 本站总访问量 次 本站总访客数