Remote Debugging EMAC OE SDK Projects with gdbserver

From wiki.emacinc.com
Revision as of 17:39, 10 May 2013 by Mcoleman (talk | contribs)
Jump to: navigation, search
Table 1: Conventions
target_program The name of the application being debugged. This is the result of the Makefile build process.
target_machine Connection information for the target machine. This can either be a serial port (ie. /dev/ttyS2) or a TCP connection in the form of HOST:PORT.
/path/to/sdk/ Represents the development system path to the EMAC OE SDK.

Sometimes a program has no technical errors that cause the compile to fail, but fails to meet the developer's expectations when run. This is typically due to algorithm or data structure design errors which can be difficult to find with just visual inspection of the code. Because of this, it can be beneficial to run a debugger targeting the binary resulting from the compile process. Debugging is the process of watching what is going on inside of another program while it is running. When a program is compiled with debug symbols included in the binary, it is possible to observe the source code and corresponding assembly while running the debugger.

When working with embedded systems the binary is usually compiled on a development machine with a different CPU architecture than what is on the target machine. This can be a problem when, as is typically the case, the target machine lacks the system resources to run a debugger. In these cases, it is possible to use the GNU debugger, or GDB, on the development machine to remotely debug the target machine provided it has a program called gdbserver. All EMAC OE builds are packaged with gdbserver to simplify the setup process for developers.

This guide is intended to build a basic understanding of how to use gdbserver with EMAC products. It is not intended as a general guide to debugging computer programs. For help with that, see the GDB man pages on the development system or read [this manual] on debugging with GDB.

Setup

Using gdbserver involves setting up both the target machine and the development machine. This requires that the binary application be present on both development and target machines. The development machine copy of the application must be compiled with debug flags whereas this is not strictly necessary for the target machine. See the [Optional global.properties Modifications Section] on the New EMAC OE SDK Project Guide for more information. See the [[[EMAC OE Getting Started Guide]]] for more information on how to connect to the target EMAC product using a serial port or Ethernet connection.

Target Machine

Because EMAC OE builds are distributed with gdbserver, installation is not a concern. The only setup necessary is to run gdbserver with target_program:

  1. If the target application is already running, use the attachpid option to connect gdbserver to the application as shown below. The PID argument can be determined using pidof.
developer@ldc:~$ pidof target_program
developer@ldc:~$ gdbserver target_machine --attach PID
  1. If the target application is not already running, the name of the binary may be included as an argument to the gdbserver program call.

<snytaxhighlight lang="bash"> developer@ldc:~$ gdbserver target_machine target_program [ARGS] </syntaxhighlight>

This establishes a gdbserver port on the target machine that listens for incoming connections from GDB on the development machine. In debug terminology, gdbserver is “attached” to the process ID of the program being debugged. In reality, though, GDB is attached to the process ID of a proxy which passes the messages to and from the remote device under test.

The next step is to run GDB on the development machine using the target_program/

Development Machine

  1. First, cd to the directory where the targe executable is stored.
  2. Run the EMAC OE SDK GDB:
developer@ldc:~$ /path/to/sdk/EMAC-OE-arm-linux-gnueabi-SDK_4.0/gcc-4.2.4-arm-linux-gnueabi/bin/arm-linux-gnueabi-gdb target_program
  1. Run the following commands in GDB to prepare for the debug session:
(gdb) target remote target_machine

Sample GDB Session

This example GDB session uses the EMAC OE SDK example project named pthread_demo. It consists of the single source file pthread_demo.c. The program is called with a single integer argument indicating how many reader threads the user wishes to create. The following describes the tasks of the main thread:

  1. The main thread performs user input validation. It prints a usage message according to the argument passed to it on the command line. The function expects the user to pass a number indicating how many threads should be spawned.
  2. The main thread initiates a new thread which uses the generator() function to perform the following tasks:
    1. Checks to see if the number of reader threads matches the number of times a reader thread has acquired the mutex lock and performed its task. If the two values do match, then the generator thread unlocks the mutex, breaks out of the while loop and moves on to line 167 to gracefully exit. If the two values do not match, then the generator thread continues through the rest of the while loop described in steps 2.2 and 2.3.
    2. Generates random data to be stored in the data struct shared by all the threads. To do this, it protects the data struct with the use of a mutex variable.
    3. Sleeps after giving up its lock on the mutex so that another thread might have a chance to acquire the lock.
  3. After creating the generator thread the main thread iteratively creates as many reader threads as indicated by the single integer argument. Each reader thread performs the following tasks:
    1. Waits for a chance to acquire the mutex lock. Once the mutex lock is acquired, it prints the value of the random number generated by the generator thread in its last run.
    2. Increments an integer in the data struct to indicate that it has completed its task.
    3. Gives up its lock on the mutex and exits.
  4. After creating the prescribed number of reader threads, the main thread then waits for each thread created to exit gracefully.
  5. The main thread exists.

The SDK version of pthread_demo.c works according to the description above with a MAX_THREAD value of 100. However, for the purpose of this example debug session it is instructive to use a faulty version of the same program. Replace lines 75-80 in pthread_demo.c with the code snippet shown in Listing 1 below.

if ((data.num_threads < 1) || (data.num_threads < MAX_THREAD)) {
        fprintf(stderr,
                "The number of thread should between 1 and %d\n",
                MAX_THREAD);
        exit(EXIT_FAILURE);
}

Useful GDB Commands

The following is a brief description of some essential GDB commands. Each description is followed by a link to the official GDB documentation page that has more specific information about what the command does and how to use it. Please note that the official GDB documentation is targeted for the latest GDB release which at the time of writing this documentation is 7.4. The version of GDB that EMAC distributes with the OE products, however, is version 6.8. Because of this, the links to documentation below may provide slightly different information. The biggest difference between the two version of GDB, however, is in the support for debugging programs with multiple threads. This is reflected in the documentation as well. Because of this, EMAC has set up ftp access to GDB 6.8 documentation on its web server. It is highly recommended that the GDB 6.8 documentation be referenced in cases where the program does not seem to support commands or options specified in the current official documentation.

Command Description
start/run These commands are used to start the debugged program with the only difference being that start automatically pauses execution at the beginning of the program's main function whereas run must be told explicitly where to pause using the breakpoint command listed below.

See also [Debugging with GDB, Section 4.2: Starting your Program]

kill Used to kill the currently-running instance of target_program.

See also [Debugging with GDB, Section 4.9: Killing the Child Process]

print Used to print the value of an expression.

See also [Debugging with GDB, Section 10: Examining Data]

list List contents of function or specified line.

See also [Debugging with GDB, Section 9: Examining Source Files]

layout This is a TUI (Text User Interface) command that enables the programmer to view multiple debug views at once including source code, assembly, and registers.

See also [Debugging with GDB, Section 25.4: TUI Commands]

disassemble This command allows the programmer to see assembler instructions.

See also [Debugging with GDB, Section 9.6: Source and Machine Code]

break This command specifies a function name, line number, or instruction at which GDB is to pause execution.

See also [Debugging with GDB, Section 5.1: Breakpoints]

next/nexti, step/stepi Allow the programmer to step through a program without specifying breakpoints. The next/nexti commands step over function calls, stopping on the next line of the same stack frame; step/stepi, step into function calls, stopping on the first line in the next stack frame. The difference between step/next and stepi/nexti is that the i indicates instruction-by-instruction stepping at the assembly language level.

See also [Debugging with GDB, Section 5.2: Continuing and Stepping]

continue Used to continue program execution from the address where it was last stopped.

See the Debugging with GDB link for next/step for more information about the continue command.

bt Short for "backtrace," which displays to the programmer a brief summary of execution up to the current point in the program. This is useful because it shows a nested list of stack frames starting with the current one.

See also [Debugging with GDB, Section 8.2: Backtrace]

quit This will quit the debugging session, and return you to the shell. The Control-D key combination is another way to accomplish this.

Session Walk-through

This debug session walk-through assumes that the program has been compiled using the modified source code above and that both the target machine and the development machine have been set up according to the above Setup section. The walk-through is divided into multiple “lessons” with the intent of first introducing the use of the commands described above and then actually running GDB to debug a known programming problem. Each lesson may be run independently of the others, but it is recommended that each be run in order starting from Lesson 1 for the first time through.

Lesson 1: Navigation and Code Display

This lesson assumes that gdbserver has been run as in the [Target Machine Setup] section above with an ARG value of 3. Other values are fine so long as they fall within the range of 1 to 100. The number '3' was arbitrarily chosen to avoid having to use a symbolic variable in the explanations below.

  1. Type b main to set a breakpoint at the main function in the source code.
  2. Type continue. This will cause the program to continue from the breakpoint set by GDB at startup. The program was passed an argument of 3, indicating that three threads should be created.
  3. Type b 73 to set a breakpoint at line 73 in the source code, which should be the line containing data.num_threads = atoi(argv[1]);
  4. Type continue. The program will continue execution up until line 73 in the source code. At this point, type layout split to view a split screen containing both the source code and the assembly-level machine instructions. Both screens show the program's current location in execution. The assembly-level display shows what the target's processor is actually executing at that point in the source code as shown in the source-level display. To view either of these without the other type layout asm for just assembly-level and layout src for just source-level.
  5. Type nexti. This will cause the program to execute the next instruction in the current stack frame which is a mov instruction beginning to prepare the current stack for a call to the library function atoi(). The details of this process are beyond the scope of this tutorial; essentially, the program needs to store information about the current execution location in the stack for when the atoi() function finishes. Type ni (alias for nexti) three more times. You should end up on a bl instruction in the assembly view as shown in Listing 2 below. The source layout should still show the program on line 73.
B+ |0x887c <main+112>       ldr    r3, [r11, #-84]                   │
   |0x8880 <main+116>       add    r3, r3, #4      ; 0x4             │
   |0x8884 <main+120>       ldr    r3, [r3]                          
   |0x8888 <main+124>       mov    r0, r3                            
  >|0x888c <main+128>       bl     0x86e0 <atoi>                     

Listing 2. GDB Assembly Layout - Note that the assembly may look different depending on the target architecture.

  1. Type stepi. This will cause the program to move into the next stack frame and GDB to show the assembly-level instructions of the atoi() call. Since the library containing atoi() was likely not compiled with debug symbols, the source-level layout will show the message [ No Source Available ].
  2. Type bt. This will cause the program to display a human-readable version of the current stack. Each stack “frame” is represented by the name of the function call it represents with that function's location in memory. Type bt full to get a list of the variables local to each stack frame.
  3. Type finish. This will cause the current stack frame to return and execution to pause on the next instruction of the previous stack frame.
  4. Type kill. This will cause the current process to be killed by gdbserver at the target machine. gdbserver will also terminate at this point. In order to start a new remote debug session, start gdbserver as described in the Target Machine Setup section and re-run step 3 of the [Development Machine Setup] section.

Lesson 2: Finding the Bug

Though this sample is contrived, it is still useful to demonstrate how to find a design mistake in an otherwise well-written (no errors or warnings) program. These types of mistakes typically have to do with the array boundary miscalculations, logic and comparison operator mistakes, or other simple mistakes. For the sake of demonstration, assume that the actual mistake is unknown. This lesson assumes that gdbserver has just been started as in the [Target Machine] Setup above with an ARG value of 5.

  1. Before starting the program in the debugger again, run it by itself on the target machine to see what the actual program output is:
root@emac-oe:~# /tmp/pthread_demo 5
The number of threads should be between 1 and 100

The program was given an input of '5' yet the output message seems to indicate that this is out of range which is obviously not true.

  1. Start the debugger again and connect to the target machine as described in the Setup section.
  2. Type b main to set a breakpoint at the main function in the source code.
  3. Type continue. This will cause the program to continue from the breakpoint set by GDB at startup.
  4. Type n. This will cause the program to step over the next line of source code. The reason for using n rather than s or one of the instruction stepping commands is because the erroneous output indicates that the coding mistake is in the programmer's source code rather than the c library functions atoi() or fprintf(). Stepping over the function will save all the time required to step through every detail of what the library functions are doing. Later passes through the code can be used to step into functions called from within that stack frame if the first pass proves unsuccessful.
  5. Continue to type n until one of the program's exit() calls is reached, but do not actually step into that exit() call. Judging by the program's output above, this should bring you to the conditional block that checks the value of the local variable n used to store the output of atoi() as shown in Listing 3. Note that once execution reaches line 79 of the source code, GDB will display the output of the fprintf() function from line 76. This may cause display problems within the text-based UI library that GDB uses which will require the command refresh to fix.
B+ |75              if ((data.num_threads < 1) || (data.num_threads < MAX_THREAD)) {       |
   |76                      fprintf(stderr,                                                |
   |77                              "The number of thread should between 1 and %d\n",      |
   |78                              MAX_THREAD);                                           |
  >|79                      exit(EXIT_FAILURE);                                            |
   |80              }                                                                      |
  1. Type p/d data→num_threads. p is an alias for print, /d tells GDB to treat the expression requested as an integer in signed decimal, and data→num_threads is the element num_threads within struct thread_data. This should provide the following output:
(gdb) p/d data->num_threads
$6 = 5

Note that the integer part of $6 will increment with each call to the gdb command print. The above output confirms that the argument '5' was successfully passed to the program and read into a variable to be tested, indicating that one of the logical tests for the current conditional block contains a mistake. This merits a closer look at line 75:

B+ |75              if ((data.num_threads < 1) || (data.num_threads < MAX_THREAD)) {       |

Line 75 consists of a conditional test which is the logical OR of two arithmetic tests involving the values of data.num_threads, '1', and MAX_THREAD. The first test is true the input integer is less than 1–(data.num_threads < 1). The second tests whether the input integer is less than the symbolic constant, MAX_THREAD–(data.num_threads < MAX_THREAD). Judging by the name of this constant and the result of the test (we know it resolves to true because the value of data.num_threads in this case is not less than one), we can see that the comparison operator used is the culprit. The correct interpretation is that it should be '>' rather than '<'.

  1. Type kill.

This was a simple problem to solve but the method used above could apply in any situation where source code compiles and runs without errors yet provides varied or unexpected output.

  1. test
  2. test 2
  3. test 3

test 3 test 3 test 3 test 3

  1. test 4
  2. test 5