Tag Archives: panic

Test, 1, 2, 3, test, test

Everybody know these words. You say it when you have to check a microphone which were plugged into some sound system. It’s just test if the microphone is working well and the spoken words are reflected by the amplifier. It’s a simple test to check the functionality of the microphone or the whole sound system. Why should I write about sound systems? No, I don’t try to learn playing a guitar or any other instrument, I like to share some thoughts about software testing. After writing a post about Valgrind, I had the feeling that I should try to show some ways of testing software. In the following post I will analyze what can be tested, what is useful and which tools are available. I don’t aim to make a full overview of software testing itself, I just like to highlight some tools (or thoughts) which are useful and every developer should know of.

Why should I test my software

This is easy to answer. Humans write software and humans tend to make mistakes. Even if something is working perfect, a bug fix or new feature can break it. Testing software will not prevent you from bugs, but it can boost your user satisfaction a lot, by removing errors which are common and avoidable.

What can be tested

First of all, you should think about what should be tested at all. The following list tries to categorizes different tests from a technical perspective:

  1. Static tests
  2. Compile time tests
  3. Runtime test

Also we have to differ between automatic tests and manual tests done by an operator. In the following I will prioritize the automatic tests, simply because once they are setup, they are easier to handle. Manuel tests means, setting up a test plan, making someone responsible for checking the test plan and forcing the test to be done before a release. On the other side automatic tests could be done by a machine at day and night with every software level your application is currently in.

Static tests

Static tests analyze the code base and try to predict error paths. There are several applications on the market, but most of them are not free. First you need a product which speaks your language. If you develop in C, C++, Java, C# or whatever makes a huge difference here. There are many static analyzer out there which can analyze C, but not many can successfully parse any of the widely used higher programming languages. One of them, which can analyze C++, is Fortify. Fortify uses rules to find different errors a developer can do. There are of course default rules, but a developer can define rules himself. This product is specialized in detecting errors which may end in security holes. One such error in terms of security considerations is file input handling. When one is reading a string directly out of a file, some assumptions are done. Strings in the C/C++ world are zero terminated. All string handling/manipulation function, which are available e.g. from the glibc, are working over the single characters as long as they didn’t find the termination byte. Now, if a developer is reading a string out of a file and doesn’t check if the terminating zero is there, he open a security hole. If an attacker is using this knowledge he is able to provoke a buffer overflow. Fortify can detect such blindly faith to input data. Of course it can detect much more. On the down side, as most software projects are really complex, static code analyzer tend to produce many false/positive. Working through this false/positive and defining filters for the own software project needs much time. On the other side these tools help in finding error paths, which even runtime tests doesn’t show, cause they are so unusual. But exactly these errors are used by attackers to abuse your software for bad things.

Compile time tests

Compile tests make sure your application compiles on different architectures, bit depth, or with other resources like external dependencies (e.g. depending third-party libraries). They are important when you develop for and/or if your team is using different platforms. They make sure a software is at least compiling on every important system. This doesn’t mean the current status works well, it just makes sure every developer involved in the progress of the product is able to build an actual version, even if he is working on a different part of the software itself. Such a situation happens often in open source products, like Mozilla, which invented tinderbox for making these tests. It allows you to set up different build environments for your software and watch the result of every check-in you made in terms of compatibility to this environment. The following shows a screenshot of the Firefox-Ports tinderbox with one broken build box. As everybody know, red is bad:

Red means a burning tree. It shows you that your last check-in doesn’t compile on a specific system. The cool thing is that you can look at the log and spot the error with ease. You know why it doesn’t compile and can fix it. This will prevent you from being blamed by other team members because they can’t work with the current trunk.

Runtime tests

Runtime tests are of course the biggest part of testing. They try to make sure the application does the job it was written for. This also means these tests are the hardest one. It may help to try to separate them as well. I use the following separation:

  1. Unit tests
  2. Dynamic analyzer
  3. GUI tests

Unit tests

Unit tests check that the specialized functionality a unit provide is working correctly. A well-known example is string handling. String handling is necessary in almost all software projects out there. May it printing the help section of a command line tool or the information presentation in an GUI application. A unit test tries to test anything a user of this unit may do with it. This could be adding a string to another, replacing parts of a string, converting a string to other formats (like to another encoding or to an integer) or much more complex operations like searching within a string with a regular expression. All these operations could be relatively easy tested with unit tests. There is no general usable software out there which could be used for unit testing, but most software frameworks offer some help to write unit tests. Qt has a unit test framework and VirtualBox has a integrated system, as well. Here is a screenshot from a just announced graphical tool by Nokia, which makes spotting errors in Qt unit tests much more easier. The second screenshot shows the output of a VirtualBox unit test.

Please also note that unit testing not just means testing functionality. It could also mean testing performance regression. If some function needed an amount of time in an existing version, it should be at least equal or faster in means of execution time in a new version. Unit tests are evolutionary like the software itself. It is illusionary to believe a unit test is finished sometime. The test will/should grow as long as users will report errors. Often a unit test is called by the bugtracker id, which showed some specific problem the first time.

Dynamic analyzer

This field of testing is a little bit different from the other one, cause the developer needs to interact here. Although it would be possible to automate some of these tests, they are more designed to be done by the developer from time to time (usually after a feature is completed) to check for some basic errors. The most common dynamic analyzer in the open source world is the Valgrind instrumentation framework. It has a tool for checking memory management errors, like I showed in this post. As already written there, memory leak errors can happen to everyone, so having a tool for finding and eliminating them, is very useful. On the other side there is a tool called Callgrind, which profiles a given software in detail. After profiling some software, a developer is able to see execution times relatively to each other. This makes it possible to find the places an algorithm spend the most time. With this information a developer has the chance to find bottlenecks, which maybe could be eliminated. Beside that, is the knowledge of the call graph sometimes from great information, especially if you develop graphical applications, like in Qt. Having events, slots or asynchrony window systems, like in X11, make it sometimes hard to predict how the code will flow. It may also be helpful in deciding if some subroutine is a candidate for parallel execution. The following screenshot shows the result of analyzing some parts of VirtualBox. To visualize the output of Callgrind (and to interactively work with this data), KCachegrind is the number one tool. The second screenshot shows some similar profiling with Instruments from Xcode.

GUI tests

GUI’s have the advantage to do several tasks in an independent order and users tend to do tasks in a way a developer don’t expect. Developer tends to do tasks in the way they have programmed it. On the other side GUI testing tools do only things they are programmed for. To be honest, we don’t do any automatic GUI testing in VirtualBox. It’s such a wide field and doing it right is hard. Anyway, there are products out there which can be helpful. Qt itself have possibilities for GUI testing build in. Another product is from froglogic. Squish is specialized in GUI testing, especially for Qt applications. But of course there are other tools as well. Just have a look at your preferred build environment.

Conclusion

Testing produces work. It is also something developers don’t like. On the other side it helps to prevent further work by bypassing errors which are preventable. Everybody should ask himself how much time he will sacrifice in setting up test machines or unit tests to make a product much more stable. Errors in freshly released products which don’t happen in an older release are always bad marketing. If there is a chance to prevent them by just adding some lines of code, everyone should use this chance. This post doesn’t show all available tools for testing. For example, the big players like Apple and Microsoft have also testing and analyzing software build into there IDE’s. I just want to give a brief overview what is possible, so that everybody could check the own environment for possibilities of doing testing.

As a last word which touches my personal work environment: Please be assured that we are trying hard to make VirtualBox as stable and save as much as possible ;).

Getting the backtrace from a kernel panic

You may know the following situation. You arrive in the morning in the office, do what you always do and check out the latest changes of the software you are working on. After a little bit of compile time and the first coffee you start the just build application. Bumm, kernel panic. After rebooting and locking through the changes you may have an idea what the reason for this could be. A colleague of you is working on a fancy new feature which needed changes to a kernel module. As you almost know nothing about this code you seek for help and, as it of course not happen on his computer, he is asking for a backtrace of this panic. You have two problems now. First you need to see the panic yourself and second it would be nice to get a copy of the backtrace for sharing this info within a bugtracker. In the following post I will show how both aims could be easily archived.

Let the kernel manage the graphical modes

As most people are working under X11 they don’t see the output of an kernel panic. When a kernel panic happens the kernel prints the reason for the panic and a kernel backtrace to the console window and stops immediately its own execution. It is not written into a log file or somewhere else. In consequence you don’t have the ability to look into the panic text, cause the graphical mode is still on. Historically the mode settings are done by the graphic driver of the X11 system. So the kernel has no idea that or which graphic mode is currently in use. Fortunately the kernel hackers invented a new infrastructure which let the kernel do the mode switch. This subsystem is called Kernel-Mode-Settings (KMS). As the kernel do the mode settings, he can switch back to the console on a panic, regardless which graphical mode is currently configured. Beside this, KMS has other improvements like Fast User Switching or a flicker free switch between text and graphic mode. On the other side is this highly hardware dependent and even if it was introduced with version 2.6.28, not all today available hardware can make use of it. If you are an owner of an Intel graphic card you are in good shape. Radeon and NVidia cards have limited support through the in kernel drivers radeonhd and nouveau. For an Intel i915 card you need to enable the following kernel options:

CONFIG_DRM_I915=y
Location:
-> Device Drivers
-> Graphics support
-> Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) (DRM [=y])
-> Intel 830M, 845G, 852GM, 855GM, 865G ( [=y])

CONFIG_DRM_I915_KMS=y
Location:
-> Device Drivers
-> Graphics support
-> Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) (DRM [=y])
-> Intel 830M, 845G, 852GM, 855GM, 865G ( [=y])
-> i915 driver (DRM_I915 [=y])

The kernel line in your favorite boot loader needs the following additional parameter:

i915.modeset=1

X11 should have this minimal configuration for the device section:

Section "Device"
 Identifier    "i915"
 Driver        "intel"
 Option        "DRI"   "true"
EndSection

Please note that you need of course some recent kernel, X11 version and Intel X11 driver to make this work. After a compile, install and boot of the new kernel, KMS should be in use. You will notice it, cause the boot messages will be printed in a much higher graphical resolution, than the usual text mode provide. The next time a kernel panic occurs, the kernel will switch back to the console before the panic is printed. This allows you to see the info printed and maybe you get a useful hint for the reason of the panic.

Post the panic

If you can’t use KMS or don’t want transcribe the panic text by hand into the bugtracker, it would be nice if the text could be made available on another computer. Kernel hackers usual use the serial port for that. Unfortunately most modern computers doesn’t have such a serial port anymore. Also you need two hosts with a serial port and the setup is complex (you have to know about baud-rates, parity and stuff like this). But there is a simpler solution: netconsole. Netconsole is a kernel module, which sends kernel messages anywhere to the net using UDP. The setup is really simple. In the kernel configuration you need the following setting:

CONFIG_NETCONSOLE=m
Location:
-> Device Drivers
-> Network device support (NETDEVICES [=y])

I prefer to compile it as module, which allows me to turn it on only when I need it. Load it with the following command:

modprobe netconsole netconsole=@/,@192.168.220.10/

The ip has to be replaced by the one of your target computer. You can of course tune it much more, like setting source and target ports or even let netconsole send the text to more than one host. On your client you need a network tool which can read from a socket and print the read text to stdout. Netcat or nc are two tools which are able to do just that. The call for nc looks like the following:

nc -l -u 6666

Now if a kernel panic will happen you will see an output like this:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] rb_erase+0x15c/0x320
PGD 6942f067 PUD a1e4067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/md1/dev
CPU 3
Modules linked in: vboxnetadp vboxnetflt vboxdrv netconsole ...

Pid: 18887, comm: VirtualBox Tainted: G        W   2.6.36-gentoo #4 DG33TL/
RIP: 0010:[]  [] rb_erase+0x15c/0x320
RSP: 0018:ffff8800b430db58  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff880069557a68 RCX: 0000000000000001
RDX: ffff880069557a68 RSI: ffff880001d8ed58 RDI: 0000000000000000
RBP: ffff8800b430db68 R08: 0000000000000001 R09: 000000008edcb5d6
R10: 0000000000000000 R11: 0000000000000202 R12: ffff880001d8ed58
R13: 0000000000000000 R14: 000000000000ed00 R15: 0000000000000002
FS:  00007fffde457710(0000) GS:ffff880001d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000064f9000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process VirtualBox (pid: 18887, threadinfo ffff8800b430c000, task ffff880091e227f0)
Stack:
 ffff88000a03ba18 ffff880001d8ed48 ffff8800b430dba8 ffffffff8105bf06
<0> ffff8800b430dba8 ffffffff8105c97c ffff8800b430dbc8 ffff88000a03ba18
<0> 00004ff8a86ba455 ffff880001d8ed48 ffff8800b430dc48 ffffffff8105ce77
Call Trace:
 [] __remove_hrtimer+0x36/0xb0
 [] ? lock_hrtimer_base+0x2c/0x60
 [] __hrtimer_start_range_ns+0x2b7/0x3c0
 [] ? rtR0SemEventMultiLnxWait+0x250/0x3d0 [vboxdrv]
 [] ? RTLogLoggerExV+0x12f/0x180 [vboxdrv]
 [] hrtimer_start+0x13/0x20
 [] rtTimerLnxStartSubTimer+0x60/0x120 [vboxdrv]
 [] rtTimerLnxStartOnSpecificCpu+0x21/0x30 [vboxdrv]
 [] rtmpLinuxWrapper+0x23/0x30 [vboxdrv]
 [] RTMpOnSpecific+0x99/0xa0 [vboxdrv]
 [] ? rtTimerLnxStartOnSpecificCpu+0x0/0x30 [vboxdrv]
 [] RTTimerStart+0x2a6/0x2e0 [vboxdrv]
 [] ? g_abExecMemory+0x33665/0x180000 [vboxdrv]
 [] g_abExecMemory+0xc678/0x180000 [vboxdrv]
 [] g_abExecMemory+0x328d7/0x180000 [vboxdrv]
 [] supdrvIOCtlFast+0x6a/0x70 [vboxdrv]
 [] VBoxDrvLinuxIOCtl+0x47/0x1e0 [vboxdrv]
 [] ? pick_next_task_fair+0xde/0x150
 [] do_vfs_ioctl+0xa1/0x590
 [] ? sys_futex+0x76/0x170
 [] sys_ioctl+0x4a/0x80
 [] system_call_fastpath+0x16/0x1b
Code: 07 a8 01 75 9d eb 81 0f 1f 84 00 00 00 00 00 48 3b 78 10 0f 84 ...
RIP  [] rb_erase+0x15c/0x320
 RSP
CR2: 0000000000000000
---[ end trace 4eaa2a86a8e2da24 ]---

Normally only kernel panics are sent to the console. You can increase the verbosity level by executing dmesg -n 8 as root.

Conclusion

To continue with the story from the beginning: With the shown methods you can hope your colleague get enough information to find the reason for the kernel panic. To be more helpful, the next step would be to try to debug the problem yourself. Even if the KGDB was merged into the kernel in version 2.6.35, it is not really usable for me. The reason is that it seems kernel hackers usually have really old hardware which either has a serial port, a PS/2 keyboard or both. Otherwise I can’t find a reason why USB keyboards don’t work. I asked on the mailing list of KGDB about the status of USB keyboard support and I can only hope support will be integrated soon.