April 16, 2008

Tools and the Debug Cycle  Comments 

Filed under: Engineering Tools — admin @ 1:44 pm

The efficiency of a development or debug tool needs to be looked at with the perspective on how it effects the debug cycle.  Put another way the real goal of a compiler is to make the user more productive in building a new product, not just running more compiles.   

To analyze this real efficiency you must look at the overall debug cycle.  The cycle has three components: 1) the debug trial, 2) analyzing the results, and 3) modifying the build for the next trial.


The debug trial 

This is the time that measurements are being taken.  The trial needs to be run until a problem or issue that needs to be fixed occurs.  Sometimes productive execution can continue beyond the finding of the first issue.  Other times the error has an effect on the future execution and renders any continued execution useless.  As a project progresses the length of trials will on average increase as the equipment under development can run longer and longer without a failure.

Analyzing the results

This is the time to analyze the measurements that were taken during the trial.  This portion of the debug cycle is dominated by the think time as the engineer attempts to decipher the results to determine what caused the behavior.  The format and level of the information is critical to making it easy to pinpoint the issues.  In networking development, protocol analyzers translate the millions of bits that have flown by into symbolically decoded packets to greatly ease the analysis. 

Modifying the build for the next trial 

This involves running the build tool, loading the new build into the target design, and setting up the rest of the test configuration for the trial.  The build tool will vary depending upon the kind of engineer and the portion of the design that is being debugged.  Build tools include assemblers, compilers, hardware synthesis, and FPGA synthesis, among others.  The loading of the build into the target design may just involve downloading to the target over a cable of network connection.  It may also involve burning the image into an EPROM or other memory device, or multiple copies for multiple devices.  Lastly there is the setting up other configuration variables of the target system.  This can involve physical positioning, resetting of mechanical components, and re-initializing equipment to a known state.


The time to accomplish the build can vary widely from a few minutes to many hours depending on the complexity.  Typically the times are between 15 minutes and one hour.  Also this time tends to increase as a project proceeds, because the size of the design increases, and the test configuration increases in complexity.

Changing Bottlenecks 

Early in my career as a programmer there was controversy caused by how expensive computer time was.  The computer had limited access and compiles were typically batched together for maximum efficiency.  Some thought that programmers did not need more compute power, they just needed to do more analysis before they submitted another compile.  Today with the low cost of compute power and the expense of engineering time, it seems silly to focus on computer efficiency, instead of engineering productivity.


I once worked on a project to improve the performance of a software linker.  The link times were taking over one hour of CPU time.  With multiple engineers time sharing the minicomputer this was limiting the engineers to one or two debug trials per day.  With the project we were able to reduce the CPU time required for linking by 30 times.  Suddenly the CPU was no longer the bottleneck.  They could then make between 6 to 8 debug trials per day.  The time for the engineers to analyze the results and access to shared resources now dominated.


The early hardware acceleration products in the EDA industry could run a design simulation a thousand times faster than the design simulation run on a workstation.  However, the generation of the netlist to load in to the accelerator, became the bottleneck to productivity.  It could take hours to generate the netlist for a simulation run that would take only minutes.  This severely limited the usefulness of the accelerator.  Understandably as acceleration products evolved the generation of the netlist also was accelerated.

Shared Resources 

Once the tool bottleneck has been reduced or eliminated, there are often shared resources that can end up becoming a bottleneck.  These shared resources can include the lab equipment configuration, a high-performance piece of test equipment, access to a Faraday cage, and the target design hardware.  The allocation of the shared resources may need to be scheduled, and many engineers will work off hours to increase access.

Bust Trials 

Between 10% and 25% of trials are busts.  By this I mean that no valuable feedback to the design is achieved.  The causes for this can include a simple error in the design logic, improperly configured equipment, or outside interference.  Often the reason for the failure is discovered before any data is transmitted.  The equipment may not even be able to initialize.  This causes a quick desperate survey of all the potential culprits.  Sometimes the issue causes a complete bust and a re-build of the design is required.  Other times a partial bust is caused and the trial can be re-run from the beginning without a re-build being required.


An additional problem is that these bust trials often still take up the time of the shared resource.  Sometimes there is another engineer that has been waiting in the wings.  However, if there is not another engineer ready, the resource sits idle for some period of time.


Given the above, here are a few recommendations:


One, monitor the number of debug trials per day that engineers are able to make, and make sure that they are not being tool bound. 


Two, maximize the information gathered per trial after it has gotten past the bust issues.  This can be accomplished with debug tools that support interactive queries during the debug trial (see post “Basic Debug Tools and beyond?, http://www.chipdesignmag.com/denker/?p=28).


Three, look for features that help avoid additional trials.   In-place editing can avoid having to make another build, and supports exploration that goes beyond what was anticipated before the start of the trial.  Also it is important that the information is presented at the right level (see post “Being on the Level?, http://www.chipdesignmag.com/denker/?p=29). If the engineer always has to stop to analyze, they will be bumped off the shared resource to make way for the next engineer.


Rick Denker

Packet Plus