Zen and the Art of Electronics: May 2013

This post is specific to Verilog - I am certain everybody has seen something like the following code snippet:

`define DATA_WIDTH 16

module foobar (data);

input [`DATA_WIDTH-1:0] data;

...

The question that will be explored in this missive is whether or not this coding style is a Good Thing.

At the outset, allow me to tell you my position on the matter - in my opinion, using pre-processor directives (`define) for signal widths is an extremely bad idea and should never be done, with no exceptions.

I am certain this will appear counter-intuitive to pretty much everybody who writes modules in Verilog. In fact, I get plenty of rather angry reactions when I put forward this view. So let's explore the reasons why this coding style is a Bad Thing, shall we?

Reason 1: Code readability suffers. Usually people put all the `defines in one file, which is then `included in other RTL files. When you do a code review, unless you want to switch back and forth between the module file and the included file or you have exemplary memory, you'll find it annoying to read code snippets that assign some slice to some other slice such as:

fooreg[`FOO:`BAR] <= barreg[`XYZ:`PQR];

What could have been as simple as

fooreg[7:0] <= barreg[15:8];

is now something that requires quite some back-and-forth to review. You, the designer, may have no problem, but the reviewer definitely will. Imagine trying to review thousands of such lines - the fatigue alone could cause errors to slip through.

Reason 2: The wrong directive is used when multiple directives have the same value. If used incorrectly, inadvertent mistakes will arise when one directive is given a new value but the other isn't.

`define ADATA_WIDTH 16
`define BDATA_WIDTH 16

input [`BDATA_WIDTH-1:0] adata; // `ADATA_WIDTH should have been
used instead

will work fine until BDATA_WIDTH is re-defined for a new version of the chip, but ADATA_WIDTH stays the same. This is when things will fail subtly in simulations, and it will be monstrously hard to track down the error to an incorrectly used directive. This is simply because `define is a pre-processor directive - which means the simulator doesn't know anything about it - the Verilog pre-processor has already performed a string replacement.

Reason 3: The directive is defined but not used everywhere. Suppose DATA_WIDTH is defined and does exist, but somehow someone missed using it:

`define DATA_WIDTH 16

data[15:0] <= some_value[31:16];

Should DATA_WIDTH be redefined in the future, it will have absolutely no effect on the code - but everyone will expect it to. This will lead to subtle, difficult to track down bugs.

Reason 4: IP integration may have issues with multiply defined names. If you are integrating a third party IP block into your ASIC, you may well find that some directive in the IP conflicts with yours. Sadly, everybody likes using names like "DATA_WIDTH".

The nature of Verilog pre-processor directives is that the last defined value is what is held. Thus if you had:

`define DATA_WIDTH 16
`define DATA_WIDTH 32

the final value of DATA_WIDTH is going to be 32.

Suppose you had the first directive in a file called "a.v" and the second in "b.v"; then the file reading order will determine what the value of DATA_WIDTH is for that simulation run.

Thus, "verilog a.v b.v" will give different results from "verilog b.v a.v". Naturally, this situation is only compounded when an ASIC integrates multiple IPs from multiple vendors with multiple files containing directives.

You may argue that names could always be made distinct; but if you are integrating code from independent IP vendors this is not always going to be possible a priori. Thus you will have an additional time consuming step in your process, without which there is the possibility of chaos.

Further, making names unique will require you to add prefixes that identify which IP or module the name came from, making names even longer and less readable - such as `FOOCHIP_XIP_PCIE_WR_DATA_WIDTH. See Reason 1 - either the names are short but cryptic, or descriptive but long. Either way, code readability will suffer.

Reason 5: Multi-chip simulations may have issues with multiply defined names. I know, people always tell me - how often does this situation happen? I say you should be doing multi-chip sims as often as possible.

I can't recall how many times I've been told in the lab: "But this problem wasn't there in the previous version of the chip!" and I've setup a multi-chip sim with the old and the new versions of the chip and figured out the issue.

You can see how hard it would have been to do this if I had to worry about unique `defines - it would have slowed me down tremendously, either causing a schedule slip or untold stress, but most likely both.

Reason 6: `include file path portability suffers. Usually all `defines are collected together in a file which is then `included in other files.

This causes portability issues with the RTL database: you'd have to create locally modified copies of the RTL if you wanted to change the paths which goes against the grain of proper revision control.

To explain: if you have

`include "/fsname/topdirname/chipname/somedir/foo.vh"

and you wanted to move over the design to a different filesystem, this path might not be valid. If you edited your local RTL to point to the correct directory, then it differs from what's in the revision control system - meaning it is an uncontrolled copy and might not represent the actual silicon.

Alternately, if you did:

`include "../../somedir/foo.vh"

then you must implicitly assume that the directory where you run tests has a fixed and unchanging relationship with the directory in which the `included file resides. This may not always be true.

Once again, if you wanted to port your RTL to a different system where the directory structure is different, you'd have to modify RTL files, creating local file copies distinct from the revision control repository files.

Besides these reasons, there are other minor annoyances - such as when you have missing `defines, or when new `defines are created as expressions of other `defines, making it difficult to spot errors.

My take on this is that it is better to not use `defines in RTL, period; because they create more problems than they solve - and they create the worst type of problems - the ones that cause loss of productivity and momentum.

I do know that it will make RTL designers very uncomfortable to not use constant expressions and use hard coded numbers instead. There is always some loss of flexibility this way.

Far be it from me to take that flexibility away, however. What I mean to say is, there is a better way of doing things - (no, not parameters, which are definitely better) - but as usual, I'll leave it for a future missive.

Also, there is a real use for `defines - something that actually will enhance the design process and productivity, and we'll tackle that in a future post as well.

The point of this post is far too important to clutter up with other things - Don't use `defines or `includes in your RTL.

Your humble author sometimes goes out looking for jobs - after all one has to have an income - such are the vagaries of life. In order to get a job, one has to go through a "process", endearingly called an interview. In reality, an interview is like a mating dance. The employer wants employees, and the employee wants an employer. Nothing productive can happen without either. It's much like mating is a necessary condition of existence - so is hiring.

So one has to go through the ritual, and dutifully, I went into one of those dances called a "telephonic screen" - something that sounds oddly like something you'd do on a 1-900 line. I suppose one has to develop a thick skin to go through with it and try to take it seriously.

So here is this author, trying to suppress either the urge to laugh or to scream, when he gets asked the question eponymous to the title of this post: "What if you discovered a timing violation today, and tapeout were tomorrow?"

Rather sadly but predictably enough, this didn't go well. I told the interviewer that in my entire career of eighteen years in electronics, this has happened not once. Not once. I do not expect it to happen ever. If it does, there are holes so big in my ASIC that you could fly A380s through them.

The point is simple but merits a detailed explanation, so great is its significance. You are trying to build an ASIC. It doesn't get built in three days, unless you think that writing RTL for a synchronous FIFO to be implemented in an FPGA constitutes ASIC design. It takes months.

In those months, you strategically plan every aspect of how the building process is going to unfold. You do trial netlists - you synthesize and do post-synthesis timing with plenty of margin every night. With a cron job. You track the RTL development and the timing results meticulously, reviewing violations carefully. You redo blocks where timing seems tight. If you have to meet a 1 ns clock, you synthesize for 600 ps. You try to meet 600 ps everywhere, not by playing synthesis tricks, but by having light and fast architectures.

The months roll on and you monitor the RTL carefully - and make it harder and harder for your people to change things as you come close to your RTL freeze deadlines. That way, your synthesis and simulations converge towards something that can go to tapeout.

Needless to say, your efforts of tracking tight paths in your design are to ensure that you won't be "discovering" new failing paths on the day before tapeout. Unless something is fundamentally broken somewhere.

Let's now work backwards from the day of the tapeout and try this analysis again.

So tapeout is tomorrow, and today you discovered a timing violation. What changed from yesterday? Why wasn't this timing violation discovered yesterday? Did RTL change yesterday? Did somebody mess up timing constraints and were those fixed yesterday and now this shows up failing paths? Were those constraints not reviewed thoroughly early on?

It is self evident that for you to not have known about this yesterday, you did not do your job properly. This is your error, even if it one of omission - something was lacking in your process - you weren't running a tight enough ship for something this gross to occur on the day before tapeout.

This may seem like a long and convoluted analysis, but it unflinchingly points to the culprit - an ill conceived process. Regrettably such is the state at a rather well known networking company. So without sugar coating it, I let the interviewer have it.

But even though it implies a whole lot, the question is still valid, and deserves an answer. What would I do if this happened. Hmm, lets see.

First, I'd take the rest of the day off, go home and sleep. If my process is so broken, then I don't have a choice but to delay tapeout for a month or three while I get a real good handle on things this time. Tapeout ain't gonna happen tomorrow, baby.

Second, I'd do a thorough failure analysis and do whatever it takes so it never happens again.

Yes, that's right. I don't make umpteen-point plans. Fix what's broken and fix it well so it stays fixed.

I guess the interviewer was trying to find out the steps one would do if a timing violation were discovered after RTL freeze, like ECOs and the like. I could have gone that way, but the question was specific with the timeline. Tapeout tomorrow. Violation found today. This demanded a different answer, and perhaps the interviewer got more than he bargained for.

Needless to say, I didn't get the job. Makes one wonder - how does Moore's law continue to apply?

Zen and the Art of Electronics

Wednesday, May 22, 2013

Pre-processor directives in RTL. A Good Thing?

Saturday, May 4, 2013

What if you discovered a timing violation today, and tapeout were tomorrow?