Saturday, May 4, 2013

What if you discovered a timing violation today, and tapeout were tomorrow?

Your humble author sometimes goes out looking for jobs - after all one has to have an income - such are the vagaries of life. In order to get a job, one has to go through a "process", endearingly called an interview. In reality, an interview is like a mating dance. The employer wants employees, and the employee wants an employer. Nothing productive can happen without either. It's much like mating is a necessary condition of existence - so is hiring.

So one has to go through the ritual, and dutifully, I went into one of those dances called a "telephonic screen" - something that sounds oddly like something you'd do on a 1-900 line. I suppose one has to develop a thick skin to go through with it and try to take it seriously.

So here is this author, trying to suppress either the urge to laugh or to scream, when he gets asked the question eponymous to the title of this post: "What if you discovered a timing violation today, and tapeout were tomorrow?"

Rather sadly but predictably enough, this didn't go well. I told the interviewer that in my entire career of eighteen years in electronics, this has happened not once. Not once. I do not expect it to happen ever. If it does, there are holes so big in my ASIC that you could fly A380s through them.

The point is simple but merits a detailed explanation, so great is its significance. You are trying to build an ASIC. It doesn't get built in three days, unless you think that writing RTL for a synchronous FIFO to be implemented in an FPGA constitutes ASIC design. It takes months.

In those months, you strategically plan every aspect of how the building process is going to unfold.  You do trial netlists  - you synthesize and do post-synthesis timing with plenty of margin every night. With a cron job. You track the RTL development and the timing results meticulously, reviewing violations carefully. You redo blocks where timing seems tight. If you have to meet a 1 ns clock, you synthesize for 600 ps. You try to meet 600 ps everywhere, not by playing synthesis tricks, but by having light and fast architectures.

The months roll on and you monitor the RTL carefully - and make it harder and harder for your people to change things as you come close to your RTL freeze deadlines. That way, your synthesis and simulations converge towards something that can go to tapeout.

Needless to say, your efforts of tracking tight paths in your design are to ensure that you won't be "discovering" new failing paths on the day before tapeout. Unless something is fundamentally broken somewhere.

Let's now work backwards from the day of the tapeout and try this analysis again.

So tapeout is tomorrow, and today you discovered a timing violation. What changed from yesterday? Why wasn't this timing violation discovered yesterday? Did RTL change yesterday? Did somebody mess up timing constraints and were those fixed yesterday and now this shows up failing paths? Were those constraints not reviewed thoroughly early on?

It is self evident that for you to not have known about this yesterday, you did not do your job properly. This is your error, even if it one of omission - something was lacking in your process - you weren't running a tight enough ship for something this gross to occur on the day before tapeout.

This may seem like a long and convoluted analysis, but it unflinchingly points to the culprit - an ill conceived process. Regrettably such is the state at a rather well known networking company. So without sugar coating it, I let the interviewer have it.

But even though it implies a whole lot, the question is still valid, and deserves an answer. What would I do if this happened. Hmm, lets see.

First, I'd take the rest of the day off, go home and sleep. If my process is so broken, then I don't have a choice but to delay tapeout for a month or three while I get a real good handle on things this time. Tapeout ain't gonna happen tomorrow, baby.

Second, I'd do a thorough failure analysis and do whatever it takes so it never happens again. 

Yes, that's right. I don't make umpteen-point plans. Fix what's broken and fix it well so it stays fixed.

I guess the interviewer was trying to find out the steps one would do if a timing violation were discovered after RTL freeze, like ECOs and the like. I could have gone that way, but the question was specific with the timeline. Tapeout tomorrow. Violation found today. This demanded a different answer, and perhaps the interviewer got more than he bargained for.

Needless to say, I didn't get the job. Makes one wonder - how does Moore's law continue to apply?

No comments:

Post a Comment