注册 登录  
 加关注
查看详情
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

paul.mcdean的博客

 
 
 

日志

 
 

High Performance Synthesis -(5)  

2007-05-06 18:51:37|  分类: Digital |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Post-compile Analysis and Re-optimization

After performing the baseline compile of a design there still exist many opportunities for improvements.  These efforts focus on the critical paths in the design.  The following sections discuss how to analyze the results of a compile and a variety of methods for making further improvements.  If many violations exist try large-scale techniques such as modifying the RTL code or re-compiling sub-blocks.  If only a few violations exist they might be fixed using tactical approaches such as cell sizing or re-buffering.

How bad is it? – Gathering metrics

Several metrics should be gathered to analyze the quality of the synthesized design.  These metrics are used to determine the critical paths of the design, bottlenecks, and the distribution of the path slack.  The following script offers an example of how to collect design metrics.

Gathering Post-Compile Quality Metrics

include start_time.scr –quiet

<compile_the_design>

include end_time.scr -quiet

report_constraint

report_area

report_timing –nets –trans –sig 3

report_timing –path end –max_paths 50

report_timing –nets –trans –sig 3 –max_paths 100 > somefile

include find_dw_implementations.scr –quiet > someotherfile

The find_dw_implementations.scr script reports the type of DesignWare parts used throughout the hierarchy.  It can be found in Solvit article Synthesis-312.

To investigate a timing violation, run a timing report using ‘-net -trans -sig 3’ and closely examine it.  Sometimes a cell is found that has a much larger delay than other cells along the path.  Look to see if the transition time (provided by -trans) of either the input or output net is large.  If so the net could be overloaded.  Nets with a high load can slow the driver cell, load cell (from slow transition times) or both.  Look at the fanout on each net (provided by -net) to confirm.  When a case like this is found it may be beneficial to manually upsize a cell using change_link or buffer the net using balance_buffer.  Both are described in Section 5.5.  If the slow transition time is at the output of a flip-flop, check the flip-flop’s clock input transition time.  If it is very high then a clock buffer probably has not been instantiated or the clock net drive strength has been understated.  The –sig switch  increases the significant digits of the displayed values.  This is useful for reading values as pico-seconds.

report_timing -path end’displays a one line summary for the critical path.  Combined with

-max_paths’, report_timing produces a textual slack distribution of the design.  The distribution can be viewed graphically using Primetime.  Examining the slack distribution can give an indication of how good or bad the overall timing is.  The designer may find that almost all of the failing paths are clustered around the worst negative slack.  This could indicate that only one or a few bottlenecks are causing most of the violations.  Or it may be found that many paths are failing and the distribution is rather uniform.  This could indicate that many areas of the design require improvement.

Bottleneck analysis

Slow sections of a design can create bottlenecks.  These are sub-blocks, functional operations, or individual cells which appear in many of the failing paths.  Once identified, bottlenecks represent opportunities to improve many paths at once.  Examine the ‘-max_paths 100’ timing report for sub-blocks, functional operations or leaf cells which appear in many paths.  Primetime can be used to graphically perform this analysis.  

Bottleneck sub-blocks or DesignWare parts are candidates for the characterize/compile strategy as discussed in Section 5.3.  Sections of bottleneck random logic are candidates for SOP flattening as discussed in Section 4.7, or RTL recoding as discussed in Section 1.  Individual bottleneck cells many be caused by implicit dont_touch’s as discussed in Section 5.6 or may benefit from buffering as discussed in Section 5.5

Re-optimization of sub-blocks

The timing of a sub-block can often be improved by performing a compile of it using updated constraints.  The compile may be a high effort incremental compile, or if necessary a full compile from RTL followed by an incremental compile.  

Design Budgeting or characterization can create the updated constraints.  Design Budgeting should be used when multiple sub-blocks are to be re-optimized in parallel.  Design Budgeting will find constraint budgets that will give each block the best change of meeting its constraints.  Characterization should be used when the designer wants to re-optimize one or more blocks in series (e.g. characterize block A, re-optimize block A, characterize block B, re-optimize block B, etc.)  Don’t compile blocks in parallel using characterized constraints.

When creating constraints for sub-blocks, first set the current design to the highest level possible.  Design Compiler uses the current design’s constraints as its basis.  Thus it is important to create the sub-block constraints from a level where the constraints are trusted; many times this is the top-level of the ASIC.  If accurate design budgeting has been done on major sub-blocks, constraints can be generated from there.  For example, if sub-block instance u100 is several levels down in the hierarchy, the characterize command might look like this: ‘characterize -constraints -connections {instA/instB/instC/u100}’.

When compiling a sub-block, interface effects exist which must be addressed.  During a compile, Design Compiler might increase the load on an input port.  Design Compiler is able to compute the effect of the increased load on the block being compiled, however it has no way to determine its affect on other blocks which share the same net driving the input.  Increasing the load on an input port may cause a timing violation in another block.  Another issue occurs at the sub-block outputs.  It is possible that during a sub-block compile Design Compiler will make an output transition time worse.  Once again Design Compiler will ensure that this does not cause a violation within the block being compiled.  However the delay of the next cell downstream (outside the block) is made worse.  Therefore the block being compiled may meet timing but the overall timing will fail once top-level analysis is performed.  These effects are very difficult to model.

To address the issues described above, add margin to the sub-block I/O constraints.  Increasing each set_input_delay and set_output_delay value by a small amount does this.  A good value for the adjustment is a typical cell delay (e.g. 250ps).  If Design Budgeting is being used to create the sub-block constraints, use the set_context_margin command.  If characterize is being used, a script named tighten_constraints.scr exists which does this automatically.  It can be found in Solvit article Methodology-48.

Optimization with updated design budgets

After a bottom-up compile, timing violations might be seen within a budgeted sub-block, across budgeted sub-blocks, or both.  If none of the violations cross budgeted sub-blocks, they can be addressed via further optimization within the sub-block, with no need to update the budgets.  If violations do occur across budgeted sub-blocks, the budgets should be updated.

The budgets are updated by running the allocate_hierarchical_budgets command using the existing gate-level design (again use ‘-levels 0’ if budgets are only desired for top-level blocks).  After updating the budgets, incremental compiles can be done in parallel, of the blocks which have violations.  If the violations across the budgeted sub-blocks are too severe for an incremental compile to fix, consider a full compile from RTL.  This will allow restructuring of the logic using the updated constraints.

Optimization with characterized constraints

Characterization should be used when the designer wants to re-optimize one or more blocks in series.  Do not compile blocks in parallel using characterized constraints; I/O paths will be over and under constrained resulting in inferior timing.  When using characterize, most designers will want to use the -constraints  switch.  It specifies that Design Compiler is to add design rule constraints (e.g. set_max_fanout) that were set on the top block.  In addition it is often useful to include the
-connections switch.  This adds connection information to the sub-block constraints.  For example, if an input to the sub-block is tied low, this switch will add a set_logic_zero to that port.  Design Compiler can use this information to remove unneeded logic.  In this way -connections can modify the functionality of a sub-block just as boundary optimization does.  The top-level functionality remains the same.

  评论这张
 
阅读(14)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018