注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

paul.mcdean的博客

 
 
 

日志

 
 

High Performance Synthesis -(4)  

2007-05-06 18:48:37|  分类: Digital |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Using SOP Flattening

Sum-of-Products (SOP) flattening is an optimization technique used early during a full compile (i.e. it is not performed during an incremental compile).  SOP flattening involves converting the equations of a logic cone to a sum-of-products expression (e.g. y=abc+def+ae).  A SOP form usually consists of only two or three levels of logic (depending on inversions and fanin of available gates).  If after compiling, the critical path has 10 levels of logic when three are needed, try SOP flattening; no other technique will provide such a reduction in levels of logic.  SOP flattening is turned off by default in Design Compiler.  This is because certain types of structures experience gate-count explosion when flattened.  

Table 4: Which blocks respond well to SOP flattening?

Good candidates for SOP flattening

Bad candidates for SOP flattening

State machines

Arithmetic components

Random logic

XOR structures (parity trees)

Blocks that use don’t-cares

Blocks with many muxes

A standard approach for using SOP flattening is to first compile the design without it.  Then search the RTL of blocks along the critical paths.  Look for logic that is a good candidate.  Try flattening on these blocks and compare the results.  For some blocks flattening actually will increase the delay.  This is because flattening increases the fanout of the block inputs.  As such, loading of these nets will increase and can offset any gains derived from decreased levels of logic.  In general flattening may not work well for blocks that have a lot of outputs where each input is related to many of the outputs.

The set_flatten command enables SOP flattening.  One argument to set_flatten is
-phase
.  By default when Design Compiler flattens logic it creates K-maps and tries to cover the ‘1s’ in the K-map.  For K-maps with many 1s this will produce more logic then necessary.  Using
-phase instructs Design Compiler to also consider covering the ‘0s’ of the K-map with the output inverted.  When -phase is used both options are considered, therefore -phase should always be used.

There are three effort levels for the set_flatten command.  Flattening will cause increased gate-count for certain structures.  The effort level determines how much of an area increase is allowed for each endpoint.  Low effort flattening will probably produce similar results as a default compile, therefore medium and high flattening efforts should be used.  If flattening is attempted on a block with good and bad candidates, Design Compiler will give up on the bad candidate endpoints and keep their original logic structure.  Only use SOP flattening on blocks which show a timing improvement; this will help keep the area impact down.

Another option to set_flatten is -minimize <single_output | multiple_output>‘.  This instructs Design Compiler whether to flatten each endpoint independently or to share logic across endpoints.  With single_output, Design Compiler will minimize the equations for each output individually. This results in the smallest implementation for each output, but the design as a whole might suffer because product (AND) terms are not shared well between outputs.  With multiple_output Design Compiler will share as many product (AND) terms between outputs as possible. This strategy is beneficial for ROM-like structures in which most outputs are a function of all inputs.  In the final analysis it is difficult to determine which method will work best.  Experiment with both methods for a given block.

Example of using SOP flattening

set_flatten true -phase true -minimize single_output \

                 -design {blockA blockB} -effort medium

set_flatten true -phase true -minimize multiple_output \

                 -design {blockC} -effort medium

compile

Use the fast DesignWare parts

Many DesignWare parts are included with a Design Compiler license.  These reside in the standard.sldb library.  The Foundation Library provides higher performing implementations of the standard parts as well as many additional functions.  These include a fast CLA adder (CLF), Wallace tree multiplier, CLA comparators, and a vector adder.  Many of the most popular Foundation Library parts are inferred through the RTL code using operators such as ‘+’ and ‘*’.

Even if a Foundation Library license exists, the library must be activated in order for Design Compiler to use its faster parts.  This is done by adding the library files to the synthetic_library and link_library variables:

synthetic_library = {dw01.sldb, dw02.sldb}

link_library = link_library + synthetic_library

Without these commands, Design Compiler will always pick the slower implementations found in the standard library.  Note the link_library addition is only needed if DesignWare parts are instantiated in the design.  A convenient location for these variables is in the .synopsys_dc.setup file.

The DesignWare Foundation library is updated four times a year.  Therefore even though a project is using the latest version of Design Compiler, it may be a release or two behind for DesignWare.  This is important because new releases sometimes contain faster implementations of popular parts.  Details about the latest release and download instructions can be received by sending a blank email to dw_est@synopsys.com.

Fast Top-Level Incremental Compiles

As discussed in Section 4.1.1 bottom-up compiles often suffer from bad sub-block constraints.  Once the sub-blocks are integrated at the top, interface timing problems are often found.  A medium or high effort incremental compile is the best initial approach to take.  However if the design is very large the run-time of a normal incremental compile may be prohibitive.  There are several strategies for improving run-time in these situations.

If a bottom-up compile strategy was used, a top-level incremental compile should only be used to fix violations between budgeted blocks.  If the only violations exist on register-to-register paths within a budgeted block, tackle the problem there.  Design Budgeting will minimize the number of timing violations between budgeted blocks.

Recommended flow to reduce run-time of incremental compiles

The standard two-pass synthesis strategy calls for a medium effort compile followed by a high effort incremental compile.  Run-time for the high effort incremental compile can be reduced in several ways.

First recall that any critical range will increase run-time.  Therefore pick a small value or zero (See Section 4.4).  Next turn off area optimization by removing any max_area attribute that was previously set.  Another technique is to disable automatic wire-load selection.  Throughout synthesis, Design Compiler re-evaluates all of the wire-load models and changes them if a design’s area has changed to warrant a new model.  However if the designer is satisfied that the current wire-load models are correct, automatic wire-load selection can be disabled resulting in less run-time.  The default_wire_load attribute must be removed from all libraries to prevent Design Compiler from using the default.  This technique works best for designs with many levels of hierarchy.  Later, automatic wire-load selection can be enabled and static-timing analysis run.

Finally, incremental implementation selection (IIS) of DesignWare parts can be turned off.  IIS visits each DesignWare part and evaluates changing its implementation (e.g. RPL vs. CLA).  If the designer is satisfied that Design Compiler is using the best DesignWare parts it can, IIS can be turned off to improve run-time.

             Improving run-time of top-level incremental compiles

set_critical_range 0.0 current_design
remove_attribute current_design max_area
auto_wire_load_selection = false
remove_attribute find(library) default_wire_load
compile_implementation_selection = false
compile -incremental -map_effort medium

  Dont_Touch’s on Sub-blocks

A simple approach for improving run-time of top-level compiles is to place a dont_touch on the top-level sub-block instances.

The downside to this approach is that only interconnect logic is optimized (cells are sized and buffers added/removed).  Further optimizations could very likely be done if access to the sub-blocks were allowed.  Although this strategy may be successful in some cases, it is generally recommended that dont_touch’s NOT be placed on sub-blocks.

Dont_Touch’s can be placed on the top-level blocks using the following dc_shell script:

filter( find(cell),"is_hierarchical == true" )
foreach(tmp, dc_shell_status) {
  set_dont_touch tmp
}

This approach will improve run-time but keep in mind that the sub-block logic is still there.  Timing analysis is still done as changes are made to the netlist, so the compile will not be instantaneous.  

  评论这张
 
阅读(32)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017