I’m a chip designer working on the digital side. I’ve got experience with
- CPU/SoC architecture and design, especially RISC-V open ISA
- IC design/verification with Verilog/SystemVerilog/SystemC
- Low power design and optimization
- ASIC design flow, including front-end, back-end and power sign-off
- Semi-custom design flow, including transistor timing analysis and SPICE simulation
Currently my interests are
- Harware and software co-design
- SoC generator
- Machine learning accelerator
If you share the same interest and want a discussion, please send me a message on LinkedIn
Work expereince
Question: how to control the clock skew between a group of clocks to be minimum, say less than 30ps, instead of utilizing useful skew? This case happens to our hard macros.
A: in Innovus, use skew group
set min_skew_group { path/to/clock/NLVB_CKB path/to/clock/NLVA_CKB path/to/clock/NLVP_CKB } create_ccopt_skew_group \ -name min_skew_group \ -sources path/to/clock/source/CKB \ -sinks $min_skew_group \ -target_insertion_delay 0.500 \ -rank 1 -target_skew 0.000 set_ccopt_property constraints -skew_group min_skew_group ccopt
The following is my notes of INNOVUS training course on Cadence’s training module
Module 02: overview “gift” directory contains lots of useful scripts to help productivity Independent “viewlog” utility or “Tools->Log Viewer” will start a GUI to help understand log files better. Batch mode: innovus -no_gui -init batch.tcl win / win off to show/hide GUI Module 03: import design Input Netlist in Verilog Floorplan in DEF Clock tree spec auto gen from SDC Scan info in Tcl or DEF I/O info (pads or pins) GDS layer map (if want to dump GDS) Timing constraint in SDC Timing library in .
AI and ML Artificial intellegence vs human intellegence The imitation game, eugen Goosman passed the Turing Test, 2014 Alpha Go, deepmind 2015
Introduction to deep learning Improve on task T with respect to performance metric P based on experience E Perceptron learning (one layer NN): a(i) = a(i-1) + n * {target - output} * A(i) 1974 multi-ayer perceptron with backpropagation training deep learning is old tricks, more computing power, more data makes it possible and powerful Binary classification TouchID, speaker verification, face verification, emal spam, motion detection, credit card fraud N-ary classifiction MNIST (handwriting), speaker identification, word prediction (typing on iphone) Deep learning for speech (Deng, 2010)
Start-up: python -> enterprise: C/Java/Scala, more engineers, faster Research: quick result and prototyping
GPU? Data movement between GPU and CPU is important
[ ] fast.ai: class (high school math)
infrastructure: spark/flink scheduler problem distributed file system
Problems to think about when running works on GPU clusters memory is relatively small throughput, jobs are more than matrix math resource provisioning: how many resource we need? GPU/CPU/RAM GPU allocation per job Python <-> Java overhead, defeats the points of GPUs
Start-up: python -> enterprise: C/Java/Scala, more engineers, faster Research: quick result and prototyping
GPU? Data movement between GPU and CPU is important
[ ] fast.ai: class (high school math)
infrastructure: spark/flink scheduler problem distributed file system
Problems to think about when running works on GPU clusters memory is relatively small throughput, jobs are more than matrix math resource provisioning: how many resource we need? GPU/CPU/RAM GPU allocation per job Python <-> Java overhead, defeats the points of GPUs
今天去参加一个AI的meetup,碰到了一个连续创业者。他介绍了自己正在做的事情:“改变现有的输入方式,不应该是人给机器输入指令,而应该是机器预测人的需求并作出相应的动作。”,这才是机器的未来。同时他提到输出界面应该AR(增强现实)这种类型的,而不应该是一个显示屏幕。
By Jon Shlens and George Toderici from Google Research @ 2017-01-20 Fri
History
Convolutional NN: old tech, why suddenly it works?
Scale: 60M parameters At least 60M +1 data point to fit these parameters
SIMD hardware (GPU)
Domain transfer
Use trained CNN (with large data set) on some other applications with limited data set CNN (convolutional neuron network)
一场涉及普通消费者的智能用电革命正在悄然发生。加州政府近年来努力推动这项能源节约革命,在近三年来取得了快速的进步,得到了能源公司和电器制造商的广泛支持。
Prosumer的概念 普通家庭以往是以单一的电力消费者出现的。但是近年来由于太阳能发电设备的推广力度不断加大,得到了大量消费者的欢迎和支持,催生了prosumer的概念,即producer + consumer。通过政府大力支持的贷款在自家屋顶安装太阳能板,并将发出的电力上网卖给电力公司,同时获得最低单价的用电费用。我的同事中就有不少安装了,或者正在考虑安装这样的设备,表明这样的project即使对于普通的三口或四口之家都是有利可图的。
智能用电 电网消费有峰有谷,这样的波动由于各个小区域的消费习惯、天气变化等密切相关,而且往往变化迅速,即以分钟为单位反复变化。但是电网负载波动对于电网设备而言是有害的,所以电力公司有非常强烈的意愿通过某些技术手段来消弭这样的波动。这样就提出了“智能用电”的概念,也就是通过对普通的电器进行联网和远程自动控制,来调节小区域内部的用电波动。这样不仅能降低电网设备的负载,也能够有效利用能源,所以对于电力公司和政府决策者而言都是大有益处的。对于普通消费者而言,积极参与“智能用电”项目能够获得的利益来自于电力公司和政府政策补贴。
举例说明,热水器和烘干机之类的设备是耗电量大户,但是往往对于时效性要求不是很高。如果能够加入特定的芯片进行联网控制其功率,取代恒定的大功率输出,就能够起到平衡电网负载波动的左右。代价可能仅仅是将原有的工作时间延长一些而已。再者就是目前越来越普及的电动车。对于普通使用者而言,充电时间远远大于使用时间。如果在足够的充电时间内自动选择电网负载最低的时段进行充电就能够获得最低的电价。
另外的小事 加州一些城市的商业区或者大型shopping center开始自行安装一些电动车充电桩。这些充电桩在非高峰期(周末或者节假日)是免费的。这样就吸引一些开电动车的顾客来充电和消费。
Some parameterized example RTL code for register-based SRAM read circuit using “generate” feature
parameter d = 32; // FIFO depth parameter w = 64; // FIFO data bit-width logic [w-1:0] mem [d-1:0]; // FIFO memory array logic [d-1:0] rwl; // 1-hot read word line // read circuit using "generate" wire [w-1:0] word_or; genvar width, depth; generate for (width = 0; width < w; width++) begin: rbit wire [d-1:0] bit_or; for (depth = 0; depth < d; depth++) begin: rmux assign bit_or[depth] = mem[depth][width] & rwl[depth]; end assign word_or[width] = |bit_or; end endgenerate reg [w-1:0] idout; always @ (negedge CKB) begin idout <= word_or; end
This is my reading note of book “SystemVerilog for Design (2nd edition)". As a non-full-time RTL designer, it has opened my mind. But still, I’m sad about the antient tool that we are using to design hardware.
Chapter 2: SystemVerilog Declaration Spaces Package Verilog shortage: no global declaration package ... endpackage share user-defined type definitions across multiple modules independent of modules parameters cannot be redefined in package, parameter is similar to localparam, cos in module localparam cannot be directly redefined while instantiation referencing :: the scope resolution operator package_name::package_member use import to import package into current space import package_name::package_member TIPS: importing an enumerated type definition will not import the labels automatically import package_name::* what is used will be imported $unit declaration space TIPS: synthesis guide tasks and functions must be automatic storage for automatic task/function is allocated each time it’s called cannot use static variables, which are supposed to be shared by all instances $unit: compilation-unit declarations declaration space outside of package/module/interface/program BUT it’s not global if put variables and nets in $unit source code order can affect the usage of a declaration external to the module each compilation has one $unit single-file compilation multiple-file compilation: source order is tricky TIPS: coding guide DONOT make any declarations in $unit space, only import packages into $unit ILLEGAL to import the same package more than once into the same $unit NOTE: donot work for global variables, static task/function // filename: def.