# Design and analysis of flip flop for low power VLSI applications - A Review

<sup>1</sup>Yash Vardhan, <sup>2</sup>Dhruv Nair, <sup>3</sup>M. Vinoth Kumar

1,2B.Tech Student, <sup>3</sup>Assistant Professor Department of ECE SRM University, NCR Campus

Abstract- Technology and speed have always progressed hand in hand, from low scale integration to large and very large scale integration and from megahertz (MHz) to gigahertz (GHz). Similarly the system requirements are also rising up with this continuous advancement in process of technology and speed of operation. It is known that low power flip-flops are crucial for the design of low-power digital systems. In this paper we analyse the details of various flip-flop design and optimization of various parameters. We have reviewed and studied lowest power flip-flops and compared their performances in the areas of Cost, Power consumption, Delay etc.

Keywords: Low Power Flip-Flop, Power Consumption, Area, Cost

### 1. INTRODUCTION

As the feature size of CMOS technology process shrinks according to Moore's Law, designers are able to integrate more transistors onto the same integrated circuit. As the number of transistors increase, it will result into increased switching and hence more power dissipation. Heat is one of the important packaging challenges in this period; being one of the main reasons for highlighting the need of low power design methodologies and practices. Another reason for taking up low power research is the reliability of the integrated circuit. In addition to this, The clock system, which consists of the clock distribution network and timing elements (flip-flops and latches), is one of the most power consuming components in a VLSI system [1]. It accounts for 30% to 60% of the total power dissipation in a system. As a result, reducing the power consumption of flip-flops will have a major impact on the total power consumption of the circuit. In this paper we have discussed different techniques which have been ideated to tackle issues to bring down power consumption, clocking distribution, area and also the cost.

In Section 1, the technique – "A Low Power High Performance Flip-Flop" [1], proposed by Ahmed Sayed and Hussain Al-Asaad is discussed. Section 2 discusses about the technique – "Conditional data mapping Flip-Flop for low power and high performance systems" [2], proposed by Chen Kong Teh. In Section 3, the technique – "Design of Reversible Synchronous Sequential circuits using Pseudo Reed Muller expression" [3], proposed by Mozammel H A Khan is discussed. In Section 4, the technique – "Low Power Clock Branch Sharing Double-Edge Triggered Flip-Flop" [4], proposed by Peiyi Zhao is discussed. In Section 5, the technique – "Design of a Fully Static Differential Low Power CMOS Flip-Flop" [5], proposed by T Yalcin and N Imailoglu. Section 6 discusses the technique – "Ultra Low Power Clocking Scheme using Energy Recovery and Clock Gating" [6], proposed by Hamid Mahmoodi.

#### **SECTION 1**

In this section, we discuss about the technique proposed by Ahmed Sayed and Hussain Al-Asaad. The technique is called as "A New Low Power High performance Flip-Flop" [1]. Move towards a mobile environment has been discussed as an important reason behind the shift towards low power research, as discussed by the authors. This trend of innovation of smaller technologically advanced devices has been faced by the issue of shorter batter life expectancy. It points to the fact that low power issues have to be addressed. As a result, they have designed a flip flop which aims at reducing the dynamic power component of the average power. For CMOS digital circuits, the average power is given by the equation:

$$P_{avg} = p_t(C_LVV_{dd}f_{clk}) + I_{sc}V_{dd} + I_{leakage}V_{dd}(1)$$

Thus the low-power design goal becomes the task of minimizing  $p_t(C_LVV_{dd}f_{clk})$ , while retaining the required functionality and identifying the cost of such minimizations in terms of area and/or performance.



Fig. 1 Power PC 603 MS latch

They have presented each of the flip-flop circuits considered in this paper accompanied by a short description of each circuit. Fig. 1 shows the Power PC master-slave latch that is one of the fastest classical structures. Its main advantage is the short direct path and the low power feedback. In this, the total power consumed by the flip-flop is affected vastly due to the heavy load on the clock. Moreover, it is sensitive to clock signal slopes and data feed through. This adds another concern when using it.



Fig. 2 DSTC Flip Flop

Fig. 2 shows the dynamic single-transistor-clocked (DSTC) flip-flop that suffers from substantial voltage drop at the output due to the capacitive coupling effect between the common node of the slave latch and the floating output driving node of the master latch.



Fig. 3 New ETL flip-flop

The new edge triggered latch proposed in Fig. 3 is a modification of the K6 ETL by replacing the jam-latches and adding the pull down transistors to create cross coupled inverters. Without the pull down transistors (of the back to back inverters) the flip-flop is still functional but the internal zero nodes suffer from cross coupling with the clock signal which causes an increase in the dynamic power consumption and reduction in the noise margins. The output inverters are not needed for correct circuit operation but are placed for general loading situations.

According to the authors, flip-flop 3 ie: New ETL flip-flop is the best amongst them all because it has lesser number of transistors and hence smaller area. The introduced flip-flop, though not having the best overall power consumption, but has the best power consumption at its optimal setup delay with moderate area.

#### **SECTION 2**

In this Section, we discuss the technique - "Conditional Data Mapping Flip-Flops for Low-Power and High-Performance Systems" [2] proposed by Chen Kong Teh et al. This paper introduces a group of low-powered and high-performance flip-flops, called as Conditional Data Mapping Flip-Flops (CDMFFs), which reduce their dynamic power by mapping their inputs to a configuration that eliminates redundant internal transitions. This presents two CDMFFs, having differential and single-ended structures, respectively, and compares them to the state-of-the-art Flip-Flops. They have discussed about the consumption of large portion of System Power by the Flip-Flops and latches, due to redundant transitions of their internal nodes, in the case where the logic state of their outputs was seen to be unchanged when it was triggered by a clock signal. In order to reduce this redundancy, various techniques as well as their flip-flops have been proposed, such as data look-ahead flip-flop (DLFF), clock-on-demand flip-flop (CODFF), conditional precharge flip-flop (CPFF), conditional capture flip-flop (CCFF), and conditional discharge flip-flop (CDFF). The common idea between DLFF and CODFF is to insert conditional circuitry into their clock paths to cut off clock signal in the case of a redundant event.

The paper illustrates the power breakdown at 50% data activity. The components of breakdown include: 1) external clock power, i.e., power dissipation at clock terminal; 2) external data power, i.e., power dissipation at data terminal; 3) internal clock power, i.e., power dissipation at 0% data activity; and 4) internal non-clock power, i.e., the difference between internal power and internal clock power. The results indicate s-CDMFF consumes the least power for arbitrary data activities in the single-ended group. In the differential group, d-CDMFF consumes the least power at data activity less than 50%. Excellent power reduction ability of CDMFFs is mainly contributed by the use of conditional circuitry in the data path, offering more power reduction when lowering data activity.

At 25% activity, d-CDMFF is 31% less power than SAFF, and 26% less power than d-CCFF. For the single-ended flip-flops at 25% activity, s-CDMFF is 41% less power than TGPL, 31% less power than s-CCFF, and 6% less power than TGFF. At 0% activities, CDMFFs consume the least power, indicating power consumed in their clock paths is very low.

Therefore, it is proposed that Conditional Data Mapping methodology is better for eliminating redundant internal transitions of a flip-flop by mapping its inputs to a certain configuration. Using this methodology, we have created two CDMFFs, which have the best power-delay product among the state-of the art flip-flops. They have not only comparable small delays among high-performance flip-flops, but also outstanding power reduction ability at various data activities. The results indicate the single-ended CDMFF has 34% less in data-to-output delay and 28% less in power at 25% data activity, in spite of the 34% increase in size.

#### **SECTION 3**

In this section, the technique "Design of Reversible Synchronous Sequential circuits using Pseudo Reed Muller expression" [3] proposed by Mozammel H A Khan has been discussed. He has the explained the concept behind the use of Reversible logic and has suggested that sequential circuits be built by replacing the latches and flip-flops and associated combinational gates of the traditional irreversible designs by their reversible counter parts. He has proposed an approach of designing synchronous sequential circuits directly from reversible gates using pseudo Reed–Muller expressions by representing the state transition and the output functions of the circuit. Also, he has presented designs of arbitrary synchronous sequential circuit as well as of counters and registers. A reversible circuit is constructed as a network of reversible gates. Fig.4 above shows the commonly used reversible gates such as  $1 \times 1$  NOT gate,  $2 \times 2$  Feynman gate,  $3 \times 3$  Toffoli gate, and  $3 \times 3$  Fredkin gate. Design examples show that the proposed direct designs save around 1.54%–49.09% quantum cost and approximately 51.43%–81.82% garbage outputs than the replacement design approach suggested earlier.



Fig.4 (a) NOT gate (b) Feynman gate (c) Toffoli gate (d) Fredkin gate

Thus, the proposed direct design method outperforms the previously reported replacement design approach.

#### **SECTION 4**

This Section talks about the technique "Low Power Clock Branch Sharing Double Edge Triggered Flip-Flop" [4], proposed by Peiyi Zhao et al. The discussed technique makes the use of a clock branch-sharing scheme to reduce the number of clocked transistors in the design. The newly proposed design also employs conditional discharge and split-path techniques to reduce switching activity as well as short-circuitcurrents.



Fig.5 Conventional dual edge flip-flop

One of the effective ways to decrease power consumption is voltage scaling. Another method which can be employed is, double-edge clocking. It can be used to save half of the power on the clock distribution network. The Power of clocking system =  $P_{clock}$  distribution  $P_{clock}$  distribution  $P_{clock}$  distribution network +  $P_{flip-flop}$ . Reducing the frequency of the clock by half directly affects the power consumed by clock distribution by almost fifty percent. It is stated that Pulse triggered flip-flops are better than the contemporary flip-flops because pulse-triggered flip-flops reduce the two stages into one stage and have smaller delays. Moreover they are characterized by the soft edge property. They have discussed a few conventional techniques for implementing double edge triggered flip-flops and then, compared these flip-flops with the proposed flip-flop.

The first technique they have discussed is, "Conventional Master—Slave Double-Edge Triggered Flip-Flop". The general scheme is shown in Fig. 5. The conventional way of designing DEFFs is done by duplicating the latch part of the single edge flip-flop to achieve sampling input data at both clock edges. This nearly increases the area by duplicating it, and also increases the load on the data and the clock inputs, which ultimately hampers the performance.



Fig.6 Dual-edge static hybrid flip-flop(ep-DSFF).

The second technique they have discussed is, "Flip-Flops with Explicit Pulse Generator Schemes". The master–slave FF has the hard edge property. Pulsed flip-flops allow cycle stealing and are skew tolerant. Explicit DEFFs use a pulse generator outside the latching part; the data latch part does not need duplication, unlike in the case of conventional dual edge triggered flip-flops. The schematic diagram of the explicit-pulsed dual-edge triggered static hybrid flip-flop (ep-DSFF) is shown in Fig. 6.

This design achieves a transparency window through an explicitly generated pulse. The pulse generator's design is based on TG-based XOR logic. The design has a simple structure on the critical path, so it may have less capacitive load on the critical path.

But, it suffers from certain drawbacks such as, it has an exposed diffusion input which is subject to noise andep-DSFF has a ratio issue along with poor robustness and driveability.



Fig.7 Symmetric pulse generator flip-flop (SPGFF)

The third conventional technique that has been discussed in their paper is, "Flip-Flops with Implicit Pulse-Generator Schemes". It has been stated that, implicit pulsed DE flip-flops use two series devices embedded in the logic branch receiving a clock and a delayed clock, respectively. Symmetric Pulse Generator Flip-Flop (SPGFF): The SPGFF is shown in Fig. 7. They have mentioned that this achieves dual-edge triggering with two symmetric stages. Each stage responds to one particular transition of the clock, hence, the name symmetric pulse generator flip-flop. Finally, the authors have compared these conventional flip-flops with a new proposed flip-flop, Proposed DE Clock Branch Sharing Implicit Pulsed Flip-Flop (as seen in Fig. 8).



Fig.8 Proposed DE Clock Branch Sharing Implicit Pulsed Flip-Flop

Conventional DEFFs replicate the area and the load on the inputs. Explicit pulsed DEFFs use external explicit pulsed DEFFs cannot work with dynamic logic. SPGFF uses implicit pulsing; however, it has four internal redundant switching nodes. Unlike SPGFF, DECPFF eliminates the redundant switching activity, however, the number of clocked transistors reaches 21, and the clock branch duplicating structure is complex.

A comparison between the SPGFF, ep-DSFF, and the newly proposed DECPFF has been presented. Different designs are analysed in terms of various parameters such as PDP, DQ delay, power, low swing driving ability, total transistor width, area, CQ delay, setup time, and leakage power. It is seen that SPGFF suffers from large power consumption because of the large number of the nodes switching with the clock. The ep-DSFF has only two gates in the critical path with a simple structure and is thus susceptible to noise.

Thus the new flip flop proves to better than the rest especially in terms of low power consumption. This is mainly due to four reasons. First, it has a clock branch sharing topology, where fewer transistors are clocked, which efficiently reduces the load on the clock. Second, the conditional discharge technique employed in the latch eliminates the redundant switching activity. Third, the split path technique reduces the short circuit current. Fourth, an implicit pulse generator scheme with one inverter clock pulse

generators, which increase the power. Moreover delay is used which further reduces power consumption. The authors have stated that The DECP flip flop has the least number of clocked transistors and lowest power;hence it is suitable for use in high-performance and low-power environments.

#### **SECTION 5**

This Section talks about "Design of Fully-Static Differential Low-Power CMOS Flip-Flop" [5].



Fig.9Yuan and Svennsons CVSL Flip-Flop

This technique is proposed by T. Yalcin and N. Ismailoglu. They have basically proposed a fully static structure and have compared it to a conventional CMOS flip-flop as well as to the Cathode Voltage Switch Logic (CVSL) which was proposed by Yuan and Svensson [1], in terms of parameters such as speed, power consumed and area. The conventional switch and inverter based flip-flops, while providing a very stable and easy-to-implement structure, suffer from the lack of flexibility. Extra logic and functionality cannot be easily embedded into such structures. They also consume high power as compared to the other flip-flops. In their study, a new fully-static flip-flop structure is proposed, and compared to the conventional static CMOS flip-flop and the fully-static CVSL flip-flop proposed by Yuan and Svensson in (Fig. 9). An add-and-delay circuit, which is a basic building block for most digital signal processingapplications, is designed using this new flip-flop with Differential Cascode Voltage Switch withPass-Gate (DCVSPG) techniques and compared to a similar circuit implemented with Yuan's CVSL flip-flop.



Fig. 10 New CVSL Flip-flop

The circuit schematic of the proposed static CVSL flip-flop is given in Fig. 10. This flip-flop structure is obtained by modifying the first (p) stage of Yuan and Svensson's flip-flop as follows: A third clocked transistors at the input stage are modified as clock controlled n-type pass transistors. This way, no floating nodes are left to be pulled up by the Pl-P2 transistors when clock is at its zero-state. Transistor (NC1) is added to the first stage, while the inverter and N3-N4 transistors, used to keep the internal nodes of the flip-flop at zero potential, are eliminated. The remaining N1 -N2 simulation results for both flip-flops are given in Table 3. Asseen in the table, the delay characteristics for both flip-flops are comparable, while the new CVSL flip-flop shows uniform and lower power consumption characteristics with respect to load and frequency. The proposed CVSL flip-flop is better in terms of the acquired silicon area and number of transistors used, since it makes use of only 11transistors, whereas the number of transistors in Yuan's flip-flop is 14 and that of a conventional structure is 18. Thus, the authors have presented the new CVSL Flip-Flop as a better alternative to the conventional flip-flop.

#### **SECTION 6**

In this Section, the technique "Ultra Low-Power clocking Scheme Using Energy Recovery and Clock Gating" [6], proposed by Hamid Mahmoodi et al. They have proposed four novel energy recovery clocked flip-flops that enable energy recovery from the clock network that results in notable energy savings. The proposed flip-flops operate with a single-phase sinusoidal clock. It has been seen that the use of sinusoidal clock signal for energy recovery has resulted in prevention of application of existing clock gating solution. Clock gating solutions have been proposed for energy recovery clocking. Energy recovery is a technique basically developed for low power digital circuits. Energy recovery circuits achieve low energy dissipation by restricting current to flow across devices with low voltage drop and by recycling the energy stored on their capacitors by using an ac-type (oscillating) supply voltage. In their paper, the authors have applied energy recovery techniques to the clock network since the clock signal is typically the most capacitive signal in a chip. The proposed energy recovery clocking scheme recycles the energy from this capacitance in each cycle of the clock.



Fig. 11 SAER flip-flop

Another known technique for reducing clock power is Clock gating. Even though energy recovery clocking results in substantial reduction in clock power, there still remains some energy loss due to resistances of the clock network and loss of energy in the oscillator due to uncontrolled switching. Therefore, there is still scope for reducing the clock power through clock gating technique during idle periods. The first proposed energy recovery clocked flip-flop, sense amplifier energy recovery (SAER) flip-flop, is shown in Fig. 11. This flip-flop, which is based on the sense amplifier flip-flop, is a dynamic flip-flop with precharge and evaluate phases of operation.

This flip-flop is used to operate with an energy recovery clock. When the clock voltage exceeds the threshold voltage of the clock transistor (MN1), evaluation occurs. It was seen that, at the onset of evaluation, the difference between the differential data inputs (D and DB) resulted in an initial small voltage difference between SET and RESET nodes. This initial small voltage difference is then amplified by the cross coupled inverter, resulting in either SET or RESET switches to low. This state transition is captured by the set/reset latch (cross coupled NAND gates) and retained for the rest of the cycle time until next evaluation occurs. The SET and RESET nodes are precharge high when the clock voltage falls below  $V_{dd}$ -  $V_{tp}$ , where  $V_{tp}$  is the threshold voltage of the precharging transistors (MP1 and MP2). Even though the SAER flip-flop is fast and consumes low power at high data switching activities, its main drawback is that either the SET or RESET node is always charged and discharged in every cycle, regardless of the data activity. This leads to considerable power consumption at low data switching activities where the data is not changing frequently.



Fig. 12 SDER flip-flop

Fig. 12 shows the static differential energy recovery (SDER) flip-flop. This flip-flop is a static pulsed flip-flop similar to the dual-rail static edge-triggered latch (DSETL). The energy recovery clock is applied to a minimum-sized inverter skewed for fast high-to-low transition. This flip-flop is static because SET and RESET nodes statically retain the state of the flip-flop without being precharged. The static nature of the flip-flop ensures that there is no internal redundant switching on SET and RESET nodes if input data remains idle. This results in power saving for low data switching activities. In this flip-flop, when the state of the input data is the same as its state in the previous conduction phase, there are no internal transitions. Thus, power consumption is minimized for low data switching activities.



Fig. 13 DCCER flip-flop

Fig. 13 shows the differential conditional-capturing energy recovery (DCCER) flip-flop. It similar to a dynamic flip-flop, the DCCER flip-flop operates in a precharge and evaluate manner. However, instead of using the clock for precharging, small pull-up PMOS transistors (MP1 and MP2) are used to charge the precharge nodes (SET and RESET). The DCCER flip-flop uses a NAND-based set/reset latch for the storage mechanism. Although MP1 and MP2 are statically ON, they don't result in static power dissipation since, as soon as the data sampling finishes and Q obtains the values of D, the pull down paths get turned off and the SET and RESET nodes are pulled back high without any loss of power. Another property of the circuit, that has been discussed is, that the clock transistor (MN1), is placed at the bottom of the stack. Therefore, the diffusion capacitance of the source terminal of MN1 is grounded and does not contribute to the charge sharing and hence, helps reduce charge sharing.



The analysis provided results which indicated that, the proposed flip-flops exhibit more than 70-80% delay reduction, a power reduction of up to 46%, and reduction in area of up to 77%, as compared to the FPTG flip-flop. Fig. 14 shows a single-ended conditional capturing energy recovery (SCCER) flip-flop. SCCER is a single-ended version of the DCCER flip-flip. The results for the power consumed during the sleep (clock gated) mode for 50% data switching activity have been shown. Notable savings are seen in terms of power when the clock gating is applied to the flip-flop during the idle state. It is seen that power savings of more than 1000 times are obtained during the idle state when compared to the power consumption without clock gating. It is observed that the power savings increase with increase in the data switching activity.

| FLIP-FLOP | TOTAL POWER (uW) |
|-----------|------------------|
| CDMFF     | 28% less         |
| DECP      | 20% less         |
| SCCER     | 56.6             |
| DCCER     | 62               |
| SDER      | 82.5             |

#### **Table-1 Power Consumption of Flip-Flops**

Based on the Power Consumption, the Single-ended ConditionalCapturingEnergyRecovery (SCCER) Flip-Flop consumes least power.

| FLIP-FLOP | COST              |
|-----------|-------------------|
| CDMFF     | save 1.54%–49.09% |
| PSDRM     | 10.81% less       |

#### **Table-2 Cost Factor of Flip-Flops**

Based on the Cost Factor, CDMFF is the cheapest due to less number of transistors in use and incorporating the technique of Conditional Data Mapping.

| FLIP-FLOP | AREA                    |
|-----------|-------------------------|
| CDMFF     | 34% increase in size    |
| ETL       | Less no. of transistors |
| DCCER     | 77% less in size        |

#### **Table-3 Area of Flips-Flops**

Based on the Size of the Flip-Flop, the DifferentialConditionalCapturingEnergyRecovery (DCCER) Flip-Flop proves to be optimum.

## Power Consumption (μW)



Fig 15 Power consumed by flip flop circuits

#### **CONCLUSION**

In this paper the different techniques of designing low power flip-flops have been studied and compared and also different techniques for designing low power pulse triggered flip-flops have been analysed as well. In these the best method was found to be Ultra Low-Power clocking Scheme Using Energy Recovery and Clock Gating because it consumed the least power.

#### REFERENCES

- [1] Ahmed Sayed and Hussain Al-Asaad, "A New Low Power High Performance Flip-Flop", IEEE International Midwest Symposium on Circuits and Systems 2006.
- [2] Chen Kong Teh, Mototsugu Hamada, Tetsuya Fujita, Hiroyuki Hara, Nobuyuki Ikumi, And YukihitoOowaki, "conditional data mapping flip-flops for low-power and high-performance systems", IEEE transactions on very large scale integration (VLSI) systems, vol. 14, no. 12, December 2006.

- [3] Mozammel HA Khan, "Design of Reversible Synchronous Sequential Circuits Using Pseudo Reed-Muller Expressions", IEEE transactions on very large scale integration (VLSI) systems, vol. 22, no. 11, November 2014.
- [4] Peiyi Zhao, Jason McNeely, Pradeep Golconda, Magdy A. Bayoumi, Robert A. Barcenas, and WeidongKuang, "Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop", IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol. 15, No. 3, March 2007.
- [5] T. Yalcin, N. Ismailoglu, "Design Of A Fully-Static Differential Low-power CMOS flip-flop", Proceedings of the 1999 IEEE International Symposium on Circuits and Systems, 1999. ISCAS'99, Vol. 1, Pages 331-333.
- [6] H. Mahmoodi, V. Tirumalashetty, M. Cooke, and K. Roy, "Ultra low power clocking scheme using energy recovery and clock gating," IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 17, no. 1, pp. 33–44, Jan. 2009.

