

# Performance Evaluation of an Efficient ALU

# Mary Sajin Sanju.I<sup>1</sup>, M. Vadivel<sup>2</sup>

<sup>1</sup> Research Scholar, Sathyabama Institute of Science and Technology, Chennai, India <sup>2</sup> Vidya Jyothi Institute of Technology, Hyderabad, India

The arithmetic logic unit (ALU) is an important building block in many applications such as microprocessors, digital signal processors (DSPs) and image processing. Power efficiency may be a general concern in VLSI design. This paper presents delay-time optimization of a 4-bit ALU designed using the full-swing gate diffusion input (GDI) technique. An efficient ALU was designed and simulated by using HSPICE tool using 130nm technology. The modified ALU gives better performance in terms of power, delay and energy and can be used for high-speed and low-power applications.

**Keywords:** Arithmetic Logic Unit (ALU); Gate Diffusion Input (GDI), Full-Swing GDI (FS-GDI).

#### 1. Introduction

Power utilization and area occupied are the most pressing issues within the semiconductor industry, which have motivated great research effort to attenuate power consumption and concomitantly the area of VLSI circuits. Severely limited power is available for the portable electronic devices heavily used day-to-day worldwide. These devices are simultaneously low-power and high-speed.

Compared to CMOS and pass transistor logic (PTL) techniques, the gate diffusion input (GDI) technique allows improvement in power consumption, propagation dissipation delay and area occupied by VLSI digital circuits. [1]

In 2019 Mahmoud Aymen Ahmed designed an ALU using the full-swing GDI technique[2]. Simulations carried out in Cadence Virtuoso using 65 nm TSMC processes with a supply voltage of 1.2 volts and a frequency of 125 MHz revealed improvement in delay time and overall energy of the optimized ALU design. Using the GDI technique allowed improvements in power, propagation delay and VLSI circuit area compared to the CMOS and pass transistor logic (PTL) techniques. But it was proposed for fabrication in twin-well CMOS or silicon-on-insulator (SOI) processes and, similarly to PTL, the GDI gates suffered from reduced voltage swings at their outputs due to threshold drops [3]. This increased static

power dissipation and caused performance degradation.

Alex F. Kirichenko designed and tested a parallel 8-bit ERSFQ ALU, comprising 6840 Josephson junctions, in 2019 [4]. The ALU design employed wave-pipelined instruction execution and featured modular bit-slice architecture that is easily extendable to any number of bits and adaptable to current recycling. A carrier signal synchronized with asynchronous instruction propagation provided the wave-pipeline operation of the ALU [5]. Its instruction set consists of 14 arithmetical and logical instructions. It was designed and simulated for operation up to 10 GHz clock rate and embedded in a shift register-based high frequency test bed with an on-chip clock generator to allow for comprehensive high frequency testing for all possible operands. It was fabricated with MIT Lincoln Laboratory's 10 kA/cm2 SFQ5ee fabrication process featuring eight Nb wiring layers and the high kinetic inductance layer needed for ERSFQ technology.

Parth Khatter designed an ALU using basic reversible gates in 2018 [6]. Two designs for both the arithmetic and the logical unit were proposed and one design for the control unit was also put forward. These designs were integrated together to give four complete ALU designs. Each proposed design was implemented in Verilog HDL using Xilinx ISE Suite 14.1 software to verify its functionality[7]. The proposed designs were compared with each other and their existing counterparts based on quantum cost, garbage outputs and ancillary inputs [8]. The simplicity and the reduced quantum cost of the proposed ALU designs make them ideal candidates to be used as modules in quantum computers. The proposed ALUs extend their applications to cryptography, machine learning and nanotechnology [9]. The technological advances have made integration of thousands of millions of transistors on single die possible, which has given the designers the flexibility and freedom of putting more and more functionality on the same die. This has, however, resulted in increased power consumption, which has motivated a plethora of techniques for dealing with it.

K. Pandiammal designed digital circuits at the nanoscale in the same year (2018). [10]An 8-bit QCA based reconfigurable 1-bit ALU was proposed using clock zone-based crossover (CZBCO). The ALU unit was designed to perform four arithmetic and logical operations, viz. binary addition, logical AND, OR and EXOR. The multigates (MGs) used in the EXOR operation [9]are reused for logical AND and OR their number reduces to two in 1-bit ALU design. The proposed ALU reduced energy dissipation by 54.5% and minimized QCA cell utilization by 43.5% when compared to existing devices.

Again in 2018 Alexis Ramos designed ALU with the use of microprocessors in space missions in mind, implying that they should be protected against the effects of cosmic radiation. Commonly this objective has been achieved by applying modular redundancy, which provides good results in terms of reliability but obviously significantly increases the number of used resources. [11] Because of that, new protection techniques have appeared, trying to establish a trade-off between reliability and resource utilization.

An application-based methodology is proposed by A .Ramos in 2019 to protect a soft processor implemented in an SRAM-based FPGA against the effect of soft errors. This is done creating a library of adaptive protection configurations, based on the profiling of the

application. This hardware configuration library, combined with the reprogramming capabilities of the FPGA, helps to create an adaptive protection for each application. Two partial triple modular redundancy (TMR) configurations for the ALU are presented as an example of this methodology, in which TMR preserves the error present in the functional unit [13]. Those components leading to failure are identified and increased reliability is achieved. The proposed scheme has been tested in a RISC-V soft processor. A fault injection campaign has been carried out to test its reliability.

There is a need to reduce overall design cost, a digital system is implemented by considering the constraints of power, area, speed and cost of the internal logic blockk [14]. The proposal aimed to implement different functionalities over the same resources with a time multiplexing concept. Depending on the requirement a suitable adder or a multiplier could be switched in without using extra resources [15]. An effort was made to explore the possibility of run-time reconfiguration as applied to 4-bit ALU design, hence leading to an efficient utilization of the resources available on the device. Design comparison indicating resource utilization and power consumption with and without partial reconfiguration was given. The design was executed using XILINX Vivado, which was also used by Jumuna in 2019 for an ALU implemented on FPGAs to analyse the design parameters, with the main design objective being to develop algorithms to achieve an efficient utilization of the available hardware [16]. The measures of the efficiency of an algorithm are speed improvement, less power consumption and better utilization of the ALU [17].

Simulated and synthesized parameters of ALUs by using VERILOG on Xilinx is proposed by Amrit Kumar Panigrahi (2019), an attempt was made to demonstrate reconfigurable built-in self-test (BIST) logic, which detects faults across the ALU block mapped on the FPGA. Proposed work aims at detecting stuck-at faults occurring at the internal blocks of ALU. Stuck at fault is a particular fault model used in automatic test pattern generation. Test patterns are generated through LFSR and outputs are analyzed using MISR(Multiple input signature register) block. MISR block is is multiple output digital system used for testable design purpose which accelerates the task of compressing multiple input data streams into one signature in the testing task. Through this proposed work an effort is made to verify the design also using Questasim simulator. The design is been executed using XILINX Vivado IDE. Number of faults detected is 80 out of 98 with code coverage of 94.2% and the total power consumed by the proposed design is 1.06 watts. Kp Shashikala (2019) designed an Arithmetic Logic Unit (ALU) by taking advantage of the concept of dual mode logic (DML) technique has been implemented. ALU is one of the most significant component of any computing system be it microprocessors, embedded structures or any other computational device. In this, ALU consists of 4x1 multiplexer, 2- input and unit, 2-input or unit, 2-input exor unit and a full adder designed to implement logic operations, such as and, or, exor and arithmetic operation of addition using a full adder. DML technique has been used for designing of multiplexer, full adder, and unit, or unit and exor unit which are then associated to realize the DML based ALU. Power of static CMOS ALU and delay of domino ALU architecture were calculated which were then compared with static mode and dynamic mode of DML based ALU respectively. This is designed and simulated using Mentor Graphics Pyxis Schematic Tool with 1.8 V supply voltage and 180 nanometre (nm) technology.

# 2. Proposed system

The proposed Full Adder design shows a detailed circuit of the proposed Full adder. Sum output (O/p) is constructed by two cascaded XNOR modules/gates. Signal B is applied to the weak inverter comprised of Q3, Q4. Input (I/p) and Output (O/p) signals (B and B' respectively), of this weak inverter are used to construct the controlled inverter with Q3 and Q4 Transistors. To overcome this swing degradation problem, pass transistor Q5 and Q6 are used. P-MOS pass transistor (Q5) and N-MOS pass transistor (Q6) for Strong. The output (O/p) of the first XNOR module is applied as input (I/p) to the second XNOR module for complete SUM function.

The proposed 4-Bit ALU designed using 65nm TSMC CMOS process, The simulations were done using the SPECTRE based Cadence Virtuoso simulator with a power supply 1.2V, and a clock frequency of 125 MHz, the size of PMOS is twice that of the NMOS transistor (W/L) =240 nm/60 nm and (W/L) =120 nm/60 nm for best power and delay performance. Using A=1100, B=0101 as test inputs. The proposed design compared to the previous design in [5] in terms of power consumption, delay, energy and transistor count. Simulation results for the proposed 4-bit ALU are shown below.



Fig:1 Proposed System

In This work delay time of a 4-Bit ALU designed using the full-swing GDI technique optimized and reduced by 22.3% compared to the previous design, while maintaining full-swing operation. Hence the energy of the 4-bit ALU reduced by 21.2%. The proposed design consists of 294 transistors and operates under 1.2V supply voltage and Frequency of 125 MHz, based on the results, it can be concluded that the proposed 4-bit ALU in full-swing GDI technique is suitable for low energy high-speed VLSI applications. Further study in this work would be using the 4-bit ALU as a building block to implement 8-bit and 16-bit ALU. The proposed 4-Bit ALU designed using 65nm TSMC CMOS process, The simulations were done using the SPECTRE based Cadence Virtuoso simulator with a power

supply 1.2V, and a clock frequency of 125 MHz, the size of PMOS is twice that of the NMOS transistor (W/L) = 240 nm/60 nm and (W/L) = 120 nm/60 nm for best power and delay performance. Using A=1100, B=0101 as test inputs. The proposed design compared to the previous design in [5] in terms of power consumption, delay, energy and transistor count. Simulation results for the proposed 4-bit ALU are shown. An adder is a digital circuit that performs addition of numbers. The half adder adds two binary digits called as augend and addend and produces two outputs as sum and carry; XOR is applied to both inputs to produce sum and AND gate is applied to both inputs to produce carry. The full adder adds 3 one bit numbers, where two can be referred to as operands and one can be referred to as bit carried in. And produces 2-bit output, and these can be referred to as output carry and sum. Then we can see that Multiplexers are switching circuits that just switch or route signals through themselves, and being a combinational circuit they are memory less as there is no signal feedback path. The multiplexer is a very useful electronic circuit that has uses in many different applications such as signal routing, data communications and data bus control applications. The advantage is that only one serial data line is required instead of multiple parallel data lines. Therefore, multiplexers are sometimes referred to as "data selectors", as they select the data to the line.

# 3. Simulation result

The existing and proposed architectures are simulated using MODELSIM. Arithmetic logic unit is an essential building block in many applications such as microprocessors, DSP and image processing while power efficiency is a general concern in VLSI Design. This presents delay time optimization of 4 bit ALU designed. Simulation results revealed improvement in Delay time and overall energy of the optimized ALU design. Output of the existing ALU and proposed ALU are analysed from the results and it is concluded that the proposed architecture reduces the delay compared with the existing system. The Synthesis is done by using HSPUI A. 5.1 SIMULATION WAVE FORM FOR EXISTING SYSTEM



Fig:2 Simulation wave form for Existing ALU



Fig:3 Simulation Wave Form for Proposed ALU

The output waveform of the existing ALU shown in fig:2, each function presented in one nanosecond of time with the same order. The output of each function is clearly described by the respective signals shown in the above figure. The proposed ALU simulated wave is shown in Fgure 3. The output waveform of the proposed ALU and each function is presented in one nanosecond of time with the same order. The output of each function clearly shows improvement compared to the existing ALU.



Fig. 4. Simulation waveform for 4x1



Fig. 5. Simulation Waveform for an adder

The simulated waveform of the Adder shown in Fig: (4 & 5), each function presented in one nanosecond of time with the same order. The output of the adder circuit is described with respective to the above signals.

Table 1. Comparative analysis of the existing and proposed systems.

| Design          | Average power/nW | Peak power/nW | No of transistors |
|-----------------|------------------|---------------|-------------------|
| Existing system | 0.4091           | 0.5641        | 12                |
| Proposed system | 0.1541           | 4.398         | 20                |

| DESIGN          | AVG POWER | PEAK POWER |
|-----------------|-----------|------------|
| 4x1 multiplexer | 1.193e-05 | 4.732e-04  |
| OR gate         | 1.715e-05 | 7.639e-03  |
| Adder Circuit   | 1.059e-04 | 1.222e-02  |
| XOR gate        | 3.476e-05 | 3.469e-03  |
| AND gate        | 1.660e-05 | 3.447e-03  |

A comparative analysis is made on the existing and the proposed one which is shown in the above Table:1.The analysis and simulation states that the proposed ALU design decreased Delay time by 22.3% (15.6ps) and improved Energy by 21.2% at a slight increase in Transistor count (8 Transistors and in Power consumption (390nW). The average power decreased by 2.55e-04 when compared to the previous design whereas the peak power has been decreased by 1.243e-03.

## 4. Conclusion

The delay time of a 4-bit ALU designed using the full-swing GDI technique optimized and *Nanotechnology Perceptions* Vol. 19 No.2 (2023)

reduced by 22.3% compared to the previous design as shown above, while maintaining full swing operation. Hence the energy of the 4-bit ALU reduced by 21.2%. The proposed design consists of 294 transistors and operates under 1.2V supply voltage and frequency of 125MHz, based on the results carried out, it can be concluded that the proposed 4-bit ALU in full swing GDI technique is suitable for low power high-speed Very LargeScale Integrated Applications. Further study during this work would be using the 4-bit ALU as a building block to increment 8 bit and 16-bit ALU. An arithmetic logic unit (ALU) is a digital electronic circuit that performs arithmetic and bit wise logical operations on integer binary numbers. Arithmetic and logical unit is a fundamental building block of many types of computing circuits, including the central processing unit (CPU) of computers, FPUs, and graphics processing units. Arithmetic Logic Unit (ALU) is an essential building block in many applications such as microprocessors, DSP, and image processing, while power efficiency is a general concern in VLSI design. The design and simulation of an efficient ALU are accomplished in HSPICE tool based on 130nm technology. From the findings modified Arithmetic and Logical Unit gives better performance in terms of power, Delay and energy which can we used for high speed and low power application

## References

- [1] V. Yuzhaninov, I. Levi and A. Fish, Design flow and characterization methodology for dual mode logic. IEEE Access 3 (2016) 3089–3101.
- [2] Mahmoud Aymen Ahmed1, M. A. Mohamed El-Bendary2 "Delay Optimization of 4-Bit ALU Designed in FS-GDI Technique" 2019 International Conference on Innovative Trends in Computer Engineering (ITCE'2019), Aswan, Egypt, 2-4 February 2019.
- [3] A. Morgenshtein, A. Fish, and I. A. Wagner, Gate-diffusion input (GDI): A power-efficient method for digital combinatorial circuits. IEEE Trans Very Large Scale Integration (VLSI) Systems 10 (2002) 566–581.
- [4] A.F.Kirichenko.I.V.Vernik.M.Y.Kamkar, J.Walter, M.Miller" ERSFQ 8-bit Parallel Arithmetic Logic Unit"2019, IEEE trans. 29(5), 13022407, Aug (2019) DOI:10.1109/TASC2019.2904484.
- [5] A. Kaizerman, S. Fisher and A. Fish, Sub threshold dual mode logic. IEEE Trans Very Large Scale Integration (VLSI) Systems 21 (2018) 979–983.
- [6] P. Khatter, N. Pandey and K. Gupta, "An Arithmetic and Logical Unit using Reversible Gates10.1109/TASC2019.2904484.," 2018 International Conference on Computing, Power and Communication Technologies (GUCON), 2018, pp. 476-480, doi: 10.1109/GUCON.2018.8675034.
- [7] Shiksha and K.K. Kashyap, High speed domino logic circuit for improved performance. 2014 Students Conference on Engineering and Systems (SCES), 28–30 May, pp. 1–5.
- [8] K. Vinay Kumar, F. Noorbasha, B. Shiva Kumar and N.V. Siva Rama Krishna, Design of an efficient ALU using low-power dual-mode logic. Intl J. Engng Res. Applications (IJERA) 4 (May 2014) 81–84.
- [9] M.M. Mano and C.R. Kime, Logic and Computer Design Fundamentals. Harlow: Pearson Education (2015).
- [10] K. Pandiammal and D. Meganathan, "Design of 8 bit Reconfigurable ALU Using Quantum Dot Cellular Automata," 2018 IEEE 13th Nanotechnology Materials and Devices Conference (NMDC), 2018, pp. 1-4, doi: 10.1109/NMDC.2018.8605892.

- [11] G. Tang, K. Takata, M. Tanaka, A. Fujimaki, K. Takagi and N. Takagi, 4-bit bit slice arithmetic logic unit for 32-bit RSFQ microprocessors. IEEE Trans. Appl. Superconductivity 26 (2016) 1300106.
- [12] A. Ramos, R. G. Toral, P. Reviriego and J. A. Maestro, "An ALU Protection Methodology for Soft Processors on SRAM-Based FPGAs," in IEEE Transactions on Computers, vol. 68, no. 9, pp. 1404-1410,1 Sept. 2019, doi: 10.1109/TC.2019.2907238.
- [13] Y. Ando, R. Sato, M. Tanaka, K. Takagi and N. Takagi, 80-GHz operation of an 8-bit RSFQ arithmetic logic unit. Proc. 15th Intl Superconductive Electronics Conf. (ISEC), Nagoya (2015), pp. 1–3.
- [14] M. A. Ahmed and M. A. Abdelghany, Low-power 4-bit arithmetic logic unit using full-swing GDI technique. Proc. Intl Conf. on Innovative Trends in Computer Engineering (ITCE 2018), pp. 193–196.
- [15] V. Dubey and R. Sairam, An arithmetic and logic unit optimized for area and power. 4th Intl Conf. Advanced Computing & Communication Technologies (ACCT'14), pp. 330–334 (2014)
- [16] T. Filippov, M. Dorojevets, A. Sahu, A. Kirichenko, C. Ayala and O. Mukhanov, 8-bit asynchronous wave-pipelined RSFQ arithmetic logic unit. IEEE Trans. Appl. Superconductivity 21 (2011)
- [17] P. Khatter, N. Pandey and K. Gupta, "An Arithmetic and Logical Unit using Reversible Gates10.1109/TASC2019.2904484.," 2018 International Conference on Computing, Power and Communication Technologies (GUCON), 2018, pp. 476-480, doi: 10.1109/GUCON.2018.8675034.