Search published articles


Showing 17 results for Subject: VLSI

M. Masoumi, H. Mahdizadeh,
Volume 8, Issue 4 (12-2012)
Abstract

A new and highly efficient architecture for elliptic curve scalar point multiplication is presented. To achieve the maximum architectural and timing improvements we have reorganized and reordered the critical path of the Lopez-Dahab scalar point multiplication architecture such that logic structures are implemented in parallel and operations in the critical path are diverted to noncritical paths. The results we obtained show that with G = 55 our proposed design is able to compute GF(2163) elliptic curve scalar multiplication in 9.6 μs with the maximum achievable frequency of 250 MHz on Xilinx Virtex-4 (XC4VLX200), where G is the digit size of the underlying digit-serial finite field multiplier. Another implementation variant for less resource consumption is also proposed. With G=33, the design performs the same operation in 11.6 μs at 263 MHz on the same platform. The results of synthesis show that in the first implementation 17929 slices or 20% of the chip area is occupied which makes it suitable for speed critical cryptographic applications while in the second implementation 14203 slices or 16% of the chip area is utilized which makes it suitable for applications that may require speed-area trade-off. The new design shows superior performance compared to the previously reported designs.
S. M. Razavi, S. M. Razavi,
Volume 15, Issue 3 (9-2019)
Abstract

Probabilistic-based methods have been used for designing noise tolerant circuits recently. In these methods, however, there is not any reliability mechanism that is essential for nanometer digital VLSI circuits. In this paper, we propose a novel method for designing reliable probabilistic-based logic gates. The advantage of the proposed method in comparison with previous probabilistic-based methods is its ultra-high reliability. The proposed method benefits from Markov random field (MRF) as a probabilistic framework and triple modular redundancy (TMR) as a reliability mechanism. A NAND gate is used to show the design methodology. The simulation results verify the noise immunity of the proposed MRF-based gate in the presence of noise. In addition, the values from reliability estimation program show the reliability of 0.99999999 and 0.99941316 for transistor failure rates of 0.0001 and 0.001, respectively, which are much better as compared with previous reported MRF-based designs.

S. Ejdehakosh, M. A. Karami,
Volume 15, Issue 4 (12-2019)
Abstract

This work presents a dual-junction, single-photon avalanche diode (SPAD) with electrical μ-lens designed and simulated in 90 nm standard complementary metal oxide semiconductor (CMOS) technology. The evaluated structure can collect the photons impinging beneath the pixel guard ring, as well as the pixel active area. The fill factor of the SPAD increases from 12.5% to 42% in comparison with similar works on the same technology, according to new charge collections. Although the designed SPAD suffers from high dark count rate (DCR of 300kHz at 0.17V excess bias at room temperature) due to high amount of tunneling which was predicted in previous similar works, it still can be used in different applications such as random number generators and charged particle positioning pixels.​


S. Abolmaali,
Volume 15, Issue 4 (12-2019)
Abstract

Accurate delay calculation of circuit gates is very important in timing analysis of digital circuits. Waveform shapes on the input ports of logic gates should be considered, in the characterization phase of delay calculation, to obtain accurate gate delay values. Glitches and their temporal effect on circuit gate delays should be taken into account for this purpose. However, the explosive number of combinations of waveform shapes, which can be applied to the input ports of logic gates, causes existing lookup-based methods to have huge space requirements. In this article, instead of considering all possible combinations of waveform shapes in the characterization phase of delay calculation process, the least number of combinations, which are dominant in determining the waveform shape of gate output, is presented. Multivariate Polynomial Regression (MPR) method is used to further reduce the required memory space. Exploration of the possible MPR analyses is performed to find the best regression case with proper memory space reduction and precision. Attained results show a 1.013E6 times reduction in storage space required for storing parameters utilized in extraction of output waveform characteristics in comparison to a state of the artwork, accompanied by acceptable precision.

S. M. Razavi, S. M. Razavi,
Volume 16, Issue 4 (12-2020)
Abstract

The Markov random field (MRF) theory has been accepted as a highly effective framework for designing noise-tolerant nanometer digital VLSI circuits. In MRF-based design, proper feedback lines are used to control noise and keep the circuits in their valid states. However, this methodology has encountered two major problems that have limited the application of highly noise immune MRF-based circuits. First, excessive hardware overhead that imposes a great cost, power consumption and propagation delay on the circuits and second, separate implementation of feedback lines that adds further delay to the circuits. In this paper, we propose a novel approach for minimal-cost inherent-feedback implementation of low-power MRF-based logic gates. The simulation results, which are based on 32nm BSIM4 models, demonstrate that besides excellent noise immunity of the proposed method, it has the least propagation delay in comparison with all of the previously reported MRF-based gates due to its inherent feedbacks. In addition, the proposed method outperforms competing ones, which have comparable noise immunity, in other circuit metrics like cost and power consumption. Specifically, the proposed method achieves at least 18%, 29%, and 39% reductions in cost, delay and power consumption with considerable noise immunity improvement compared with competing methods.

R. Pinto,
Volume 16, Issue 4 (12-2020)
Abstract

Multiplication is a basic operation in any signal processing application. Multiplication is the most important one among the four arithmetic operations like addition, subtraction, and division. Multipliers are usually hardware intensive, and the main parameters of concern are high speed, low cost, and less VLSI area. The propagation time and power consumption in the multiplier are always high. The multiplier speed usually determines the speed of the processor. Hence in this work, a design of a 32-bit multiplier is proposed by modifying the conventional shift-add multiplier. The proposed structure reduces the power consumed by the technique of minimizing the switching activities in the design. A 32-bit parallel prefix adder based on the modified Ling equation is also proposed to speed up the addition of the partial products in the multiplier. The design is modeled in VHDL and implementation is carried out in CADENCE software with 90 nm and 180 nm CMOS technology.

P. Kulkarni, B. Hogade, V. Kulkarni,
Volume 17, Issue 1 (3-2021)
Abstract

Fast Fourier Transform (FFT) processors employed with pipeline architecture consist of series of Processing Elements (PE) or Butterfly Units (BU). BU or PE of FFT performs multiplication and addition on complex numbers. This paper proposes a single BU to compute radix-2, 8 point FFT in the time domain as well as frequency domain by replacing a series of PEs. This BU comprises of fused floating point (FP) addition-subtraction (FFAS) and modified booth algorithm based floating point multiplier (FMULT). BU performs all arithmetic operations in floating pointform to overcome the nonlinearities available in fixed word length (FWL). FP arithmetic is slower as compared with FWL. To improve the speed of operation, symmetrical property of twiddle constant is used and they are embedded in the BU. BU outputs two halves of computation simultaneously with a single FFAS and two FMULT. BU design is synthesized, placed and routed for 45nm technology of nangate open cell library. Synthesized results show that proposed BU consumes 23910µm2 area with latency of 3.44ns which are 5.05% smaller in area, 7.02% faster and replaces a set of two five operand adder and two multipliers by a single FFAS as compared with previously reported smallest work.

S. Abolmaali,
Volume 17, Issue 3 (9-2021)
Abstract

Area reduction of a circuit is a promising solution for decreasing the power consumption and the chip cost. Timing constraints should be preserved after a delay increase of resized circuit gates to guarantee proper circuit operation. Sensitization of paths should also be considered in timing analysis of circuit to prevent pessimistic resizing of circuit gates. In this work, a greedy area reduction algorithm is proposed which is path-based and benefits well from viability analysis as the sensitization method. A proper metric based on viability conditions is presented to guide the algorithm towards selecting useful circuit nodes to be resized with acceptable performance and area reduction results. Instead of using gate slacks in resizing the candidate gates, all circuit gates are down-sized first and then the sizes of circuit gates that violate the circuit timing constraint are increased. This approach leads to considerable improvement in the complexity and performance of the proposed method. Results show that area improvement of about 88% is achievable. Comparison to a pessimistic method also reveals that on average 14.2% growth in area improvement is obtained by the presented method.

G. Morankar,
Volume 17, Issue 3 (9-2021)
Abstract

Tremendous developments in integrated circuit technology, wireless communication systems, and personal assistant devices have fuelled growth of Internet of Things (IoT) applications and smart cards. The security of these devices completely depends upon the generation of random and unpredictable digital data streams through random number generator. Low quality, low throughput, and high processing time are observed in software-based pseudo-random number generator due to interrelated data or programs and serial execution of codes respectively. In this paper, FPGA implementation of low power true random number generator through ring oscillator for IoT applications and smart cards is presented. Ring oscillators based on higher jitter and sampling techniques were exploited to present true random number generator. Further statistical parameters of the generated data streams are enhanced through feedback mechanism and post-processing technique. The presented true random number generator technique does not depend on the characteristics of a particular FPGA. The presented technique consumes low power, requires low hardware footprints and passes the entire National Institute of Standards & Technology (NIST) 800-22 statistical test suite. The presented low power and area true random number generator with enhanced security through post-processing unit may be applied for encryption/decryption of data in IoT and smart cards.

A. Pathan, T. Memon,
Volume 17, Issue 4 (12-2021)
Abstract

FPGA’s block memory may be programmed as a single or dual-port RAM/ROM module that leads to an area-efficient implementation of memory-based systems. In this contest, various works of carrying out an optimized implementation of simple to complex DSP systems on embedded building blocks may be seen. The multiplier is a core element of the DSP systems, and in implementing a memory-based multiplier, it is observed that one of the operands is kept constant, hence leading the design to a constant-coefficient multiplication. This paper shows Virtex-7 FPGA’s dual-port ROM-based implementation of an 8x8 variable-coefficient multiplier that may be used in several simple to complex DSP applications. The novelty of the proposed design is to configure the block ROM in dual-port mode and, hence, get four partial products in two clock cycles and introduce two unconventional adder approaches for partial product addition. This approach leads to fully resource utilization and the provision of a variable-coefficient multiplier. The work also shows the comparison of proposed architecture with already existing memory-based implementations and concludes the work as a novel step towards the efficient memory-based implementation of multiplier core.

T. Mendez, S. G. Nayak,
Volume 18, Issue 1 (3-2022)
Abstract

The need for low-power VLSI chips is ignited by the enhanced market requirement for battery-powered end-user electronics, high-performance computing systems, and environmental concerns. The continuous advancement of the computational units found in applications such as digital signal processing, image processing, and high-performance CPUs has led to an indispensable demand for power-efficient, high-speed and compact multipliers. To address those low-power computational aspects with improved performance, an approach to design the multiplier using the algorithms of Vedic math is developed in this research. In the proposed work, the pre-computation technique is incorporated that aided in estimation of the carries during the partial product calculation stage; that enhanced the speed of the multiplier. This design was carried out using Cadence NCSIM 90 nm technology. The comparative analysis between the proposed multiplier design and the multipliers from the literature resulted in a substantial improvement in power dissipation as well as delay. The research was extended to assess the designed architectures’ performance statistically, applying the independent sample t-test hypothesis.

F. Asghariyehlou, J. Javidan,
Volume 18, Issue 2 (6-2022)
Abstract

This paper deals with the optimization of the CORDIC-based modified Gram-Schmidt (MGS) algorithm for QR decomposition (QRD) and presents a scalable algorithm with maximum throughput, the least possible latency, and hardware resources. The optimized algorithm is implemented on Xilinx Virtex 6 FPGA using ISE software as a fixed point with selected accuracy based on the results of MATLAB simulation. Using the loop unrolling technique with different coefficients, an attempt is made to reduce the latency and increase the throughput. In contrast, increasing the unrolling factor leads to a decrease in the frequency of the CORDIC unit as well as a decrease in the number of resources. As a result, there is a trade-off between the unrolling factor and the frequency of the CORDIC unit. By investigating the different unrolling factors, it is shown that the loop unrolling technique with a factor of 4 has the highest throughput with the value of 5.777 MQRD/s and the lowest latency with the value of 173 ns. Moreover, it is shown that throughput and latency are improved by 42.52% and 73.74% respectively compared to the not optimized case. The proposed method is also scalable for different sizes of m×m complex channel matrices, where log2 mN.

S. Abolmaali,
Volume 18, Issue 2 (6-2022)
Abstract

In this article, a critical path identification method is proposed for ternary logic circuits. The considered structure for the ternary circuits is based on 2:1 multiplexers. Sensitization conditions for the employed ternary multiplexers are introduced. Moreover, static timing analysis and dynamic programming are utilized in the identification of true and false paths of the circuit for obtaining more realistic results in a reasonable time. An event-driven simulation engine is also developed for confirming the sensitization state of the identified paths. Some ternary arithmetic logic circuits are designed to depict the effectiveness of the proposed identification method. Simulation results show the correctness and efficiency of the proposed method.

R. Samanth, S. G. Nayak, P. B. Nempu,
Volume 19, Issue 1 (3-2023)
Abstract

In the CMOS circuit power dissipation is a major concern for VLSI functional units. With shrinking feature size, increased frequency and power dissipation on the data bus have become the most important factor compared to other parts of the functional units. One of the most important functional units in any processor is the Multiply-Accumulator unit (MAC). The current work focuses on the development of MAC unit bus encoders as well as the identification of an improved architecture for image processing applications. To reduce the power consumption in these functional units, two bus encoding architectures were developed by encoding data before it was sent on the data buses. One is MSB reference encoding, and another is Fourth and Fifth bit ANDing (FFA) without the need for an extra bus line with fewer transitions by using gray codes. The comparison of the proposed encoding architectures with the existing encoding architectures from the literature revealed an 8% to 36% significant improvement in power dissipation. The simulation was done with Xilinx ISE, and the Cadence RTL Compiler tool was utilized for the synthesis, which was done with the 180nm technology library. And also, the image filtering is analyzed using MATLAB.

A. Hamidi, S. Karimi, A. Ahmadi,
Volume 19, Issue 2 (6-2023)
Abstract

One of the problems in digital control of power converters is calculation time in each sampling instant which effect on cost and complexity of digital controller. In this paper, a formula is introduced for calculating the number of clock cycles in each sample then interaction between sampling frequency and implementation cost (number of functional units and word length) of FPGA-based digital controller of DC-AC converter (three-phase four-legs inverter) is verified. The digital architecture is built on finite set model predictive control, and implemented on the FPGA board based on fixed-point calculations. We consider two digital architectures for design the controller in this study. One with four functional units and another with six functional units. This study aims to develop a mathematical equation for the number of clock cycles in each time instant to select the best switching state in the control algorithm, which affects the sampling frequency and clock frequency. Based on the obtained results, the number of functional units, word-length, and the number of switches determine the maximum clock cycles. By knowing maximum clock cycles the maximum sampling frequency is determined. In structure with four functional units, the maximum sampling frequency is 71 kHz for WL=8 bits and 17.7 kHz for WL=32 bits, and in structure, with six functional units, the maximum sampling frequencies are 97.6 and 24.4 kHz for WL=8 and WL=32 bits, respectively. In architecture with more functional units, we have greater sampling frequency with more accuracy and cost. The results obtained from this paper can be a reference for digital controller design. 

Amirhossein Salimi, Behzad Ebrahimi, Massoud Dousti,
Volume 20, Issue 1 (3-2024)
Abstract

The scaling limitations of Complementary Metal-Oxide-Semiconductor (CMOS) transistors to achieve better performance have led to the attention of other structures to improve circuit performance. One of these structures is multi-valued circuits. In this paper, we will first study Carbon Nanotube Transistors (CNT). CNT transistors offer a viable means to implement multi-valued logic due to their variable and controllable threshold voltage. Subsequently, we delve into the realm of three-valued flip-flop circuits, which find extensive utility in digital electronics. Leveraging the insights gained from our analysis, we propose a novel D-type flip-flop structure. The presented structure boasts a remarkably low power consumption, showcasing a reduction exceeding 61% compared to other existing structures. Furthermore, the proposed circuit incorporates a reduced number of transistors, resulting in a reduced footprint. Importantly, this circuit exhibits negligible static power consumption in generating intermediate values, rendering it robust against process variations.  Overall, the proposed circuits demonstrate a 29.7% increase in delay compared to the compared structures. However, they showcase a 96.1% reduction in power-delay product (PDP) compared to the other structures. The number of transistors is also 8.3% less than other structures. Additionally, their figure of merits (FOM) are 19.7% better than the best-compared circuit, underscoring its advantages in power efficiency, chip area, and performance.
Vahid Jamshidi, Mohammad Mehdi Bordbar,
Volume 20, Issue 2 (6-2024)
Abstract

Nonvolatile computing have been shown to be effective in the face of the sudden power outage for wireless sensor networks, Internet-of-Things applications, data converters, and emerging energy-harvesting circuits. It also plays a significant role in power-gating to minimize the leakage power for improving the energy efficiency. However, using on-chip backup module for D flip–flop has a bottleneck, and result in an increase in total power consumption, occupied area, and reduced calculation speed. Furthermore, the backup module needs external control signals, which increases the complexity of the circuit. This paper proposes a novel nonvolatile flip-flop with simultaneous data backup capability, which uses NCFET ferroelectric transistor to fundamentally advance the non-volatile computing paradigm. Proposed NVFF exhibits 0.8% faster and 5.0% smaller energy than previous works, while uniquely providing Radiation-Hardened feature.

Page 1 from 1     

Creative Commons License
© 2022 by the authors. Licensee IUST, Tehran, Iran. This is an open access journal distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.