BULK IEEE VLSI TITLES 2015-2016
www.technosincorp.com www.technosinc.blogspot.com , www.technosinc.page.tl
VILLUPURAM BRANCH
www.technosincorp.com www.technosinc.blogspot.com , www.technosinc.page.tl
Project Cost Starting Range Rs 2,000 for Clients with Full Documentation with Complete 24*7 Online Support
Sno. |
|
Topic |
Abstract |
Year |
1. |
VLSI2015_01 |
A Low-Cost Low-Power All-Digital Spread-Spectrum Clock Generator |
In this brief, a low-cost low-power all-digital spread spectrum clock generator (ADSSCG) is presented. The proposed ADSSCG can provide an accurate programmable spreading ratio with process, voltage, and temperature variations. To maintain the frequency stability while performing triangular modulation, the fast-relocked mechanism is proposed. The proposed fast-relocked ADSSCG is implemented in a standard performance 90-nm CMOS process, and the active area is 200 µm × 200 µm. The experimental results show that the electromagnetic interference reduction is 14.61 dB with a 0.5% spreading ratio and 19.69 dB with a 2% spreading ratio at 270 MHz. The power consumption is 443 µW at 270 MHz with a 1.0 V power supply.
|
2015 |
2. |
VLSI2015_02 |
A Combined SDC-SDF Architecture for Normal I/O Pipelined Radix-2 FFT |
We present an efficient combined single-path delay commutator-feedback (SDC-SDF) radix-2 pipelined fast Fourier transform architecture, which includes log2 N − 1 SDC stages, and 1 SDF stage. The SDC processing engine is proposed to achieve 100% hardware resource utilization by sharing the common arithmetic resource in the time-multiplexed approach, including both adders and multipliers. Thus, the required number of complex multipliers is reduced to log4 N − 0.5, compared with log2 N − 1 for the other radix-2 SDC/SDF architectures. In addition, the proposed architecture requires roughly minimum number of complex adders log2 N + 1 and complex delay memory 2N + 1.5 log2 N − 1.5.
|
2015 |
3. |
VLSI2015_03 |
A Class of SEC-DED-DAEC Codes Derived From Orthogonal Latin Square Codes |
Radiation-induced soft errors are a major reliability concern for memories. To ensure that memory contents are not corrupted, single error correction double error detection (SEC-DED) codes are commonly used, however, in advanced technology nodes, soft errors frequently affect more than one memory bit. Since SEC-DED codes cannot correct multiple errors, they are often combined with interleaving. Interleaving, however, impacts memory design and performance and cannot always be used in small memories. This limitation has spurred interest in codes that can correct adjacent bit errors. In particular, several SEC-DED double adjacent error correction (SEC-DED-DAEC) codes have recently been proposed. Implementing DAEC has a cost as it impacts the decoder complexity and delay. Another issue is that most of the new SEC-DED-DAEC codes miscorrect some double nonadjacent bit errors. In this brief, a new class of SEC-DED-DAEC codes is derived from orthogonal latin squares codes. The new codes significantly reduce the decoding complexity and delay. In addition, the codes do not miscorrect any double nonadjacent bit errors. The main disadvantage of the new codes is that they require a larger number of parity check bits. Therefore, they can be useful when decoding delay or complexity is critical or when miscorrection of double nonadjacent bit errors is not acceptable. The proposed codes have been implemented in Hardware Description Language and compared with some of the existing SEC-DED-DAEC codes. |
2015 |
4. |
VLSI2015_04 |
Design of Efficient Content Addressable Memories in |
Content addressable memories (CAMs) enable highspeed parallel search operations in table lookup-based applications, such as Internet routers and processor caches. Traditional CAM design has always suffered from the high dynamic power consumption associated with its large and active parallel hardware. However, deeply scaled technology nodes, with multigate devices replacing planar MOSFETs, are expected to bring new tradeoffs to CAM design. FinFET, a vertical-channel gate-wraparound double-gate device, has emerged as the best alternative to planar MOSFET. In this brief, for the first time, we explore the design space of symmetric and asymmetric gate-workfunction FinFET CAMs. We propose several design alternatives and evaluate them in terms of their dc and transient metrics for
|
2015 |
5. |
VLSI2015_05 |
A New Efficiency-Improvement Low-Ripple |
The new efficiency-improvement low-ripple charge pump boost converter using adaptive slope generator with hysteresis voltage comparison techniques is proposed in this paper. This proposed converter can reduce output voltage ripple, because its inductor is connected to the output. This proposed converter adopts a new controlled architecture, self-adaptive slope generator with hysteresis comparison technology, to shorten the transient response. The proposed boost converter has been fabricated with TSMC 0.35-µm CMOS 2P4M processes, and a total chip area of 1.49 mm × 1.49 mm. Its maximum output current is 260 mA when the output voltage is 3.6 V. When the supply voltage is 3.3 V, the output voltage can be 3.6–5.1 V. The maximum efficiency is 90.99% and the minimum output ripple is 10.8 mV. Finally, the theoretical analysis is verified to be correct by the experimental results.
|
2015 |
6. |
VLSI2015_06 |
A 0.25-V 28-nW 58-dB Dynamic Range |
In this paper, we present a single-bit clock-less
|
2015 |
7. |
VLSI2015_07 |
Range Unlimited Delay-Interleaving and -Recycling |
A clock skew-compensation and duty-cycle correction circuit (CSADC) is used as the second-level clock distributing
|
2015 |
8. |
VLSI2015_08 |
Obfuscating DSP Circuits via High-Level |
This paper presents a novel approach to design
|
2015 |
9. |
VLSI2015_09 |
Accelerating Scalar Conversion for Koblitz Curve |
Koblitz curves are a class of computationally efficient elliptic curves where scalar multiplications can be accelerated using τ NAF representations of scalars. However, conversion from an integer scalar to a short τ NAF is a costly operation. In this paper, we improve the recently proposed scalar conversion scheme based on division by τ 2. We apply two levels of optimizations in the scalar conversion architecture. First, we reduce the number of long integer subtractions during the scalar conversion. This optimization reduces the computation cost and also simplifies the critical paths present in the conversion architecture. Then we implement pipelines in the architecture. The pipeline splitting increases the operating frequency without increasing the number of cycles. We have provided detailed experimental results to support our claims made in this paper.
|
2015 |
10. |
VLSI2015_10 |
Design of Self-Timed Reconfigurable Controllers |
Synchronization is an important issue in modern system design as systems-on-chips integrate more diverse technologies, operating voltages, and clock frequencies on a single substrate. This paper presents a methodology for the design and implementation of a self-timed reconfigurable control device suitable for a parallel cascaded flip-flop synchronizer based on a principle known as wagging, through the application of distributed feedback graphs. By modifying the endpoint adjacency of a common behavior graph via one-hot codes, several configurable modes can be implemented in a single design specification, thereby facilitating direct control over the synchronization time and the mean-time between failures of the parallel master-slave latches in the synchronizer. Therefore, the resulting implementation is resistant to process non-idealities, which are present in physical design layouts. This paper includes a discussion of the reconfiguration protocol, and implementations of both a sequential token ring control device, and an interrupt subsystem necessary for reconfiguration, all simulated in UMC 90-nm technology. The interrupt subsystem demonstrates operating frequencies between 505 and 818 MHz per module, with
|
2015 |
11. |
VLSI2015_11 |
Level-Converting Retention Flip-Flop for Reducing |
In this paper, we propose a level-converting retention flip-flop (RFF) for ZigBee systems-on-chips (SoCs). The proposed RFF allows the voltage regulator that generates the core supply voltage (VDD,core) to be turned off in the standby mode, and it thus reduces the standby power of the ZigBee SoCs. The logic states are retained in a slave latch composed of thick-oxide transistors using an I/O supply voltage (VDD,IO) that is always turned on. Level-up conversion from VDD,core to VDD,IO is achieved by an embedded nMOS pass-transistor level-conversion scheme that uses a low-only signal-transmitting technique. By embedding a retention latch and level-up converter into the data-to-output path of the proposed RFF, the RFF resolves the problems of the static RAM-based RFF, such as large dc current and low readability caused by threshold drop. The proposed RFF does not also require additional control signals for power mode transitioning. Using 0.13-µm process technology, we implemented an RFF with VDD,core and VDD,IO of 1.2 and 2.5 V, respectively. The maximum operating frequency is 300 MHz. The active energy of the RFF is 191.70 fJ, and its standby power is
|
2015 |
12. |
VLSI2015_12 |
All Digital Energy Sensing for |
Minimizing energy consumption is of utmost importance in
|
2015 |
13. |
VLSI2015_13 |
Recursive Approach to the Design of a |
This brief presents a parallel single-rail self-timed adder.
|
2015 |
14. |
VLSI2015_14 |
Novel Reconfigurable Hardware Architecture for |
In this paper, we introduce a novel reconfigurable hardware architecture for computing the polynomial matrix
|
2015 |
15. |
VLSI2015_15 |
Implementation of Subthreshold Adiabatic |
Behavior of adiabatic logic circuits in weak inversion or subthreshold regime is analyzed in depth for the first time in the literature to make great improvement in ultralowpower circuit design. This novel approach is efficacious in low-speed operations where power consumption and longevity are the pivotal concerns instead of performance. The schematic and layout of a 4-bit carry look ahead adder (CLA) has been implemented to show the workability of the proposed logic. The effect of temperature and process parameter variations on subthreshold adiabatic logic-based 4-bit CLA has also |
2015 |
16. |
VLSI2015_16 |
FPGA-Based Bit Error Rate Performance |
This paper presents the bit error rate (BER) performance validation of digital baseband communication systems on a field-programmable gate array (FPGA). The proposed BER tester (BERT) integrates fundamental baseband signal processing modules of a typical wireless communication system along with a realistic fading channel simulator and an accurate Gaussian noise generator onto a single FPGA to provide an accelerated and repeatable test environment in a laboratory setting. Using a developed graphical user interface, the error rate performance of single- and multiple-antenna systems over a wide range of parameters can be rapidly evaluated. The FPGA-based BERT should reduce the need for time-consuming software based simulations, hence increasing the productivity. This FPGA-based solution is significantly more cost effective than conventional performance measurements made using expensive commercially available test equipment and channel simulators.
|
2015 |
17. |
VLSI2015_17 |
Algorithm and Architecture Design of the |
Improved video coding techniques introduced in the
|
2015 |
18. |
VLSI2015_18 |
Pre-Encoded Multipliers |
In this paper, we introduce architecture of pre-encoded multipliers for Digital Signal Processing applications based
|
2015 |
19. |
VLSI2015_19 |
A High-Performance FIR Filter Architecture for |
Transpose form finite-impulse response (FIR) filters are inherently pipelined and support multiple constant multiplications (MCM) technique that results in significant saving of computation. However, transpose form configuration does not directly support the block processing unlike direct form configuration. In this paper, we explore the possibility of realization of block FIR filter in transpose form configuration for area-delay efficient realization of large order FIR filters for both fixed and reconfigurable applications. Based on a detailed computational analysis of transpose form configuration of FIR filter, we have derived a flow graph for transpose form block
|
2015 |
20. |
VLSI2015_20 |
A Novel Photosensitive Tunneling Transistor |
In this paper, a novel device structure, operating
|
2015 |
21. |
VLSI2015_21 |
High-Throughput LDPC-Decoder Architecture |
This paper presents architecture of block-level-parallel layered decoder for irregular LDPC code. It can be reconfigured to support various block lengths and code rates of IEEE 802.11n (WiFi) wireless-communication standard. We have proposed efficient comparison techniques for both column and row layered schedule and rejection-based high-speed circuits to compute the two minimum values from multiple inputs required for row layered processing of hardware-friendly min-sum decoding algorithm. The results show good speed with lower area as compared to state-of-the-art circuits. Additionally, this work proposes dynamic multi-frame processing schedule which efficiently utilizes the layered-LDPC decoding with minimum pipeline stages. The
|
2015 |
22. |
VLSI2015_22 |
A New Parallel VLSI Architecture for Real-time Electrical Capacitance Tomography |
This paper presents a fixed-point reconfigurable parallel
|
2015 |
23. |
VLSI2015_23 |
Graph-Based Transistor Network Generation |
Transistor network optimization represents an effective way of improving VLSI circuits. This paper proposes a novel method to automatically generate networks with minimal transistor count, starting from an irredundant sum-of-products expression as the input. The method is able to deliver both series–parallel (SP) and non-SP switch arrangements, improving speed, power dissipation, and area of CMOS gates. Experimental results demonstrate expected gains in comparison with related approaches.
|
2015 |
24. |
VLSI2015_24 |
A Relative Imaging CMOS Image Sensor for High |
This paper proposes an unconventional image acquisition scheme for machine vision applications, based on detecting ratios of illumination (pixel) intensities. Detecting relative ratios enables capturing the scene features and patterns almost independently from the local scene illumination resulting in potentially extremely high dynamic range. Moreover, detecting signal ratios using a fully differential circuit optimally suits the intrinsic nature of VLSI design. A scalable and compact hardware implementation is proposed as a proof-of-concept towards relative image acquisition. The proposed photo-current ratio-detecting pixels completely bypass the need of conventional photo-current integration which enables high frame-rate operation of up to 24000 frames-per-second (fps). The pulse-width modulated output of the proposed pixel is captured by compact column-parallel readout circuits based on digital counters. The developed 32×32 pixel array prototype CMOS image sensor consumes 4mW of power operating at a nominal 9765 fps frame rate, and 6.8mW of power operating at a maximum 24000fps. The presented prototype design is fully scalable towards newer CMOS fabrication nodes and higher sensor resolution.
|
2015 |
25. |
VLSI2015_25 |
Low-Cost High-Performance VLSI Architecture for |
This paper proposes a simple and efficient Montgomery multiplication algorithm such that the low-cost and high-performance Montgomery modular multiplier can be implemented accordingly. The proposed multiplier receives and outputs the data with binary representation and uses only one-level carry-save adder (CSA) to avoid the carry propagation at each addition operation. This CSA is also used to perform operand pre-computation and format conversion from the carry save format to the binary representation, leading to a low hardware cost and short critical path delay at the expense of extra clock cycles for completing one modular multiplication. To overcome the weakness, a configurable CSA (CCSA), which could be one full-adder or two serial half-adders, is proposed to reduce the extra clock cycles for operand pre-computation and format conversion by half. In addition, a mechanism that can detect and skip the unnecessary carry-save addition operations in the one-level CCSA architecture while maintaining the short critical path delay is developed. As a result, the extra clock cycles for operand pre-computation and format conversion can be hidden and high throughput can be obtained. Experimental results show that the proposed Montgomery modular multiplier can achieve higher performance and significant area–time product improvement when compared with previous designs.
|
2015 |
26. |
VLSI2015_26 |
Fully Pipelined Low-Cost and High-Quality Color |
This paper presents a fully pipelined color demosaicking design. To improve the quality of reconstructed images, a linear deviation compensation scheme was created to increase the correlation between the interpolated and neighboring pixels. Furthermore, immediately interpolated green color pixels are first to be used in hardware-oriented color demosaicking algorithms, which efficiently promoted the quality of the reconstructed image. A boundary detector and boundary mirror machine were added to improve the quality of pixels located in boundaries. In addition, a hardware sharing technique was used to reduce the hardware costs of three interpolators. The VLSI architecture in this work contains only 4.97 K gate counts and the core area is 60,229 um2 synthesized by using 0.18-um CMOS process. The operating frequency of this work is 200 MHz by consuming 4.76 mW. Compared with the previous low complexity designs, this work has the benefits in terms of low cost, low power consumption, and high performance.
|
2015 |
27. |
VLSI2015_27 |
A Novel Area-Efficient VLSI Architecture for |
Long term evolution (LTE) is aimed to achieve the
|
2015 |
28. |
VLSI2015_28 |
Comparative Performance Analysis of |
In this paper, a short-gate tunneling-field-effecttransistor (SG-TFET) structure has been investigated for the dielectrically modulated biosensing applications in comparison with a full-gate tunneling-field-effect-transistor structure of similar dimensions. This paper explores the underlying physics of these architectures and estimates their comparative sensing performance. The sensing performance has been evaluated for both the charged and charge-neutral biomolecules using extensive device-level simulation, and the effects of the biomolecule dielectric constant and charge density are also studied. In SG-TFET architecture, the reduction of the gate length enhances its drain control over the band-to-band tunneling process and this has been exploited for the detection, resulting to superior drain current sensitivity for biomolecule conjugation. The gate and drain biasing conditions show dominant impact on the sensitivity enhancement in the short-gate biosensors. Therefore, the gate and drain bias are identified as the effective design parameters for the efficiency optimization.
|
2015 |
29. |
VLSI2015_29 |
An Efficient Constant Multiplier Architecture |
This paper proposes efficient constant multiplier architecture based on vertical-horizontal binary common sub-expression elimination (VHBCSE) algorithm for designing a reconfigurable finite impulse response (FIR) filter whose coefficients can dynamically change in real time. To design an efficient reconfigurable FIR filter, according to the proposed VHBCSE algorithm, 2-bit binary common sub-expression elimination (BCSE) algorithm has been applied vertically across adjacent coefficients on the 2-D space of the coefficient matrix initially, followed by applying variable-bit BCSE algorithm horizontally within each coefficient. This technique is capable of reducing the average probability of use or the switching activity of the multiplier block adders by 6.2% and 19.6% as compared to that of two existing 2-bit and 3-bit BCSE algorithms respectively. ASIC implementation results of FIR filters using this multiplier show that the proposed VHBCSE algorithm is also successful in reducing the average power consumption by 32% and 52% along with an improvement in the area power product (APP) by 25%
|
2015 |
30. |
VLSI2015_30 |
VLSI-Assisted Nonrigid Registration Using |
Increasing demand of high-speed portable modules for multimedia applications has motivated the development of hardware-based solutions for image processing applications. Most of the nonrigid image registration algorithms are found to be unsuitable for hardware implementation because of their nonlinearity and computationally intensive nature. In this paper, an algorithm for nonrigid image registration based on Demons approximation is proposed. The algorithm has been simulated in MATLAB and results show a 15% improvement in peaksignal-to-noise-ratio with a 17% reduction in registration time for 256 × 256 image over the original Demons algorithm. The proposed algorithm is synthesized in Virtex6-xc6vlx760-2-ff1760 and maximum synthesized frequency is found to be 174 MHz. The proposed architecture provides the low cost, high-speed solution for the registration process, which is also helpful for making a portable system.
|
2015 |
31. |
VLSI2015_31 |
Fine-Grained Access Management in |
Modern VLSI designs incorporate a high amount of
|
2015 |
32. |
VLSI2015_32 |
A High-Throughput VLSI Architecture for Hard and |
This paper introduces a novel low-complexity multiple-input multiple-output (MIMO) detector tailored for single-carrier frequency division-multiple access (SC-FDMA) systems, suitable for efficient hardware implementations. The proposed detector starts with an initial estimate of the transmitted signal based on a minimum mean square error (MMSE) detector. Subsequently, it recognizes less reliable symbols for which more candidates in the constellation are browsed to improve the initial estimate. Efficient high-throughput VLSI architecture is also introduced achieving a superior performance compared to the conventional MMSE detectors with less than 28% added complexity. The performance of the proposed design is close to the existing maximum likelihood post-detection processing (ML-PDP) scheme, while resulting in a significantly lower complexity, i.e., and times fewer Euclidean distance (ED) calculations in the 16-QAM and 64-QAM schemes, respectively. The proposed design for the 16-QAM scheme is fabricated in a 0.13 CMOS technology and fully tested, achieving a 1.332 Gbps throughput, reporting the first fabricated design for SC-FDMA MIMO detectors to-date. A soft version of the proposed architecture is also introduced, which is customized for coded systems.
|
2015 |
33. |
VLSI2015_33 |
Partially Parallel Encoder Architecture |
Due to the channel achieving property, the polar code has become one of the most favorable error-correcting codes. As
|
2015 |
34. |
VLSI2015_34 |
Novel Block-Formulation and Area-Delay-Efficient |
A poly-phase based interpolation filter computation involves an input-matrix and coefficient-matrix of size each, where is the up-sampling factor and , is the filter length. The input-matrix and the coefficient-matrix resizes when changes. An analysis of interpolation filter computation for different up-sampling factors is made in this paper to identify redundant computations and removed those by reusing partial
|
2015 |
35. |
VLSI2015_35 |
One Minimum Only Trellis Decoder for Non-Binary |
A one minimum only decoder for Trellis-EMS (OMO
|
2015 |
36. |
VLSI2015_36 |
A Low-Cost Hardware Architecture for Illumination |
For real-time surveillance and safety applications in intelligent transportation systems, high-speed processing for image enhancement is necessary and must be considered. In this paper, we propose a fast and efficient illumination adjustment algorithm that is suitable for low-cost very large scale integration implementation. Experimental results show that the proposed method requires the least number of operations and achieves comparable visual quality as compared with previous techniques. To further meet the requirement of real-time image/video applications, the 16-stage pipelined hardware architecture of our method is implemented as an intellectual property core. Our design yields a processing rate of about 200 MHz by using TSMC 0.13-μm technology. Since it can process one pixel per clock cycle, for an image with a resolution of QSXGA (2560 × 2048), it requires about 27 ms to process one frame that is suitable for real-time applications. In some low-cost intelligent imaging systems, the processing rate can be slowed down, and our hardware core can run at very low power consumption.
|
2015 |
37. |
VLSI2015_37 |
A 2.5-Gb/s DLL-Based Burst-Mode Clock and Data Recovery |
In this brief, a delay-locked loop (DLL)-based burst-mode
|
2015 |
38. |
VLSI2015_38 |
Aging-Aware Reliable Multiplier Design With |
Digital multipliers are among the most critical
|
2015 |
39. |
VLSI2015_39 |
Reverse Converter Design via Parallel-Prefix Adders: Novel Components, |
In this brief, the implementation of residue number system
|
2015 |
40. |
VLSI2015_40 |
Fully Reused VLSI Architecture of |
The dedicated short-range communication (DSRC)
|
2015 |
VLSI PROJECTS 2014
SN |
PROJECT CODE |
PROJECT TOPIC |
YEAR |
1 |
NVLSI1449 |
Topic: Argo: A Time-Elastic Time-Division-Multiplexed NOC using Asynchronous Routers
Abstract: In this paper we explore the use of asynchronous routers in a time-division-multiplexed (TDM) network-on-chip (NOC), Argo that is being developed for a multi-processor platform for hard real-time systems. TDM inherently requires a common time reference, and existing TDM-based NOC designs are either synchronous or mesochronous. We use asynchronous routers to achieve a simpler, smaller, and more robust, self-timed design. Our design exploits the fact that pipelined asynchronous circuits also behave as ripple FIFOs. Thus, it avoids the need for explicit synchronization FIFOs between the routers. Argo has interesting elastic timing properties that allow it to tolerate skew between the network interfaces (NIs). The paper presents Argo NOC-architecture and provides a quantitative analysis of its ability of absorb skew between the NIs. Using a signal transition graph model and realistic component delays derived from a 65 nm CMOS implementation, a worst case analysis shows that a typical design can tolerate a skew of 1-5 cycles (depending on FIFO depths and NI clock frequency). Simulation results of a2×2NOC confirm this.
|
2014 |
2 |
NVLSI1448 |
Topic: High Performance BIST PLL Approach for VCO Testing
Abstract: RF and mixed signal IC testing is becoming an important issue that affects both the time-to-market and product cost of many modem electronic systems. This paper focuses on certain mixed signal IC that is phase locked loop (PLL). A novel BIST (Built-In-Self-Test) approach is developed for RF PLL; it is particularly applied for testing the VCO block. The proposed BIST schema doesn’t break the loop to include test circuit in the PLL design stage which is achieved with minimal degradation characteristics of PLL. The key advantage of this technique is that it uses an internal test signal for evaluating the test procedure. The presented architecture uses the existing elements for measuring and testing in order to reduce the area overhead for BIST schema, solves the analog nodes loading problem and improves the test accessibility. The test output generated is a purely digital signal. The BIST method enables the detection of catastrophic and many parametric faults affected the VCO by measuring its oscillation frequency response. To evaluate the effectiveness of proposed BIST approach, a fault simulation results indicate the characteristic of the BIST structure that is high fault coverage of 100%.
|
2014 |
3 |
NVLSI1447 |
Topic: Performance Evaluation of Column-Scaled LDPC Codes Under Fading Channel Conditions
Abstract: Column scaling of LDPC codes is generally done to reduce the decoding complexity without degradation in bit error rate. It has been deduced that, the so constructed CS-LDPC codes had better performance than the existing regular and deterministic LDPC in terms of Bit Error Rate (BER). In this paper, ability of Column Scaled LDPC codes in reaching the best performance is evaluated for different fading channel conditions such as Additive White Gaussian Noise (AWGN), Rayleigh and Rician channels. Diagonal elements (that are not equal to zero) are distributed randomly in order to generate CS-LDPC. In Column Scaled Low density Parity Check codes, non binary parity check matrix, H is derived using Galios field or finite field polynomial and then non-binary H matrix is converted to binary H matrix. CS-LDPC codes have parity-check matrix that are composed of binary and diagonal matrix that eases implementation and analysis results shows that the CS-LDPC scheme improves Bit Error Rate and Frame Error Rate for AWGN and Rician channels and no significant improvement for Rayleigh channels.
|
2014 |
4 |
NVLSI1446 |
Topic: A Low-Cost Platform for Voice Monitoring
Abstract: A low-cost platform is proposed in this paper that has been conceived to monitor the vocal activity of people that use the voice as a professional tool. Such a platform includes a wearable data-logger and a processing program that allows the vocal parameters to be extracted from the recorded signal. The data-logger is equipped with a contact microphone that is attached to the jugular notch of the person under monitoring, thus sensing the skin acceleration level due to the vibration of the vocal folds. The microphone output is conditioned through a custom circuitry and then sent to a cheap micro-controller based board, which stores the raw samples onto a micro SDcard. The off-line processing provides an estimation of Sound Pressure Level (SPL), fundamental frequency (F0)andTime Dose (Dt), which are the parameters that seem most suitable for the identification of vocal disorders and the prevention of an improper use of the voice. For the estimated parameters, suitable calibration procedures are implemented and their effectiveness is shown through specifically conceived experimental tests. Experimental results are shown that refer to the calibration of the device and its normal use during monitoring interval of several hours. A comparison with a commercial device is also reported.
|
2014 |
5 |
NVLSI1445 |
Topic: Implementation of High Speed Low Power Combinational and Sequential Circuits using Reversible logic
Abstract: Reversible logic has presented itself as a prominent technology which plays an imperative role in Quantum Computing. Quantum computing devices theoretically operate at ultra high speed and consume infinitesimally less power. Research done in this paper aims to utilize the idea of reversible logic to break the conventional speed-power trade-off, thereby getting a step closer to realise Quantum computing devices. To authenticate this research, various combinational and sequential circuits are implemented such as a 4-bit Ripple-carry Adder, (8-bit X 8-bit) Wallace Tree Multiplier, and the Control Unit of an 8-bit GCD processor using Reversible gates. The power and speed parameters for the circuits have been indicated, and compared with their conventional non-reversible counterparts. The comparative statistical study proves that circuits employing Reversible logic thus are faster and power efficient. The designs presented in this paper were simulated using Xilinx 9.2 software.
|
2014 |
6 |
NVLSI1444 |
Topic: A LOW POWER BIST SCHEME BASED ON BLOCK ENCODING
Abstract: With the development of integrated circuit manufacturing technology, low power test has become a focus of concern during testing fields. This paper proposes a new low power BIST˄built-in self test˅scheme based on block encoding which first exploit a block re-encoding method to optimize the test cube, and then a low power test based on LFSR (linear feedback shift register) reseeding is applied. According to the compatibility of flag, the scheme proposes a grouping algorithm based on flag to divide and reorder the test cubes in the test cube set. Experimental results show that the scheme not only obtain better test compression ratio and test data storage, but also reduce the test power consumption effectively. Key words: LFSR reseeding; test data compression; low power test; test cube block.
|
2014 |
7 |
NVLSI1443 |
Topic: Designing of FPGA Based High Performance 32 Bit FFT Processor With BIST
Abstract: Designing and implementation of 32 bit and 64 point pipelined FFT processor is presented in this paper. This FFT processor is going to be implemented on Field Programmable Gate Array (FPGA). The aim behind this is to reduce the number of cycles required for computation. The architecture of FFT has two pipelines. Out of this one pipeline is present in execution of the complex multiplication of butterfly unit and other is present in the RAM unit. In this architecture a novel simple address mapping scheme is proposed. The twiddle factor in this architecture is not going to be stored in ROM memory, it is going to be generated and accessed directly. The Built In Self Test (BIST) provided in this is used to design such technique which test itself. |
2014 |
8 |
NVLSI1442 |
Topic: Low Power and High Performance Achievement Using Constant Delay Logic Style
Abstract: The high performance energy efficient is one of the most important goal and objective in the design of VLSI circuits. To achieve this, new CMOS logic family constant delay (CD) logic is used. The CD logic has contention C-Q delay and D-Q delay modes. In CD logic, D-Q delay mode proposes a distinct characteristic where the output is pre-calculated before getting the inputs from the previous stage. This logic provides performance improvement over static and dynamic logic styles in multistage circuit block. In accordance with the logic type, the CD logic style is suitable to implement difficult logic expressions such as addition. The three modes of CD logic is designed, simulated and synthesized. Also full adder is designed, simulated and synthesized in transistor level using static, dynamic and CD logic styles in Tanner EDA. The synthesized results of Full Adder demonstrates that Full Adder using CD logic style has lesser delay which enhances the performance and consumes more power than other two logic styles. Low power is likely to be a key objective in VLSI circuit design. To achieve this, low power techniques -Clock Gating and Supply Voltage Scaling are also used in Full Adder and 4-bit Ripple Carry Adder using CD logic style.
|
2014 |
9 |
NVLSI1441 |
Topic: Reconfigurable Edge Detection Processor Using Xilinx Platform Studio
Abstract: In this paper we propose a technique for software implementation of Edge detection which serves as a preprocessing step for many image processing algorithms such as image enhancement, image segmentation, tracking and image and video coding. The Edge Detection is one of the key stages in image processing and object recognition. Edge detection is a basic operation in image processing which refers to the process of identifying and locating sharp discontinuities in an image. The discontinuities are abrupt changes in pixel intensity which characterize boundaries of objects in a scene. It plays a major role in many algorithms used for segmentation and tracking. This paper presents an edge detection algorithm that results in significantly reduced memory requirements, decreased latency and increased throughput with no loss in edge detection performance using Micro Blaze Processor. This edge detection algorithm is based on MATLAB simulation and FPGA implementation through serial communication using Xilinx Platform Studio.
|
2014 |
10 |
NVLSI1440 |
Topic: Reconfigurable System-On-Chip Design Using FPGA
Abstract: System-on-Chip (SoC) design integrates processors, memory, and a variety of IPs in a single design. Due to the FPGA capabilities and high time-to-market pressures, complex SoC designs are increasingly targeted to FPGA. Traditionally cores in FPGAs are connected using AXI and PLB bus-based architectures. FPGA devices provide Embedded Systems development with new alternatives for creating new hardware accelerated applications. The availability of embedded processor subsystems in FPGAs opens the door to a myriad of applications. Reconfigurable System-on-Chip architecture: includes Micro Blaze Soft Core Processor integrates peripherals with PLB and OPB Buses provides access to memory, PS2 and VGA IP cores. A new peripheral based Arithmetic application is designed, the keyboard module is a custom hardware module that accepts input from a PS/2 serial keyboard and outputs character data to the VGA input memory. VHDL Language is used in ISE for custom logic design. System C & VHDL Co-Synthesis scenario provides a way of checking interoperability of a single designed different functionality hardware module. Both designs are synthesizable and implemented in a single Bit stream, and configured to FPGA. Two level functionality is observed for the configured Bit stream with FPGA Hardware, design modeling was done using System C & VHDL Co-Synthesis. This paper presents an evaluation of design methods and concepts of reconfigurable architecture; it provides a lot of options for system designers. Co-Synthesis was done either Top-Down or Bottom-Up Design Methodologies. Implementation was targeted through Spartan - 3E FPGA Board.
|
2014 |
11 |
NVLSI1439 |
Topic: Performance Analysis of a Space-Frequency Block Coded OFDM Wireless Communication System with MSK and GMSK Modulation
Abstract: Space frequency block codes (SFBC) are very much efficient in overcoming the effect of frequency selective fading channel in a wireless communication system. In this paper, bit error rate performance analysis is carried out for a SFBC-OFDM system with MSK and GMSK modulation schemes. Results are evaluated numerically for SISO and MIMO communication links. It is shown that, for a fixed bit error rate the improvement in SNR for both SFBC coded MSK and GMSK modulation is noticeable. Also the receiver sensitivity is evaluated for system BER. It is shown that sensitivity improves for the change in code rate but remains nearly same for the same combination of transmit and receive antennas both for SFBC coded MSK and GMSK modulation scheme.
|
2014 |
12 |
NVLSI1438 |
Topic: Modified Wallace Tree Multiplier using Efficient Square Root Carry Select Adder
Abstract : A multiplier is one of the key hardware blocks in most digital and high performance systems such as FIR filters, micro processors and digital signal processors etc. A system’s performance is generally determined by the performance of the multiplier because the multiplier is generally the slowest element in the whole system and also it is occupying more area consuming. The Carry Select Adder (CSLA) provides a good compromise between cost and performance in carry propagation adder design. A Square Root Carry Select Adder using RCA is introduced but it offers some speed penalty. However, conventional CSLA is still area-consuming due to the dual ripple carry adder structure. In the proposed work, generally in Wallace multiplier the partial products are reduced as soon as possible and the final carry propagation path carry select adder is used. In this paper, modification is done at gate level to reduce area and power consumption. The Modified Square Root Carry Select-Adder (MCSLA) is designed using Common Boolean Logic and then compared with regular CSLA respective architectures, and this MCSLA is implemented in Wallace Tree Multiplier. This work gives the reduced area compared to normal Wallace tree multiplier. Finally an area efficient Wallace tree multiplier is designed using common Boolean logic based square root carry select adder.
|
2014 |
13 |
NVLSI1437 |
Topic: Design of an Energy Efficient, High Speed, Low Power Full Subtractor Using GDI Technique
Abstract: This paper proposes the design of an energy efficient, high speed and low power full subtractor using Gate Diffusion Input (GDI) technique. The entire design has been performed in 150nm technology and on comparison with a full subtractor employing the conventional CMOS transistors, transmission gates and Complementary Pass-Transistor Logic (CPL), respectively it has been found that there is a considerable amount of reduction in Average Power consumption (Pavg), delay time as well as Power Delay Product (PDP). P avg is as low as 13.96nW while the delay time is found to be 18.02pico second thereby giving a PDP as low as 2.51x10 -19 Joule for 1 volt power supply. In addition to this there is a significant reduction in transistor count compared to traditional full subtractor employing CMOS transistors, transmission gates and CPL, accordingly implying minimization of area. The simulation of the proposed design has been carried out in Tanner SPICE and the layout has been designed in Microwind.
|
2014 |
14 |
NVLSI1436 |
Topic: Design and Implementation of Area Efficient, Low Power AMBA-APB Bridge for SoC
Abstract: In this paper, we present the design of Advanced Peripheral Bus (APB) controller (or APB Bridge). UART as an APB slave has been used in the design. Linear Feedback shift register (LFSR) module has been included in the UART design for data security. We have also compared APB Bridge design compatible with AMBA Specification (Rev 2.0) and APB Bridge design compatible with AMBA 3 APB Specification (v1.0) for power and area constraints have been done. Design of APB Bride with AMBA3 APB save 6% power and 10% area over the one designed with AMBA2 APB.
|
2014 |
15 |
NVLSI1435 |
Topic: Designing a Learning Platform for the Implementation of Serial Standards using ARM Microcontroller LPC2148
Abstract: In embedded system design, managing communication among various bus interfaces and attaching multiple systems with different interfacing protocols to a main processor is one of the challenging tasks. Popular serial interfacing protocols include: USB, I2C, SPIISSP, CAN and UART for communication between integrated circuits for low/medium data transfer speed with on board peripherals. This paper presents a platform which deals with the implementation of certain of the above serial protocols presented by a low power 32-bit ARM RISC processor: LPC2148, with suitable examples, including hardware and software details. This platform is also useful for students of different disciplines to work with different serial protocols, which helps them in interfacing of sensors, memory ICs, analog subsystems and so on. It also aims to provide the students with hands on experience, practices in embedded systems and minimizing the prerequisite knowledge. |
2014
|
16 |
NVLSI1434 |
Topic: RGB Based KMB Image Compression Technique
Abstract: With the increased requirement of bandwidth in digital media, the compression of an image is an important issue. However the various image compression technologies which are still in use such as JPEG/PNG/DCT offer an efficient way for the compression/extraction of an image and provide an ease of data transmission. The technique used here, is much more helpful in reducing the bandwidth of an image and to speed up of its availability, reliability, and transmission rates. In this technique, an image compression domain algorithm aims at high performance in terms of image effectiveness.
|
2014 |
17 |
NVLSI1433 |
Topic: Built-In Self-Test for Analog-to-Digital Converters in SoC Applications
Abstract: This paper presents a built-in self-test (BIST) architecture for testing high speed analog-to-digital converters (ADCs) with sampling rates in excess of 1 GHz. A methodology for performing mixed-mode BIST simulations in SoC applications is proposed along with hardware for performing on-chip BIST. The architecture presented utilizes an on-chip ROM and allows for the generation of test signals with single frequency as well as multiple frequencies signals. The issues associated with BIST signal generation for low voltage ADCs are also discussed. Simulations revealed that the SFDR of the sinusoidal signal generated from the BIST hardware was 25.28 dB with a frequency of 312.5 MHz and 19.88 dB with a frequency of 416.67 MHz. |
2014 |
18 |
NVLSI1432 |
Topic: A SoC Design and Implementation of H.264 Video Encoding System Based on FPGA
Abstract: A SoC design of H.264 Video Encoding system is implemented based on FPGA in this paper. Intra prediction algorithm and baseline profile is selected, and H.264 encoder algorithm is designed as an IP core and embedded to the SoC through the interconnect interface AMBA AXI bus. The SoC is implemented on Xilinx Zynq-7000 FPGA and each functional module is simulated by Modelsim and tested within the SoC platform. Comparing to the existing H.264 Video Encoding system based on ARM or DSP, results indicate that this special SoC could fully shows its advantage in high-speed and flexibility. Also the implemented system could meet the required rate for the processing of HD-1080 format video sequence |
2014 |
19 |
NVLSI1431 |
Topic: Design and Analysis of a Simple D Flip-Flop Based Sequential Logic Circuits for QCA Implementation
Abstract: Quantum-dot Cellular Automata (QCA) is one of the emerging computing paradigms. Its advantages such as smaller size, lower power consumption and faster speed are very attractive. QCA performs highly dense computing that could be realized in a variety of material systems. It is presently being investigated as an alternative to CMOS VLSI. In conventional digital systems the information is transferred from one place to another by means of electrical current, while as QCA cells transfer information by propagating a polarization state. This paper proposes a detailed design and simulation of a simple D flip-flop based sequential logic circuits like shift register, ring counter and modulo n counter circuits for quantum-dot cellular automata. The proposed designs are based on the D-type flip-flop (DFF) device. A QCA binary wire with four clocking zones can be used to implement a DFF. The aim is to maximize the circuit density and focus on a layout that is minimal in its use of cells. |
2014 |
20 |
NVLSI1430 |
Topic: Multiple-Clock Multiple-Edge-Triggered Multiple-Bit Flip-flops for Two-Phase Handshaking Asynchronous Circuits
Abstract: This paper proposes multiple-clock multiple-edge triggered multiple-bit flip-flops for designing simple and straightforward asynchronous control circuits of the two-phase handshaking protocol. The proposed flip-flops have multiple clocks and multiple data inputs, and each data input can be stored in the flip-flop at both the rising edge and the falling edge of the corresponding clock. They can be applied in the asynchronous design of the two-phase handshaking protocol not only for synthesizing simple control circuits, but also for obtaining robust circuits. The performance of the proposed flip-flops has been evaluated using the PTM 22nm HP device parameters.
|
2014 |
21 |
NVLSI1429 |
Topic: Efficient Design of Sparse FIR Filters with Optimized Filter Length
Abstract: A large number of experiments have demonstrated that for an FIR filter the sparsity of filter coefficients is highly elated to its filter order. However, traditional sparse FIR filter design methods focus on how to increase the number of zero valued coefficients, but overlook the impact of filter orders on design performance. As an attempt to jointly optimize filter length and sparsity of an FIR filter, a novel method is proposed in this paper to design sparse linear-phase FIR filters. With peak error constraints, the objective function of the design problem is formulated as a combination of the sparsity of filter coefficients and a measure of the effective filter order. Then, the design problem is then recast as a weighted l0-norm optimization problem, which is solved by an efficient numerical method based on the iterative-reweighted-least-squares (IRLS) algorithms. Experimental results illustrate that the proposed method can efficiently reduce the effective filter order while enhancing the sparsity of an FIR filter.
|
2014 |
22 |
NVLSI1428 |
Topic: A novel approach to realize Built-in-self-test(BIST) enabled UART using VHDL
Abstract: Testing of VLSI chips are becoming very much complex day by day due to increasing exponential advancement of nano technology. So both front-end and back-end engineers are trying to evolve a system with full testability keeping in mind the possibility of reduced product failures and missed market opportunities. BIST is a design technique that allows a system to test automatically itself with slightly larger system size. In this paper, the simulation result performance achieved by BIST enabled UART architecture through VHDL programming is enough to compensate the extra hardware needed in BIST architecture. This technique generate random test pattern automatically, so it can provide less test time compared to an externally applied test pattern and helps to achieve much more productivity at the end .
|
2014 |
23 |
NVLSI1427 |
Topic: Architecture for Monitoring SET Propagation in 16-bit Sklansky Adder
Abstract: We propose a measurement architecture that allows to trace generation and propagation of single event transients in a combinational target circuit that will be subjected to radiation in an experimental study. We choose the Sklansky adder as a target circuit, since it exhibits both properties we are interested in, namely different amounts of fanout and a carry propagation chain. The problem of devising a suitable on-chip measurement infrastructure lies in the partly contradictory requirements, like constrained area, radiation tolerance and good resolution of the location and propagation path of particle hits. Our proposed architecture is based on linear feedback shift registers that can be used as lean and robust counter implementations. These counters are attached at selected locations within the target adder circuit, and we show by means of a simulation study as well as a fault dictionary that this architecture indeed comes up to our expectations.
|
2014 |
24 |
NVLSI1426 |
Topic: High Performance Low Swing Clock Tree Synthesis with Custom D Flip-Flop Design
Abstract: Low swing clocking is a low power design methodology that scales the clock voltage to decrease power consumption of the clock distribution networks, with an expected degradation in the performance. In this work, a novel low swing clock tree synthesis methodology is combined with a custom low swing clock-aware D flip-flop (DFF) design. The low swing clocking serves to reduce the power dissipation whereas the custom low swing-aware DFF serves to preserve the performance of the IC. The experimental results performed on the three largest circuits of ISCAS’89 benchmarks operating at 1GHz in the 32nm technology show that the proposed methodology can achieve an average of 16% power savings in the clock tree compared to its full swing counterpart, while satisfying the same clock skew (50ps) and slew (150ps) constraints at the worst case corner of operation. Moreover, the clock-to-output delay of the low swing DFF does not increase compared to traditional full swing DFF, while consuming only 1% more power.
|
2014 |
25 |
NVLSI1425 |
Topic: Securing RObust Header Compression (ROHC)
Abstract: The desire for the cellular and wireless industry to converge on an all-IP infrastructure, fueled by the increased usage of mobile applications on smart phones and VoIP applications have pushed research in maximizing bandwidth efficiency amidst a shrinking allocation of RF spectrum. One method of providing increased bandwidth efficiency (especially with the desire to move to IPv6), is the use of RObust Header Compression (ROHC-RFC5225) to compress headers from the network layer and above into small identifiers before sending packets to the link layer. ROHCv1 and ROHCv2 have been adopted and is in the roadmaps for usage on High Speed Packet Access (HSPA), Long Term Evolution (LTE) and Evolution Data Optimized (EVDO) mobile phone networks. Although the promise of significant bandwidth savings can be achieved using ROHC, the stateful nature of the protocol leads to potential compromises. In this paper, we examine three attacks on the ROHC protocol that result in denial of service and packet interception and their affect on networks that use ROHC to compress and decompress IP headers. Additionally, we propose three simple methods to mitigate the attacks.
|
2013 |
26 |
NVLSI1424 |
Topic: Shift Register Design Using Two Bit Flip-Flop
Abstract: A novel concept of multi bit flip-flops has been proved to be an effective way in processing multiple bits simultaneously .In this paper we propose a way of using multi bit flip-flop technique in designing various digital circuits. By sharing the inverters in the flip-flops, the total number of inverters can be reduced in a multi-bit flip-flop. So, here, we have designed a shift register which is an important memory element in digital systems, using 2-bit flip flop. Experimental results reveal that our approach is very efficient, which can be effortlessly incorporated in modern vlsi circuit designs.
|
2014 |
27 |
NVLSI1423 |
Topic: Design and Estimation of delay, power and area for Parallel prefix adders
Abstract: In Very Large Scale Integration (VLSI) designs, Parallel prefix adders (PPA) have the better delay performance. This paper investigates four types of PPA’s (Kogge Stone Adder (KSA), Spanning Tree Adder (STA), Brent Kung Adder (BKA) and Sparse Kogge Stone Adder (SKA)). Additionally Ripple Carry Adder (RCA), Carry Look-ahead Adder (CLA) and Carry Skip Adder (CSA) are also investigated. These adders are implemented in verilog Hardware Description Language (HDL) using Xilinx Integrated Software Environment (ISE) 13.2 Design Suite. These designs are implemented in Xilinx Virtex 5 Field Programmable Gate Arrays (FPGA) and delays are measured using Agilent 1692A logic analyzer and all these adder’s delay, power and area are investigated and compared finally.
|
2014 |
28 |
NVLSI1422 |
Topic: Design of a 4-bit Adder using Reversible Logic in Quantum-Dot Cellular Automata (QCA)
Abstract: Both quantum-dot cellular automata (QCA) and reversible logic are emerging technologies that are promising alternatives to overcoming the scaling and heat dissipation issues, respectively, in the current CMOS designs. Here, the fundamentals of QCA and reversible logic are studied; the feasibility of incorporating reversible logic in QCA designs is also demonstrated. Based on two existing designs, an improved version of the reversible gates, namely the Feynman Gate and the Toffoli Gate, were implemented in QCA technology using QCADesigner. The proposed design of the QCA-based Feynman Gate is faster by ½ cycle as compared to the existing design; while the proposed Toffoli Gate has the same latency as the existing design but it is readily to be cascaded into a more complex design. A 4-bit ripple carry adder in QCA is then designed using the proposed Feynman and Toffoli gates to realize a reversible QCA full adder. This 4-bit QCA adder with reversible logic consists of 2030 QCA cells, has a latency of 7 clock cycles and 8 garbage outputs.
|
2014 |
29 |
NVLSI1421 |
Topic: Background Subtraction Algorithm for Moving Object Detection in FPGA
Abstract: Currently, both the market and the academic communities have required applications based on image and video processing with several real-time constraints. On the other hand, detection of moving objects is a very important task in mobile robotics and surveillance applications. In order to achieve an alternative design that allows for rapid development of real time motion detection systems, this paper proposes a hardware architecture for motion detection based on the background subtraction algorithm, which is implemented on FPGAs (Field Programmable Gate Arrays). For achieving this, the following steps are executed: (a) a background image (in gray-level format) is stored in an external SRAM memory, (b) a low-pass filter is applied to both the stored and current images, (c) a subtraction operation between both images is obtained, and (d) a morphological filter is applied over the resulting image. Afterward, the gravity center of the object is calculated and sent to a PC (via RS-232 interface). Both the practical results of the motion detection system and synthesis results have demonstrated the feasibility of FPGAs for implementing the proposed algorithms on an FPGA based hardware platform. The implemented system provides one processed pixel per FPGA’s clock cycle (after the latency time) and speed-ups the software implementation (using the real-time xPC TargetOS from MathWorks) by a factor of 32.
|
2014 |
30 |
NVLSI1420 |
Topic: An Area- and Energy-Efficient FIFO Design Using Error-Reduced Data Compression and Near-Threshold Operation for Image/Video Applications
Abstract: Many image/video processing algorithms require FIFO for filtering. The FIFO size is proportional to the length of the filters and input data width, causing large area and power consumption. We have proposed an energy- and area-efficient FIFO design for image/video applications through FIFO with error-reduced data compression (FERDC) and near-threshold operation. On architecture level, FERDC technique is proposed to reduce the size and power consumption of the FIFO by utilizing the spatial correlation between neighboring pixels and performing error-reduced data compression together with quantization to minimize the mean square error (MSE). On circuit level, near threshold operation is adopted to achieve further power reduction while maintaining the required performance. To demonstrate the proposed FIFO, it has been implemented using a 0.18-µmCMOS process technology. The implementation covers different FIFO length, including 128, 256, 512, and 1024. The experimental results show that the proposed FIFO operating at 0.5 V and 28.57 MHz achieves up to 99%, 65%, and 34.91% reduction in dynamic power, leakage power, and area, respectively, with a small MSE of 2.76, compared with the conventional FIFO design. The proposed FIFO can be applied to a wide range of image/video signal processing applications to achieve high area and energy efficiency.
|
2014 |
31 |
NVLSI1419 |
Topic: Design and Implementation of High Throughput and Area Efficient Hard Decision Viterbi Decoder in 65nm Technology
Abstract: This paper presents a high throughput (1Gbps) and moderate area for constraint length K=3, code rate R=1/2 and four states (N=4) hard decision state parallel Viterbi decoder. The Add Compare Select (ACS) unit in path metric unit is designed to reduce the latency of ACS loop delay by using Modified Carry Look Ahead Adder and Digital Comparator. We also consider the design of Survivor Memory Unit (SMU) which combines the advantages of both Register Exchange method and Trace Back method, to reduce the decoding latency and total area of the Viterbi decoder. The proposed Viterbi decoder design is described using Verilog HDL and implemented in standard cell ASIC flow using Synopsys EDA tool. The design operation is verified by decoding the one million bits. The behavior of the decoder is verified by using Synopsys simulator and synthesized using Synopsys Design Compiler in 65nm CMOS technology library. The proposed decoder operates at 250MHz, supply voltage 1.32V and operating temperature range -40°C to 125°C. The ACS architecture achieves 67.07% improvement in reduction of latency compared to the conventional ACS architecture and achieves 1.235 Gbps throughput. The results show that, the Viterbi decoder architecture achieves 73.03% to 92.46% improvement in area as compared to the other architectures. This reduction in latency and area finds application in high data rate communication.
|
2014 |
32 |
NVLSI1418 |
Topic: Exact BER Performance Analysis of Link Adaptive Relaying with Non-coherent BFSK Modulation
Abstract: Link adaptive relaying (LAR) is one of the most popular techniques developed for mitigating error propagation in decode and forward (DF) based cooperative wireless networks which employs soft power scaling approaches at the relay nodes. On the other side, frequency shift keying (FSK) is a prominent technique for eliminating the need for channel estimation by training sequences which increases complexity of the system and causes reduction in the transmission rate in proportional to the number of users involved in the network. In this paper, performance of an LAR scheme with non-coherent binary FSK (BFSK) signaling is investigated by deriving exact closed form bit error rate expressions in Rayleigh fading channels.
|
2014 |
33 |
NVLSI1417 |
Topic: Efficient Integer DCT Architectures for HEVC
Abstract: In this paper, we present area- and power-efficient architectures for the implementation of integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video Coding (HEVC). We show that an efficient constant matrix multiplication scheme can be used to derive parallel architectures for 1-D integer DCT of different lengths. We also show that the proposed structure could be reusable for DCT of lengths 4, 8, 16, and 32 with a throughput of 32 DCT coefficients per cycle irrespective of the transform size. Moreover, the proposed architecture could be pruned to reduce the complexity of implementation substantially with only a marginal affect on the coding performance. We propose power-efficient structures for folded and full-parallel implementations of 2-D DCT. From the synthesis result, it is found that the proposed architecture involves nearly 14% less area-delay product (ADP) and 19% less energy per sample (EPS) compared to the direct implementation of the reference algorithm, on average, for integer DCT of lengths 4, 8, 16, and 32. Also, an additional 19% saving in ADP and 20% saving in EPS can be achieved by the proposed pruning algorithm with nearly the same throughput rate. The proposed architecture is found to support ultrahigh definition 7680×4320 at 60 frames/s video, which is one of the applications of HEVC.
|
2014 |
34 |
NVLSI1416 |
Topic: Critical-Path Analysis and Low-Complexity Implementation of the LMS Adaptive Algorithm
Abstract: This paper presents a precise analysis of the critical path of the least-mean-square (LMS) adaptive filter for deriving its architectures for high-speed and low-complexity implementation. It is shown that the direct-form LMS adaptive filter has nearly the same critical path as its transpose-form counterpart, but provides much faster convergence and lower register complexity. From the critical-path evaluation, it is further shown that no pipelining is required for implementing a direct-form LMS adaptive filter for most practical cases, and can be realized with a very small adaptation delay in cases where a very high sampling rate is required. Based on these findings, this paper proposes three structures of the LMS adaptive filter: (i) Design 1 having no adaptation delays, (ii) Design 2 with only one adaptation delay, and (iii) Design 3 with two adaptation delays. Design 1 involves the minimum area and the minimum energy per sample (EPS). The best of existing direct-form structures requires 80.4% more area and 41.9% more EPS compared to Design 1. Designs 2 and 3 involve slightly more EPS than the Design 1 but offer nearly twice and thrice the MUF at a cost of 55.0% and 60.6% more area, respectively.
|
2014 |
35 |
NVLSI1415 |
Topic: An Optimized Modified Booth Recorder for Efficient Design of the Add-Multiply Operator
Abstract: Complex arithmetic operations are widely used in Digital Signal Processing(DSP)applications. In this work, we focus on optimizing the design of the fused Add-Multiply (FAM) operator for increasing performance. We investigate techniques to implement the direct recoding of the sum of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient recoding technique and explore three different schemes by incorporating them in FAM designs. Comparing them with the FAM designs which use existing recoding schemes, the proposed technique yields considerable reductions in terms of critical delay, hardware complexity and power consumption of the FAM unit.
|
2014 |
36 |
NVLSI1414 |
Topic: Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14 Additions Abstract: Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms. The DCT is employed in a multitude of compression standards due to its remarkable energy compaction properties. Multiplier-free approximate DCT transforms have been proposed that offer superior compression performance at very low circuit complexity. Such approximations can be realized in digital VLSI hardware using additions and subtractions only, leading to significant reductions in chip area and power consumption compared to conventional DCTs and integer transforms. In this paper, we introduce a novel 8-point DCT approximation that requires only 14 addition operations and no multiplications. The proposed transform possesses low computational complexity and is compared to state-of-the-art DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio. The proposed DCT approximation is a candidate for reconfigurable video standards such as HEVC. The proposed transform and several other DCT approximations are mapped to systolic-array digital architectures and physically realized as digital prototype circuits using FPGA technology and mapped to 45 nm CMOS technology.
|
2014 |
37 |
NVLSI1413 |
Topic: A Bit-Serial Pipelined Architecture for High-Performance DHT Computation in Quantum-Dot Cellular Automata Abstract: In this brief, we consider quantum-dot cellular automata (QCA) realization of the discrete Hadamard transform (DHT). An analysis of a full-parallel solution based on efficient multibit addition in QCA is first presented. We show that this leads to large area as well as delay. We then propose a bit-serial pipelined architecture for QCA-based DHT. The proposed architecture is based on a new one-bit adder–subtractor requiring only six majority gates and a feedback latch that requires only one majority gate and limited wiring. The approach leads to a reduction in area-delay-cycle product of 74% and 91% (over a full-parallel solution) for wordlengths of 4 and 8, respectively. Results of simulations in QCADesigner are also presented. |
2014 |
38 |
NVLSI1412 |
Topic: An Efficient Non-Linear Cost Compression Algorithm for Multi Level Cell Memory
Abstract: This paper defines a non-linear cost compression problem, proposes an efficient algorithm, and applies it to a real application of multi level cell memory to minimize energy consumption and latency. The non-linear cost compression problem extends the traditional cost compression problem to allow a non-linear cost function of symbol frequencies, while it is a weighted linear combination of symbol frequencies in the cost compression problem. In order to solve the non-linear cost compression problem efficiently, we propose an encoding symbol frequency based approach. We first compute frequencies of encoding symbols to minimize a cost function. To achieve the computed frequencies of a cost-compressed message, we deploy existing size-decompression algorithms. The proposed algorithm is optimal and as fast as the existing size compression algorithms. Our experimental results show that it reduces the energy consumption and latency by 70 percent for a text file in multi level cell memory. Furthermore, it increases the lifetime of endurance limited memory.
|
2014 |
39 |
NVLSI1411 |
Topic: Lossless Image Compression using Fast Arithmetic Operation
Abstract: In this paper we are presenting a loss less image compression coder and decoder based on fast arithmetic operations. In the proposed method, we are making use of only simple adder and subtractor in order to reduce the value of the pixel in a very simple manner such that it takes very less amount of run time memory and the time required to encode and decode the given image is very much less. In this proposed method, decompressed image is exactly equal to that of the original image hence it is purely loss less method. Performance of this method is also compared with arithmetic operation based predictive lossless image compression based on time to compress and decompress and compression ratio as quantitative parameters. Since this is taking less time to encode and decode this is much suitable for real time implementation of image codec.
|
2014 |
40 |
NVLSI1410 |
Topic: Design of Efficient Binary Comparators in Quantum-Dot Cellular Automata
Abstract: Quantum-dot cellular automata (QCA) are an attractive emerging technology suitable for the development of ultradense low-power high-performance digital circuits. Efficient solutions have recently been proposed for several arithmetic circuits, such as adders, multipliers, and comparators. Nevertheless, since the design of digital circuits in QCA still poses several challenges, novel implementation strategies and methodologies are highly desirable. This paper proposes a new design approach oriented to the implementation of binary comparators in QCA. New formulations of basic logic equations required to perform the comparison function are proposed. The new strategy has been exploited in the design of two different comparator architectures and for several operands word lengths. With respect to existing counterparts, the comparators proposed here exhibit significantly higher speed and reduced overall area.
|
2014 |
41 |
NVLSI1409 |
Topic: A Low-Power and Portable Spread Spectrum Clock Generator for SoC Applications
Abstract: In this paper, a novel portable and all-digital spread spectrum clock generator (ADSSCG) suitable for system-on-chip (SoC) applications with low-power consumption is presented. The proposed ADSSCG can provide flexible spreading ratios by the proposed rescheduling division triangular modulation (RDTM). Thus it can provide different EMI attenuation performance for various system applications. Furthermore, the proposed ADSSCG employs a low-power digitally controlled oscillator (DCO) to save overall power consumption significantly. Measurement results show that power consumption of the proposed ADSSCG is 1.2 mW (@54 MHz), and it provides 9.5 dB EMI reductions with 1% spreading ratio. Besides, the proposed ADSSCG has very small chip area as compared with conventional SSCGs which often required large on-chip loop filter capacitors. In addition, the proposed ADSSCG is implemented only with standard cells, making it easily portable to different processes and very suitable for SoC applications.
|
2014 |
42 |
NVLSI1408 |
Topic: Design Flow for Flip-Flop Grouping in Data-Driven Clock Gating
Abstract: Clock gating is a predominant technique used for power saving. It is observed that the commonly used synthesis based gating still leaves a large amount of redundant clock pulses. Data-driven gating aims to disable these. To reduce the hardware overhead involved, flip-flops (FFs) are grouped so that they share a common clock enabling signal. The question of what is the group size maximizing the power savings is answered in a previous paper. Here we answer the question of which FFs should be placed in a group to maximize the power reduction. We propose a practical solution based on the toggling activity correlations of FFs and their physical position proximity constraints in the layout. Our data-driven clock gating is integrated into an Electronic Design Automation (EDA) commercial backend design flow, achieving total power reduction of 15%–20% for various types of large-scale state-of-the-art industrial and academic designs in 40 and 65 manometer process technologies. These savings are achieved on top of the savings obtained by clock gating synthesis performed by commercial EDA tools, and gating manually inserted into the register transfer level design.
|
2014 |
43 |
NVLSI1407 |
Topic: Input Vector Monitoring Concurrent BIST Architecture Using SRAM Cells
Abstract: Input vector monitoring concurrent built-in self test (BIST) schemes perform testing during the normal operation of the circuit without imposing a need to set the circuit offline to perform the test. These schemes are evaluated based on the hardware overhead and the concurrent test latency (CTL), i.e., the time required for the test to complete, whereas the circuit operates normally. In this brief, we present a novel input vector monitoring concurrent BIST scheme, which is based on the idea of monitoring a set (called window) of vectors reaching the circuit inputs during normal operation, and the use of a static-RAM like structure to store the relative locations of the vectors that reach the circuit inputs in the examined window; the proposed scheme is shown to perform significantly better than previously proposed schemes with respect to the hardware overhead and CTL tradeoff.
|
2014 |
44 |
NVLSI1406 |
Topic: Jitter of Delay-Locked Loops Due to PFD
Abstract: In this paper, delay-locked loop’s (DLLs) jitter due to uncertainties in the phase frequency detector (PFD) is calculated. First, time-domain equations of the DLL are introduced. These equations are the key to obtaining a closed form equation related to the jitter of DLL in presence of a noisy PFD. Jitter equations at the output of all stages are calculated theoretically. A DLL is designed in 0.18-µm CMOS technology to validate the obtained equations.
|
2014 |
45 |
NVLSI1405 |
Topic: Area-Delay Efficient Binary Adders in QCA
Abstract: As transistors decrease in size more and more of them can be accommodated in a single die, thus increasing chip computational capabilities. However, transistors cannot get much smaller than their current size. The quantum-dot cellular automata (QCA) approach represents one of the possible solutions in overcoming this physical limit, even though the design of logic modules in QCA is not always straightforward.
|
2014 |
46 |
NVLSI1404 |
Topic: Reconfigurable CORDIC-Based Low-Power DCT Architecture Based on Data Priority
Abstract: This paper presents a low-power coordinate rotation digital computer (CORDIC)-based reconfigurable discrete cosine transform (DCT) architecture. The main idea of this paper is based on the interesting fact that all the computations in DCT are not equally important in generating the frequency domain outputs. Considering the importance difference in the DCT coefficients, the number of CORDIC iterations can be dynamically changed to efficiently tradeoff image quality for power consumption. Thus, the computational energy can be significantly reduced without seriously compromising the image quality. The proposed CORDIC-based 2-D DCT architecture is implemented using 0.13µm CMOS process, and the experimental results show that our reconfigurable DCT achieves power savings ranging from 22.9% to 52.2% over the CORDIC-based Loeffler DCT at the cost of minor image quality degradations. |
2014 |
47 |
NVLSI1403 |
Topic: FPGA-Based Bit Error Rate Performance Measurement of Wireless Systems
Abstract: This paper presents the bit error rate (BER) performance validation of digital baseband communication systems on a field-programmable gate array (FPGA). The proposed BER tester (BERT) integrates fundamental baseband signal processing modules of a typical wireless communication system along with a realistic fading channel simulator and an accurate Gaussian noise generator onto a single FPGA to provide an accelerated and repeatable test environment in a laboratory setting. Using a developed graphical user interface, the error rate performance of single- and multiple-antenna systems over a wide range of parameters can be rapidly evaluated. The FPGA-based BERT should reduce the need for time-consuming software based simulations, hence increasing the productivity. This FPGA-based solution is significantly more cost effective than conventional performance measurements made using expensive commercially available test equipment and channel simulators.
|
2014 |
48 |
NVLSI1402 |
Topic: A Combined SDC-SDF Architecture for Normal I/O Pipelined Radix-2 FFT
Abstract: We present an efficient combined single-path delay commutator-feedback (SDC-SDF) radix-2 pipelined fast Fourier transform architecture, which includes log 2N−1 SDC stages, and 1 SDF stage. The SDC processing engine is proposed to achieve 100% hardware resource utilization by sharing the common arithmetic resource in the time-multiplexed approach, including both adders and multipliers. Thus, the required number of complex multipliers is reduced to log 4N−0.5, compared with log 2N−1 for the other radix-2 SDC/SDF architectures. In addition, the proposed architecture requires roughly minimum number of complex adders log2N+1 and complex delay memory 2N+1.5log2N−1.5.
|
2014 |
49 |
NVLSI1401 |
Topic: Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for Efficient FIR Filter Implementation
Abstract: Multiple constant multiplications (MCM) scheme is widely used for implementing transposed direct-form FIR filters. While the research focus of MCM has been on more effective common sub expression elimination, the optimization of adder-trees, which sum up the computed sub-expressions for each coefficient, is largely omitted. In this paper, we have identified the resource minimization problem in the scheduling of adder-tree operations for the MCM block, and presented a mixed integer programming (MIP) based algorithm for more efficient MCM-based implementation of FIR filters. Experimental result shows that up to 15% reduction of area and 11.6% reduction of power (with an average of 8.46% and 5.96% respectively) can be achieved on the top of already optimized adder/subtractor network of the MCM block.
|
2014 |
50 |
NVLSI1400 |
Topic: A Look-Ahead Clock Gating Based on Auto-Gated Flip-Flops
Abstract: Clock gating is very useful for reducing the power consumed by digital systems. Three gating methods are known. The most popular is synthesis-based, deriving clock enabling signals based on the logic of the underlying system. It unfortunately leaves the majority of the clock pulses driving the flip-flops (FFs) redundant. A data-driven method stops most of those and yields higher power savings, but its implementation is complex and application dependent. A third method called auto-gated FFs (AGFF) is simple but yields relatively small power savings. This paper presents a novel method called Look-Ahead Clock Gating (LACG), which combines all the three. LACG computes the clock enabling signals of each FF one cycle ahead of time, based on the present cycle data of those FFs on which it depends. It avoids the tight timing constraints of AGFF and data-driven by allotting a full clock cycle for the computation of the enabling signals and their propagation. A closed-form model characterizing the power saving per FF is presented. It is based on data-to-clock toggling probabilities, capacitance parameters and FFs’ fan-in. The model implies a breakeven curve, dividing the FFs space into two regions of positive and negative gating return on investment. While the majority of the FFs fall in the positive region and hence should be gated, those falling in the negative region should not. Experimentation on industry-scale data showed 22.6% reduction of the clock power, translated to 12.5% power reduction of the entire system. |
2014 |