**AUTHORS:**Liu De, Wang Mingjiang

**Download as PDF**

**ABSTRACT:**
This paper presents an architecture of a triple-mode floating-point adder that supports higher
precision and parallel lower precision addition. The proposed design can work in three modes: four parallel
single precision or two parallel double precision or one quadruple precision addition/subtraction. The proposed
triple-mode adder’s parallel computation in lower precision can be applied in SIMD application to
accommodate 3D graphics, video conferencing and multimedia fields while its high precision computation can
be applied in scientific applications such as supernova simulations, climate modeling and etc. To improve the
performance of the triple-mode floating-point adder, the design is implemented with the improved two-path
algorithm in combinational and pipeline form. To compare area, power and worst-case latency, single-mode
single, double, quadruple and dual-mode quadruple precision floating-point adders are also implemented using
the similar techniques. These adders and the triple-mode adder are tested and verified through extensive
simulation and then synthesized with 65nm manufacture process. The synthesis results show that the proposed
triple-mode floating-point adder requires 10-16% more delay than a single-mode quadruple precision adder and
saves 47-52% area compared to the combination of four single, two double and one quadruple precision adders.

**KEYWORDS:**
floating-point adder, floating-point arithmetic, triple-mode adder

**REFERENCES:**

[1] (2012, Jan.), Avoiding AVX-SSE transition penalties, Intel,

[Online]. Available: http://software.intel.com/en-us/articles

[2] (2014 Sep.) Intel64 and IA-32 architectures optimization reference manual, Intel,

[Online]. Available: http://www.intel.com/content/www/us/en/

[3] C. L. Yang and B. Sano, “Exploiting parallelism in geometry processing with general purpose processors and floating-point SIMD instruction,” IEEE Transactions on Computers, vol. 49, no. 9, 2000, pp. 934-946

[4] Naoki NISHIKAWA and Keisuke IWAI, “Throughput and Power Efficiency Evaluation of Block Ciphers on Kepler and GCN GPUs Using Micro-Benchmark Analysis,” IEICE Transactions on Information and Systems, vol. E97-D, no. 6, 2014, pp. 1506-1515

[5] Li Rongchun, Dou Yong, “Efficient parallel implementation of three-point Viterbi decoding algorithm on CPU, GPU and FPGA,” Concurrency and Computation-Practice & Experience, vol. 26, no. 3, 2014, pp. 821-840

[6] M.Ferreira, N. Roma and Luis MS Russo, “Cache Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER,” BMC Bioinformatics, vol. 15:165, 2014

[7] Juan M. Cebrian, Lasse Natvig and J. C. Meyer, “Performance and energy impact of parallelization and vectorization techniques in modern microprocessors,” Computing, vol. 96, no. 12, 2014, pp. 1179-1193

[8] A. Akkas, “Instruction Set Enhancements for Reliable Computations,” Ph.D. dissertation, Lehigh University, 2001.

[9] D.H. Bailey, R. Barrio, J.M. Borwein, “Highprecision computation: Mathematical physics and dynamics,” Applied Mathematics and Computation, vol. 218, no. 20, 2012, pp. 10106-10121

[10] G. Howell, G.A. Geist, “Necessity of high precision arithmetic for large-scale computations,” in Proc. NPSC, 1995, pp. 219– 222. Jun. 2012.

[11] IEEE Standard for Floating-Point Arithmetic, ANSI/IEEE Standard 754-2008, Aug. 29, 2008.

[12] E. Schwarz, R. Smith, C. Krygowski, “The S/390 G5 floating point unit supporting hex and binary architecture,” 14th IEEE Symposium on Computer Arithmetic, 1999, pp. 258-265.

[13] S. Oberman, “Design Issues in HighPerformance Floating-Point Arithmetic Units,” Ph.D. dissertation, Dept. Elect. Eng., Stanford University, Stanford, 1996.

[14] A. Beaumont-Smith, N. Burgess, “Reduced latency IEEE floating-point standard adder architectures,” 14th IEEE Symposium on Computer Arithmetic, 1999, pp. 35–43.

[15] P. M. Seidel, G. Even, “Delay-optimized implementation of IEEE floating-point addition,” IEEE Transactions on Computers, vol. 53, no. 2, 2004, pp. 97–113

[16] M. Farmwald, “On the Design of HighPerformance Digital Arithmetic Units,” Ph.D. Dissertation, Stanford University, Stanford, 1981.

[17] S. Oberman, H. Al-Twaijry, M. Flynn, “A SNAP project: design of floating-point arithmetic units,” 13th IEEE Symposium on Computer Arithmetic, 1997, pp. 156-165.

[18] J. Bruguera and T. Lang, “Rounding in floating-point addition using a compound adder,” Techinical report, University of Santiago de Compostela, 2000.

[19] P. M. Kogge and H. S. Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations,” IEEE Transactions on Computers, vol. C-22, no. 8, 1973, pp. 786–793

[20] N. Bruguera, “The flagged prefix adder for dual addition,” Proceeding of SPIE - The International Society for Optical Engineering, 1998, pp. 567-575

[21] G. Oklobdzija, “An Algorithmic and Novel Design of a Leading Zero Detector Circuit: Comparison with Logic Synthesis,” IEEE Transactions on VLSI Systems, vol. 2, no. 1, 1994, pp. 124-128

[22] V. Oklobdzija, “Comment on “Leading-zero anticipatory logic for high-speed floating point addition”,” IEEE Journal of Solid-State Circuits, vol. 32, no. 2, 1997, pp. 292–293

[23] H.Suzuki and H. Morinaka, “Leading-Zero Anticipatory Logic for High-Speed Floating Point Addition,” IEEE Journal of Solid-State Circuits, vol. 31, no. 8, 1996, pp. 1157-1164

[24] G. Dimitrakopoulos, K. Galanopoulos, C. Mavrokefalidis and D. Nikolos, “Low-Power Leading-Zero Counting and Anticipation Logic for High-Speed Floating Point Units,” IEEE Transactions on VLSI Systems, vol. 16, no. 7, 2008, pp. 837-850

[25] J. D. Bruguera and T. Lang, “Leading-one prediction with concurrent position correction,” IEEE Transactions on Computers, vol. 48, no. 10, 1999, pp.1083–1097

[26] A. Akkas, “Dual-mode quadruple precision floating-point adder,” 9th Euromicro Conference on Digital System Design, Cavtat, CROATIA, 2006, pp. 211-220

[27] Akkas, “Dual-mode floating-point adder architectures,” Journal of Systems Architecture, vol. 54, no. 12, 2008, pp. 1129-1142

[28] Akkas and M. Schulte, “Dual-mode floatingpoint multiplier architectures with parallel operations,” Journal of Systems Architecture, vol. 52, no. 10, 2006, pp. 549-562

[29] Akkas and M. J. Schulte, “A Quadruple Precision and Dual Double Precision FloatingPoint Multiplier,” Euromicro Symposium on Digital System Design, 2003, pp. 76-81.

[30] Isseven and A. Akkas, “A Dual-mode quadruple precision floating-point divider,” 40th Asilomar Conference on Signals, Systems and Computers, 2006, pp. 1697-1701.

[31] M. K. Jaiswal and Ray C.C. Cheung, “Unified Architecture for Double/Two-Parallel Single Precision Floating Point Adder,” IEEE Transactions on Circuits and Systems IIExpress Briefs, vol. 61, no. 7, 2014, pp. 521- 525

[32] D. Tan, C. E. Lemonds and M. J. Schulte, “Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support,” IEEE Transactions on Computers, vol. 58, no. 2, 2009, pp. 175-187

[33] M. K. Jaiswal and Ray C.C. Cheung, “ConFigurable Architecture for Double/TwoParallel Single Precision Floating Point Division,” Proceedings of IEEE Computer Society Annual Symposium on VLSI, 2014, pp. 332-337.

[34] K. Manolopoulos, D. Reisis, “An Efficient Multiple Precision Floating-Point Multiplier,” 18th IEEE International Conference on Electronics, Circuits and Systems, 2011, pp. 153-156.

[35] A. Baluni and F. Merchant, “A Fully Pipelined Modular Multiple Precision Floating Point Multiplier With Vector Support,” International Symposium on Electronic System Design, 2011, pp. 45-50.

[36] Libo Huang and Li Shen, “A New Architecture for Multiple Precision Floating-Point MultiplyAdd Fused Unit Design,” 18th IEEE Symposium on Computer Arithmetic, 2007, pp. 69-76.

[37] K. Manolopoulos and D. Reisis, “An Efficient Dual-Mode Floating-Point Multiply-Add Fused Unit,” in Proc. 17th IEEE International Conference on Electronics, Circuits and Systems, 2010, pp. 5-8.

[38] M. Gok and M. M. Ozbilen, “Multi-functional floating-point MAF designs with dot product support,” Microelectronics Journal, vol. 39, pp. 30-43, 2007.

[39] A. Verma and A. K. Verma, “Hybrid LZA: A Near Optimal Implementation of the Leading Zero Anticipator,” 14th Asia and South Pacific Design Automation Conference, 2009, pp. 203- 209.

[40] N. K. Reddy, M. C. Sekhar, “A Novel Low Power Error Detection Logic for Inexact Leading Zero Anticipator in Floating Point Units,” 27th International Conference on VLSI Design, 2014, pp. 128-132.