# **Stochastic Inner Product Core for Digital FIR Filters**

MING MING WONG Nanyang Technological University School of Computer Science Engineering 50 Nanyang Ave, 639798 SINGAPORE mmwong@ntu.edu.sg

**DENNIS WONG** Heriot-Watt University Malaysia Institute of Sensors, Signals and Systems No 1 Jalan Venna P5/2 Precinct 5,62200 Putrajava MALAYSIA d.wong@hw.ac.uk

CISHEN ZHANG Swinburne University of Technology Hawthorn VIC 3122 AUSTRALIA cishenzhang@swin.edu.au

ISMAT HIJAZIN Swinburne University of Technology School of Software and Electrical Engineering School of Software and Electrical Engineering Hawthorn VIC 3122 AUSTRALIA ihijazin@swin.edu.au

Abstract: The computational operations of stochastic computing (SC) are governed by probability rules which is different from conventional arithmetic computations. Applications of SC to digital signal and image processing problems have been recently reported in the literature. To improve the computational performance of SC based finite impulse response (FIR) digital filters, a new stochastic inner product (multiply and accumulate) core with an improved scaling scheme is presented for improving the accuracy and fault tolerance performance of the filters. Taking into account the symmetric property of the coefficients of linear phase FIR filters, the proposed inner product core is designed using tree structured multiplexers which is capable of reducing the critical path and fault propagation in the stochastic circuitry. The designed inner product core can lead to construction of SC based light weight and multiplierless FIR digital filters. As a result, an SC based FIR digital FIR filter is implemented on Altera Cyclone V FPGA which operates on stochastic sequences of 256-bits length (8-bits precision level). Experimental results show that the developed filter has lower hardware cost, better accuracy and higher fault tolerance level compared with other stochastic implementations.

Key–Words: Stochastic computing, Inner product, Digital FIR filters

#### Introduction 1

Stochastic computing (SC) [1] is a computational technique with operations based on probability instead of arithmetic rules [2, 3, 4]. This technique can simplify mathematical functions, which are computationally demanding in binary computation, with simple logic operations and reduced hardware requirement. It is robust against noise. And it has a progressive precision characteristic that the precision of stochastic numbers (bit streams) increases as computation proceeds [2]. These advantages enabled SC's recent applications in signal and image processing, in particular, in realization and implementation of digital filters [5, 6]

It is noted that the existing scaling schemes for the SC FIR filters may considerably degrade the filter operation accuracy when the filter order is high. To deal with this problem, a new stochastic inner product core with an improved scaling scheme is presented for improving the accuracy and fault tolerance performance of SC based finite impulse response (FIR) digital filters. Taking into account the symmetric property of the coefficients of linear phase FIR filters, the proposed inner product core is designed using tree structured multiplexers which is capable of reducing the critical path and fault propagation in the stochastic circuitry. The designed inner product core can lead to construction of SC based light weight and multiplierless FIR digital filters.

The performance of the proposed SC based FIR filter is tested via hardware implementation on an FPGA using a case study on a 6th-order FIR digital filter. With the filter's order varying from 4th to 8thorder and each having cutoff frequencies ranging from  $0.2\pi$  to  $0.8\pi$ , empirical analysis is performed to evaluate the proposed FIR filter's accuracy levels, fault tolerance capabilities as well as the associated hardware

costs. The obtained results show that our proposed SC filter design outperforms other existing higher precision stochastic FIR filter designs.

The rest of the paper is organized as follows. Section 2 reviews stochastic computational elements of SC. The improved stochastic inner product core is presented in Section 3. The result of Section 3 is applied to the SC digital FIR filters design in Section 4 on a case study of the 6th order SC FIR filter, without loss of genrality. The experimental results (accuracy and fault tolerance analysis) as well as the hardware implementation for the application is reported and discussed in Section 5. Finally, some conclusion remarks are drawn in Section 6.

### **2** Basics of Stochastic Computation

The basic rule of SC is that the computational data (in bit-streams) are represented as stochastic sequences and are processed in form of digitized probabilities [3]. Naturally, the representations and all the involved computations always lie within the real-number unit interval [0, 1]. Stochastic representation can be coded in two formats: *SC-unipolar* and *SC-bipolar* [1].

In SC-unipolar format, the input s is a real number within the unit interval, i.e.  $0 \le s \le 1$ . As an example, a 2's complement binary input bit-stream  $\{0100\}_2$  is represented in stochastic bit-streams S, consisting of 4 of bit '1' out of  $2^4 = 16$  bits (remaining bits are zeros). This stochastic bit-streams S, is also interpreted as p = P(S = 1) = 4/16. On the other hand, in SC-bipolar format, the range of the real number input, s, is extended to  $-1 \le s \le 1$ . Consider a 2's complement binary input bit-stream  $\{1100\}_2$ . In SC-bipolar bit-streams S, the deterministic value is mapped to  $p = P(S = 1)/2 = 12/(16 \times 2) = 6/16$ .

In other words, stochastic representation observes the probability of 1s at arbitrary bit position in S. Such representation serves as the main reason for having high fault tolerance in SC. A single bit-flip in a long bit-stream causes a minor change in original logical value. On the contrary, a single bit-flip in the conventional 2's complement computation will result in huge error especially if the bit-flip occurs on the higher-order bit.

**Multiplication** of two inputs streams, which is computational intensive in conventional signed binary computing, can be performed using single logical gate in SC. Consider two stochastic input bit-streams,  $X_1$ and  $X_2$  and the output for their multiplication, Y, is derived as,

$$y = P(Y = 1)$$
  
=  $P(X_1 = 1)P(X_2 = 1)$   
+ $(1 - P(X_1 = 1))(1 - P(X_2 = 1))$ 

Stochastic multiplication in bipolar format is clearly a logical XNOR operation between input bit-streams,  $X_1$  and  $X_2$  in digital circuit. For unipolar format, the multiplication is performed using a logical AND operation instead. Stochastic multiplier for both unipolar and bipolar formats are as depicted in Figure 1.



Figure 1: Stochastic Multiplier for (i) SC-unipolar and (ii) SC-bipolar formats.

Addition in SC is performed using a special operation, termed as scaled addition. The addition is scaled such that the value always lies between the probability interval [0, 1]. With S being a constant scale, the sum of two independent stochastic bitstreams  $X_1$  and  $X_2$ , produces Y, defined as,

$$y = P(Y = 1)$$
  
=  $P(S = 1)P(X_1 = 1)$   
+ $(1 - P(S))(P(X_2 = 1))$   
=  $SX_1 + (1 - S)X_2$ 

Thus, multiplexer with conditional select line S, set as  $P(S) = \frac{1}{2}$  can be used to realize the scaled addition of two stochastic bit-streams in digital circuit.

**Subtraction** in SC is similar to the adder except that the stochastic scaled substractor requires an additional inverter and this only feasible in SC-bipolar format. Both the stochastic scaled adder and scaled substractor are illustrated in Figure 2.



Figure 2: Stochastic scaled adder/substractor

### **3** Stochastic Inner Product for FIR Filters

FIR filters are widely implemented in many DSP applications because of their filtering stability and linear-phase properties. FIR filters are generally characterized by their impulse response coefficients which sequentially perform inner product operations with the input signals. The inner product operations can be well approximated through SC. For example, the summation of the multiplication between the input vector  $\{X_0, X_1\}$  and the filter coefficient vector  $\{a_0, a_1\}$ can be derived using a single stochastic operation, the stochastic scaled addition, i.e.  $\frac{1}{2}(a_0X_0 + a_1X_1)$ . Through SC, the computational data are represented in stochastic bit-streams of  $2^n$  bits (*n* is precision level) and are processed in the form on digitized probabilities [2]. In terms of hardware, a stochastic scaled addition can be realized using a multiplexer with its conditional select line, S set as the scaling factor [2].

Unfortunately, the implicit scaling of  $\frac{1}{2}$  in stochastic scaled addition will significantly degrade the output accuracy if the amount of stochastic inputs involved is large [5]. Meanwhile, summation of small amount of stochastic inputs will result in growth of data variances due to the lack of precision level in SC as well as data correlation issues [5]. In other words, the accuracy level of the stochastic FIR filter will then be subjected to the filter order. A recent work by Chang and Parhi [5, 6] proposed an alternative inner product core with uneven weighted multiplexer as opposed to the conventional stochastic scaled addition. Given two filter coefficients  $\{a_0, a_1\}$ , the control probability signal is set to be  $\frac{|a_0|}{|a_0|+|a_1|}$  instead of  $\frac{1}{2}$ . Following this, the scaled addition is deduced as  $\frac{1}{|a_0|+|a_1|} (a_0 X_0 + a_1 X_1)$ . The simulation results showed that such inner product core can have higher accuracy level compared to the conventional stochastic inner product core.

It is observed that when one of the coefficients is greatly larger than the other i.e.  $a_1 \gg a_0$ , the weightage of  $a_0$  will become insignificant in the control probability signal, i.e.,  $\frac{|a_0|}{|a_0|+|a_1|} \approx \frac{|a_0|}{|a_1|}$ . As a result of the computational negligible coefficients of small weighting in higher order filters, the scaled addition approach proposed in [5] can have accuracy degradation in high order filters. To deal with this problem, this paper propose a new scheme for the filter coefficient scaling in the stochastic inner product. Under this scheme, the coefficients of equal (or near equivalent) weightings are paired together to form the scaling factor in the stochastic scaled addition.

A causal discrete-time FIR filter of order N(length M = N + 1) is generally described as  $y[n] = \sum_{k=0}^{N} a[k]x[n-k]$ , with y[n] as the output signal, x[n] is the input signal and  $\{a_0, a_1, a_2, a_3, \ldots, a_N\}$  are the filter coefficients. All the 4 types of linear phase FIR filters have symmetric parameters in absolute value. Therefore, the distribution of the absolute value of the filter's impulse response coefficients resembles a bell curve. The largest value is weighted at the center of the distribution and decreases gradually towards the first and the last coefficients, i.e.  $a_0 \approx a_N < a_1 \approx a_{N-1} < a_2 \approx a_{N-2} < \cdots < a_{\frac{N-1}{2}}$ . Hence, the scaled additions are performed according to the pairs of the FIR filter coefficients arranged such as follows.

- $P_0 = \{a_0, a_N\}, P_1 = \{a_1, a_{N-1}\}, P_2 = \{a_2, a_{N-2}\}, \dots$  and
- $P_{N'=\frac{N-1}{2}} = \left\{ a_{\frac{N-1}{2}}, a_{\frac{N+1}{2}} \right\}$  for even length (such as FIR Types II and IV),
- $P_{N'=\frac{N}{2}} = \left\{ a_{\frac{N}{2}} \right\}$  for odd length (such as FIR Types I and III).

Furthermore, note that  $P_0 < P_1 < P_2 < \cdots < P_0 < P_{N'}$  with  $N' = \frac{N-1}{2}$  for even filter length and  $N' = \frac{N}{2}$  for odd filter length. With that, the next round of scaled additions are performed following the pairing shown below.

The similar addition process is repeated until the inner product computation is completed. An example of the resultant stochastic inner product using the proposed approach is shown in (7).

### **4** Design of SC FIR Filters

Without lose of generality, consider a 6th-order linear phase Type I FIR filter, as shown in Figure 3, with its taps coefficients labeled as  $\{a_0, a_1, a_2, a_3, a_4, a_5, a_6\}$  and the input vectors listed as  $\{X_0, X_1, X_2, X_3, X_4, X_5, X_6\}$ . The inner product of the filter is derived as  $y = a_0X_0 + a_1X_1 + a_2X_2 +$  $a_3X_3 + a_4X_4 + a_5X_5 + a_6X_6$ . Using the proposed stochastic inner product, the final computation is described in (7) and is illustrated in Figure 4. Note that, both of the input vectors and filter coefficients are first



Figure 3: Structure of the SC 6th-order FIR filter.

converted into stochastic bit-streams using SNG modules [1], which are not shown in the figure.

$$Y_{11} = \left(\frac{|a_0|}{|a_0| + |a_6|}\right) X_0 + \left(\frac{|a_6|}{|a_0| + |a_6|}\right) X_6 \quad (1)$$

$$Y_{12} = \left(\frac{|a_1|}{|a_1| + |a_5|}\right) X_1 + \left(\frac{|a_5|}{|a_1| + |a_5|}\right) X_5 \quad (2)$$

$$Y_{13} = \left(\frac{|a_2|}{|a_2| + |a_4|}\right) X_2 + \left(\frac{|a_4|}{|a_2| + |a_4|}\right) X_4 \quad (3)$$

$$Y_{14} = \left(|a_3|X_3\right) \tag{4}$$

$$Y_{21} = \left(\frac{|a_0| + |a_6|}{|a_0| + |a_1| + |a_5| + |a_6|}\right)Y_{11} + \left(\frac{|a_1| + |a_5|}{|a_0| + |a_1| + |a_5| + |a_6|}\right)Y_{12}$$
(5)

$$Y_{22} = \left(\frac{|a_2| + |a_4|}{|a_2| + |a_3| + |a_4|}\right) Y_{13} + \left(\frac{|a_3|}{|a_2| + |a_3| + |a_4|}\right) Y_{14}$$
(6)

$$Y = \left(\frac{|a_0| + |a_1| + |a_5| + |a_6|}{\sum_{i=0}^{6} |a_i|}\right) Y_{21} + \left(\frac{|a_2| + |a_3| + |a_4|}{\sum_{i=0}^{6} |a_i|}\right) Y_{22}$$
(7)

With such filter's coefficients,  $a_0 \approx a_6, a_1 \approx a_5$ and  $a_2 \approx a_4$ , the scaled addition in (1), (2) and (3) can be performed using fixed scaling factor  $\frac{1}{2}$ . Therefore, the conditional probability selection line (which is determined by the scaling factor) of the correspondence multiplexers can share the same SNG modules to promote hardware cost reduction. The savings will be more prominent in higher order filter where there is a large amount of identical coefficients. In addition, our SC FIR filter is designed with precision level of 8bits, whereby the computations are performed using  $2^8 = 256$  bits only. The filter designs in [5, 6] are computed using  $2^{10} = 1024$  bits instead.

### **5** Experimental Results

Several simulations were performed to test the effectiveness and the efficiency of the proposed SC FIR filter. The metric of measurements included the output accuracy (error-to-signal power ratio), the fault tolerance and the hardware requirement and performance in FPGA implementation.

### 5.1 Accuracy Analysis

The new SC low-pass FIR filters, implemented in three different orders and each having four different cutoff frequencies, are evaluated for their accuracy levels. A total of 256 samples of input test signal is used in the test simulation. The test signal consists of a mixture of four sinusoidal waves of different frequencies padded with white noise. The accuracies of the proposed filters are measured in term of the errorto-signal power ratio and are benchmarked with the work reported in [5]. These results are as summarized in Table 1. The results from [5] showed the error ratio increases with higher filters' order. In constrast, our SC FIR filter presents consistently lower error ratio regardless of the filters' order. Further accuracy justification can be deduced by comparing the frequency response and the power spectral density (PSD) of the output signal deduced from both our SC filter and the ideal filter (see Figure 5). It is observed that the spectrum of our SC filter is very close to that of the ideal filter.

#### 5.2 Fault Tolerance Analysis

Apart from low hardware cost, SC is well recognized for being insusceptible towards fault as opposed to



Figure 4: The circuit implementation of the 6th order SC FIR filter.



Figure 5: Output spectrum and PSD derived using the FIR ideal filter and our SC FIR filter. Both filters are low-pass with 6th-order and the cutoff frequency at  $0.4\pi$ .

|        | Filter Cutoff Frequency |        |          |        |          |        |          |        |
|--------|-------------------------|--------|----------|--------|----------|--------|----------|--------|
| Filter | $0.2\pi$                |        | $0.4\pi$ |        | $0.6\pi$ |        | $0.8\pi$ |        |
| Order  | (i)                     | (ii)   | (i)      | (ii)   | (i)      | (ii)   | (i)      | (ii)   |
| 2      | 0.0050                  | 0.0037 | 0.0046   | 0.0025 | 0.0058   | 0.0013 | 0.0069   | 0.0004 |
| 4      | 0.0192                  | 0.0597 | 0.0123   | 0.0314 | 0.0120   | 0.0465 | 0.0070   | 0.0145 |
| 6      | 0.0136                  | 0.0648 | 0.0075   | 0.0462 | 0.0080   | 0.0637 | 0.0048   | 0.0626 |
| 8      | 0.0095                  | -      | 0.0107   | -      | 0.0114   | -      | 0.0076   | -      |

Table 1: Accuracy test results of (i) our proposed design (precision level of 8-bits) compared with (ii) [5] (precision level of 10-bits).

the conventional binary computing. Fault tolerance testing is conducted on our proposed SC 6th-order FIR filter with cutoff frequency at  $0.4\pi$ . The test is performed by randomly injecting various percentage of bit-flipping error in the input signals and the corresponding error-to-signal power ratio is measured and summarized in Table 2. The results show that percentage of random bit-flipping error ranging from 0.5% to 3.0% has minimal impact on the output accuracy of our proposed SC FIR filter. On the contrary, the conventional ideal FIR filter shows significant accuracy degradation as the injected bit-flipping error increases at every 0.5%. These results are further benchmarked with the work reported in [6]. The authors presented two SC 7th-order FIR filter with cutoff frequency at  $0.1\pi$  using direct form and lattice form. Both of their filters also exhibited higher error percentages in comparison to our work.

The multiplexers in our proposed inner product core are positioned in tree structure to avoid error propagation that tends to occur in long critical path. Therefore, with short critical path, the presented SC FIR filter has higher fault tolerance in comparison to the conventional FIR filter as well as the existing SC FIR filters.

### 5.3 Hardware Complexity

The proposed filter is implemented in Cyclone V 5CGXFC7D6F31C6 using Quartus II 11.1. Having the architectures clocked at 100MHz, the timing analysis of the architectures is deduced using TimeQuest Timing Analyzer. Meanwhile, the power consumption reports are derived from the PowerPlay Power Analyzer. The full hardware compilation result of the filter as well as its core units; the inner product and the SNG Module are summarized in Table 3.

## 6 Conclusion

A case study of a new SC FIR filter design using an improved stochastic inner product core was presented in this paper. Without the use of multiplier, the inner product core unit employed stochastic scaled addition with a new scaling scheme that paired the filter's coefficients in according to their weightage. The computation was realized using multiplexers positioned in tree structure, which in turn reduces the critical path as well as the fault propagation in the stochastic circuitry. Such design enhanced the computational accuracy and offered high fault tolerance in SC filter system. For hardware evaluation, a new SC 6th-order FIR filter with the cutoff frequency at  $0.4\pi$  on FPGA platform has been implemented and tested. Experimental results have shown that the presented SC FIR filter outperforms the conventional filter and the existing works in both metrics and also has low hardware cost.

### References:

- [1] B. R. Gaines, 'Stochastic computing', *Proceedings of the Spring Joint Computer Conference, New York, NY, USA*, pp. 149-156,(1967).
- [2] A. Alaghi, and J. P. Hayes, 'Survey of Stochastic Computing', ACM Trans. Embed. Comput. Syst., vol. 12, no. 2, pp. 19, (2013), .
- [3] W. Qian, X. Li, M. D. Riedel, K. Bazargan, and D. J. Lilja, 'An architecture for fault-tolerant computation with stochastic logic',*IEEE Transactions on Computers*, vol. **60**, no. 1, pp. 93-105, (2011).
- [4] B. Moons, and M. Verhelst, 'Energy-efficiency and accuracy of stochastic computing circuits in emerging technologies', *IEEE Journal on E,merging and Selected Topics in Circuits and Systems*, vol. **4**, no. 4, pp. 475-486, (2014).

- [5] Y. N. Chang, and K. K. Parhi, 'Architectures for digital filters using stochastic computing', 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2697-2701, (2013).
- [6] Y. Liu, and K. K. Parhi, 'Lattice fir digital filter architectures using stochastic computing', 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1027-1031, (2015).

Table 2: Error-to-signal power ratio analysis resultant from various percentage of random bit-flipping error in 6th-order FIR filter with cutoff frequency at  $0.4\pi$ .

| Filter              | Percentage of Bit-Flipping |        |        |        |        |        |  |
|---------------------|----------------------------|--------|--------|--------|--------|--------|--|
| Implementation      | 0.5%                       | 1.0%   | 1.5%   | 2.0%   | 2.5%   | 3.0%   |  |
| Our Work            | 0.0076                     | 0.0160 | 0.0209 | 0.0292 | 0.0355 | 0.0514 |  |
| Conventional Filter | 0.1180                     | 0.1874 | 0.3033 | 0.3351 | 0.4409 | 0.4843 |  |
| Direct Form [6]     | 0.0488                     | 0.0540 | -      | -      | -      | -      |  |
| Lattice Form [6]    | 0.0563                     | 0.0820 | -      | -      | -      | -      |  |

Table 3: Hardware review for the FPGA implementation of the SC 6th-order FIR filter with cutoff frequency at  $0.4\pi$ .

| Hardware Requirement/                  | SC FIR | Inner Product | SNG    |
|----------------------------------------|--------|---------------|--------|
| Performance                            | Filter | Core          | Module |
| Combinational ALUTs (112,960)          | 128    | 4             | 26     |
| Memory ALUTs (56,480)                  | 0      | 0             | 0      |
| Dedicated Logic Register (225,920)     | 159    | 1             | 33     |
| Total Register (225,920)               | 159    | 1             | 33     |
| Fmax (MHz)                             | 306.0  | 0             | 429.92 |
| Dynamic Thermal Power Dissipation (mW) | 2.11   | 0.04          | 0.94   |