# VLSI Implementation of Lattice Wave Digital Filters for Increased Sampling Rate Using Three Port Parallel Adaptors 

0 ( ( 1 \$ . 6+ , T $^{*}$ \$ 5 : \$ /<br>Netaji Subhas Institute of Technology<br>Division of ECE<br>Sec-3, Dwarka, New Delhi<br>INDIA<br>mishaagg@gmail.com

TARUN KUMAR RAWAT<br>Netaji Subhas Institute of Technology<br>Division of ECE<br>Sec-3, Dwarka, New Delhi<br>INDIA<br>tarundsp@gmail.com


#### Abstract

The second-order all-pass section is the main building block of the lattice wave digital filter (WDFs). The all-pass sections are conventionally realized using two port adaptors. In this paper, second-order all-pass sections are replaced with three port parallel adaptors. These adaptors, implemented with canonic signed digit coefficients are proposed to increase maximal sampling frequency of lattice WDFs. The proposed implementation of three port parallel adaptors reduces the latency of the critical loop by reducing the components (adders and multipliers). Further increase in maximal sampling frequency is obtained by integrating these three port parallel adaptors using carry propagation adders (CPA) designed with low power and high performance 1-bit full adders, registers as delay elements and binary multipliers. Here, multipliers are implemented using a network of shifts and adders (or subtractors). An example of a filte implementation where the proposed approaches are applied, is presented. In this example multiple-constant multiplication technique is applied to reduce the number of adders in the implementation of multipliers. The sections are integrated using Design Architect and simulated using Eldonet tools of Mentor Graphics V2008 and tested by applying number of input vectors. The results are compared with the conventional second-order all-pass sections. The comparison shows the increase in maximal sampling frequency by approximately $46 \%$ at the cost of about $13 \%$ increase in area.


Key-Words: VLSI design, wave digital filters three port adaptor, full adder, delays

## 1 Introduction

Wave digital filter constitute a wide class of infi nite impulse response (IIR) digital filter that transform an analog network into topological equivalent digital filte [1]. A major advantage of WDFs over most of other recursive filter is that they can inherit the fundamental properties such as sensitivity and stability under finite-arithmeti conditions [2], [3]. In Very Large Scale Integration (VLSI) implementation of a WDF, the silicon area, computational complexity, power consumption, and maximal achievable sampling rate are highly dependent on the coefficien word length [4]. Therefore, the word length should be as short as possible but must be sufficien to satisfy the given filte specification [2]. Additionally, in highly customized VLSI implementations, the general multiplier element is very costly. Therefore, it is benefi cial to carry out the multiplication of a data sample by each filte coefficien value using a sequence of shift-and-add (and/or subtract) operations. The arithmetic complexity of the shift-and-add multiplier can be reduced by using canonic signed digit code (CSDC) for the coefficient representation [5]. The average num-
ber of nonzero bits in a CSDC number is approximately $W_{d} / 3$, where $W_{d}$ is coefficien wordlength. This implementation is called multiplierless.
Previous work has been focused on the VLSI implementation of high speed WDFs using two port adaptors [4], [6]. A method, in which VLSI architecture of two port adaptor circuit is described using carrysave arithmetic, is used to achieve significan increase in sampling rate of WDFs is most significan bit firs arithmetic [8]. The bit level systolic array method is applied for increased sampling rate to design unit element WDF and a lattice WDF using same specifi cations [7]. First- and second-order all-pass sections are basic building blocks of lattice WDFs. An arithmetic transformation has been applied to these basic building blocks and implemented with carry-save arithmetic for increased maximal sampling rate in [9], [10].
In this paper, we have replaced conventional Richards' second-order section with three port parallel adaptor using bit parallel arithmetic to improve maximal sampling rate. Three port parallel adaptor is realized using logic components such as CPA
which are implemented using low power high performance full adders, inverters and registers as delay units. These components are individually designed and implemented using $0.18 \mu \mathrm{~m}$ technology in full custom VLSI design. The registers are designed with the transmission gate based D flip-flop The constant multipliers are implemented with CSDC coefficient and multiple-constant multiplications to save the hardware [5] [10] [11] [12]. The lattice WDF is integrated using these logic components. The integration method of a low-pass lattice WDF is illustrated in Section V by a design example. The filter using conventional Richards' all-pass sections and using three port parallel adaptors are integrated using the same specification This will enable us to make a proper comparison between their corresponding hardware realizations presented in Section VI. The integrated fil ters are simulated and tested by applying number of input vectors. The comparison results show that the latter design is more efficien than the conventional design in terms of maximal sampling rate at the cost of small area overhead. The sampling rate can be further improved by replacing CPA with CSA [9], [10]. Area is compared in terms of number of transistors. Rest of the paper is organized as follows. Section II describes the lattice wave digital filters Section III explains the first and second-order all-pass sections. Section IV explores the adaptor components designs using different logic styles. A design example of a fi ed function low pass lattice wave digital filte using conventional Richards' all-pass sections and using three port parallel adaptors is given in Section V. Comparative analysis of two different second-order sections is given in Section VI. Section VII concludes the paper.

## 2 Lattice Wave Digital Filters

A lattice WDF is composed of two wave digital (WD) all-pass filter in parallel [1] [12] [13]. An $N^{t h}$ order lattice WDF is shown in Fig. 1. Its transfer function can be written as

$$
\begin{equation*}
H(z)=\frac{H_{0}(z)+H_{1}(z)}{2} \tag{1}
\end{equation*}
$$

where $H_{0}(z)$ and $H_{1}(z)$ are WD all-pass filters WD all-pass filter can be realized in many different ways [1], [14]. In this work, we only consider cascade realization of first and second-order all-pass sections. A first-orde all-pass section can be realized using the Richards' structure, where two port adaptors are used [10]. A second-order all-pass section can be realized using the Richards' structure, or a three port adaptor [1]. By cascading the first and second-order


Figure 1: Lattice wave digital filte realization of N order
all-pass sections low-pass, high-pass, band-pass and band-reject lattice WDFs can be designed. The detailed discussion of first and second-order all-pass sections is given in Section III. These sections are recursive structures. Recursive structures require in general a smaller number of arithmetic operations per sample than their nonrecursive counterparts. One limitation of the recursive structure is the maximal sampling frequency $f_{\max }$ at which a filte can operate [1]. The maximal sampling frequency for a recursive algorithm, described by a fully specifie signal fl w graph is [15]

$$
\begin{equation*}
f_{\max }=\frac{1}{T_{\min }}=\min \left\{\frac{N_{i}}{T_{\text {tot }}}\right\} \tag{2}
\end{equation*}
$$

where $T_{\text {min }}$ is the minimal sampling time, $T_{\text {tot }}$ is the total latency of the arithmetic operations and $N_{i}$ is the number of delay elements in the directed loop $i$ [16]. The loop(s) that determines the maximal sampling frequency is called the critical loop(s). The digital filter with high maximal sampling frequency are suitable candidates of low power and high speed applications. The reason is that if required sampling rate is less than the maximal sampling rate, the excess speed can be utilized to reduce the power consumption via power supply voltage scaling techniques [17], [18]. Area can be minimized by clever hardware design [17]. From Eq. (2), we observe two factors that
are affecting the maximal sampling rate. The firs factor is the number of delay elements in the critical loop and second is the total latency in the critical loop. The maximal sampling frequency can be increased by increasing the number of delay elements in the critical loop or by minimizing the critical loop latency. The latency can be reduced by using low-sensitivity fil ters, resulting in short coefficient (low-latency multiplications) and by removing unnecessary operations in the critical loop via numerically equivalent transformations [10]. However, in this paper we have mainly concerned with minimizing the critical loop latency. It is minimized by reducing the number of logic components in the critical loop. It is further minimized by reducing the critical delay of individual logic component. Lattice WDFs are realized using three port series and parallel adaptors using efficien algorithms using bit serial arithmetic and their cyclic scheduling are observed [15]. The lattice WDF is realized using three port series adaptors for increased sampling rate [19].

## 3 Realization of all-pass sections

A lattice WDF shown in Fig. 1, consists of a parallel connection of two all-pass filte branches whose output are summed to produce the filte output. Each of the two filte branches is constructed using a cascade of first and second-order all-pass sections. These sections are constructed from basic two port adaptors. The entire filte is realized as a network of these two port adaptors and delay elements. This architectural modularity makes it well suited for VLSI implementation.
Application of all-pass sections to the filterin problem has the advantage that it produces efficien realization in terms of the number of arithmetic operations for a given filte order. Implementation using two port adaptors leads to realizations that are canonic in terms of both the number of multipliers and the number of delays. Additionally, it has been shown that all-pass filter of this form are almost self-scaling, since they implement bounded functions [20]. Hence, data wordlength growth between the all-pass sections is negligible, entirely removing the need for the inter stage scaling which is often required when using other cascade filte forms. The following equations are used to implement two port adaptor from which the lattice WDF is built (see Figs. 2 and 3) [2], [3].

$$
\begin{align*}
& y_{1}=x_{2}+\gamma\left(x_{2}-x_{1}\right)  \tag{3}\\
& y_{2}=x_{1}+\gamma\left(x_{2}-x_{1}\right) \tag{4}
\end{align*}
$$

In these equations, $x_{1}, x_{2}$ are inputs, $y_{1}, y_{2}$ are outputs and $\gamma$ is an adaptor coefficien lying in the range


Figure 2: Richards' first-orde all-pass section using two port adaptor


Figure 3: Different types of adaptor based on values of Gamma coefficient
$[-1,1][2],[4]$. A first-orde all-pass section, realized using Richards' structure, is shown in Fig. 2, where a two port adaptor is used with a delay element. Depending on the range of $\gamma$ coefficien values, we obtain four types of symmetric two port adaptors, [12], [21]. For type-1, $0.5<\gamma<1$ and $\alpha=1-\gamma$, for type-2, $0<\gamma<0.5$ and $\alpha=\gamma$, for type- $3,-0.5<\gamma<0$ and $\alpha=|\gamma|$ and for type- $4,-1<\gamma<0.5$ and $\alpha=1+\gamma$. Figure 3 shows these four types of structures.

The transfer function for the first-orde section is given by [15]

$$
\begin{equation*}
H(z)=\frac{-\alpha_{0} z+1}{z-\alpha_{0}} \tag{5}
\end{equation*}
$$

where $\alpha_{0}$ is the adaptor coefficient Let us assume the critical path delay for a multiplier is $T_{\mathrm{m}}$ and for an adder is $T_{\mathrm{a}}$. For a first-orde section, the critical loop is shown by dotted lines in Fig. 2. Since this critical loop has one multiplier and two adders, the maximal


Figure 4: Richards' second-order all-pass section using two port adaptors
sampling frequency $f_{\text {max }}$ is given by

$$
\begin{equation*}
f_{\max }=\frac{1}{T_{\mathrm{m}}+2 T_{\mathrm{a}}} \tag{6}
\end{equation*}
$$

A second-order all-pass section, realized using Richards' structure is shown in Fig. 4. The transfer function of this second-order section is given by [15]

$$
\begin{equation*}
H(z)=\frac{-\alpha_{1} z^{2}+\alpha_{2}\left(\alpha_{1}-1\right) z+1}{z^{2}+\alpha_{2}\left(\alpha_{1}-1\right) z-\alpha_{1}} \tag{7}
\end{equation*}
$$

where $\alpha_{1}$ and $\alpha_{2}$ are adaptor coefficients The critical loop is shown by dotted lines in Fig. 4. Since this critical loop has two multipliers and four adders, the maximal sampling frequency is

$$
\begin{equation*}
f_{\max }=\frac{1}{2 T_{\mathrm{m}}+4 T_{\mathrm{a}}} \tag{8}
\end{equation*}
$$

The three port parallel adaptor all-pass section is shown in Fig. 5. The transfer function of this section is given by [15]

$$
\begin{equation*}
H(z)=\frac{\left(\beta_{1}+\beta_{2}-1\right) z^{2}+\left(\beta_{1}-\beta_{2}\right) z+1}{z^{2}+\left(\beta_{1}-\beta_{2}\right) z+\left(\beta_{1}+\beta_{2}-1\right)} \tag{9}
\end{equation*}
$$

where $\beta_{1}$ and $\beta_{2}$ are adaptor coefficients Comparing Eq. (7) and (9), we fin [15]

$$
\begin{equation*}
\beta_{1}=\frac{\left(\alpha_{1}-1\right)\left(\alpha_{2}-1\right)}{2}, \beta_{2}=\frac{\left(1-\alpha_{1}\right)\left(1+\alpha_{2}\right)}{2} \tag{10}
\end{equation*}
$$

The two critical loops are shown by dotted lines in Fig. 5. Loop 1 has one multiplier and three adders,


Figure 5: Second-order all-pass section using three port parallel adaptor
and loop 2 has one multiplier and four adders. Since loop 2 contains more components, therefore, it is considered as critical loop. The maximal sampling frequency is given by

$$
\begin{equation*}
f_{\max }=\frac{1}{T_{\mathrm{m}}+4 T_{\mathrm{a}}} \tag{11}
\end{equation*}
$$

We observe, that the critical loop of the Richard's structure based second-order section contains two multipliers and four adders. However, a three port parallel adaptor contains only one multiplier and four adders. For the latter realization, the price to pay is somewhat longer coefficien wordlength. However, it is found that the three port adaptor coefficient typically require one extra bit to match the performance of the two port realization for a given coeffi cient wordlength [19].

## 4 Hardware Implementation

The fi ed function three port parallel adaptor is implemented using power efficien components (full adders and $D$ fli flops using $0.18 \mu \mathrm{~m}$ technology in CMOS VLSI design. Multiple adders and/or subtractors are needed for the representation of adaptor coefficient and all-pass sections. The first-orde WD all-pass section design is based on the two port adaptor and delay element. The second-order WD all-pass sections are constructed using three port parallel adaptor and delays. The components description is as follows.


Figure 6: 1-bit Full adder circuit based on improved XOR-XNOR cell

### 4.1 Full Adder Design

Improving the 1-bit adder performance would greatly enhance the execution of binary operations in digital circuits. In this paper, the 1-bit hybrid full adder is used to implement the lattice WDF [22]. The adder is shown in Fig. 6, where $A, B$ and $C$ are the inputs and S and $C_{\text {out }}$ are the outputs. The transistor sizes are technology dependent [17]. This adder is used to design $n$-bit CPA adder. It is observed that when full adder units are cascaded, the output voltage is dropped by single or multiple times of the threshold voltage. To avoid this problem adders are cascaded with buffered inputs and outputs [22].

In this design, adders are used to operate on real time input data samples which may be positive or negative. Therefore, we have assumed that the input samples are in two's complement form. The time performance of an adaptor depends only on the coefficien wordlength. Therefore, extending the wordlength inside an adaptor would result in die area penalty. The filte coefficient are represented with CSDC for reduced number of nonzero coefficien bits. The bit level implementation of an adaptor for different operations requires three fundamental circuits (cells) of adders. These cells are shown in Fig. 7.

Cell 1, 2 and 3 are used to implement two port and three port parallel adaptor structures. Cell 1 is a selectable subtraction unit which is used to perform the operation $A-B$ where $A$ and $B$ are the inputs. It is shown in Fig. 7(a). As shown in Fig. 7(b), cell 2 is a selectable adder/subtractor unit. It performs the operation $A+B$. Cell 3 is a selectable subtractor/adder unit which is used to perform the operation $-A-B$.


Figure 7: Adder cells


Figure 8: Transmission gate based $D$ flip-fl

It is shown in Fig. 7(c). These adder cells are used to design CPA. The multipliers are implemented with shift and add and/or subtract operations. Therefore, same adder cells can be used to implement multipliers only by applying shifted versions of the input to these adder cells.

### 4.2 Delay Element

In this paper, we have designed a simple $D$ flip-flo as shown in Fig. 8 for delay unit. A transmission gate based 2:1 multiplexer, a static inverter and a buffer unit are used to integrate the $D$ flip-flo Here, 0.18 $\mu \mathrm{m}$ technology is used and operating clock frequency is 100 MHz .

### 4.3 Multipliers

In this work, multiplication of a data sample with each filte coefficien value is performed using a sequence of shift and add and/or subtract operations. The arithmetic complexity of the shift-and-add multiplier is reduced by using CSDC for representation of the coeffi cients [5]. The hardware cost is further reduced by using multiple-constant multiplication technique which utilizes redundancy between the coefficients

## 5 Design Example

In this Section the design and implementation of a $9^{\text {th }}$ order fi ed-point lattice WDF using two port adaptors and three port parallel adaptors along with their multiplierless implementation is presented. Consider a low-pass filte with the following specification [21, pp.12]: Sampling frequency $=16000 \mathrm{~Hz}$, Passband edge frequency $=3400 \mathrm{~Hz}$, Stopband edge frequency $=4500 \mathrm{~Hz}$, Passband ripple $=0.5 \mathrm{db}$, Stopband attenuation $=50 \mathrm{db}$, Filter type $=$ Chebyshev and Filter order $=9$.

### 5.1 Lattice WDF design using two port adaptors and delays

The block diagram of an $N^{\text {th }}$ order low-pass lattice WDF as shown in Fig. 1. Here, we have considered a $9^{\text {th }}$ order low-pass lattice WDF, which is composed of one first-orde and four second-order sections. The blocks of first and second-order sections are replaced with their equivalent signal fl w graphs shown in Fig. 2 and 4, respectively. The signal fl w graph of a $9^{\text {th }}$ order low-pass lattice WDF using Richard's sections is shown in Fig. 9. The maximal sampling rate is determined by one of the critical loops in these filte sections. The maximal sampling frequency for each of these all-pass sections of this structure are given in Eq. (12) through (16).

$$
\begin{align*}
f_{\max \alpha_{0}} & =\frac{1}{T_{\mathrm{m}}+2 T_{\mathrm{a}}}  \tag{12}\\
f_{\max \alpha_{1}, \alpha_{2}} & =\frac{1}{2 T_{\mathrm{m}}+5 T_{\mathrm{a}}}  \tag{13}\\
f_{\max \alpha_{3}, \alpha_{4}} & =\frac{1}{2 T_{\mathrm{m}}+5 T_{\mathrm{a}}}  \tag{14}\\
f_{\max \alpha_{5}, \alpha_{6}} & =\frac{1}{2 T_{\mathrm{m}}+4 T_{\mathrm{a}}}  \tag{15}\\
f_{\max \alpha_{7}, \alpha_{8}} & =\frac{1}{2 T_{\mathrm{m}}+4 T_{\mathrm{a}}} \tag{16}
\end{align*}
$$



Figure 9: Signal fl w graph of lattice WDF using Richard's second-order sections

These equations yield a $f_{\max }$ for the entire filte which is given by

$$
\begin{align*}
f_{\max } & =\min \left\{\frac{1}{T_{\mathrm{m}}+2 T_{\mathrm{a}}}, \frac{1}{2 T_{\mathrm{m}}+5 T_{\mathrm{a}}}, \frac{1}{2 T_{\mathrm{m}}+5 T_{\mathrm{a}}}\right. \\
& \left.\frac{1}{2 T_{\mathrm{m}}+5 T_{\mathrm{a}}}, \frac{1}{2 T_{\mathrm{m}}+4 T_{\mathrm{a}}}\right\} \\
& =\frac{1}{2 T_{\mathrm{m}}+5 T_{\mathrm{a}}} \tag{17}
\end{align*}
$$

Now, the multipliers are implemented with shift and add operations using multiple constant multiplication method, which is called as multiplierless implementation. The $\gamma$ coefficients adaptor type, $\alpha$ coeffi cients, and their representations in binary and CSDC formats for this $9^{\text {th }}$ order low pass lattice WDFs are given in Table-I. Fig. 10 is obtained by replacing multipliers with shift operations and CPA adders/ subtractors using CSD coefficient given in Table-I. Depending upon addition/ subtraction, the adders are replaced with adder cells (cell 1, cell 2 or cell 3 ) represented in Section-IV. The maximum sampling frequency of each of the all-pass sections using multiplierless implementation is given in Eq. (18) through (22),


Figure 10: Multiplierless implementation of lattice WDF using Richard's second-order sections

Table 1: Low-pass filte parameters

| $\gamma_{i}, 0 \leq i \leq 8$ | Adaptor Type | $\alpha_{j}, 0 \leq j \leq 8$ | $\alpha_{\text {Binary }}$ | $\alpha_{C S D C}$ |
| :---: | :---: | :---: | :---: | :---: |
| 0.667713527 | 1 | 0.33228647 | 0.01010101 | 0.01010101 |
| -0.49630558 | 3 | 0.49630558 | 0.01111111 | $0.1000000 \overline{1}$ |
| 0.797917736 | 1 | 0.202082263 | 0.00110100 | $0.010 \overline{1} 0100$ |
| -0.618835168 | 4 | 0.381164832 | 0.01100010 | $0.10 \overline{1} 00010$ |
| 0.542641521 | 1 | 0.457358479 | 0.01110101 | $0.100 \overline{1} 0101$ |
| -0.766286584 | 4 | 0.233713416 | 0.00111100 | $0.01000 \overline{1} 00$ |
| 0.328193215 | 2 | 0.328193215 | 0.01010100 | 0.01010100 |
| -0.919144204 | 4 | 0.080855796 | 0.00010101 | 0.00010101 |
| 0.217705053 | 2 | 0.217705053 | 0.01001110 | $0.0100 \overline{1} 000$ |

$$
\begin{gather*}
f_{\max \alpha_{0}}=\frac{1}{2 T_{\mathrm{a}}+2 T_{\mathrm{a}}}=\frac{1}{4 T_{\mathrm{a}}}  \tag{18}\\
f_{\max \alpha_{1}, \alpha_{2}}=\frac{1}{3 T_{\mathrm{a}}+5 T_{\mathrm{a}}}=\frac{1}{8 T_{\mathrm{a}}}  \tag{19}\\
f_{\max \alpha_{3}, \alpha_{4}}=\frac{1}{4 T_{\mathrm{a}}+5 T_{\mathrm{a}}}=\frac{1}{9 T_{\mathrm{a}}}  \tag{20}\\
f_{\max \alpha_{5}, \alpha_{6}}=\frac{1}{3 T_{\mathrm{a}}+4 T_{\mathrm{a}}}=\frac{1}{7 T_{\mathrm{a}}}  \tag{21}\\
f_{\max \alpha_{7}, \alpha_{8}}=\frac{1}{3 T_{\mathrm{a}}+4 T_{\mathrm{a}}}=\frac{1}{7 T_{\mathrm{a}}} \tag{22}
\end{gather*}
$$

These equations yield a $f_{\text {max }}$ for the multiplierless implementation of the entire filte which is given by

$$
\begin{equation*}
f_{\max }=\min \left\{\frac{1}{4 T_{\mathrm{a}}}, \frac{1}{8 T_{\mathrm{a}}}, \frac{1}{9 T_{\mathrm{a}}}, \frac{1}{7 T_{\mathrm{a}}}, \frac{1}{7 T_{\mathrm{a}}}\right\}=\frac{1}{9 T_{\mathrm{a}}} \tag{23}
\end{equation*}
$$

### 5.2 Lattice WDF design using three port parallel adaptors and delays

In this section, the blocks of second-order sections are replaced with signal fl w graph of three port parallel adaptors, shown in Fig. 5. The maximal sampling frequency is determined by one of the critical loops of second-order all-pass section in this filte. The maximal sampling frequency for each of these all-pass sections, using three port parallel adaptors is same and is given in Eq. (24)

$$
\begin{equation*}
f_{\max }=\frac{1}{T_{\mathrm{m}}+4 T_{\mathrm{a}}} \tag{24}
\end{equation*}
$$

Now, the multipliers are implemented with shift and add operations using multiple constant multiplication method. Using Eq. (10), the $\beta$ coefficient and their binary and CSDC equivalents are calculated to implement second-order all-pass section using three port parallel adaptor and are given in Table-II.

Table 2: $\beta$ coefficient of low pass filte

| $\beta_{k}, 1 \leq k \leq 8$ | $\beta_{\text {Binary }}$ | $\beta_{\text {CSDC }}$ |
| :---: | :---: | :---: |
| 0.20095334 | 0.001100111 | $0.010 \overline{1} 0100 \overline{1}$ |
| 0.30274107 | 0.010011011 | $0.01010 \overline{\overline{1}} 0 \overline{\overline{1}}$ |
| 0.1679029 | 0.001010110 | $0.010 \overline{\overline{1}} \overline{0} 0 \overline{1} 0$ |
| 0.45093234 | 0.011100111 | $0.100 \overline{1} 0100 \overline{1}$ |
| 0.2573983 | 0.010000100 | 0.010000100 |
| 0.50888833 | 0.100000101 | 0.100000101 |
| 0.359521 | 0.010111000 | $0.10 \overline{1} 00 \overline{1} 000$ |
| 0.55962328 | 0.100011111 | $0.10010000 \overline{1}$ |

The signal fl w graph of a multiplierless $9^{\text {th }}$ order low-pass lattice WDF using three port parallel adaptors is shown in Fig. 10. In this figur $\alpha_{0}$ is implemented with two port adaptor. Fig. 10 is obtained by replacing multipliers with shift and add operations using CSDC coefficient given in Table-II. The maximum sampling frequency for each of the all-pass sections is given in Eq. (25) through (28).

$$
\begin{align*}
& f_{\max \beta_{1}, \beta_{2}}=\frac{1}{2 T_{\mathrm{a}}+4 T_{\mathrm{a}}}=\frac{1}{6 T_{\mathrm{a}}}  \tag{25}\\
& f_{\max \beta_{3}, \beta_{4}}=\frac{1}{2 T_{\mathrm{a}}+4 T_{\mathrm{a}}}=\frac{1}{6 T_{\mathrm{a}}}  \tag{26}\\
& f_{\max \beta_{5}, \beta_{6}}=\frac{1}{T_{\mathrm{a}}+4 T_{\mathrm{a}}}=\frac{1}{5 T_{\mathrm{a}}}  \tag{27}\\
& f_{\max \beta_{7}, \beta_{8}}=\frac{1}{2 T_{\mathrm{a}}+4 T_{\mathrm{a}}}=\frac{1}{6 T_{\mathrm{a}}} \tag{28}
\end{align*}
$$

These equations yield a $f_{\text {max }}$ for the multiplierless implementation of the entire filte using three port parallel adaptors which is given by

$$
\begin{equation*}
f_{\max }=\min \left\{\frac{1}{6 T_{\mathrm{a}}}, \frac{1}{6 T_{\mathrm{a}}}, \frac{1}{5 T_{\mathrm{a}}}, \frac{1}{6 T_{\mathrm{a}}}\right\}=\frac{1}{6 T_{\mathrm{a}}} \tag{29}
\end{equation*}
$$



Figure 11: Multiplierless implementation of lattice WDF using three port parallel adaptors

## 6 Results and Analysis

Comparing all second-order sections, the maximal sampling frequency of low-pass lattice WDF designed using Richards' sections is $f_{\max }=\frac{1}{9 T_{\mathrm{a}}}$. Similarly the maximal sampling frequency of low-pass lattice WDF filte designed using three port parallel adaptor is $f_{\max }=\frac{1}{6 T_{\mathrm{a}}}$. Hence, about $50 \%$ increase is observed in maximal sampling frequency. This analysis shows that the low-pass lattice WDF filte designed using three port parallel adaptor is more efficient
For further decrement in maximal sampling frequency, the CPA adders and registers are used to implement different two port and three port adaptors of the presented low-pass lattice WDF. The CPA adder is designed using cell 1 , cell 2 or cell 3 , full adder cells according to addition or subtraction operation using bit parallel arithmetic. The CPA adder is simulated and tested in simulation environment given in [22]. In this implementation, the sampling frequency is depending on adders delay only. For given example in Section V, the minimal sampling period comparison of multiplierless implementations of the two approaches are presented in Table-III. From this comparison, it is found that the maximal sampling frequency of the low-pass lattice WDF is improved significantl using multiplierless three port parallel adaptors than the two port adaptor all-pass sections.

Table 3: Minimum Sampling period comparison of multiplierless Second-order all-pass sections using Richards' section and three port parallel adaptors

| Richards' sections(ns) | three port parallel adaptors(ns) |
| :---: | :---: |
| $T_{\min \alpha_{1} \alpha_{2}}=22.194$ | $T_{\min \beta_{1} \beta_{2}}=16.982$ |
| $T_{\min \alpha_{3} \alpha_{4}}=25.360$ | $T_{\min \beta_{3} \beta_{4}}=17.253$ |
| $T_{\min } \alpha_{5} \alpha_{6}=20.670$ | $T_{\min \beta_{5} \beta_{6}}=14.576$ |
| $T_{\min \alpha_{7} \alpha_{8}}=22.354$ | $T_{\min \beta_{7} \beta_{8}}=17.350$ |

Reducing the latency of the critical loop, maximal sampling frequency can be improved. From Table-III, it is observed that the minimal sampling period $T_{\text {min }}$ of low-pass lattice WDF using Richards' second-order section is 25.360 ns and $T_{\text {min }}$ using three port parallel adaptor is 17.350 ns . Hence maximal sampling frequencies for Richards' sections and three port adaptors are $f_{\text {max }}=39.43218 \mathrm{MHz}$ and $f_{\text {max }}=57.636 \mathrm{MHz}$, respectively. The $f_{\text {max }}$ is further increased by approximately $46 \%$. The area is measured in terms of number of transistors for the two different approaches. The implementation results are summarized in Table-IV.

## 7 Conclusion

A novel approach to design a fi ed coefficien lattice WDF for increased maximal sampling frequency is presented. The maximal sampling frequency is in-

Table 4: Area comparison (in terms of transistor counts)of low-pass lattice WDF using Richards' section and three port parallel adaptors

|  | Richards' sections | Three port adaptors |
| :---: | :---: | :---: |
| Technology | $0.18 \mu \mathrm{~m}$ | $0.18 \mu \mathrm{~m}$ |
| Area | 24750 | 27940 |

creased by reducing the number of logic components in the critical loop and further increased by reducing critical delay of individual logic component. Three port parallel adaptor has less number of logic components in their critical loop than Richards' second-order section. From given example in Section V, the maximal sampling frequency is improved by about $50 \%$ using three port parallel adaptors than the conventional Richards' sections. This is further improved by multiplierless implementation of lattice WDF using individual logic components such as hybrid full adders, delay elements and simple logic circuitry. These components are designed and implemented using $0.18 \mu \mathrm{~m}$ technology in full custom VLSI design. Since, multipliers are the most power and area consuming elements, therefore, we have replaced them by shift and add and/or subtract units. With the help of CSDC coefficients the fi ed coefficien multiplierless second order Richards' sections and three port parallel adaptors are integrated for signed binary numbers. The adaptors are implemented in given design example. The lattice WDF designed using three port parallel adaptors offers about $46 \%$ further increase in fmax, compared to conventional Richards' design at the cost of approximately $13 \%$ increase in area. Here, area is measured in terms of transistors count.

## References:

[1] H. Johansson and L. Wanhammar, Wave digital filte structures for high-speed narrow-band and wide-band filtering IEEE Transaction of Circuits and Systems-II: Analog and Digital Signal Processing, vol. 46, no. 6, 1999, pp. 726-741.
[2] A. Fettweis, Wave digital filters Theory and practice Proceedings of The IEEE, vol. 74, no. 2, 1986, pp. 270-327.
[3] A. Fettweis, On Adaptors for Wave Digital Filters, IEEE transaction on acoustics, speech, and signal processing, vol. 23, no. 6, 1975, pp. 516-525.
[4] T. Wicks and S. Summerfield VLSI implementation of high speed wave digital filter
based on a restricted coefficien set, Proceeding IEEE International Symposium on Circuits and Systems, vol. 1, 1993, pp. 603-606.
[5] A. Avizienis, Signed-digit number representation for fast parallel arithmetic, IRE Transactions on Electronics Computers, vol. 10, no. 3, 1961, pp. 389-400.
[6] S. Summerfield T. Wicks and S. Lawson, Design and VLSI architecture and implementation of wave digital filter using short signed digit coefficients IEE Proceedings-Circuits devices Systems, vol. 143, no. 5, 1996, pp. 259-266.
[7] S. Summerfiel and S. Lawson, The design of wave digital filte using fully pipelined bit-level systolic arrays, Journal of VLSI Signal Processing, vol. 2, 1990, pp. 51-64.
[8] R.J. Singh, J. V. McCanny, High performance VLSI architecture for Wave Digital Filtering, Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, vol. 4, no. 4, 1992, pp. 269-278.
[9] H.Ohlsson, O. Gustafsson, H. Johansson and L. Wanhammar, Implementation of bit-parallel lattice wave digital filter with increased maximal sample rate, Proceedings IEEE, 2001, pp. 71-74.
[10] H. Ohlsson, Studies on implementation of digital filters with high throughput and low power consumption, Thesis No. 1031, Linkping University, Sweden, 2003.
[11] K. Johansson, O. Gustafsson and L. Wanhammar, Multiple Constant Multiplication for Digit-Serial Implementation of Low Power FIR Filters, WSEAS Transaction on Circuits and Systems, vol.5, no. 7, 2006, pp. 1001-1008.
[12] J. Yli-Kaakinen and T. Saramki, A systematic algorithm for the design of lattice wave digital filter with short-coefficien wordlength, IEEE Transaction on Circuits and Systems-I, vol. 54, no. 8, 2007, pp. 1838-1851.
[13] J. Yli-Kaakinen and T. Saramki, Design of very low-sensitivity and low-noise recursive filter using a cascade of low order lattice wave digital filte, IEEE Transaction on Circuits and Systems

II- Analog and Digital Signal Processing,vol.
46, 1999, pp. 906-914.
[14] L. Wanhammar, DSP integrated circuits, New York: Academic, 1999.
[15] O. Gustafsson and L. Wanhammar, Maximally fast scheduling of bit-serial lattice wave digital filter using three-port adaptor allpass sections, Proceeding Nordic Signal Processing Symp., 2000, pp. 441-444.
[16] M. Renfors, Y. Neuvo, The maximum sampling rate of digital filter under hardware speed constraints, IEEE Transactions on Circuits and Systems, vol. 28, no. 3, 1981, pp. 196-202.
[17] S.M. kang and Y. Leblebici, CMOS Digital Integrated Circuits, Tata McGraw-Hill Companies 3rd edn. (2003).
[18] A.P. Chandrakasan and R.W. Brodersen, Low Power Digital CMOS Design, Kluwer Academic Norwell, MA, 1995.
[19] M.S. Anderson, S. Summerfiel and S. Lawson, Realization of lattice wave digital filter using three-port adaptors, Electronics Letters, vol. 31, no. 8, 1995, pp. 628-629.
[20] L Gazsi, Explicit Formulas for Lattice Wave Digital Filters, IEEE Transactions on Circuits and Systems , vol. 32, no. 1, 1985, pp. 68-88.
[21] K. Venkat, Wave digital filterin using MSP430. Application Report, Texas Instruments, SLAA331, 2006, pp. 1-25.
[22] C.H. Chang, G.U. Jiangmin and M. Zhang, A review of $0.18-\mu \mathrm{m}$ full adder performances for tree structured arithmetic circuits. IEEE Transaction on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 6, 2005, pp. 686-695.

