Field of the Invention
The present invention relates to minimized area implementation
of a sum of products expression.

Background of the Invention
The implementation of Sum of Products (SOP) expression
can be used to realize signalprocessing applications on hardware. SOP implementation
is hereby discussed with reference to hardware realization of a digital filter.

A digital filter is an electronic circuit processing discrete
signal samples to perform a desired transfer function operation on the said discrete
signal samples. The digital filter is a Finite Impulse Response (FIR) or Infmite
Impulse Response (IIR) filter. Following components are used in the digital filter
implementation.

__Unit sample delay (11):__ Unit sample delay (Z^{-1}) is shown in Figure
1(a). Unit sample delay is a Parallel In Parallel Out (PIPO) register as it captures
the sample at its input on the next positive clock edge.

__Multiplier (12):__ Takes in an input 'x' and give multiplied output 'y' = 'a*x',
where 'a' is the multiplicand. Its symbol is a triangle with multiplicand by its
side as shown in Figure 1(b).

__'P1' bit adders (13):__ Takes in two inputs each of width 'p1' bits and gives
an added output of 'p1' bits. Hence this adder is called as the 'p1' bit parallel
adder and its symbol is '+' in a circle as shown in Figure 1(b).

__'P1' bit subtractor (15):__ Takes in two inputs each of width 'p1' bits and
subtracts one output from the other and gives an output of 'p1' bits. Hence this
subtractor is called as the 'p1' bit parallel subtractor and its symbol is shown
in a circle with '+' in it and '-' on one of the inputs as shown in Figure 1(b).

__Parallel shifter (14):__ Takes in an input 'x' and gives 'z' bit left shifted
output. Suppose 'x=101' and 'z=2' then 'y = 10100'. Its symbol is a line with '<<z'
on it (where 'z' can be any positive number) as shown in Figure 1(b).

Working of Unit sample Delay is evident from the Timing
diagram shown in Figure 1(c). The samples X0, X1, X2..X7 at the input of unit sample
delay in Figure 1(c) appear as delayed by one Clock (Clk) period as they are captured
by the next positive edge of the Clk.

The transfer function H (Z) of a FIR filter has the generic
form as given in equation (2a);
$$\mathrm{H}\left(\mathrm{Z}\right)={\displaystyle \sum _{\mathrm{k}=0}^{\mathrm{N}}{\mathrm{b}}_{\mathrm{k}}{\mathrm{z}}^{-\mathrm{k}}}$$

where b_{K} represents the coefficients, and z^{-k} represents delay
of k clock cycles.

Equation of FIR filter is given below:
$$\mathrm{y}\left(\mathrm{n}\right)={\displaystyle \sum _{\mathrm{k}=0}^{\mathrm{N}}{\mathrm{b}}_{\mathrm{k}}\mathrm{x}\left(\mathrm{n}-\mathrm{k}\right)}$$

where y (n) is the output and x (n-K) is the delayed input.

The FIR filter has different types of implementations in
hardware. Two of the important implementations of the FIR filter are direct form
of coefficient bank implementation as shown in Figure 2(a) and transposed form of
implementation as shown in Figure 2(b).

The direct form of the FIR filter has input X connected
to the first unit sample delay. Unit sample delays (11) are connected serially to
each other. Taps S_{0} to S_{N} obtained from unit sample delays
(11) are connected to respective coefficient multiplier (12) b_{0} to b_{N}.
The output of multipliers (12) is added through plurality of adders (13,15) to form
output Y of the filter.

The transposed form of the FIR filter (2b) has the input
X connected to the plurality of coefficient multipliers b_{0} to b_{N}
(12) resulting in taps S_{0} to S_{N}, The taps S_{0} to
S_{N} are connected to plurality of unit sample delays (11) and adders (13,15)
to form output Y of the filter.

Given below is the equation of the IIR filter:
$$\mathrm{H}\left(\mathrm{Z}\right)=\left({\displaystyle \sum _{\mathrm{m}=0}^{\mathrm{M}}{\mathrm{b}}_{\mathrm{m}}{\mathrm{z}}^{-\mathrm{m}}}\right)/\left(1-{\displaystyle \sum _{\mathrm{l}=1}^{\mathrm{L}}{\mathrm{a}}_{\mathrm{l}}{\mathrm{z}}^{-\mathrm{l}}}\right)$$

where H (Z) is the transfer function, b_{m} and a_{1} are the coefficients
and z^{-k} represent a delay of k clock cycles.

The equation of the IIR filter can also be written as in
equation (2d);
$$\mathrm{y}\left(\mathrm{n}\right)={\displaystyle \sum _{\mathrm{l}=1}^{\mathrm{L}}{\mathrm{a}}_{\mathrm{l}}\mathrm{y}\left(\mathrm{n}-\mathrm{l}\right)}+{\displaystyle \sum _{\mathrm{m}=0}^{\mathrm{M}}{\mathrm{b}}_{\mathrm{m}}\mathrm{x}\left(\mathrm{n}-\mathrm{m}\right)}$$

where y (n) is the output, y(n-1) is the delayed output and x(n-m) is the delayed
input.

The implementation of coefficient is implementation of
a number in hardware. Hence the term coefficient and number are used interchangeably
in the specification.

The structure of direct and transposed form IIR filter
is shown in figure 2(c). The direct form structure has a plurality of unit sample
delays (11), and two sets of direct form coefficient bank (1) formed by plurality
of adders (13,15) and plurality of the multipliers (12) connected to the taps. The
transposed form of IIR filter has plurality of unit sample delays (11), and two
sets of transposed form coefficient bank (1) formed by a plurality of adders (13,15)
and plurality of the multipliers (12) connected to the taps as shown in Figure 2(c).

The parallel direct or transposed form of coefficient bank
is the area consuming part in the parallel digital filter implementations. An existing
way of implementing the parallel direct or transposed form of coefficient bank using
the shift (12) and add (13,15) has been described in Figure 2(b). Left shift is
'0' cost in parallel implementations as it needs only appending lines set at '0'
on the left of the parallel bus. The remaining part of shift (12) and add (13,15)
implementation comprises adders.

The implementation of parallel multipliers using shift
and add is shown in figure 2(d).

Existing method for implementing direct form of coefficient
bank is described as follows:

The existing method of implementation of direct form of
coefficient bank has two parts, part1 is an algorithm shown in figure 2(e), which
takes coefficients taps as input and gives equation of direct form of coefficient
bank and horizontal sub expressions as output and part 2 is the generalized structure
shown in figure 2(g), which is obtained by mapping the equation of the direct form
of coefficient bank, and horizontal sub-expressions onto hardware. The method is
explained by using coefficient bank implementation for a given example:

Assuming that the example coefficients of an FIR filter
are given by, Set 1= {-79,1044,-5890,27916, 49362,-8382,1628,-154,63,126}.

The equation form of the direct form of coefficient bank
for the example is as given below:
$$\mathrm{Y}=\left[-79\right]\mathrm{S}0+\left[1044\right]\mathrm{S}1+\left[-5890\right]\mathrm{S}2+\left[27916\right]\mathrm{S}3+\left[49362\right]\mathrm{S}4+\left[-8382\right]\mathrm{S}5+\left[1628\right]\mathrm{S}6+\left[-154\right]\mathrm{S}7+\left[63\right]\mathrm{S}8+\left[126\right]\mathrm{S}9$$

Figure 2(f1) shows the direct form of implementation of
the example FIR filter. Structure (1) is the direct form of coefficient bank, that
comprises unit sample delays (11), multipliers and adders (12,13,15). The execution
of the existing method (Figure 2(e)) is explained with reference the coefficients
of Set 1.

__Step 1:__ This step illustrates the conversion of coefficients into Canonical
Signed Digit (CSD) representation. Here the value of 'i' is -1.
$$\begin{array}{l}-79=\mathrm{i}0\mathrm{i}0001;\mathrm{\hspace{1em}}1044=10000010100;\mathrm{\hspace{1em}}-5890=\mathrm{i}01001000000\mathrm{i}0;\mathrm{\hspace{1em}}27916=\\ 100\mathrm{i}0100010\mathrm{i}00\end{array}$$
$$\begin{array}{l}49362=10\mathrm{i}0000010\mathrm{i}010010;\mathrm{\hspace{1em}}-8382=\mathrm{i}0000\mathrm{i}01000010;\mathrm{\hspace{1em}}1628=10\mathrm{i}010\mathrm{i}00\mathrm{i}00;\mathrm{\hspace{1em}}-154\\ =\mathrm{i}0\mathrm{i}010\mathrm{i}0;\mathrm{\hspace{1em}}63=100000\mathrm{i};\mathrm{\hspace{1em}}126=100000\mathrm{i}0\end{array}$$

The direct form of coefficient bank with CSD representation
of coefficients is as given below:
$$\begin{array}{l}\mathrm{Y}=\left(2^0-2^4-2^6\right)\mathrm{S}0+\left(2^2+2^4+2^10\right)\mathrm{S}1+\left(-2^1+2^8+2^11-2^13\right)\mathrm{S}2+\left(-2^2+2^4+2^8-2^10-2^12+2^15\right)\mathrm{S}3+\left(2^1+2^4-2^6+2^8-2^14+2^16\right)\mathrm{S}4+\left(2^1+2^6-2^8-2^13\right)\mathrm{S}5+\\ \left(-2^2-2^5+2^7-2^9+2^11\right)\mathrm{S}6+\left(-2^1+2^3-2^5-2^7\right)\mathrm{S}7+\left(-2^0+2^6\right)\mathrm{S}8+\left(-2^1+2^7\right)\mathrm{S}9\end{array}$$
or
$$\mathrm{Y}=\mathrm{S}0-\mathrm{S}8+\left(-\mathrm{S}2+\mathrm{S}4+\mathrm{S}5-\mathrm{S}7-\mathrm{S}9+2\left(\mathrm{S}1-\mathrm{S}3-\mathrm{S}6+2\left(\mathrm{S}7+2\left(-\mathrm{S}0+\mathrm{S}1+\mathrm{S}3+\mathrm{S}4+2\left(-\mathrm{S}6-\mathrm{S}7+2\left(-\mathrm{S}0-\mathrm{S}4+\mathrm{S}5+\mathrm{S}8+2\left(\mathrm{S}6-\mathrm{S}7+\mathrm{S}9+2\left(\mathrm{S}2+\mathrm{S}3+\mathrm{S}4-\mathrm{S}5+2\left(-\mathrm{S}6+2\left(\mathrm{S}1-\mathrm{S}3+2\left(\mathrm{S}2+\mathrm{S}6+2\left(-\mathrm{S}3+2\left(-\mathrm{S}2-\mathrm{S}5+2\left(-\mathrm{S}4+2\left(\mathrm{S}3+2\left(\mathrm{S}4\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)$$

Number of adders required to implement equation (2f2) are
38.

Hereafter, Adders/ Subtractors are referred as adders for the ease of explanation.

__Step2:__ Horizontal subexpressions are formed from the CSD representation of
coefficients. Table 1 shows the horizontal optimization for the example. From Table1,
it is seen that sub expression in row 2^0 is -(S0-S8), and in row 2^6 it is (S0-S8).
Since S0-S8 is present in two 'rows', it is called as horizontal sub expression.

Similarly S1-S3, -S2+S5, S3+S4 are other horizontal sub
expressions. The equation of the direct form of coefficient bank of the example
after horizontal optimization is:
$$\mathrm{Y}=\mathrm{X}1+2\left(\mathrm{X}3+\mathrm{S}4-\mathrm{S}7-\mathrm{S}9+2\left(\mathrm{X}2-\mathrm{S}6+2\left(\mathrm{S}7+2\left(-\mathrm{S}0+\mathrm{S}1+\mathrm{X}4+2\left(-\mathrm{S}6-\mathrm{S}7+2\left(-\mathrm{X}1-\mathrm{S}4+\mathrm{S}5+2\left(\mathrm{S}6-\mathrm{S}7+\mathrm{S}9+2\left(-\mathrm{X}3+\mathrm{X}4+2\left(-\mathrm{S}6+2\left(\mathrm{X}2+2\left(\mathrm{S}2+\mathrm{S}6+2\left(-\mathrm{S}3+2\left(-\mathrm{S}2-\mathrm{S}5+2\left(\mathrm{S}4+2\left(\mathrm{S}3+2\left(\mathrm{S}4\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)$$

The equation can also be written as equation 2(i) for showing
hardware mapping.
$$\mathrm{Y}=\mathrm{X}1+2\left(\mathrm{E}1+2\left(\mathrm{E}2+2\left(\mathrm{S}7+2\left(\mathrm{E}3+2\left(-\mathrm{E}4+2\left(\mathrm{E}5+2\left(\mathrm{E}6+2\left(\mathrm{E}7+2\left(-\mathrm{S}6+2\left(\mathrm{X}2+2\left(\mathrm{E}8+2\left(-\mathrm{S}3+2\left(-\mathrm{E}9+2\left(-\mathrm{S}4+2\left(\mathrm{S}3+2\left(\mathrm{S}4\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)$$

where the expressions E1 = X3+S4-S7-S9, E2 =X2-S6, E3 = -S0+S1+X4, E4 = S6+S7, E5
=-X1- S4+S5, E6 = S6-S7+S9, E7 = -X3+X4, E8 = S2+S6, E9 =S2+S5, and horizontal sub
expressions X1 = S0-S8, X2 = S1-S3, X3 = -S2+S5, X4 = S3+S4.

Number of adders required to implement the equation (2i)
are 34.

The structure of direct form of coefficient bank from existing
method of implementations for the above example is shown in Figure2 (f2). In Figure
2(f2) sub-structures SS [X1 to X4] show the hardware mapping of horizontal sub expressions
X1 to X4, sub-structure SS [E1 to E9] are the hardware mapping of the expressions
E1 to E9 and sub-structure SS [Y] is the hardware mapping of the equation Y. The
generalized structure of direct form of coefficient bank from existing method of
implementations is shown in Figure2 (g), where the sub-structures SS [X1 to Xm]
show the hardware mapping of horizontal sub expressions X1 to Xm, sub-structures
SS [E1 to Ep] are the hardware mapping of expressions and E1 to Ep and SS [Y] is
the hardware mapping of the equation.

Figure 2(i1) shows the transposed form of implementation
of the example FIR filter. Structure (2) is the transposed form of coefficient bank
implementation.

Individual equations for the transposed form implementation
of the example coefficient bank are:

S0 = -79X, S1 = 1044X, S2 = -5890X, S3 = 27916X, S4 = 49362X, S5 = -8382X, S6 =
1628X, S7 = -154X, S8 = 63X, S9 = 126X.

The execution of the steps in the existing method (Figure
2(h)) are explained below:

__Step1__: Conversion of coefficients into Canonical Signed Digit (CSD) or any
other representation.

The conversion of individual coefficients to CSD with reference
to Set 1 is as given below:

- S0 = -79X = (i0i0001) X
- S1 = 1044X =(10000010100) X
- S2 = -5890X = (i01001000000i0) X
- S3 = 27916X = (100i0i0100010i00) X
- S4 = 49362X = (10i0000010i010010) X
- S5 = -8382X = (i0000i01000010) X
- S6 = 1628X = (10i010i00i00) X
- S7 = -154X = (i0i010i0) X
- S8 = 63X =(100000i) X
- S9 = 126X = (100000i0) X

__Step2:__ From the vertical sub expression form the CSD representation of coefficients.
Table 2 shows the vertical sub expressions for the above stated example. In Table
2, the expression (2^0 -2^6) is present in column S0 and in column S8 as -(2^0-2^6).
The expression is also present in columns S3 and S9, as (-2^2)(2^0-2^6) and (- 2^1)(2^0-2^6)
respectively. As the expression (2^0-2^6) is present in different columns, it is
called vertical sub expression. Similarly (1-(2^2)) and (1+(2^3)) are other vertical
sub expressions.

Equations of the coefficients after the vertical optimization:

- S0=-79X=(Y1-(2^4))X
- S1=1044X=((1+(2^2)+(2^8))*(2^2))X
- S2=-5890X=((-1+(2^7)+(2^10)*(Y2))*(2^1)) X
- S3=27916X=((-Y1+(2^2)-(2^8)-(2^10)+(2^13)*(2^2))X
- S4=49362X=((Y3-(2^5)*(Y2)+(2^13))*(Y2))*(2^1))X
- S5=-8382X=((1+(2^5)*(Y2)-(2^12))*(2^1))X
- S6=1628X((-Y3+(2^5)*Y2+2^9)*(2^2))X
- S7=-154X=((-1+(2^2)*Y2-(2^6))*(2^1))X
- S8=63X=(Y1)X
- S9=126X=(Y1*(2^1))X

where,
- Y1 = 1-(2^6), Y2=1-(2^2),Y3 = 1+(2^3) are vertical sub expressions.

It is observed from the above analyses that the number
of adders required to implement the transposed form coefficient bank are 20. For
the example the vertical sub expressions are mapped into hardware as sub-structures
SS (Y1, Y2,Y3) as shown in Figure 2(i2). The individual equations S0 to S9 of the
transposed form of coefficient bank are mapped into hardware as sub-structures SS
(SO to S9) as shown in Figure 2(i2). The sub-structures are formed from the plurality
of adders 13,15 and shifter 14.

The generalized structure of transposed form of coefficient
bank (2) from existing method of implementation is shown in Figure2 (j). It is formed
with sub-structures SS (Y1 to Ym) formed from mapping the vertical sub expressions
Y1 to Ym into hardware and sub-structures SS (SO to Sn) formed from mapping the
individual equations S0 to Sn into hardware. The sub-structures are formed from
the plurality of adders (13,15) and shifter (14). It is observed that the existing
structure of coefficient bank is inefficient, when high magnitude coefficients are
required to be implemented.

The constraints in the existing method for SOP expression
implementation are further illustrated by implementing a number with CSD representation.
The CSD representation of 45 = 10i0i01 and structure of the CSD implementation of
the number (45) is shown in Figure 2(k1). It is observed that the number of adders
required to implement 45 in CSD based implementation as in Figure 2(k1) are 3.

In CSD implementation the input is shifted by a value of
(2^z) (where 'z' is a positive number) and then added or subtracted from the other
shifts of the input. Another existing way implementation of a 45 in hardware is
9*5 and is shown in Figure 2(k2). Number of adders required implementing 45 as in
Figure 2(k2) are 2. The input x is multiplied by number 9 and then the result 9x
is multiplied with 5 to get an output of 45x.

It is evident from the implementation of the coefficient
multiplier 45 that there can be an area efficient way of implementing coefficients,
so as to reduce the number of adder/subtractors in the resultant hardware.

Thus, a requirement is felt for an efficient method to
achieve minimal area implementation of a Sum of Products expression.

Object and Summary of the Invention
It is an object of the present invention to provide a minimal
area integrated circuit implementation of a sum-of-products expression.

It is another object of the present invention to implement
direct form of coefficient bank using said sum-of -products expression.

It is yet another object of the present invention to implement
transposed form of coefficient bank using said sum-of-products expression.

To achieve the said objectives the present invention provides
a device for implementing a sum-of-products expression comprising:

- a first set of 2-input Shift-and-Add (2SAD) blocks receiving a first set of
the coefficients for generating a first set of partially optimized expression terms
by removing repeated odd fundamental coefficients and applying recursion therein
to generate minimal area sum-of-products implementation,
- a second set of 1-input Shift-and-Add (1SAD) blocks receiving a second set of
the coefficient of the expression, said second set obtained as a subset of said
first set of coefficients for generating a second set of partially optimized expression
terms by applying vertical optimization therein,
- a third set of 2SAD blocks receiving a combination of a third set of the coefficients
of the expression, a first subset of said first set of partially optimized expression
terms and a first subset of said second set of partially optimized expression terms
as inputs, for generating a third set of partially optimized expression terms by
applying horizontal optimization,
- a fourth set of 2SAD blocks receiving a combination of a fourth set of the coefficient
of the expression, a second subset of said first set of partially optimized expression
terms, a second subset of said second set of partially optimized expression terms,
and a first subset of said third set of partially optimized expression terms as
inputs, for generating a fourth set of partially optimized expression terms by applying
decomposition and factorization therein, and
- a fifth set of 2SAD blocks receiving said fourth set of partially optimized
expression terms for providing the final output.

Further, the present invention provides a method for creating
an implementation of a sum-of-products expression, comprising steps of:

- a. preparing a tabulated form of a set of coefficients,
- b. locating the common factors within said tabulated form to generate vertical
and/or horizontal sub expressions for area minimal realization of said sum of products
expression, said method includes the preprocessing steps of;
- c. simplifying said set of coefficients into an odd fundamental set and a multiplication
factor set,
- d. factorizing said odd fundamental set to remove repeated odd fundamental coefficients
to generate optimized sum of products expression for minimal area implementation,
- e. forming a common factor set comprising numbers that correspond to 2^n (+/-)
1, the highest value in the set being minimally greater than the highest value in
said odd fundamental set; and said odd fundamental set comprising minimized values
of repeated odd fundamental coefficients,
- f. decomposing each element of said odd fundamental set into a quotient, a remainder,
and a common factor, said common factor being selected from said common factor set
for providing the minimum absolute value of the remainder and optimal value of quotient,
- g. merging said quotient and remainder sets with said multiplication factor
set to generate a modified coefficient set, and
- h. recursively optimizing said modified coefficient set by applying steps 'c'
through 'g' until the smallest factor set of coefficients is obtained to thereby
generate the optimized sum-of-products expression for minimal area implementation.

Brief description of the accompanying drawings
The invention will now be described with reference to the
accompanying drawings.

- Figure 1(a) illustrates the existing structure of unit sample delay.
- Figure 1(b) illustrates the existing structures of 'p1' bit adder, 'p1' bit
saturator, multiplier and a 'z' bit left shifter.
- Figure 1(c) illustrates timing diagram to explain working of unit sample delay.
- Figure 2(a) illustrates the existing structure of direct form FIR filter.
- Figure 2(b) illustrates the existing structure of the transposed form FIR filter.
- Figure 2(c) illustrates the existing structure of direct and transposed form
IIR filter.
- Figure 2(d) illustrates the generalized implementation of sum of products expression
using shifters and adders.
- Figure 2(e) illustrates the existing method for implementation of direct form
of coefficient bank.
- Figure 2(f1) illustrates the existing direct form structure of FIR filter for
a given example.
- Figure 2(f2) illustrates the existing structure of direct form of coefficient
bank with Canonical Signed Digit (CSD) representation of coefficients for a given
example.
- Figure 2(g) illustrates the existing method for implementation of direct form
of coefficient bank.
- Figure 2(h) illustrates the existing method for implementation of transposed
form of coefficient bank.
- Figure 2(i1) illustrates the existing transposed form structure of FIR filter
for a given example.
- Figure 2(i2) illustrates the existing structure of -transposed form of coefficient
bank for a given example.
- Figure 2(j) illustrates existing generalized structure of transposed form of
coefficient bank.
- Figure 2(k1) illustrates the existing structure of a CSD represented single
coefficient '45'.
- Figure 2(k2) illustrates another existing structure of a single coefficient
'45'.
- Figure 3(a1) illustrates the method for implementation of a sum of products
expression in accordance with the instant invention.
- Figure 3(a2) illustrates the method for implementation of direct form of coefficient
bank with generalized equations in accordance with the instant invention.
- Figure 3(a2)-1 illustrates the convergent 2SAD(2 input Shift and Add) implementation
of Subexpression R1 in accordance with the instant invention.
- Figure 3(a2)-2 illustrates the 1SAD (1 input Shift and Add) implementation of
Subexpression Y1 in accordance with the instant invention.
- Figure 3(a3) illustrates the generalized structure generalized structure for
implementation of minimal area Sum of Products expression to realize direct form
coefficient bank.
- Figure 3(b1) illustrates the structure obtained after optimization 1 for implementation
of direct form of coefficient bank for a given example in accordance with the instant
invention.
- Figure 3(b2) illustrates the structure obtained after optimization 2 for implementation
of direct form of coefficient bank for a given example in accordance with the given
invention.
- Figure 3(b3) illustrates the structure obtained after expanding the cost 1 coefficients
in optimization 2 in accordance with the instant invention.
- Figure 3(b4) illustrates the structure obtained after optimization 3 for implementation
of direct form of coefficient bank for a given example in accordance with the instant
invention.
- Figure 3(c1) illustrates the method for implementation of transposed form of
coefficient bank (Using modified sum of products expression) in accordance with
the instant invention.
- Figure 3(c2) illustrates the method for realization of transposed form of coefficient
bank with generalized equations in accordance with the instant invention.
- Figure 3(c3) illustrates the generalized structure for implementation of minimal
area Sum of Products expression to realize the transposed form coefficient bank.
- Figure 3(d) illustrates the structure for implementation of transposed form
of coefficient bank for a given example according to the instant invention.

Detailed Description
Before going into details of the invented method for implementation
of the minimal area sum and products expression, the terms 'cost' and 'weight' are
introduced as follows:

__Cost:__ It is the hardware cost required in terms of adders to implement a
number. Adder, subtractor cost is referred as 'adder cost' for ease of explanation.

Cost to implement power of 2 (shifting) in parallel implementations
is '0'.Cost for implementing (2^z+1) or (2^z-1)(where 'z' is a positive number)
in parallel implementations is '1'.

__Weight:__ Weight of the cost 0's,cost 1's and numbers greater than cost 1's
in a number set is fixed as follows:

Weight of cost 0 (power of 2 coefficients) = 1,weight of cost 1's((2^z)+1) or (2^z)-1))
= 2 and weight of numbers greater than cost1's = 100 (Assumed for the ease of implementation).
The weight of the powers of two is fixed as '1' , because it requires a single adder
to connect to the elements in the number set. The weight of Cost 1 coefficients
is fixed as '2'.

The present invention provides a method for generating
a minimal area structure for a sum of products expression, said method has three
optimization steps involved in reduction of number of adders required to implement
the coefficient set and is illustrated as flow chart in Figure 3(a1).

Before explaining the step by step execution of the method
of the instant invention (Figure 3(a1)), the instant invention is explained with
the help of the optimization steps applied to the input set of coefficients as follows:

The input set of coefficients is a user defined complex
Sum-of-Products expression or an equation represented by a high magnitude coefficient
or a number, wherein the input subexpression in the instant case is Y=b_{0}
*S0 + b_{1} *S1 + b_{2} * S2 +.........+ b_{n}*Sn,

where {b_{0}, b_{1}, b_{2}.....b_{j},b_{k},b_{n}}
is the predefined (user input) coefficients, and

S0=X, S1=Z^{-1}X, S2=Z^{-2}X.....Sn=Z^{-n}X. (Here X is
a variable and Z^{-n} represents unit sample delay).

Optimization 1
This optimization is a recursive optimization (Figure 3
(a2)) that generates the modified quotient and remainder sets in the SOP expression
after optimizing the adder cost.

Steps 1, 2, 3 of optimization1 continues till all the numbers
in quotient and remainder set are optimized to Cost1s or powers of 2. Cost 1 coefficients
are considered as common factors.

Optimization is recursively applied so as to generate a minimal area output expression
and is hereafter also referred as Recursive Optimization.

Given below is the required form of the optimized Sum-of
Products expression.
$$\mathrm{Y}={\mathrm{cf}}_{11}\left\{{\mathrm{cf}}_{12}\left\{\dots \left\{{\mathrm{cf}}_{1\mathrm{m}}\left\{{\mathrm{QO}}_{1\mathrm{m}}\right\}+{\mathrm{RE}}_{1\mathrm{m}}\right\}\right\}\right\}+{\mathrm{cf}}_{21}\left\{{\mathrm{cf}}_{22}\left\{\dots \left\{{\mathrm{cf}}_{2\mathrm{m}}\left\{{\mathrm{QO}}_{2\mathrm{m}}\right\}+{\mathrm{RE}}_{2\mathrm{m}}\right\}\right\}\right\}+{\mathrm{cf}}_{31}\left\{\left\{{\mathrm{cf}}_{32}\dots \right\}\dots \right\}+\mathrm{RE}\left(\mathrm{Similar\; type\; of\; equation\; as\; QO}\right),$$

where cf_{11}, cf12,...cf_{1m},cf_{21},cf_{22},...cf_{2m}.
are common factors, and QO_{1m},QO_{2m}... are quotient expressions,
and RE_{1m},RE_{2m}... are remainder expressions,

Generalized QO_{1m},QO_{2m}..and.,RE_{1m},RE_{2m}.
are given as follows:
$${\mathrm{QO}}_{1\mathrm{m}}=\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{q}1\mathrm{m}0}\ast \mathrm{S}0+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{q}1\mathrm{m}1}\ast \mathrm{S}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{q}1\mathrm{m}\mathrm{n}}\ast \mathrm{S}\mathrm{n}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}\mathrm{o}\dots $$
$${\mathrm{QO}}_{2\mathrm{m}}=\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{q}2\mathrm{m}0}\ast \mathrm{S}0+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{q}2\mathrm{m}1}\ast \mathrm{S}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{q}2\mathrm{m}\mathrm{n}}\ast \mathrm{S}\mathrm{n}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}\mathrm{o}\dots $$
$${\mathrm{RE}}_{1\mathrm{m}}=\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{r}1\mathrm{m}0}\ast \mathrm{S}0+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{r}1\mathrm{m}1}\ast \mathrm{S}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{r}1\mathrm{m}\mathrm{n}}\ast \mathrm{S}\mathrm{n}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}\mathrm{o}\dots $$
$${\mathrm{RE}}_{2\mathrm{m}}=\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{r}2\mathrm{m}0}\ast \mathrm{S}0+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{r}2\mathrm{m}1}\ast \mathrm{S}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{D}}_{\mathrm{r}2\mathrm{m}\mathrm{n}}\ast \mathrm{S}\mathrm{n}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}\mathrm{o},$$

where D_{q1m0} to D_{r2mn} etc., are cost 1's,

where R1 to Ro are recursive sub expressions of form R1 = (Power of 2)*S0+.....(Power
of 2)*Sn.... Ro = (Power of 2)*S0+.....(Power of 2)*Sn,

where (Power of 2) is any power of 2.

Optimization 2
The expressions QO_{1m} to RE_{2m} obtained
by applying the method of the optimization 1 are tabulated in the form of Table
A as follows in order to factorize vertically. Further, the optimization 2 is hereafter
also referred to as Vertical Optimization.

Traversing the table column wise and assuming that D_{q1m1}
= D_{q2m1}, then the first vertical subexpression is given by Y1 = D_{q1m1}*S1.
Similarly there can be 'p' vertical subexpressions, where 'p' is any positive number.

Generalized equation after optimization 2 is given by:
$$\mathrm{Y}={\mathrm{cf}}_{11}\left\{{\mathrm{cf}}_{12}\left\{\dots \left\{{\mathrm{cf}}_{1\mathrm{m}}\left\{{\mathrm{QO}}_{1\mathrm{m}}\right\}+{\mathrm{RE}}_{1\mathrm{m}}\right\}\right\}\right\}+{\mathrm{cf}}_{21}\left\{{\mathrm{cf}}_{22}\left\{\dots \left\{{\mathrm{cf}}_{2\mathrm{m}}\left\{{\mathrm{QO}}_{2\mathrm{m}}\right\}+{\mathrm{RE}}_{2\mathrm{m}}\right\}\right\}\right\}+{\mathrm{cf}}_{31}\left\{\left\{{\mathrm{cf}}_{32}\dots \right\}\dots \right\}+\mathrm{RE};$$

where cf_{11}, cf12,...cf_{1m},cf_{21},cf_{22},...cf_{2m}...
etc., are common factors, and QO_{1m},QO_{2m}... are quotient expressions,
and RE_{1m},RE_{2m}... are remainder expressions,

where

QO_{1m} = (Power of 2)*D_{q1m0}*S0 +... + (Power of 2)*D_{q1mn}*Sn+
(Power of 2)*R1+...+(Power of 2)*Ro+(Power of 2)*Y1....+ (Power of 2)Yp.

RE_{1m}=(Power of 2)*D_{r1m0}*S0+...+(Power of 2)*D_{r1mn}*Sn+(Power
of 2)*R1+...+(Power of 2)*Ro+(Power of 2)*Y1.... +(Power of 2)*Yp;

where R1 to Ro are recursive subexpressions and Y1 to Yp are vertical subexpressions,

Y1 = D_{qmm1}*S1, Yn=D_{qmmn}*S_{n}.

Optimization 3
The input coefficient sets are subjected to horizontal
factorization here and the optimization is hereafter also referred to as horizontal
optimization.

Expanding Cost I s in QO_{nm} to RE_{nm}(nth
quotient and remainder expressions), and in the vertical subexpressions Y1 to Yp
to (2^z+1) or (2^z-1), where 'z' is any positive number. Assuming that cost 1's
in the expressions QO_{1m} to RE_{1m} have been expanded to D_{q1m0}=
(2^A0) + (2^B0), D_{q1mn}= (2^An) + (2^Bn), D _{rm10}= (2^A0)+ (2^B1),D_{r1mn}
= (2^An)+ (2^Bk). Substituting the expanded cost 1's in QO_{nm} and RE_{nm}
results in the following subexpression.

QO_{1m} = (2^A0)*S0+(2^B0)*S0.....(2^An)*Sn+(2^Bn)*Sn+(Power of 2)*R1+...
+(Power of 2)*Ro+(Power of 2)*Y1....+ (Power of 2)*Yp .. RE_{1m} = (2^A0)*S0+(2^B1)*S0....(2^An)*Sn+(2^Bk)*Sn+
(Power of 2)*R1+... +(Power of 2)*Ro+(Power of 2)*Y1....+ (Power of 2)Yp; where
(2^A0),(2^B0),(2^An),(2^Bn),(2^B1),(2^Bk) are powers of 2. The expressions QO_{1m}
to RE_{1m} are represented in Table B as follows.

Assuming that the expression ((2^A0)*S0+(2^An)*Sn)) row
1 of Table B is equal to the expression ((2^A0)*S0+(2^An)*Sn)) in row 2. Thus X1
= ((2^A0)*S0+(2^An)*Sn) is the derived horizontal sub expression.

Thus, the resultant horizontal sub expressions can be derived
as follows:
$$\mathrm{Y}={\mathrm{cf}}_{11}\left\{{\mathrm{cf}}_{12}\left\{\dots \left\{{\mathrm{cf}}_{1\mathrm{m}}\left\{{\mathrm{QO}}_{1\mathrm{m}}\right\}+{\mathrm{RE}}_{1\mathrm{m}}\right\}\right\}\right\}+{\mathrm{cf}}_{21}\left\{{\mathrm{cf}}_{22}\left\{\dots \left\{{\mathrm{cf}}_{2\mathrm{m}}\left\{{\mathrm{QO}}_{2\mathrm{m}}\right\}+{\mathrm{RE}}_{2\mathrm{m}}\right\}\right\}\right\}+{\mathrm{cf}}_{31}\left\{\left\{{\mathrm{cf}}_{32}\dots \right\}\dots \right\}+\mathrm{RE}\left(\mathrm{Similar\; type\; of\; equation\; as\; QO}\right),$$

where QO_{1m} = (Power of 2)*S0 +... (Power of 2)*Sn+(Power of 2)*R1+...(Power
of 2)*Ro+ (Power of 2)*Y1+...+(Power of 2)*Yp+(Power of 2)*X1+...+(Power of 2)*Xq,
RE_{1m} = (Power of 2)*S0 +... (Power of 2)*Sn+(Power of 2)*R1+...(Power
of 2)*Ro+ (Power of 2)*Y1+...+(Power of 2)*Yp+(Power of 2)*X1+...+(Power of 2)*Xq,
QO_{2m} = (Power of 2)*S0 +... (Power of 2)*Sn+(Power of 2)*R1+...(Power
of 2)*Ro+ (Power of 2)*Y1+...+(Power of 2)*Yp+(Power of 2)*X1+...+(Power of 2)*Xq,
RE_{2m}=(Power of 2)*S0 +... (Power of 2)*Sn+(Power of 2)*R1+...(Power of
2)*Ro+ (Power of 2)*Y1+...+(Power of 2)*Yp+(Power of 2)*X1+...+(Power of 2)*Xq,
where horizontal subexpressions X1 = (power of 2)*S0 +... (Power of 2)*Sn+(Power
of 2)*R1+...(Power of 2)*Ro+(Power of 2)*Y1+...+(Power of 2)*Yp,

Xq = (power of 2)*S0 +... (Power of 2)*Sn+(Power of 2)*R1+...(Power of

2)*Ro+(Power of 2)*Y1+...+(Power of 2)*Yp,

where R1 to Ro are recursive subexpressions, Y1 to Yp are vertical subexpressions
and X1 to Xq are horizontal subexpressions. Thus, the coefficient equation 'Y' is
implemented with expansion of common factors as (2^z+1) or (2^z-1) where z is a
positive number.

The method of the instant invention is hereby described
with reference to the structural implementation of the optimized sum-of-products
expression, the embodiment herein described pertaining to direct and transposed
form of digital filter realization.

Mapping of the generalized subexpressions and final expression
into a generalized hardware structure:

The generalized recursive subexpressions can be written
as;
$$\begin{array}{l}{\mathrm{R}}_{1}=\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{S}}_{0}+\dots \left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{S}}_{\mathrm{n}},\\ \dots \\ {\mathrm{R}}_{\mathrm{o}}=\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{S}}_{0}+\dots \left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast {\mathrm{S}}_{\mathrm{n}},\end{array}$$

where, Power of 2 is an integral power of 2.

Mapping generalized recursive subexpression into hardware
can be illustrated with the help of an example as follows:

Assuming that the recursive subexpression is represented
by the equation; R1 = S0 + S 1 +4*S5+ 4*S8 and the optimized subexpression be R1
= S0 +S1+4*(S5+S8).

The area efficient way, of implementing the subexpression
R1 is, implementing S0 + S1 and adding 4*(S5+S8) to it. (The area efficiency here
is in having incremental growth of precision at different levels. Any conventional
algorithm resulting in incremental growth of precision of adders at different levels
can be used to implement the subexpressions. Further, since the invention mainly
deals with reducing the number of adders not the precision of the adder's, the method
to reduce the precision of adders is not discussed in the document). This resultant
structure in figure 3(a2)-1 is called convergent set of 2SAD (2-input Shifter and
Adder).

The input set to the convergent structure in Figure (3a2)-1 is {S0, S1, S5,S8}.

As illustrated in figure 3(a2)-1, complete input set S0 to Sn is not fed to the
2SAD block, as subexpression R1 does not have all the inputs. Further the shifter
structure in Figure (3a2)-1 would depend on the powers of 2 applied to the coefficients
in the subexpression R1.

The convergent set of 2SAD structure can be extended to any subexpression with (power
of 2) coefficients for S0 to Sn. The subexpressions R1 to Ro map on to the converging
substructure SS [R_{1}] to SS [R_{o}] in hardware as shown in Figure
3(a3).

In the Vertical subexpression Y1 = D_{qmm1}*S1,
Yp=D_{qpmn} * S_{n. ,} D_{qmm1}, D_{qpmn} are the
numbers that can be expanded as 2^z + 1 or 2^z -1, where z is a positive number.

Mapping the generalized vertical subexpression into hardware
can be illustrated with the help of an example as follows:

Assuming that the vertical subexpression is represented
by the equation Y1 = 33*S1 = 32*S1 + S1. This can be implemented as 1SAD (1 input
shift and add) as shown in Figure (3a2-2).

Since the Vertical Subexpressions in would result in the substructures as shown
in Figure (3a2-2), it is therefore observed that the Y1 to Yp would form sub structures
; SS[Y1] ... SS[Yp] as illustrated in Figure 3(a3).

The left shifts in 1 SAD would depend on the powers of two assigned to the coefficients
D_{qmm1, ...} D_{qpmn}.

The horizontal sub expressions for the instant predefined
coefficient set inputs are given as below:
$$\begin{array}{l}\mathrm{X}1=\left(\mathrm{power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{S}0+\dots \left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{S}\mathrm{n}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}1+\dots \left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}\mathrm{o}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{Y}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{Y}\mathrm{p},\\ \dots \\ \mathrm{X}\mathrm{q}=\left(\mathrm{power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{S}0+\dots \left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{S}\mathrm{n}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}1+\dots \left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}\mathrm{o}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{Y}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{Y}\mathrm{p}\end{array}$$

The above subexpressions result in the similar convergent 2SAD substructure.

The substructures SS[X1] to SS[Xq] are illustrated in Figure 3(a3).

The quotient and remainder sub expressions QO_{1m} ... RE_{2m} ...
given below:
$${\mathrm{QO}}_{1\mathrm{m}}=\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{S}0+\dots \left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{S}\mathrm{n}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}1+\dots \left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}\mathrm{o}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{Y}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{Y}\mathrm{p}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{X}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{X}\mathrm{q},$$
$${\mathrm{RE}}_{1\mathrm{m}}=\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{S}0+\dots \left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{S}\mathrm{n}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}1+\dots \left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{R}\mathrm{o}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{Y}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{Y}\mathrm{p}+\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{X}1+\dots +\left(\mathrm{Power\; of}\mathrm{\hspace{0.17em}}2\right)\ast \mathrm{X}\mathrm{q},$$

The resultant structure for the above steted sub expressions is analogous to the
convergent 2SAD substructures.

The substructure is further illustrated as SS [QO1m]...SS[RE1m]... in Figure 3(a3).

The final equation of the coefficient bank combining all
the quotient, and remainder expressions can be written as
$$\mathrm{Y}={\mathrm{cf}}_{11}\left\{{\mathrm{cf}}_{12}\left\{\dots \left\{{\mathrm{cf}}_{1\mathrm{m}}\left\{{\mathrm{QO}}_{1\mathrm{m}}\right\}+{\mathrm{RE}}_{1\mathrm{m}}\right\}\right\}\right\}+{\mathrm{cf}}_{21}\left\{{\mathrm{cf}}_{22}\left\{\dots \left\{{\mathrm{cf}}_{2\mathrm{m}}\left\{{\mathrm{QO}}_{2\mathrm{m}}\right\}+{\mathrm{RE}}_{2\mathrm{m}}\right\}\right\}\right\}+{\mathrm{cf}}_{31}\left\{\left\{{\mathrm{cf}}_{32}\dots \right\}\dots \right\}+\mathrm{RE}\left(\mathrm{Similar\; type\; of\; equation\; as\; QO}\right),$$

Expanding the common factor's cf_{11} ... as 2^z+1 or 2^z-1 would result
in another expression with coefficients as power of 2's which will result in convergent
set of 2SAD structure with input set{ QO_{1m} , RE_{1m} ...} .This
structure is shown as SS[Y] in Figure 3(a3).

Generalized Structure depicting the minimal area integrated
circuit implementation of the sum of products expression:

The generalized structure is shown in figure 3(a3) to map
results of the algorithm. The generalized structure contains taps S0 to Sn from
plurality of unit sample delays(11) connected to a group of sub structures SS[R1]
to SS[Ro] with inputs S0 to Sn each generating plurality of signals R1 to Ro. The
sub structures SS[Y1] to SS[Yp] with inputs S0 to Sn generate plurality of signals
Y1 to Yp, where p is any positive number; sub-structures SS[X1] to SS[Xq] with inputs
S0 to Sn,R1 to Ro and Y1 to Yp each generating signals X1 to Xq, where q is any
positive number, sub-structures SS[QO1m]...SS[RE1m] ... with inputs S0 to Sn,R1
to Ro,Y1 to Yp and X1 to Xq generating signals QO1m... RE1m etc., and substructure
SS[Y] formed from inputs QO1m... to RE1m... generating output Y, these sub-structures
contains plurality of adders (13, 15) and shifter 14,these sub-structures connected
together form coefficient bank of direct form of FIR filter or IIR filter.

The invented method for direct form implementation of the
coefficient bank can be used to implement single number also. This is done by applying
a single number at the input of the algorithm in Figure 3(a1). Single number given
to the algorithm would result in optimized shift and add hardware structure of the
number.

The method of the instant invention is hereby described
with reference to the given (Optimization 1) coefficient set as input.

__Direct form of Coefficient Bank implementation__: Decompose high cost terms
into some multiples of cost1's and power of 2's. The algorithm has three optimizations.

The three optimizations are explained with the help of
an example as follows:

__Optimization 1__: This is a recursive optimization. By this optimization an
individual coefficient 'coeff1' will be decomposed to:
$$\mathrm{coeff}1=\mathrm{cf}\left(\mathrm{qo}\right)+\mathrm{re}$$
or
$$\mathrm{coeff}1=\mathrm{cf}\left(\mathrm{qo}\right)-\mathrm{re};$$

where cf = common factor, qo = quotient, re = remainder.

Further, the coefficient set called 'coeffset' will be
decomposed to coeffset = cf (QO)+RE.

Note: The negative coefficients are not treated separately,
negative coefficient are implemented as positive coefficients with a subtractor
at the end of sub-structure while adding to other sub-structures.

The SOP realization for the example coefficient set are
described with reference to the steps involved in the method as illustrated in Figure
3(a1).

__Steps involved in Optimization 1:__
Step 1: Decompose the coefficient set into odd fundamentals
and multiplication factor set:

{coefficient set} = {odd fundamental set} * {multiplication factor set},

(Each coefficient is divided with 2 till it is odd. If it is of odd magnitude, then
it is moved to odd fundamental set. The number of 2's with which each coefficient
is divided is moved into multiplication factor set along with the sign of the coefficient)

For the given set of coefficients; the coefficient set is decomposed to odd fundamental
set and multiplication factor set as follows:

{-79,1044,-5890,27916,49362,-8382,1628,-154,63,126}= {79,261,2945,6979,
24681, 4191,407, 77,63,63} * {-1,4,-2,4,2,-2,4,-2,1,2}; where the Coefficient set={-79,1044,-5890,27916,49362,-8382,1628,-154,
63,126},

Odd fundamental set = {79,261,2945,6979, 24681, 4191, 407,
77,63,63}, individual elements in the odd fundamental set are called odd fundamentals.
Multiplication factor set = {-1,4,-2,4,2,-2,4,-2,1,2}

Step 2: Searching for repeated odd fundamentals other than
0 &1 and removing them from the odd fundamental set and form recursive sub expressions.
In the example odd fundamental set "63" is repeated odd fundamental, and the example
odd fundamental set changes as:
Two 0's have come in place of S8,S9 and one 63 has come in position of S10. The
recursive sub expression thus generated is R_{1} = S8 + 2S9.

Step 3: Checking the odd fundamental set has only 0's,
1's and cost1's and exiting from the optimization 1 with recursive sub-expressions
and equation of direct form of coefficient bank, else proceed to step 4 for optimizing
the numbers greater than the cost 1 numbers.

Step 4: From the odd fundamental set compute the "prec",
precision of maximum odd fundamental (max_num).
$$\mathrm{prec}=\mathrm{ceil}\left(\mathrm{log}2\mathrm{max\_num}\right)$$

Form the common factor set, which contains only cost1's,
cost1's are formed by (2^z+1) or (2^z- 1), where z varies from '1' to 'prec'.

For the example 'max_num' =24681, 'prec' = ceil (log2 (abs
(24681))) = 15 and the common factor set is {3,5,7,9...32769} formed by 2^z+1 or
2^z -1, z varying from '1' to '15'.

Step 5: Calculate the total weight of the numbers in odd
fundamental set wherein, the Total weight of numbers in odd fundamental set = weight
of each number in odd fundamental set.

The weight of each odd fundamental in example odd fundamental
set is in Table 3 as given below:
<u>Table 3</u>
Number
Weight

79
100

261
100

2945
100

6979
100

24681
100

4191
100

407
100

77
100

63
2

Total weight of numbers in odd fundamental set for the
example = 802,on similar lines weight of any number set can be calculated.

Step 6: For each common factor in the common factor set
decompose the odd fundamental set into a common factor , quotient and remainder.
$$\left\{\mathrm{Odd\; fundamental\; set}\right\}=\mathrm{cf}\ast \left\{\mathrm{QO}\right\}+\left\{\mathrm{RE}\right\};$$

where 'cf'-common factor, 'QO' is quotient set and 'RE' is remainder set,

total weight of decomposed odd fundamental set after setting common factor equal
to the weight of common factor + weight of quotient set +weight of remainder set.
For each common factor of the example starting from 3 to 32769 calculate total weight.

Choose the best common factor, which gives the lowest possible total weight. The
best common factor for given example is "63" as it reduces the total weight of the
odd fundamental set from "802" to "323", thus the lowest possible weight is achieved.

Odd coefficient set is decomposed into common factor, quotient
set and remainder set as follows:
$$\left\{79,261,2945,6979,24681,4191,407,77,0,0,63\right\}=63\left\{1,4,47,111,391,66,7,1,0,0,1\right\}+\left\{16,9,-16,-14,48,33,-34,14,0,0,0\right\}$$

where common factor = 63,

quotient set = {1,4,47,111,391,66,7,1,0,0,1}

remainder set= {16,9,-16,-14,48,33,-34,14,0,0,0}

Step 7: Multiply the QO and RE with multiplication factor
set and treat the resultant new_QO and new_RE as new coefficient sets.

Go to step1 with new coefficient sets 'new_QO' & 'new_RE'.

For the given example 'new_QO' = {1,4,47,111,391,66,7,1,0,0,1} * {-1,4,-2,4,2,-2,4,-2,0,0,1}
= {-1,16,-94,444,782,-132,28,-2,0,0,1}---- this number set passes through all the
steps in optimization 1 again and stops with '7' as common factor. Another number
set 'new_RE' = {16,9,-16,-14,48,33,-34,14,0,0,0}*{-1,4,-2,4,2,-2,4,-2,0,0,1} = {-16,36,32,-56,96,-66,-136,-28,0,0,0}
----- this contains all cost1's and pow_2's and stops at step 3 while passing through
the steps of optimization 1.

After the optimization 1 the coefficient equation of the
example number set for direct form implementation of coefficient bank is in equation
(3a).
$$\mathrm{Y}=\left[-16\right]\ast \mathrm{S}0+36\ast \mathrm{S}1+32\ast \mathrm{S}2+96\ast \mathrm{S}4+\left[-66\right]\ast \mathrm{S}5+\left[-136\right]\ast \mathrm{S}6+\left[-28\right]\ast \mathrm{R}2+63\left\{\left[-1\right]\ast \mathrm{S}0+16\ast \mathrm{S}1+4\ast \mathrm{S}2+\left[-4\right]\ast \mathrm{S}3+\left[-2\right]\ast \mathrm{S}4+\left[-132\right]\ast \mathrm{S}5+\left[-2\right]\ast \mathrm{S}7+1\ast \mathrm{R}1+7\left\{64\ast \mathrm{S}3+4\ast \mathrm{S}6+\left[-14\right]\ast \mathrm{R}3\right\}\right\}$$

where recursive sub expressions for the given equation are:

R1=1*S8+2*S9, R2=2*S3+1*S7, R3=1*S2+[-8]*S4.

The number of adders obtained after optimization 1 in the above equation are 29.

Coefficient equation (3a) can also be written as equation (3b).
$$\mathrm{Y}=\mathrm{RE}1+63\left(\mathrm{RE}2+7\left(\mathrm{QO}2\right)\right)$$

where expressions,
$$\mathrm{RE}1=\left[-16\right]\ast \mathrm{S}0+36\ast \mathrm{S}1+32\ast \mathrm{S}2+96\ast \mathrm{S}4+\left[-66\right]\ast \mathrm{S}5+\left[-136\right]\ast \mathrm{S}6+\left[-28\right]\ast \mathrm{R}2$$
$$\mathrm{RE}2=\left[-1\right]\ast \mathrm{S}0+16\ast \mathrm{S}1+4\ast \mathrm{S}2+\left[-4\right]\ast \mathrm{S}3+\left[-2\right]\ast \mathrm{S}4+\left[-132\right]\ast \mathrm{S}5+\left[-2\right]\ast \mathrm{S}7+1\ast \mathrm{R}1$$
$$\mathrm{QO}2=64\ast \mathrm{S}3+4\ast \mathrm{S}6+\left[-14\right]\ast \mathrm{R}3$$
and recursive sub expression in expressions (3c1,3c2,3c3) are R1=1*S8+2*S9, R2=2*S3+1*S7,
R3=1*S2+[-8]*S4.

The resultant structure after the optimization 1 from equation
(3b) is shown in Figure 3(b1). The input X, of the filter is given to plurality
of unit sample delays (11) resulting in signals S0 to S9. These signals are given
to a group of sub-structures that are defined for the following input/ output parameters:

- (a) Sub-structure SS [R1] with inputs S8,S9 results in signal R1
- (b) Sub-structure SS[R2] with inputs S7, S3 results in signal R2
- (c) Sub-structure SS[R3] with inputs S2,S4 results in signal R3,
- (d) Sub-structure SS[RE1] with inputs S0,S2, S4,R2,S1,S6,S5, results in signal
RE1,
- (e) Sub-structure SS[RE2] with inputs S0,S4,S7,S2,S3,R1,S1,S5 results in signal
RE2,
- (f) Sub-structure SS[Q02] with inputs S3,S6,R3 results in signal Q02;
- (g) Sub-structure SS[Y] with inputs,RE1, RE2,QO2 results in output Y of the
direct form of FIR filter, the group of substructures formed from plurality of adders(13,15)
and shifters(14) is called direct form of coefficient bank(1).

Note: After the end of optimization 1 the remainder and
quotient expressions contain only 0's, power of 2's,cost1's.

__Optimization 2 (vertical optimization):__
The table formed from quotient expression, and remainder
expressions is scanned column wise for possibility of taking cost1 coefficients
as common sub expression. This would further reduce the total weight of coefficient
equation. This optimization is explained with quotient and remainder expressions
of an example as follows:

The coefficient equation of the example in consideration
after optimization 1 is Equation (3a) as given below:
$$\mathrm{Y}=\left[-16\right]\ast \mathrm{S}0+36\ast \mathrm{S}1+32\ast \mathrm{S}2+96\ast \mathrm{S}4+\left[-66\right]\ast \mathrm{S}5+\left[-136\right]\ast \mathrm{S}6+\left[-28\right]\ast \mathrm{R}2+63\left\{\left[-1\right]\ast \mathrm{S}0+16\ast \mathrm{S}1+4\ast \mathrm{S}2+\left[-4\right]\ast \mathrm{S}3+2\ast \mathrm{S}4+\left[-132\right]\ast \mathrm{S}5+\left[-2\right]\ast \mathrm{S}7+1\ast \mathrm{R}1+7\left\{64\ast \mathrm{S}3+4\ast \mathrm{S}6+\left[-14\right]\ast \mathrm{R}3\right\}\right\}$$

where recursive sub expressions are R1=1*S8+2*S9, R2=2*S3+1*S7, R3=1*S2+[-8]*S4,

Step 8: separate the remainder and quotient expressions
in the coefficient equation.

Equation (3a) can also be written as
$$\mathrm{Y}=\mathrm{RE}1+63\left(\mathrm{RE}2+7\left(\mathrm{QO}2\right)\right)$$

where quotient(QO2) and remainder expressions(RE1,RE2) are
$$\mathrm{RE}1=\left[-16\right]\ast \mathrm{S}0+36\ast \mathrm{S}1+32\ast \mathrm{S}2+96\ast \mathrm{S}4+\left[-66\right]\ast \mathrm{S}5+\left[-136\right]\ast \mathrm{S}6+\left[-28\right]\ast \mathrm{R}2$$
$$\mathrm{RE}2=\left[-1\right]\ast \mathrm{S}0+16\ast \mathrm{S}1+4\ast \mathrm{S}2+\left[-4\right]\ast \mathrm{S}3+\left[-2\right]\ast \mathrm{S}4+\left[-132\right]\ast \mathrm{S}5+\left[-2\right]\ast \mathrm{S}7+1\ast \mathrm{R}1$$
$$\mathrm{QO}2=64\ast \mathrm{S}3+4\ast \mathrm{S}6+\left[-14\right]\ast \mathrm{R}3$$

Step 9:Find the vertical sub expressions in table formed
from separated quotient and remainder expressions by traversing the table column
wise, looking for common odd fundamental cost Is.Forming a table from quotient (3c3)
and remainder (3c1, 3c2) expressions for the given example. By writing coefficients
in each equation column wise under respective signal into a table as shown in Table4.

In Table 4, column 7 has sub expression -2*33 and sub expression
-4*33, the odd fundamental for both these sub expressions is 33. Hence instead of
calculating 33 twice we can calculate only once by taking 33*S5 as vertical sub
expression. This is called vertical sub expression as we are taking common odd fundamental
column wise in the table.

The example coefficient equation (3b) after substituting
the vertical sub expression results in equation 3(f):
$$\mathrm{Y}=\left[-16\right]\ast \mathrm{S}0+36\ast \mathrm{S}1+32\ast \mathrm{S}2+96\ast \mathrm{S}4+\left[-2\right]\ast \mathrm{Y}1+\left[-136\right]\ast \mathrm{S}6+\left[-28\right]\ast \mathrm{R}2+63\left\{\left[-1\right]\ast \mathrm{S}0+16\ast \mathrm{S}1+4\ast \mathrm{S}2+\left[-4\right]\ast \mathrm{S}3+\left[-2\right]\ast \mathrm{S}4+\left[-4\right]\ast \mathrm{Y}1+\left[-2\right]\ast \mathrm{S}7+1\ast \mathrm{R}1+7\left\{64\ast \mathrm{S}3+4\ast \mathrm{S}6+\left[-14\right]\ast \mathrm{R}3\right\}\right\}$$

where Y1 = 33*S5,

The equation (3f) can be rewritten as follows:
$$\mathrm{Y}=\mathrm{RE}1+63\left(\mathrm{RE}2+7\left(\mathrm{QO}2\right)\right)$$

where RE1= [-16]*S0+36*S1+32*S2+96*S4+[-2]*Y1+[-136]*S6+[-28]*R2

RE2 = [-1]*S0+16*S1+4*S2+[-4]*S3+[-2]*S4+[-4]*Y1+[-2]*S7+1*R1

Q02 = 64*S3+4*S6+[-14]*R3,

It is observed that the number of adders in coefficient
equation (3f) after optimizations 2 are 28.

The structure obtained after the optimization 2 (equation
(3g)) is illustrated in Figure 3(b2). The input X of the filter is given to plurality
of unit sample delays (11) resulting in signals S0 to S9, these signals are given
to a group of sub-structures that operate for the following input-output parameters:

- (a)Sub-structure SS [R1] with inputs S8,S9 results in signal R1,
- (b)Sub-structure SS[R2] with inputs S7, S3 results in signal R2,
- (c) Sub-structure SS[R3] with inputs S2,S4 results in signal R3,
- (d) Sub-structure SS[Y1] with inputs S5 results in signal Y1,
- (e) substructure SS[RE1] with inputs S0,S2, S4,R2,S1,S6,Y1 results in signal
RE1,
- (f) ub-structure SS[RE2] with inputs S0,S4,S7,S2,S3,R1,S1,Y1 results in signal
RE2, sub-structure SS[Q02] with inputs S3,S6,R3 results in signal Q02;
- (g)sub-structure SS[Y] with inputs RE1,RE2,QO2 results in output Y of the direct
form of FIR filter, the group of substructures formed from plurality of adders(13,15)
and shifters(14) is called direct form of coefficient bank(1).

__Optimization 3(horizontal optimization):__
Step 10: Expand cost 1s coefficients in quotient, remainder
expressions and vertical sub expression as (2^z + 1) or (2^z -1) where 'z' is a
positive number.

The resultant coefficient equation for the example is as given below:
$$\mathrm{Y}=\left\{32\ast \mathrm{S}1+4\ast \mathrm{S}1+32\ast \mathrm{S}2+128\ast \mathrm{S}4+\left[-128\right]\ast \mathrm{S}6+\left[-8\right]\ast \mathrm{S}6+\left[-32\right]\ast \mathrm{R}2+4\ast \mathrm{R}2+\left[-2\right]\ast \mathrm{Y}1+\left[-16\right]\ast \mathrm{S}0+\left[-32\right]\ast \mathrm{S}4+63\left\{16\ast \mathrm{S}1+4\ast \mathrm{S}2+\left[-4\right]\ast \mathrm{S}3+\left[-2\right]\ast \mathrm{S}7+\left[-4\right]\ast \mathrm{Y}1+\left[-1\right]\ast \mathrm{S}0+\left[-2\right]\ast \mathrm{S}4+\mathrm{R}1+7\left\{64\ast \mathrm{S}3+4\ast \mathrm{S}6+\left[-16\right]\ast \mathrm{R}3+2\ast \mathrm{R}3\right\}\right\}\right\}$$
or
$$\mathrm{Y}=\mathrm{RE}1+63\left(\mathrm{RE}2+7\left(\mathrm{QO}2\right)\right)$$

where expression RE1 = 32*S1+4*S1+32*S2+128*S4+[-128]*S6+[-8]*S6+ [-32]*R2+4*R2+[-
2]*Y1+[-16]*S0+[-32]*S4,

RE2 = 16*S1+4*S2+[-4]*S3+[-2]*S7+[-4]*Y1+[-1] S0+[-2]*S4 +R1

Q02 = 64*53+4*S6+[-16]*R3+2*R3.

The resultant structure after the optimization 2 from equation
(3i) is shown in Figure 3(b3). The input X of the filter is given to plurality of
unit sample delays (11) resulting in signals S0 to S9, these signals are given to
a group of sub-structures, sub-structure SS [R1] with inputs S8, S9 results in signal
R1, sub-structure SS [R2] with inputs S7, S3 results in signal R2, sub-structure
SS [R3] with inputs S2, S4 results in signal R3, sub-structure SS [Y1] with inputs
S5 results in signal Y1, substructure SS[RE1] with inputs S0,S2, S4,R2,S1,S6,Y1
results in signal RE1,sub-structure SS[RE2] with inputs S0,S4,S7,S2,S3,R1,S1,Y1
results in signal RE2, sub-structure SS[Q02] with inputs S3,S6,R3 results in signal
Q02;sub-structure SS[Y] with inputs RE1,RE2,QO2 results in output Y of the direct
form of FIR filter, the group of substructures formed from plurality of adders(13,15)
and shifters(14) is called direct form of coefficient bank(1).

Step 11: Find the optimal horizontal sub expressions in
table formed from separated quotient and remainder expressions by traversing table
row wise and looking for common horizontal sub expressions, there is possibility
of overlap in the horizontal sub expressions choose the best horizontal sub expressions
which reduces the total weight of the coefficient equation to the lowest as possible.

The quotient expressions (Q02) and remainder expression
(RE1, RE2) as given in equation (3i) in table form are in Table5.

Table 5: Expression RE1, RE2 & Q02 after step 10 expressed
in the tabular form as follows:

In table 5 the sub expression -16*S0-32*S4 of row2 and
sub expression -S0-2*S4 of row3 can be treated as single sub expression 'S0+2*S4'.
This is called horizontal sub expressions as we are selecting common sub expression
across the rows of the table. There are no overlaps for the given table.

Resultant coefficient equation obtained after optimization
3 for the example is:
$$\mathrm{Y}=\left\{32\ast \mathrm{S}1+4\ast \mathrm{S}1+32\ast \mathrm{S}2+128\ast \mathrm{S}4+\left[-128\right]\ast \mathrm{S}6+\left[-8\right]\ast \mathrm{S}6+\left[-32\right]\ast \mathrm{R}2+4\ast \mathrm{R}2+\left[-2\right]\ast \mathrm{Y}1+\left[-16\right]\ast \mathrm{X}1+63\left\{\left[-1\right]\ast \mathrm{X}1+\left[-2\right]\ast \mathrm{S}7+4\ast \mathrm{S}2+\left[-4\right]\ast \mathrm{S}3+16\ast \mathrm{S}1+\mathrm{R}1+\left[-4\right]\ast \mathrm{Y}1+7\left\{4\ast \mathrm{S}6+2\ast \mathrm{R}3+64\ast \mathrm{S}3+\left[-16\right]\ast \mathrm{R}3\right\}\right\}\right\}\mathrm{X}$$
or
$$\mathrm{Y}=\mathrm{RE}1+63\left(\mathrm{RE}2+8\mathrm{QO}2-\mathrm{QO}2\right)$$

where expressions RE1=32*S1+4*S1+32*S2+128*S4+[-128]*S6+[-8]*S6+ [-32]*R2+4*R2+[-2]*Y1+[-16]*X1

RE2 = [-1]*X1+[-2]*S7+4*S2+[-4]*S3+16*S1+R1+[-4]*Y1
$$\mathrm{QO}2=4\ast \mathrm{S}6+2\ast \mathrm{R}3+64\ast \mathrm{S}3+\left[-16\right]\ast \mathrm{R}3$$

Equation (3h) can further be simplified and written as:
$$\mathrm{Y}=\mathrm{RE}1+64\left(\mathrm{RE}2+8\mathrm{QO}2-\mathrm{QO}2\right)-\left(\mathrm{RE}2+8\mathrm{QO}2-\mathrm{QO}2\right)$$

The number of adders after all the three optimizations
are 27.

The resultant structure after the three optimizations(from
equation (3i)) is shown in Figure 3(b4). The input X of the filter is given to plurality
of unit sample delays (11) resulting in signals S0 to S9, these signals are given
to a group of sub-structures, sub-structure SS[R1] with inputs S8,S9 results in
signal R1,sub-structure SS[R2] with inputs S7, S3 results in signal R2,sub-structure
SS[R3] with inputs S2,S4 results in signal R3,sub-structure SS[Y1] with inputs S5
results in signal Y1,substructure SS[X1] with inputs S0,S4 results in signal X1,sub-structure
SS[RE1] with inputs X1,S2,R2,S1,S6,Y1 results in signal RE1,sub-structure SS[RE2]
with inputs X1,S7,S2,S3,R1,S1,Y1 results in signal RE2, sub-structure SS[Q02] with
inputs S3,S6,R3 results in signal Q02;sub-structure SS[Y] with inputs RE1,RE2,QO2
results in output Y of the direct form of FIR filter, the group of substructures
formed from plurality of adders(13,15) and shifters(14) is called direct form of
coefficient bank(1).

__Generalized Structure depicting the minimal area integrated circuit implementation
of the sum of products expression:__
The generalized structure is shown in figure 3(a3) to map
results of the algorithm, the generalized structure contains taps S0 to Sn from
plurality of unit sample delays(11) connected to a group of sub structures SS[R1]
to SS[Ro] with inputs S0 to Sn each generating plurality of signals R1 to Ro. The
sub structures SS[Y1] to SS[Yp] with inputs S0 to Sn generate plurality of signals
Y1 to Yp, where p is any positive number; sub-structures SS[X1] to SS[Xq] with inputs
S0 to Sn,R1 to Ro and Y1 to Yp each generating signals X1 to Xq, where q is any
positive number, sub-structures SS[QO1m]...SS[RE1m...] with inputs S0 to Sn,R1 to
Ro,Y1 to Yp and X1 to Xq generating signals QO1m... RE1m etc., and substructure
SS[Y] formed from inputs QO1m to RE1m generating output Y, these sub-structures
contains plurality of adders (13, 15) and shifter 14,these sub-structures connected
together form a direct form of FIR filter or IIR filter.

The invented method for direct form implementation of the
coefficient bank can be used to implement single number also. This is done by giving
a single number at the input of the algorithm in Figure 3(a1). Single number given
to the algorithm would result in a optimized shift and add hardware structure of
the number.

Method for implementation of sum of products expression
to thereby implement the transposed form of coefficient bank: (An embodiment of
the instant invention)

The algorithm involved in reduction of number of adders
required to implement the coefficient set is shown in Figure 3(c1). This algorithm
is explained with the help of generalized equations in Figure 3(c2)(extended to
2 pages). Part 2 of the method is a generalized structure to map the results obtained
from the algorithm as shown in Figure 3(c3).

The equations of the individual coefficients in transposed form of coefficient bank
for the example are:

S1 = 1044X, S2 = -5890X, S3 = 27916X, S4 = 49362X, S5 = -8382X, S6 = 1628X,S7 =
-154X,S8 = 63X,S9 = 126X

The following discussion is about the optimization of the coefficients for transposed
form of coefficient bank for the example using the algorithm in Figure 3(c1).

Step 1: Decompose the coefficient set into odd fundamentals
and multiplication factor set:
$$\left\{\mathrm{coefficient\; set}\right\}=\left\{\mathrm{odd\; fundamental\; set}\right\}\ast \left\{\mathrm{multiplication\; factor\; set}\right\}$$

(Each coefficient is divided with 2 till it is odd, when
it is odd, magnitude of odd number is moved to odd fundamental set. The number of
2's with which each coefficient is divided is moved into multiplication factor set
along with the sign of the coefficient)

Example coefficient set decomposes as follows:

{-79,1044,-5890,27916,49362,-8382,1628,-154,63,126}= {79,261,2945,6979,24681, 4191,
407, 77,63,63} * {-1,4,-2,4,2,-2,4,-2,1,2},

where {coefficient set} = {-79,1044,-5890,27916,49362,-8382,1628,-154, 63,126},
{odd fundamental set} = {79,261,2945,6979, 24681, 4191,407, 77,63,63} {multiplication
factor set} = {-1,4,-2,4,2,-2,4,-2,1,2}.

Step 2: Check if all the elements of the odd fundamental
set has all cost 1s and 1s as odd fundamentals. If yes implement cost 1 odd fundamental
as 2^z+1 or 2^z-1, where z = ceil (log2 (odd fundamental)), and use cost1 to implement
coefficients and 'exit', else proceed to step 3. In the example odd fundamental
set has only one cost 1 coefficient (63), and rest all are higher cost numbers,
hence proceed to step 3.

Step 3: Check if odd fundamental set has cost 1 odd fundamentals.
If yes proceed to step 4,else proceed to step 5.In the example odd fundamental set
has '63' which is the cost1 odd fundamental. Hence proceed to step 4.

__Note:__ Cost1 odd fundamental is computed as follows:

prec_of_number = ceil(log2(odd fundamental)).

If at any stage, odd fundamental = 2^z+1 or 2^z-1,z varying from 1 to prec_of_number,
then that odd fundamental is cost 1.

Step 4:Decompose 'odd fundamental set' into 'graph set'
and 'incomplete set', ('graph set' consists of cost 1 odd fundamentals from the
odd fundamental set, these cost1 odd fundamentals are implemented in a graph as
2^z+1 or 2^z-1, where z = ceil (log2 (odd fundamental))), ('incomplete set' consists
of un implemented odd fundamentals).

For the given example the odd fundamental set {79,261,2945,6979, 24681, 4191, 407,
77,63,63} is decomposed to {63} and {79,261,2945,6979,24681,4191,407,77};

where graph set = {63}, 63 is implemented in graph as (2^6)-1. The signals S8 and
S9 are obtained as S8 = ((2^6)-1)X, S9 = (2^1)*((2^6)-1)X, where coefficients 63
=((2^6)-1) & 126 =(2^1)*((2^6)-1), incomplete set = {79,261,2945,6979, 24681, 4191,
407, 77}.

Step 5:Form the 'incomplete set' from the odd fundamentals
greater than 1.When it is found in step 3 that there are no cost1 odd fundamentals,
then there is no 'graph set'. All numbers in the odd fundamental set which are greater
than 1 are moved to 'incomplete set'. In the example under consideration step 5
is not applicable at this stage.

Step 6: Determine the cost and coefficient equation of
numbers in incomplete set using invented algorithm in Figure 3(a1) and make a look
up table with cost and coefficient equation. Use the algorithm in Figure 3(a1) with
each number of incomplete set separately to obtain coefficient equation, 'cost of
implementation of a number in incomplete set' (cost) = cost of implementation of
each number in coefficient equation+number of adders or subtractors in the coefficient
equation.

Incomplete set for the example is {79,261,2945,6979, 24681,
4191, 407, 77}, cost & coefficient equation of each number in incomplete set found
from the algorithm in Figure 3(a1) is in look up table (Table 6).
Table 6: Cost & coefficient equation of each number in incomplete set:
Number
Cost
Coefficient equation

79
2
63+16

261
2
257+4

2945
3
2049+896

6979
4
7[-124+257(16)]

24681
3
1023(24)+127

4191
2
127*33

407
3
3(136)-1

77
3
-12+65(-1)

Look up table is obtained using the algorithm of Figure
3(a1) with each number processed by algorithm separately, for example number 79
when processed by the algorithm of Figure 3(a1) that gives common factor 63 with
remainder as 16.The cost of implementation of 79 is 2. Since 63 is a cost1 number
and there is a '+' to add 63 and 16. Hence it is noted in the look up table 6 shows
that 79 has cost of '2' and coefficient equation of 79 is '63+16'.Similarly for
all the other numbers in incomplete set coefficient equation and cost are noted
in Table 6.

Step7: Check if adder distance* for smallest cost, smallest
magnitude number (found from look up table formed in step 6) in incomplete set is
1. If yes proceed to step 8 else it means adder distance is greater than 1 or it
is not possible to implement the number from graph set and proceed to step 9.

__* Adder distance:__
Adder distance 1 of the selected number from incomplete
set = fundamental coefficients from the existing graph set + a power of 2.

For the example incomplete set = {79,261,2945,6979, 24681,
4191, 407, 77}, the smallest cost, smallest magnitude number in incomplete set from
Table 6 is '79',79 is having adder distance '1' as '79' = 63 (existing fundamental
from the graph set)+16.

Step 8: Use the graph set to implement the selected number
(from step7) in 'incomplete set' and remove the number from 'incomplete set' and
move it to 'graph set'. For the given example the number 79 has adder distance '1'
hence it is implemented in the form of the graph as '63+16' (with '-1' at the end
of the structure as coefficient is -79), and 79 is removed from 'incomplete set'
and is added to 'graph set'. Now graph set = {63,79}, incomplete set = {261,2945,6979,
24681, 4191, 407, 77}, now move to step 10.

Step 9: Use the coefficient equation in look up table obtained
in step6 to implement the number in incomplete set and add the odd fundamentals
in the coefficient equation into the graph set. For the example in step 7 it is
found that the number 79 has adder distance I.If the number to be implemented has
adder distance more than 1 then the implementation of the coefficient '79' can be
searched from the look up table.

Step 10: Check, whether the incomplete set is Null if yes
exit, if not go to the step 7. For the example as the incomplete set {261,2945,6979,
24681, 4191, 407, 77} is not a null set then go to step 7 with the incomplete set.
For the example step 7 selects 261 as next number to be implemented, and adder distance
for the number '261' is neither 1 nor 2. Hence step 9 is invoked and the implementation
in look up table 6 for number 261 i.e., (257+4)(with 4 at the end of the structure
as coefficient is 4*261=1044) is taken, and the odd fundamental 257 and 261 are
added to the graph.

Implementation of remaining numbers in incomplete set of
the example using the step7 to step 10 of algorithm in Figure 3(c1) results in equations
as follows,
$$\mathrm{S}1=1044\mathrm{X}=16+\left(257\right)\ast 4$$
$$\mathrm{S}2=-5890\mathrm{X}=-\left(7\right)\ast 256+\left(-2\right)\ast \left(2049\right)$$
$$\mathrm{S}3=27916\mathrm{X}=\left(7\right)\ast \left(\left(-124\right)+\left(257\right)\ast 16\right)$$
$$\mathrm{S}4=49362\mathrm{X}=\left(258\right)+\left(1023\right)\ast \left(48\right)$$
$$\mathrm{S}5=-8382\mathrm{X}=\left(127\right)\ast \left(-66\right)$$
$$\mathrm{S}6=1628\mathrm{X}=-4+\left(17\right)\left(96\right)$$
$$\mathrm{S}7=-154\mathrm{X}=\left(79-2\right)\ast \left(-2\right)$$

Figure 3(c3) shows the generalized structure for the transposed
form of the coefficient bank, where the structure is having a plurality of sub structures
SS [SO], SS[S1]...SS[Sn]. SS[S0] has X as input and S0 as output, SS[S1] has A and
X as input and S1 as output. The sub-structures are formed from adders (13,15) (
adders or subtractors are referred as adders for the ease of explanation), shifter(14),
to form transposed form of coefficient bank(2). The taps obtained from structure
of the individual coefficient equations are connected to unit sample delays (11)
to form transposed form of FIR / IIR filter.

Number of adders required to implement the graph for the
example in Figure 3(d) are 18.Resultant graph structure for all the coefficients
of the example for the transposed form of the coefficient bank (2) is shown in Figure
3(d). This structure is formed from the plurality of the adders (13,15) and the
shifter (14) connected together depending on coefficients. Part 2 of the method
is the generalized structure formed from the algorithm in part 1,the structure has
plurality of sub-structures SS[S0],SS[S1]... SS[Sn], where SS[S0] has X as input
and S0 as output, SS[S1] has A and X as input and S1 as output etc., the sub-structures
are formed from adders (13,15), shifter 14, to form transposed form of coefficient
bank(2),the taps obtained from structure of the individual coefficient equations
are connected to unit sample delays (11) to form a transposed form of FIR filter
or IIR filter.

Advantages of the invented method:
Number of adders using CSD based method and the invented
system to implement the example coefficient bank are tabulated in Table 7.
Table 7: Number of adders using CSD based method and invented method to implement
the example coefficient bank:
Existing (CSD)
Invented

Direct
34
27

Transposed
20
18

It is evident from the above table that the invented method
has less number of adders compared to existing method in implementation of direct
and transposed form of coefficient bank. Thus, it is observed that the instant invention
provides a minimal area implementation of a sum of products expression.