# Design of MIMO-OFDM systems Physical layer with minimal hardware complexity and Low Power Consumption

M.Saraswathi Assistant Professor Panimalar Engineering College Chennai, India m.saraswathi16@yahoo.com

Abstract— In this paper the design of 128/64 point Fast Fourier transform processor (FFT Processor) is proposed to support future generation Multiple Input Multiple Output Orthogonal Frequency Division Multiplexing (MIMO OFDM) based IEEE 802.11n wireless local area network base band processor. The pipelined mixed radix delay feedback (MRMDF) FFT architecture is proposed to provide a higher throughput rate combining the characteristics of both Single path Delay Feedback (SDF) which is used to reduce memory size and Multipath Delay Commutator (MDF) by using the multidatapath scheme. That is higher throughput rate can be provided by using four parallel data path. The proposed processor not only supports the operation of FFT in 128 point and 64 point but can also provide different throughput rates for 1-4 simultaneous data sequence to meet IEEE 802.11n requirements. Further, less complexity is needed in this deign compared with traditional four parallel approach. The hardware cost of memory and complex multipliers are less using this proposed FFT/IFFT processor, since this paper is done by using delay feedback and data scheduling approach. The higher order radix FFT algorithm reduces the number of complex multiplication. The proposed FFT/IFFT processor requires minimum memory since feedback approach is used for reordering the input data and the immediate results of each module. The mixed radix FFT algorithm is used to save power consumption and is measured as 296.96 mW and of Minimum period within 6.204ns (Maximum Frequency: 161.188MHZ) This paper is VLSI based and the language used is VHDL. The software used for simulation of this paper is XILINX 8.2.

Keywords-FFT/IFFT-Fastfouriertransform/InverseFast fourier transform, MIMO-OFDM-Multiple input multiple output- orthogonal frequency division multiplexing, throughputdata transfer from one location to another in a given amount of time.

## I. INTRODUCTION

The multiple input multiple output schemes have been proposed as an efficient solution for the high speed fourth generation (4G) wireless systems. Orthogonal frequency division multiplexing (OFDM) technique has high bandwidth efficiency and ability to reduce multi path channel effect. The combination of MIMO-OFDM is considered as a promising solution for enhancing the data rates of the next generation 4G wireless communication systems operating in frequency-selective based fading environments. The MIMO-OFDM system transmits multiple data streams; which requires several independent base band processors. Hence, its hardware and computational complexity increases compared with single-input single-output OFDM (SISO-OFDM) systems.

This technology has been employed in the physical layer specification of the emerging IEEE 802.11n standard to provide wireless LAN access services. Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (IFFT) are the crucial computational blocks to the base band multicarrier demodulation and modulation schemes in a physical layer of WLAN IEEE 802.11n.

FFT computations are also high and a better approach than a general purpose processor is required, to fulfill the requirements at a reasonable cost with lower power consumption and higher throughput. The High throughput Task Group which establishes IEEE 802.11n.standard is going to draw up the next-generation wireless local area network (WLAN) proposal based on the 802.11 a/g which is the current OFDM-based WLAN standards. The IEEE 802.11n standard based on the MIMO OFDM system provides very high data throughput rate from the original data rate 54 Mb/s to the data rate in excess of 600 Mb/s because the technique of the MIMO can increase the data rate by extending an OFDM-based system. However, the IEEE 802.11n standard also increases the computational and the hardware complexities greatly, compared with the current WLAN standards. It is a challenge to realize the physical layer of the MIMO OFDM system with minimal hardware complexity and power consumption-especially the computational complexity-in VLSI implementation.

The FFT/IFFT processor is one of the highest computational complexity modules in the physical layer of the IEEE 802.11n standard. According to IEEE 802.11n standard, the execution time of 128-point and 64-point with 1-4 simultaneous data sequences must be calculated within 3.6 or 4.0  $\mu$  s[5]. Therefore, if employing the traditional approach to solve the simultaneous multiple data sequences, several FFT/IFFT processors are needed in the physical layer of a MIMO OFDM system. Thus, the hardware complexity





of the physical layer in a MIMO OFDM system will be very high. The pipelined FFT architecture typically falls into one of the two following categories. One is multipath delay commutator (MDC) and the other is single-path delay feedback (SDF)[8]. In general, the MDC scheme can achieve higher throughput rate by using multiple data paths, while the SDF scheme needs less memory and hardware complexity with the delay feedback scheme. Besides, the operation of the complex multiplication consumes lots of power in the FFT processor. This paper meets the solution to the problems analyzed above.

# II. PROPOSED FFT/IFFT ARCHITECTURE FOR A MIMO-OFDM SYSTEM

This paper proposes an FFT processor with a novel multipath pipelined architecture to deal with the issue of the multiple data sequences for MIMO OFDM applications. The 128/64-point FFT with 1-4 simultaneous data sequences can be supported in this paper with minimal hardware complexity. Further the power consumption can also be saved by using higher radix FFT algorithm. Three-step radix-8 FFT algorithm is chosen in for design to save complex multiplications [11].

Since 128-point FFT is not a power of 8, the mixed-radix FFT algorithm combining two different FFT algorithms is needed. The mixed-radix multipath delay feedback (MRMDF) FFT architecture can provide higher throughput rate with minimal hardware cost by combining the features of MDC and SDF.

The features proposed MRMDF architecture are the following:

- Higher throughput rate can be provided by using four parallel data paths;
- The minimum memory is required by using the delay feedback approach to reorder the input data and the intermediate results of each module;
- The 128-point mixed-radix FFT algorithm is implemented to save power consumption;
- The number of complex multiplier is minimized by using the scheduling scheme and the specified constant multipliers.

# A. Algorithm used : (Mixed Radix 8-2 FFT with bit reversing output sequences for 128/64 point FFT)

Given a sequence x (n), N point discrete Fourier Transform (DFT) [13] is defined as

$$x(k) = \sum_{n=0}^{N-1} x(n) w_N^{kn} \quad \kappa = 0, 1, 2, ..., 127$$
(1)

Where x (n) and x (k) are complex number.

The twiddle factor is given  
by 
$$w_N^{nk} = e^{-j(2\pi nk/N)} = \cos\left(\frac{2\pi nk}{N}\right) - i\sin\left(\frac{2\pi nk}{N}\right) \square \square$$

Since 128 point FFT is not a power of 8, the mixed radix FFT algorithm, including the radix-2 and radix-8 FFT algorithms are needed.

$$N = QR \tag{3}$$

For 128 point FFT,  $Q = 8^2$  and R = 2

$$N = 64*2$$
 (4)

It is two dimensional FFT.

In the two dimensional FFT n is indexed by q and r, where  $0 \le q \le Q - 1$  and  $0 \le r \le R - 1$ 

Let N=128 we select the mapping of n:

$$n = Qr + q \begin{cases} r = 0, 1 \\ q = 0, 1, \dots, 63 \end{cases}$$
(5)

We select the mapping of k by 1 and m, where  $0 \le l \le Q - 1$  and  $0 \le m \le R - 1$ 

$$k = 2l + m \begin{cases} l = 0, 1, \dots, 63 \\ m = 0, 1 \end{cases}$$
(6)  
$$x(2l + m) = \sum_{q=0}^{63} \sum_{r=0}^{1} x(64r + q)W_{128}(64r + q)(2l + m)$$
$$= \sum_{q=0}^{63} \sum_{r=0}^{1} x(64r + q)W_{128}^{128rl}W_{128}^{qm}W_{128}^{64rm}W_{128}^{2ql}$$
$$= \sum_{q=0}^{63} \begin{cases} \sum_{r=0}^{1} x(64r + q)W_{2}^{rm} \\ r=0 \end{cases} W_{128}^{qm} W_{128}^{qm} W_{128}^{ql} (7)$$



Figure 1. The basic butterfly structure for mixed-radix 8-2FFT algorithm



The IFFT of an N-point sequence x (k), k=0, 1, 2, ..., N-1

$$x(n) = \frac{1}{N} \sum_{k=0}^{N-1} x(k) w^{-nk} \quad (8)$$

# B. Block diagram of FFT/IFFT Architecture for MIMO OFDM System physical layer

In this proposed processor, the FFT architecture can not only deal with 1-4 simultaneous data sequences for MIMO-OFDM applications but also save lots of hardware complexity, compared with the traditional approach. Data in



Figure 2. Block diagram of Proposed FFT/IFFT Architecture for MIMO OFDM System physical layer

### 1) Module 1(Data Reordering)

Several different-size delay elements and a switch block are used for data reordering. This implement the operation of FFT and IFFT with 1-4 simultaneous data sequences more efficiently using data reordering. It avoids the data sequences multiplied by the same twiddle factor in each data path. The four adjoining sequences are separated by one delay unit. Then the separated data will be reordered among four sequences by the appropriate operation of the switch [13]. Finally, the separated data will be adjusted by the delay elements. The reordered data will be separated into 32 groups or 16 groups for 128- or 64-point, respectively. Each group contains four data sequences. The operation of the FFT is data dependent.



Figure 3. Block diagram for Data reordering (module 1)

## 2) Module 2 (Radix 2 FFT algorithm)

In general, four complex multipliers are needed in the four-parallel approach to implement a radix-2 FFT algorithm [11]. In Mixed Radix Multipath Delay Feedback (MRMDF) FFT Processor, a group concept is used to deal with multiple data sequences.



Figure 4. Block diagram for Radix 2 (module 2)

The block diagram of Module 2 consisting of a memory, 4 butterfly units of radix-2 FFT algorithm (BU\_2), two complex multipliers, two ROMs, and some multiplexers. If 64-point FFT/IFFT is operated in our scheme, the data will skip this module. The memory size can store the data of 16 groups containing 256 complex data. Because four data paths are adopted in our design, four memory banks are needed to provide four data from memory to BU\_2 simultaneously. Only1/8 period of cosine and sine waveforms are stored in ROM and the other period waveforms can be reconstructed by these stored values. The operation of BU\_2 is complex addition and complex subtraction from two input data.

Because radix-2 FFT algorithm is adopted in this module, BU\_2 can not start until both input sequences x(n) and x(64+n), when n=0,1,2......63 are available. Four BU\_2s generate eight output data according to radix-2 FFT algorithm; four output data will be fed to the Module 3 directly, while the other four output data will be stored in the memory. After the operation of BU\_2, there are 16 groups fed to the Module 3 and 16 groups fed back to the memory. Then, these data stored in the memory are read and are multiplied by the twiddle factors simultaneously before they are sent to Module 3.

# 3) Module 3 (Radix-8 FFT algorithm)

It consists of three stages and one modified complex multiplier, as shown in figure 5. The architecture is directly mapped from 3-step radix-8 FFT algorithm [5].



Figure 5. Block diagram of radix 8 FFT (Module 3)

Four identical components are used in each stage since four-parallel approach is adopted. In our design, four input data are loaded from the previous module simultaneously. The different data sequences A, B, C, and D is considered as



a group. The size of the delay element in three stages is 32 (8 group), 16 (4 group) and eight (2 group), respectively.

The function of delay element is to store the input data until the other available input data is received for the butterfly unit operation. The data of the first 8 groups are stored in the delay elements in the first stage. When the data of the next groups are valid from the input, eight data are fed to four butterfly unit from both the input and the delay element. The output data generated by the butterfly unit in the first stage or second stage are multiplied by a trivial twiddle factor before they are fed to the next stage. These twiddle factors can be implemented efficiently. However the four output data from the third stage need to be multiplied by the nontrivial twiddle factors simultaneously in the modified complex multiplier.

In the same group, the output data of each data path are multiplied by the same twiddle factors. It is inefficient to build four complex multipliers in four data paths for multiplying different twiddle factors simultaneously. So the proposed approach is to reduce the complexity of the complex multipliers. The twiddle factors of the modified complex multiplier are the real and imaginary parts of the twiddle factor and range from 0 to 49.

In practice, only nine sets of constant values, with 0 to 8 are needed because the other twiddle factors can be obtained by using the trigonometric function. Also these constant values can be realized more efficiently by using several adders and shifters. This approach saves the gate count by about 38 percent when compared to four-complex-multiplier approach. Also the performance of this approach is equivalent to that of the four complex multipliers.

# 4) Module 4 (Radix -8 FFT algorithm)

The Three-step radix-8 FFT algorithm is realized in radix 8. This corresponds to the last stage of FFT/IFFT processor. So the structure of module is modified to ensure that the FFT output data are correct. Some output data generated by the BU\_2 in the first stage and second stage are multiplied by the nontrivial twiddle factors before they are fed to the next stage.



Figure 6. Block diagram of radix 8 FFT (Module 4)

III. RESULT



Figure 7. FFT Processor Output

| 🚰 🖬 🗠 र व्या र 🏘 😤 व 🎜 📢         |             |             |           |  |
|----------------------------------|-------------|-------------|-----------|--|
|                                  | Voltage (V) | Current (mA | Power (mW |  |
| Vecint                           | 1.8         |             |           |  |
| Dynamic                          |             | 1.25        | 2.26      |  |
| Quiescent                        |             | 160.00      | 288.00    |  |
| Vcco33                           | 3.3         |             |           |  |
| Dynamic                          |             | 0.00        | 0.00      |  |
| Quiescent                        |             | 2.00        | 6.60      |  |
| Total Powe                       |             |             | 296.86    |  |
| Startup Curre                    |             | 500.00      |           |  |
| Battery Capacity (mA Hours) 0.00 |             |             |           |  |
| <                                |             |             | >         |  |
| Summary                          | Power S     | Current S   | Thermal   |  |
|                                  |             |             |           |  |

Figure 8. Power Consumption

# IV. PERFORMANCE

The proposed FFT/IFFT processor hardware costs in four-data-sequence 128-point FFT are as follows.

Memory size:

- Module 1: 16 words;
- Module 2: 256 words;
- Module 3: 224 words;
- Module 4: 16 words;

Total memory size is 514 words.

Complex multipliers: 16, where the complexity of modified complex multipliers.
 Complex addees: 48

| • Complex adders: 48   |                |                               |                          |  |  |
|------------------------|----------------|-------------------------------|--------------------------|--|--|
| MODULE2 Project Status |                |                               |                          |  |  |
| Project File:          | Module2.ise    | Current State:                | Synthesized              |  |  |
| Module Name:           | Module_2       | • Errors:                     | No Errors                |  |  |
| Target Device:         | xcv600e-8fg900 | <ul> <li>Warnings:</li> </ul> | 13 Warnings              |  |  |
| Product Version:       | ISE 9.1i       | • Updated:                    | Fri Jul 31 15:10:13 2009 |  |  |
|                        |                |                               | OFFIC                    |  |  |



| М                                   | ODULE2 Partition Summa     | ary         |             |
|-------------------------------------|----------------------------|-------------|-------------|
| No partition information was found. |                            |             |             |
|                                     |                            |             |             |
| Device Ut                           | ilization Summary (estimat | ted values) |             |
| Logic Utilization                   | Used                       | Available   | Utilization |
| Number of Slices                    | 3568                       | 6912        | 51%         |
| Number of Slice Flip Flops          | 2530                       | 13824       | 18%         |
| Number of 4 input LUTs              | 6667                       | 13824       | 48%         |
| Number of bonded IOBs               | 258                        | 512         | 50%         |
| Number of GCLKs                     | 1                          | 4           | 25%         |

| Detailed Reports          |         |                          |        |             |          |
|---------------------------|---------|--------------------------|--------|-------------|----------|
| Report Name               | Status  | Generated                | Errors | Warnings    | Infos    |
| Synthesis Report          | Current | Fri Jul 31 15:10:11 2009 | 0      | 13 Warnings | 12 Infos |
| Translation Report        |         |                          |        |             |          |
| Map Report                |         |                          |        |             |          |
| Place and Route Report    |         |                          |        |             |          |
| Static Timing Report      |         |                          |        |             |          |
| Bitgen Report             |         |                          |        |             |          |
| MODULE3 Partition Summary |         |                          |        |             |          |

No partition information was found.

| Device Utilization Summary (estimated values) |      |           |             |  |
|-----------------------------------------------|------|-----------|-------------|--|
| Logic Utilization                             | Used | Available | Utilization |  |
| Number of Slices                              | 5463 | 6912      | 79%         |  |
| Number of Slice Flip Flops                    | 8192 | 13824     | 59%         |  |
| Number of 4 input LUTs                        | 5384 | 13824     | 38%         |  |
| Number of bonded IOBs                         | 4066 | 512       | 794%        |  |
| Number of GCLKs                               | 1    | 4         | 25%         |  |

| Detailed Reports          |         |                          |        |              |       |
|---------------------------|---------|--------------------------|--------|--------------|-------|
| Report Name               | Status  | Generated                | Errors | Warnings     | Infos |
| Synthesis Report          | Current | Fri Jul 31 15:11:32 2009 | 0      | 113 Warnings | 0     |
| Translation Report        |         |                          |        |              |       |
| Map Report                |         |                          |        |              |       |
| Place and Route Report    |         |                          |        |              |       |
| Static Timing Report      |         |                          |        |              |       |
| Bitgen Report             |         |                          |        |              |       |
| MODULE4 Partition Summary |         |                          |        |              |       |

No partition information was found.

| Device Utilization Summary (estimated values) |      |           |             |  |  |
|-----------------------------------------------|------|-----------|-------------|--|--|
| Logic Utilization                             | Used | Available | Utilization |  |  |
| Number of Slices                              | 340  | 6912      | 4%          |  |  |
| Number of Slice Flip Flops                    | 512  | 13824     | 3%          |  |  |
| Number of 4 input LUTs                        | 337  | 13824     | 2%          |  |  |
| Number of bonded IOBs                         | 258  | 512       | 50%         |  |  |
| Number of GCLKs                               | 1    | 4         | 25%         |  |  |

| Detailed Reports       |         |                          |        |             |       |
|------------------------|---------|--------------------------|--------|-------------|-------|
| Report Name            | Status  | Generated                | Errors | Warnings    | Infos |
| Synthesis Report       | Current | Fri Jul 31 15:12:19 2009 | 0      | 62 Warnings | 0     |
| Translation Report     |         |                          |        |             |       |
| Map Report             |         |                          |        |             |       |
| Place and Route Report |         |                          |        |             |       |
| Static Timing Report   |         |                          |        |             |       |
| Bitgen Report          |         |                          |        |             |       |

| <b>F'</b> 0 | D '       | .1.     | C          | C 1      | 1 1    |
|-------------|-----------|---------|------------|----------|--------|
| Figure 9.   | Device ui | 111EV D | ertormance | tor each | module |
|             |           |         |            |          |        |

The proposed FFT architecture can not only implement three-step radix-8 FFT algorithm in 128-point FFT to reduce the number of complex multiplications but also provide four times throughput rate, compared with SDF scheme.

#### V. SUMMARY

Timing Summary:

Speed Grade: -5

Minimum period: 6.204ns (Maximum Frequency: 161.188MHz) Device utilization summary:

| Selected Device:            | 3s400pq208-5         |
|-----------------------------|----------------------|
| Number of Slices:           | 3033 out of 584 84%  |
| Number of Slice Flip Flops: | 5649 out of 7168 78% |
| Number of 4 input LUTs:     | 3244 out of 7168 45% |
| Number of IOs:              | 546                  |

#### VI. CONCLUSION

In this design, 64-point and 128-point FFT/IFFT can be supported. Based on the concept of data reordering and grouping, the processor can provide different throughput rates to deal with 1–4 simultaneous data sequences more efficiently. Furthermore, the hardware costs of memory and complex multiplier can be saved by adopting delay feedback and data scheduling approaches. And the number of complex multiplications can be reduced effectively by using higher radix FFT algorithm.

#### REFERENCES

- H.Sampath, S.Talwar, J.Tellado, V.Ereeg and A.Paulraj, "A fourth generation MIMO-OFDM broadband wireless system, Design, performance, dan field trial results, "IEEE communication Magazine, vol.40, no.9, pp143-149, sept.2002.
- [2] A.Van Zelst, Tiim C.W. Shenk, "Implementation of a MIMO-OFDM based wireless LAN system, "IEEE Trans. On signal processing, vol.52, no.2, pp483-494, feb.2004.
- [3] H.Boloskei and E.Zurich, "MIMO-OFDM wireless system: basics, perspectives and challenges, "IEEE Trans. Wireless comm. Vol.13, no.4, pp31-37, aug.2006.
- [4] G.L. Stuber, J.R. Barry, S.W. Mc Laughlin, Y.Li, M.A. Ingram, and T.H. Pratt, "Broadband MIMO-OFDM wireless communications", in Proc. IEEE vol.92, no.2, pp271-297, feb.2004.
- [5] Yu-weblin and Chen-Y; Lee, "Design of an FFT/IFFT processor for MIMO-OFDM system, "IEEE Transactions on circuits and systems, vol.54, no.4, April 2007.
- [6] S.Magar, S.Shen, G.Luikuo, M.Fleming and R.Aguilar, "An application specific DSP Chip set for 100-MHz data rates", in puoe. Int. conf. Acoustics, speech, signal process, vol.4, pp1998-1992, Apr.1988.
- [7] J.O "Brein, J.Mather and B.Holland," A 200 MIPS single chip 1k FFT processor, in proc. IEEE Int. solid-state circuit's conf. 1989, vol.36, and pp.166-167. 327.
- [8] H.Shousheng and M.Torkelson, "Designing pipeline FFT processor for OFDM (de) modulation," in proc URSI Int. symp on signals, system election, oct 1998, vol. 29, pp. 257-262.
- Beven M.Beas, "A Low-power, High performance, 1024-point FFT processor," IEEE Jornal of solid-state circuits, vol.34, No. 3, pp. 380-87, Mar.1999.
- [10] J.W.Cooley and J.W Turkey, "An algorithm for the machine calculation of complex Fourier series," Math. Comput. Vol.5, No.5, pp.87-109, 1965.
- [11] P.Duhamel, "Algorithms meeting the lower bounds on the multiplicative complexity of length-2n DFT's and their connection with practical algorithms, "IEEE Trans. Acoust., speech, signal processing, vol.39, Sept.1990.
- [12] S.He and M.Torkelson, "Designing pipeline FFT processor for OFDM (de) modulation," in proc. IEEE URSI Int. symp. Signals, syst. Electron, 1998, pp.257-262.
- [13] M.Mohammed ismail,Dr.M.J.S.Rangachar,Dr.Ch.D.V.Pradesi Rao"VLSI Implementation and performance analysis of efficient mixed radix 8-2 FFT algorithm with bit reversal for the output sequences",IEEE URSI Int. symp. Signals, syst. Electron, 1998, pp.257-262.

