<%BANNER%>

Fast Fourier Transform Implementation Using Field Programmable Gate Array Technology for Orthogonal Frequency Division M...


PAGE 1

FAST FOURIER TRANSFORM IMPLEMENTATION USING FIELD PROGRAMMABLE GATE ARRAY TECHNOLOGY FOR ORTHOGONAL FREQUENCY DIVISION MULTIPLEXING SYSTEMS By RAMA KRISHNA LOLLA A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2002

PAGE 2

Copyright 2002 by RAMA KRISHNA LOLLA

PAGE 3

To My Family & BABA

PAGE 4

iv ACKNOWLEDGMENTS I would like to extend my thanks to Dr. Fred J. Taylor for his suggestions at all the stages of the project. This project would not have taken shape without his guidance. I would like to thank my advisors, Dr. John G. Harris and Dr. John M. Shea, for their timely suggestions. I am also thankful to my colleagues in the High Speed Digital Architecture Laboratory for their support. I would also like to acknowledge the continuous support my family has given me during the course of my work.

PAGE 5

v TABLE OF CONTENTS page ACKNOWLEDGMENTS..................................................................................................iv LIST OF TABLES............................................................................................................vii LIST OF FIGURES..........................................................................................................viii ABSTRACT.......................................................................................................................ix CHAPTER 1 INTRODUCTION...........................................................................................................1 OFDM Overview.................................................................................................................1 FFT Algorithms Explored...................................................................................................2 Thesis Organization.............................................................................................................3 2 OFDM THEORY AND IMPLEMENTATION..............................................................4 Description of the Wireless Channel...................................................................................4 History of OFDM................................................................................................................6 3 ALGORITHM THEORY AND DESCRIPTION...........................................................9 Cooley-Tukey Algorithm....................................................................................................9 Complexity Analysis..............................................................................................12 Radix-2 Algorithm.................................................................................................13 Radix-4 Algorithm.................................................................................................15 Chirp-z Algorithm.............................................................................................................17 4 FIELD PROGRAMMABLE GATE ARRAYS............................................................24 Power Calculations in FPGAs...........................................................................................27 Costs Involved in FPGA Fabrication................................................................................27 Comparison to other Technologies...................................................................................28 5 IMPLEMENTATION DETAILS AND RESULTS.....................................................29 Description of the Work....................................................................................................29 Description of Tools Used.................................................................................................32

PAGE 6

vi Results and Conclusions...................................................................................................33 Power Calculations............................................................................................................35 Noise Tolerance.................................................................................................................37 Directions of Future Work................................................................................................40 APPENDIX A 16-BIT COOLEY-TUKEY IMPLEMENTATION.....................................................41 B 32-BIT COOLEY-TUKEY AND CHIRP-Z IMPLEMENTATION...........................64 LIST OF REFERENCES................................................................................................105 BIOGRAPHICAL SKETCH..........................................................................................107

PAGE 7

vii LIST OF TABLES Table page 3.1 Time-domain index n resolved in terms of n1 and n2.................................................11 3.2 Resolution of the frequency domain index k..............................................................11 4.1 Truth table of the function implemented in Figure (4.3).............................................26 5.1 Radix–2 Cooley-Tukey implementation with round off errors...................................33 5.2 Radix-4 Cooley Tukey implementation with round off errors....................................33 5.3 Radix-2 Cooley Tukey implementation without round off errors...............................34 5.4 Radix-4 Cooley-Tukey implementation without round off errors..............................34 5.5 Power calculations for Radix-2 8-point FFT...............................................................36

PAGE 8

viii LIST OF FIGURES Figure page 2.1 Multipath Propagation...................................................................................................5 2.2 General Block Diagram of an OFDM communication system.....................................6 3.1 Cooley-Tukey Algorithm Implementation..................................................................12 3.2 Radix-2 repetitive unit.................................................................................................14 3.3 Implementation of a Radix-2 8-point FFT unit...........................................................15 3.4 Radix-4 basic block.....................................................................................................16 3.5 Chirp -z implementation.............................................................................................21 3.6 Chirp Signal.................................................................................................................21 3.7 Phase response of the Chirp Signal shown in Figure 3.6............................................22 4.1 General structure of an FPGA.....................................................................................24 4.2 Programmable Interconnection Switch.......................................................................25 4.3 A 3-input LUT implementation..................................................................................25 5.1 Implementation of Multipliers (a) shows the initial truncating configuration and Figure (b) shows the truncation operation after one more level of processing.....31 5.2 N-by-N-bit Pipelined Multiplier.................................................................................31 5.3 Model used in the Thesis work...................................................................................33 5.4 BER variations against SNR for an internal bus width of 16......................................37 5.5 BER variations against SNR for an internal bus width of 32......................................38 5.6 BER variations against SNR: Comparison of floating point results with modeled Radix –2 and Radix –4 8 point FFTs with 16and 32-bit internal bus width.......38

PAGE 9

ix Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science FAST FOURIER TRANSFORM IMPLEMENTATION USING FIELD PROGRAMMABLE GATE ARRAY TECHNOLOGY FOR ORTHOGONAL FREQUENCY DIVISION MULTIPLEXING SYSTEMS By Rama Krishna Lolla December 2002 Chair: Dr. Fred J. Taylor Major Department: Electrical and Computer Engineering Orthogonal Frequency Division Multiplexing (OFDM) is an emerging multi-carrier technique, which uses Fast Fourier Transforms (FFTs) to modulate the data onto sets of orthogonal frequencies. The core operation in the OFDM systems is the FFT unit that consumes a large amount of resources and power. The goal of this thesis was to study better implementation structures for the FFT. The Radix-2 and Radix-4 implementations of the Cooley-Tukey algorithm and the Chirp-z algorithm were implemented using the Field Programmable Gate Array (FPGA) technology. Twos complement numbering system was used in the designs, and their performance was judged on the basis of their implementation complexity and amount of power consumed for implementation.

PAGE 10

1 CHAPTER 1 INTRODUCTION Orthogonal Frequency Division Multiplexing (OFDM) is an emerging Multi-carrier technique, which uses FFTs to modulate the data onto sets of orthogonal frequencies. Orthogonality enables the frequencies to overlap while still maintaining statistical independence. The transmitter uses an IFFT to convert the “frequency domain” data into the “time domain” and the received signals are converted back into the “frequency domain” by using an FFT at the receiver. An IFFT is similar in structure to the FFT, the differences being the twiddle factors in each being the complex conjugates of other [1]. This core operation is often the limiting technology when it comes to the power consumed for its implementation. The objective of this thesis is to study the implementations of Cooley-Tukey and Chirp-z FFT algorithms onto FPGA technology to arrive at a low power, low latency configuration. OFDM Overview OFDM efficiently overcomes the problems that plague most wireless channels. Multi-path propagation is a serious hazard that introduces delay spread accounting for multiple copies of the transmitted signal to reach the receiver. This causes energy of one symbol of information to spill onto several successive symbols. This phenomenon is called Inter Symbol Interference (ISI). OFDM reduces ISI through several simultaneous transmissions, thus making it possible to have an increase in the transmission time for each symbol. OFDM moves the equalization operation to the frequency domain instead of time domain as in the case of single carrier systems.

PAGE 11

2 The OFDM implementation also has excellent ICI performance. Using the FFT, which uses bands of frequencies that are the harmonics of the fundamental frequency band, does this. This ensures minimum cross talk between the sub-carriers thereby reducing the ICI. This does not require the phase lock of the local oscillators. These properties of an OFDM system are the much sought after solutions to combat the delay spread in the wireless environment [1]. FFT Algorithms Explored The Cooley-Tukey algorithm formulates an efficient way to reduce the usage the number of complex multiplications. The algorithm allows for configuring the design in more than one way based on the fundamental repetitive unit used for implementing the longer point FFTs [2-5]. Two such configurations are explored in this thesis. They are the Radix-2 (grouping in units of 2) and Radix-4 (grouping in units of 4) Cooley-Tukey implementations. The Chirp-Z algorithm provides for greater frequency resolution that is independent of the sampling rate. Spectral resolution is greatly improved by mapping the contour closer to the poles in the z-domain. In the limiting case, when the contour chosen is a unit circle, the results are a perfect match with the Cooley-Tukey algorithm. Its implementation has blocks of circular convolutions that are usually implemented in CCDs [2-5]. An attempt has been made in this thesis work to assemble the circular convolution blocks onto FPGAs. The modules in this work were compiled onto Altera FLEX10KE family of devices using Altera MAX+PLUS II software. These FPGAs give maximum flexibility by allowing modifications in the designing as well as the testing phase. The FPGAs can

PAGE 12

3 often be used to gain quicker control over the market at a cheaper price. The internal blocks of the FPGA are usually standardized for each family of device [6-9]. Thesis Organization Chapter 2 describes the OFDM scheme and the ways it reduces the effects of ISI and ICI. Chapter 3 discusses the details of the Cooley-Tukey and the Chirp-z algorithms for the implementation of FFTs. Chapter 4 discusses the FPGA technology and its comparisons with other technologies. Chapter 5 concludes the thesis work with a detailed description of the actual work done, the results obtained and the inferences drawn.

PAGE 13

4 CHAPTER 2 OFDM THEORY AND IMPLEMENTATION Description of the Wireless Channel The wireless communication channel introduces some non-linearity in the signal. These nonlinear effects can be modeled in most cases as an filter. The receiver is assumed to receive multiple copies of the transmitted sequence with different amplitudes and phases. This is mainly due to the different signal paths (Figure 2.1) and their associated path losses. The gains distributed along the multiple paths determine the coefficients of the filtering model of the channel. In addition there are often variations along a path either due to mobile environment or climatic changes. These effects can be mitigated to some degree, by increasing the transmitted power. The more the power generated at the transmitter, the lesser the error probability. A typical power spectrum consists of some peaks and troughs signifying the distribution of power at various frequency bands. An ideal communication strategy assigns signals to those channels or bands having the highest gain. This is called Water Filling Strategy for power allocation [1]. Multi-path propagation effects cause the signal to spread in time into successive signal values. This results in Inter-Symbol-Interference (ISI). Here the energy of one symbol overlaps onto the successive symbols. This is a major concern in the case of single carrier systems. This overlap causes constructive interference at some instances and destructive interference at others. Multi-carrier systems attempt to resolve this issue by dividing the high-speed data stream into several simultaneous transmissions (thus keeping the overall

PAGE 14

5 data rate constant) so that each individual transmission has an increased transmission time. This reduces the probability of each symbol to be misread at the receiver. Figure 2.1 Multipath Propagation Orthogonal Frequency Division Multiplexing (OFDM) is one of the prominent and effective multi carrier transmission techniques. Frequency division multiplexing involves transmitting of various symbols of information at various frequency bands with some “guard band” to separate the carriers. OFDM is an advanced technique that eliminates the use of the guard band even while retaining proper decipherability at the receiver. Parallel transmission is accomplished by implementing some type of quadrature amplitude multiplexing (QAM) and then transmitting the data after performing an inverse Fourier transform. That is an inverse Fourier transform is taken at the transmitter with the received sequence processed by a FFT. The idea behind taking the inverse Fourier transform at the transmitter can be motivated as follows. Orthogonality is desired in a transmitted array data sequence having known frequency, amplitude and phase. If we assume that if the data is already assumed to be in “frequency domain” at the transmitter, the IFFT produces a “time domain” array of signals subject to some form frequency, amplitude and phase restrictions. At the receiver, the “frequency domain” data is regained by taking the FFT of the received “time-domain” sequence [1].

PAGE 15

6 History of OFDM An impractical analog implementation of a Fourier transform would involve using oscillators at the required frequencies. The oscillator drift in analog components is a major reason for the initial failure of this line of thought. This would cause the carriers to lose orthogonality and result in a phenomenon called Inter Carrier interference (ICI). The digital implementations became available, the frequency drift problems were mitigated and OFDM research once again resumed [1]. The digital implementations almost meet the orthogonality criteria by remaining at a constant frequency. The block diagram of the OFDM link is shown in Figure (2.2). Figure 2.2 General Block Diagram of an OFDM communication system The quadrature amplitude multiplexing (QAM) stage produces in-phase and quadrature components that can be fed into the Fourier transform stage as the real and imaginary parts of the complex input respectively. The Fourier transform can be thought of as a tool to simultaneously transmit an array of symbols at various frequencies giving the effect of a filter bank. Each sub-channel acts as a single carrier system and can be treated as such allowing for some statistical dependence on one another. Single carrier systems usually have an equalizer in the time domain to nullify the effects of ISI. Multi-

PAGE 16

7 carrier systems like OFDM have the equalizing phenomenon in the frequency domain to combat the ISI. ISI can be effectively eliminated by adding a “guard interval” at the end of each symbol, the length of the guard time being greater than the maximum tolerable delay spread. This reduces the spreading effect of signals onto successive symbols. Efficient utilization of bandwidth is evident in the simultaneous transmissions at a different range of frequencies. For an N-point FFT/IFFT channel, with the N simultaneous transmission the symbol time can be increased by a factor of N, thus reducing the ISI in the same proportion. The length of the FFT is however constrained by the exponentially increasing complexity of design of the transmitter and receiver modules. So the choice of the length of FFT is usually a trade off between the complexity of implementation and decipherability of data at the receiver. Orthogonality is a concept that statistically quantifies the independence among the components of the skeleton structure for describing any system. The projections of any signal along the components of the orthogonal system could sufficiently represent the system. In actual implementations, the information is sent in the shape of sinc (sin x / x) pulses so that the frequency domain representation would be rectangular pulses. The sinc pulses have nulls at periodic intervals and if the subcarriers are placed at that spacing, then the maxima of each subcarrier would occur only when all other subcarrier contributions are zero. The inherent orthogonality in an OFDM system allows the spectrum to overlap without causing any interference problems. This eliminates the usage of any steep bandpass filters that are required for other frequency division multiplexing systems implying lesser implementation complexity. The lack of orthogonality causes

PAGE 17

8 some amount of cross talk between the subcarriers and this phenomenon is called the Inter Carrier Interference (ICI). This ICI is to be minimized to establish a good communication link. Orthogonality is introduced into the system by ensuring that when one the output corresponding to a particular sub-carrier is at its peak, there is minimum (ideally zero) contribution from the remaining sub-carriers. The sub-carrier spacing is thus determined by the null-null spacing of each transmission. This forces the correlation between the sub-carriers to zero. Thus the OFDM system attempts to reduce the problems of ISI and ICI with very low implementation complexity. This may not be effectively removed in the single carrier systems even after equalization. The implementation of the OFDM however brings into focus the issues of power usage. It is observed that the major power sink in the transmitter/receiver design was the IFFT/FFT and for low power applications, this issue must be dealt with extensively. Fortunately, the IFFT/FFT implementations for an arbitrary N-point implementation vary only in their twiddle factors in most cases. There are many ways of the FFT implementation. The underlying concept of the DFT and two algorithms (Cooley-Tukey and Chirp-z algorithms) to implement the FFT were studied as a part of this thesis work and they were compared in terms of power and latency issues.

PAGE 18

9 CHAPTER 3 ALGORITHM THEORY AND DESCRIPTION The algorithms used for the purpose of this thesis work were the Cooley Tukey and the Chirp-z Transform Algorithms. The main purpose of this study was to innovate a better FFT implementation structure for OFDM applications. The following are descriptions of the Cooley Tukey algorithm and the Chirp-z algorithm. Cooley-Tukey Algorithm Formally a discrete Fourier transform (DFT) is given by equations (3.1) and (3.2) as Analysis Equation: -==101)2(][][NkNNknjekXnxp [ ] 1,0 Nn (3.1) Synthesis Equation: -=-=10)2(][][NnNknjenxkXp [ ] 1,0 Nk (3.2) where, x[n] is the nth sample of an N-element time series and correspondingly, X[k] is the kth harmonic of an N-point discrete Fourier transform of x[n]. The summations in both the synthesis and analysis equations exist only if all the values of x and X are bounded. The DFT assumes periodicity. The multiplying exponential coefficients are of unit magnitude and can effectively be represented as equally spaced points along a unit circle mapped in the z-domain according to their phase. More detailed description of the properties of the DFT equations described above can be found in [2-5]. The implementation of the equations (3.1) and (3.2) in their canonic form would require

PAGE 19

10 N*(N-1) complex additions and N2 complex multiplications. This can result in a high implementation complexity and latency. Cooley and Tukey published their simplifications to this set of equations in 1965 that took form of the algorithm described below [5]. A complexity reduction is achieved by breaking down long DFTs into collections of smaller FFTs. The algorithm is described as follows. Let N be the number of points in the input sequence. Consider representing N as the composite number, 21N N N = (3.3) The time domain index n, and the frequency domain index k are also resolved as 112n N n n + = 1]-N[0, n and 1]-N[0, n 2211 (3.4) k Nk k221 + = 1]-N[0, k and 1]-N[0, k 2211 (3.5) Substituting these in the DFT Equation (3.2) ()() -=++-+=+101122212k 2N1k1n 1N 2n2(]n N n[]k Nk [NnNjexXp (3.6) From Equation (3.6), the exponential is of the form ( ) ( ) ( ) ( ) 1212221112121112221***))((nkNNnkNNnkNNNnkNnNnkNkNWWWWW=++ (3.7) Using the relations, mmNNWW= 12121212==-nkjNNnkNeWp 12,kn $ Equation (3.7) becomes, ( ) ( ) ( ) 12222111112221**))((nkNnkNnkNnNnkNkNWWWW=++ (3.8) Substituting Equation (3.8) in Equation (3.6),

PAGE 20

11 []-=-=+=+1010112221111111222222***)(NnnkNnkNNnnkNWWWnNnxkNkX (3.9) The inner sum is clearly a N2-point DFT for fixed n1 and the outer sum is an N1-point DFT for fixed k2. There is also a gluing factor, known as the “twiddle factor”, 12nkNW which is multiplied by the inner sum of products term for the fixed values of k2 and n1. Tables (3.1) and (3.2) shows the unresolved indices (n, k) in terms of the resolved indices ((n1, n2), (k1, k2)). Table 3.1 Time-domain index n resolved in terms of n1 and n2 n1 n2 0 1 2 N2-1 0 0 N1 2* N1 (N2-1)*N1 1 1 N1+1 2* N1+1 (N2-1)* N1+1 2 2 N1+2 2* N1+2 (N2-1)* N1+2 N1-1 N1-1 2* N1-1 3* N1-1 N-1 Table 3.2 Resolution of the frequency domain index k k1 k2 0 1 2 N2-1 0 0 1 2 N2-1 1 N2 N2+1 N2+2 2*N2-1 2 2*N2 2*N2+1 2*N2+2 3*N2-1 N1-1 (N1-1)*N2 (N1-1)*N2+1 (N1-1)*N2+2 (N-1)

PAGE 21

12 Thus the algorithm describes a means by which sets of N2-point DFTs (for fixed n1) interface sets of N1-point DFTs (for fixed k2). The algorithm is interpreted in the Figure (3.1). Figure 3.1 Cooley-Tukey Algorithm Implementation Complexity Analysis The complexity of the entire N-point DFT implementation can be modeled as the complex multiplication and addition count associated with N1 N2-point FFT units and N2 N1-point FFT units and N twiddle factors [5]. ( ) [ ] ( ) [ ] NNNNNComplexityMultiplier++=212221 )1(21 + + = NNNComplexityMultiplier (3.10) ( ) [ ] ( ) [ ] 1*1*112221 + = NNNNNNmplexityAdditionCo 2121221221NNNNNNNN-+-= )2(12 + = NNNmplexityAdditionCo (3.11) If N can be resolved into a highly composite number: nNNNNN*...***321 = (3.12)

PAGE 22

13 then the multiplier complexity is approximately N*(N1 + N2 + N3 + … + Nn) This is much lesser than the original N2 for a direct implementation shown in Equation (3.1). It is common knowledge that a multiplier unit is more complex than a simple adder. So the complexity of the DFT units is expressed often in terms of the multiplier complexity alone. Radix-2 Algorithm When N is of the form N=2n, it can be factored as N=2 x 2 x 2 x ….x 2. Thus all the individual blocks that are implemented would only be 2-point FFT blocks, which require no multiplications at all. All the butterfly coefficients would then be implemented as a part of the gluing logic that connects the individual blocks. If the first level of factorization is N=2 x N/2, i.e., N1 = 2 and N2 = N/2, then the frequency domain and time domain indices (k, n) can be modified as 122nnn + = )]1(,0[],1,0[221 Nnn (3.13) 212)(1kkkN+= )]1(,0[],1,0[221 Nkk (3.14) Substituting from Equations (3.13) and (3.14) in Equation (3.9), =-=+=+102101222111112222***]2[])([nnknkNnNWWWnnxkkXNN ]*]12[[**)1(*]2[102102222222221222-=-=+-+=NNNNnnknkNknkWnxWWnx ][**)1(][])([2'12'022121kXWkXkkXkNkN-+=+ (3.15) where the terms Xi’ are (N/2)-point DFT units. The first term is a grouping of even indexed terms in the time domain and the second term is a grouping of odd-indexed terms

PAGE 23

14 in the time domain. We can further express the above result to extract the even and odd indexed frequency domain indices as '12'02*][][2XWkXkXkN+= 1022 £ £ Nk (3.16) '12'022*][][2XWkXkXkNN-=+ 1022 £ £ Nk (3.17) A block representation of a basic radix-2 implementation unit is shown in the Figure (3.2). Figure 3.2 Radix-2 repetitive unit This one level of reduction would reduce the implementation complexity to sum of multiplier complexities of two N/2-point DFT units and N/2 multiplications and N additions. If N=2m and we factorize N m times, then the multiplier complexity is given by )(log*)2/(2NNComplexityMultiplier = (3.18) and the addition complexity is given by )(log*2NNmplexityAdditionCo = (3.19) Further if we observe the twiddle factors to be multiplied, we see that there are some factors which are only multiplications with 0,(-1)k,(-j)k. A detailed diagram of a radix-2 8-point implementation is shown in Figure (3.3). The resulting algorithm is called a radix-2 fast Fourier transform (FFT).

PAGE 24

15 Figure 3.3 Implementation of a Radix-2 8-point FFT unit Radix-4 Algorithm The radix-4 FFT algorithm goes a step further in reducing the complexity of a DFT implementation. When N is a power of 4, i.e., N= 4 x 4 x 4 x … x 4, N can be factorized as N= 4 x (N/4). Here N1=4 and N2 = N/4.The time and frequency domain indices (n,k) can therefore be expressed as 124nnn + = )]1(,0[],3,0[421 Nnn (3.20) 214)(kkkN + = )]1(,0[],3,0[421 Nnn (3.21) ()[][] =-=+=+30410122411111242224***4*nnknkNnnkNWWWnnxkkXNN (3.22) ()[][] ()[] ()[] ()[] +++-++-+=+224212242122421224*34**24*1*14**4*232222241nkkNknkkNknkkNknkNNNNNWnxWjWnxWWnxWjWnxkkX (3.23) ( ) [ ] ( ) [ ] ( ) [ ] 2'222'12'0241**1**][*2121kXWkXWjkXkkXkNkkNkN-+-+=+ ( ) [ ] 2'33**21kXWjkNk+ (3.24)

PAGE 25

16 The Xi’ s in Equation 3.20 are all N/4 point FFT units of grouped terms of type (4m+i). Thus the complexity reduces to sum of complexity of 4 N/4 point FFT units and 3N/4 complex multiplications and 3N complex additions. Expressing the right hand side of the equations for the varying k1 values can further reduce this. [ ] [ ] [ ] ( ) [ ] [ ] ( ) 2'332'12'222'02***222kXWkXWkXWkXkXkNkNkN+++= (3.25) ( ) [ ] [ ] [ ] ( ) [ ] [ ] ( ) 2'332'12'222'024***222kXWkXWjkXWkXkXkNkNkNN---=+ (3.26) ( ) [ ] [ ] [ ] ( ) [ ] [ ] ( ) 2'332'12'222'022***222kXWkXWkXWkXkXkNkNkNN+-+=+ (3.27) ( ) [ ] [ ] [ ] ( ) [ ] [ ] ( ) 2'332'12'222'0243***222kXWkXWjkXWkXkXkNkNkNN-+-=+ (3.28) From Equations (3.25), (3.26), (3.27), (3.28), we observe that the number of complex additions reduces to 2N from 3N. So if we go for log2(N) stage implementations( and factorizations), then we see that ( ) ( ) NComplexityMultiplierN283log* = (3.29) and ( ) NNmplexityAdditionCo2log = (3.30) This implementation can also be seen as a repetition of a fundamental unit, which is the radix-4 block shown in Figure (3.4). Figure 3.4 Radix-4 basic block

PAGE 26

17 We observe that the basic repetitive block in the radix-4 algorithm does not have any actual multiplications just like the radix-2 block. The resulting algorithm is called a radix-4 FFT. Chirp-z Algorithm One interesting feature of the Cooley-Tukey algorithm implementation is that the frequency resolution is always related to the number of points at the input of the FFT unit. The only way to increase the spectral resolution is to increase the number of data points at the input of the DFT unit. Also, this algorithm only maps onto equally spaced locations on the unit circular contour in the z-domain. As a result, the FFT can only provide constant bandwidth analysis in the context of N equal frequency bands of constant gain. An alternative way of implementation is to map onto a contour close to the poles in the z-domain so that the spectral resolution is improved. The Chirp-z transform algorithm avoids these problems by giving the freedom to choose a range of frequencies to be analyzed independent of the sampling rate and define the frequency response resolution to be determined by the chosen contour in the z-domain. If the contour is chosen to be the unit circle, the Chirp-z transform and Cooley-Tukey FFT algorithm produce the same result. The downside to the Chirp-z implementation is its higher implementation complexity and slower performance in comparison to the Cooley-Tukey FFT algorithm. The algorithm modifies the z-domain mapping to represent the FFT of the signal. If x(n) is a N-point sequence, the z-transform is defined as []nkNnkznxzX--==*)(10 [ ] 1,0 Lk (3.31)

PAGE 27

18 Here L represents the number of frequency domain outputs; clearly this is independent of the sampling rate. The zdomain contour is chosen starting at a point closer to the poles of the system, which is to be resolved and also it is a continuous track which could also be a unit circle as in the case of the DFT. Though we have only an N-point sequence x[n] in the time domain, we can have an L-point sequence X[z] in the frequency domain. If 000qjerz= is the origin of the contour, then the contour spirals either inwards or outwards based on the value of R according to the equation, ( ) kjkeRzzf*0= [ ] 1,0 Lk This results in ( ) kjjkeRerz00*0fq= [ ] 1,0 Lk (3.32) Here (r0 ,? 0) represent the origin of the contour spiral, f0 represents how the successive stages follow on the contour and R determines the convergence or divergence of the contour. If R<1 the contour spirals inwards towards the origin and if R>1 the contour spirals away from the origin. If R=1, then the contour is a circle of radius r0. Substituting Equation (3.32) in Equation (3.31), Equation 3.33 [][]() ( ) -=-=10000**NnnkjjkeRernxzXfq []()()-=--=10000***NnnkjnjeRernxfq (3.33) If we define0*fjeRV = [][]-=--=100*)(*0NnnknjkVernxzXq (3.34) To simplify this further, we use the relation

PAGE 28

19 ( ) [ ] 22221nknknk--+= (3.35) in Equation (3.34). [][]() -=----=102022)(2220****NnnnjknkkVVVernxzXq [] []() [ ] 22)(22022**100nknkVVernxVzXNnnjk----=-=q (3.36) Defining the grouped term to represent a new sequence g (n) [ ] ( ) 220*)(0nVernxngnj--=q [ ] 1,0 Nn (3.37) In the case of a circular mapping (R=1), Nkjerzkp2*-= (3.38) the zk are equally spaced points along a circle of radius r. Then -=--=102**)(][NnnkNknjernxzXp -=--=102*]*)([NnnNknjernxp Here the modified sequence is y(n)=x(n)*r-n and it is sufficient to calculate the DFT of the modified sequence. Returning to the more general case, consider a sequence h(n) defined as 22)(nVnh= (3.39) Substituting from Equation (3.37) and Equation (3.39) in Equation (3.36), -=-=-10)()()(22NnknkhngVzXk (3.40) Defining the convolution sum as another sequence y(k), we have

PAGE 29

20 -=-=10)()()(Nnnkhngky (3.41) Thus Equation (3.40) becomes )()()(khkykzX= (3.42) Here the sequence y(n) is a convolution between a sequence g(n) of length N and second sequence h(n) of infinite length. Taking a M-point segment of this infinite length sequence for practical purposes, y(n) would be a sequence of length L given by 1 + = NML (3.43) The convolution filter h(k) is usually implemented using charge coupled devices (CCD) or surface acoustic wave (SAW) devices. Since we try to obtain a frequency resolution of L, the length of the convolution filter h(n) is considered to be 1 + = NLM)1( = NLM (3.44) which implies that )1()1( £ £ LnN. (3.45) The computational complexity of the Chirp-z algorithm is thus dependent on M requiring M*log2M complex multiplications. Compared to N*L, the complexity comparison can be argued as follows. When L is small, direct computation is more efficient but when L is large, the Chirp-z transform is better. To calculate a DFT, set the contour parameters as: r0=R=1, ? 0=0, F 0=2p/N and L=N. So, h(n) from Equation (3.39) simplifies into )()()()sin()cos()(22njhnhnhjnhirNnNn+=+=pp (3.46)

PAGE 30

21 and )sin()cos()(2222NnNnjVnhnpp-==-(3.47) These coefficients are implemented in a ROM for the pre-multiplications and post-multiplications. The algorithm is implemented as shown in the following figure (Figure 3.5). Figure 3.5 Chirp -z implementation The sequence h(n) has n2 complex exponential values that can be thought of as a continuously increasing frequency term as ?n = n 2F 0/2 = (nF 0/2)n. This signal, shown in Figure 3.6, has an increasing frequency and sounds like the chirp of a bird. Figure 3.6 Chirp Signal There are some interesting properties of this chirp signal which enhance the applications of this Chirp-z algorithm for the computation of the DFT. The phase of the

PAGE 31

22 signal in Figure 3.6 is parabolic as shown is shown in Figure 3.7. The figure shows a linear region as well as the curvature in the in the phase. So the phase can be expressed in the form Phase(n ) = a*n+*n 2. Here a determines the linear region and determines the curvature in the Figure 3.7. Figure 3.7 Phase response of the Chirp Signal shown in Figure 3.6 An interesting point is that a chirp signal can be completely recovered from an impulse signal after passing through a system with unit magnitude and phase shown in the Figure 3.7 and vice versa. The reverse system would require a system with unit magnitude response and increasing phase (opposite to the original system). This property of the chirp signal encourages its use in radar systems, which require short pulses with higher energy. Relation between the tolerances of the chirp system and the DFT algorithm is obtained by seeing the equation describing the incremental evolution factor F. Defining 2f and 1f to be the maximum and minimum operating frequencies, ( ) NffFs)(2012)(-=pf (3.48) The frequency resolution in a conventional DFT system is given by ( ) NDFTp f 2 = (3.48) From Equations 3.48 and 3.49, we see that ( ) sFffDFT120-=ff (3.48)

PAGE 32

23 This implies that the Chirp-z algorithm has a lesser frequency tolerance for a given N. This also indicates that the number of points required to achieve a particular spectral resolution is always smaller when using the Chirp-z algorithm.

PAGE 33

24 CHAPTER 4 FIELD PROGRAMMABLE GATE ARRAYS The field programmable gate arrays (FPGAs) are a class of programmable devices which house large circuits with gate count exceeding 20,000 gates, a count that is too large to be fit onto a CPLD. CPLDs have blocks of AND gates interfacing blocks of OR gates. Unlike the CPLDs, the FPGAs have logic blocks interconnected with sets of programmable switches. The structure of a general FPGA is shown in the figure(4.1). Figure 4.1 General structure of an FPGA There are three types of blocks in the figure, viz., I/O blocks, logic blocks and interconnection switches. The logic blocks, all identical and usually standardized for each family of devices, are arranged in a neat arrangement of a matrix. The I/O blocks usually interface the internal circuitry to the external pins. The interconnecting switches connect the I/O blocks and the logic blocks. These switches, shown in figure (4.2), are

PAGE 34

25 programmable and form a connection between a horizontal and vertical line based on the value of the SRAM cell (‘0’ for no connection, Vv ?Vh, and ‘1’ for a formed connection, Vv=Vh). Figure 4.2 Programmable Interconnection Switch Each logic segment of a user program must be small enough to fit into a logic block. Each logic block is usually an implementation of either look-up-tables (LUT), multiplexers or general gates. An LUT implementation of a three input function f=X1X2+ X1X3+ X2X3 is shown in figure (4.3) and the truth table is implemented in table (4.1). Figure 4.3 A 3-input LUT implementation

PAGE 35

26 Table 4.1 Truth table of the function implemented in Figure (4.3) X1 X2 X3 f 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 In Figure (4.3), each multiplexer is controlled by a single input based on what the multiplexer decides to pass a particular input to the output. Usually the number of inputs to the LUT is about five, which then would require 32 input blocks. Since the FPGAs are volatile, they must be reprogrammed every time they are powered up. An alternative solution is to have a RAM/ROM memory block that automatically supplied these requisite values at power on. Each of the memory cells holds a value of either a ‘1’ or a ‘0’. The values that are fed into the SRAM cells are calculated using a simple protocol. From the truth table, if the entries are placed in ascending order, and if the first level of multiplexers is controlled by the least significant bit (LSB), and the last level of multiplexers is controlled by the most significant bit (MSB), the SRAM values would be in the same order as the output values of the truth table. When a circuit is implemented in an FPGA, the logic blocks are programmed to realize the necessary functions and the programmable switches are also programmed to make the suitable interconnections.

PAGE 36

27 Power Calculations in FPGAs Power dissipation in digital circuits is often the limiting factor in the utility of a particular circuit in an application. For the purpose of this thesis, Altera FLEX10KE FPGAs were used and their power dissipation is given by the following equation [8]. ()CCIOIOMAXOAVEdnDCnCCINTCCINTVtogfVCOUTPVIPower******5.0*1++== (4.1) where ICCINT= no-load current in the device VCCINT= no-load voltage VCC d= number of DC outputs PDCn=DC output of output n fMAX= Maximum frequency of operation togIO=average number of I/O pins toggling at each block VCCIO=DC power supply value. OUT=Number of output and bi-directional pins CAVE=Average capacitance of the FPGA device Vo= Voltage level of the high output state Costs Involved in FPGA Fabrication The actual cost of FPGA fabrication is the engineering costs, and tool (software/hardware) price. The engineering cost for an FPGA fabrication is much less than that of the ASIC counterpart. But the actual comparison of the FPGA costs is evident when it is compared to the manufacturing cost of ASIC devices. The ASIC devices on the other hand have high NRE costs and longer times to market the product. This actually is the major advantage for the FPGA. The break even number of the FPGA design can be found as follows. FPGA cost = Engineering costs & tools + total sales for the all items sold On the other hand, ASIC cost = NRE + Engineering cost & tools + total sales for all items sold + Re-spin cost + Inventory costs + Accounting for future price reductions.

PAGE 37

28 When the additional costs of the ASICs are considered, the FPGAs are a much better choice for even a moderate amount of sales. Comparison to other Technologies Programmability is one very good advantage of the FPGAs that is absent in the ASIC implementations of digital circuits. ASICs cannot be changed at will unlike the FPGAs. The design and testing cycles in an FPGA are much shorter than an ASIC and hence can be marketed much faster. However, for high volume productions, ASIC implementations are much cheaper. Since optimizations can be done up to the gate level in ASIC implementations, they are more power efficient.

PAGE 38

29 CHAPTER 5 IMPLEMENTATION DETAILS AND RESULTS This chapter describes inferences drawn from the results and the work done for obtaining the results, and also a description of the tools used. Description of the Work The thesis work is based on modules built for the purposes of this thesis rather than the standard modules provided along with Altera tools. This was done to obtain an in-depth understanding of the pipelining and FPGA concepts in general. The Cooley-Tukey and Chirp-Z algorithms were implemented using a fixed-point 2’s complement integer arithmetic in VHDL and Verilog. The Cooley-Tukey FFTs have been fit into the Altera FLEX10KE family of devices but the Chirp-z FFTs were too cumbersome (24 multipliers of the type used in this work for a 4-point implementation) to fit onto the FPGAs. Matlab models were built for those designs which could be fit onto Altera FPGAs. These modules were used for observing the performance of the FFTs in varying noise environments. The fixed-point implementation allows power efficient high-speed operations at low cost. This is very much suitable for the mobile/portable applications [10]. This implementation on the other hand loses precision thus decreasing the dynamic range and increasing the round off noise. It is important to note that the complexity of design increases exponentially as the internal bus width is increased. From the packaging point of view, the pin count is also a major issue. If the complex input were of 16-bit width, then a parallel input of an N-point FFT (N>4) would be a virtual impossibility. An alternative work around for this problem is to provide only 2 16-

PAGE 39

30 bit inputs that would take in real and complex data inputs simultaneously at clocked intervals. The main drawback in this implementation is that the FFT would assume a serial-type form and hence the actual operation of the FFTs is done at N times lesser speed. The effect of this bus width is visible at the output also. Noise is introduced in the system due to insufficient representation of all the numbers in the system. This type of noise is called round-off noise and it propagates in the system along successive stages. Initially the system was attempted to be with only an internal bus width (all the twiddle factors, outputs of all stages) of 16-bit width only. This requires rounding off the output of a 16-by-16-bit multiplier from 32 bits to 16-bit width (thereby losing 16 bits of precision), and a 16-bit adder (losing bit precision loss at each unit). Tables 5.1 and 5.2 reveal the implementation details of the case where the round-off errors have not been eliminated. The deficiencies in the preliminary design were removed after employing the following techniques. 1. Each multiplier output is not truncated till it reaches one more level (Figure (5.1)). 2. The conventional multiplier was replaced with a pipelined multiplier (Figure 5.2), which greatly reduced the speed-bottleneck and increased the maximum operating frequency of the system. 3. All 16-bit adder/subtraction units were replaced with 32-bit units. 4. A serial-to –parallel converter is placed at the input unit and a parallel-to-serial converter is placed after the output unit to reduce the pin counts.

PAGE 40

31 Figure 5.1 Implementation of Multipliers (a) shows the initial truncating configuration and Figure (b) shows the truncation operation after one more level of processing Figure 5.2 N-by-N-bit Pipelined Multiplier In general, the internal bus width was standardized to 32-bit width. These adjustments greatly increased the complexity in design. The pipelined multiplier required 3847 logic blocks while the conventional Baugh-Wooley multiplier required only 2160 logic blocks for its implementation. Due to the pipelined multiplier implementation, there was an observed increase in the maximum possible operating frequency from about 7 MHz to about 58MHz and 30 MHz for the Radix-2 and Radix-4 Cooley-Tukey 8-point FFT implementations respectively. The pipelined multiplier was a 7-stge case involving the usage of smaller 4-by-4 point multiplications. Tables 5.3 and 5.4 quantify the implementation issues of the case where round off errors were eliminated to a great extent.

PAGE 41

32 Description of Tools Used The Altera MAX+PLUS II software was used to synthesize the VHDL/Verilog code. The software contained tools to compile, simulate and edit the floor plan of the design. The compiler was equally optimized for area and speed. The designs were fit into the FLEX10KE device family. The FLEX10KE family of devices has the following features: High gate density implementation Accommodates designs of about 200,000 typical gates 4096 SRAM bits per Embedded Array Block (EAB). Multi-volt I/O pins (2.5V, 3.3 V or 5.0 V devices) Built-in Joint Test Action Group (JTAG) Boundary Scan Test (BST) circuitry available without consuming additional device logic. Built in low-skew clock distribution trees. Flexible fast track interconnects Powerful I/O pins Each FPGA has an embedded array (EAB) and a logic array (LAB) which are useful for efficient implementations. The embedded array is used in implementing a variety of memory functions, complex logic functions, microcontroller applications and data transform. The logic array is used to implement a multitude of general logic functions. Each LAB has 8 logic elements (LEs) and a local interconnect and each LE has a four-input look-up-table (LUT), a programmable flip-flop and a dedicated signal path for carry and cascaded functions. The FPGA device has the ability of be configured either serially or in parallel synchronously/asynchronously. The average capacitance of the device remains unchanged for any operating frequency.

PAGE 42

33 Figure 5.3 Model used in the Thesis work Figure (5.3) shows the model used in the thesis work. The IFFT is only a special case of the FFT implementation. Here smaller Point IFFT units replace the smaller Point FFTs and the twiddle factors were replaced by their complex conjugates. The FFTs were fit into the FLEX10KE family of devices. Up to 4-point FFTs could be accommodated using a single device for both the radix-2 and radix-4 cases but their 8-point implementations required multiple devices. This reduced the percentage utilization of the logic blocks to a great extent. Matlab models were built for the FFTs for the Cooley-Tukey and Chirp-z algorithms. The model was used for performance evaluations under varying noise conditions. The results for the implementation are in the following results and inferences section. Results and Conclusions Table 5.1 Radix–2 Cooley-Tukey implementation with round off errors. FFT FFT IFFT IFFT NMultiliersLC CountDelay(ns)VariancePrecision*VariancePrecision*2015029.71.01E-09152.46E-10164060943.22.57E-09151.66E-1016834151154.42.94E-07105.13E-0913 Table 5.2 Radix-4 Cooley Tukey implementation with round off errors. FFTFFTIFFTIFFTNMultiliersLC countDelay(ns)VariancePrecision*VariancePrecision*4072340.92.13E-09141.40E-1016834114145.63.02E-07105.53E-0913

PAGE 43

34 Table 5.3 Radix-2 Cooley Tukey implementation without round off errors. FFTFFTIFFTIFFTNMultiliersLC countDelay(ns)Variance Precision** Variance Precision** 2024449.96.20E-10151.46E-10164079567.12.53E-09141.53E-1016835744169.11.68E-07112.53E-0914 Table 5.4 Radix-4 Cooley-Tukey implementation without round off errors. FFTFFTIFFTIFFTNMultiliersLC countDelay(ns)VariancePrecision*VariancePrecision*40138181.11.01E-19317.09E-2133835699171.11.72E-07112.62E-0914 The results in Tables 1-4 can be argued as follows. The IFFT equation (Equation 3.1) has a factor (1/N) in it. Since this can be achieved by a simple shifting operation, the IFFT has more precision than the FFT in all the Cooley-Tukey implementations. The number of logic blocks required for the implementation of the no round-off errors case is considerably more than when the round-off errors were present. The increase in the amount of hardware has a direct impact on the propagation delays. But the usage of a multi-stage multiplier allows pipelined implementation and this increases the throughput of the entire system. The Baugh-Wooley complex multiplier has about 2160 logic blocks as compared to the 3847 logic blocks of the complex pipelined multiplier. As the length of the FFT increases, the number of the multipliers is also on the rise and that increases the implementation complexity of the system. Attempts to fit a 16-point FFT unit in both the cases (with and without the round-off errors) proved futile. Usage of library-parameterized modules (LPMs) is a possible solution to this problem. But since the purpose of this thesis was to also judge the performance of the systems in terms of latency, the usage of a pipelined multiplier became a necessity. It is common knowledge

PAGE 44

35 that greater the number of stages in a design implementation, lesser is the amount of precision preserved. The Radix-4 FFT implementation has fewer stages of implementation compared to the Radix-2 implementation and hence greater is its precision. The theoretical precision of a fixed-point implementation is given by ( ) Varianceecision2logPr= (5.1) and is calculated in bits. The tables reveal that the observed precision is very close in almost all cases to the theoretical case. Power Calculations The power dissipation in an FPGA is the aggregate sum of all the internal power dissipation and the power dissipation due to the I/O. It is given by the formula IOINTPPatePowerEstim + = { } { } (){}() () ()++++= + + = ==lnRVhnRVCCIOIOMAXOAverageCCINTCCactivedbyCCsDCOUTACOUTCCINTCCINTiolCCiohCCOUTlOUThVtogfVCOUTVIIPPVI11tan22**********5.0** (5.2) Here h =percentage of high dc outputs l = percentage of low dc outputs OUT= number of total output and bi-directional pins in an FPGA device fMAX = Maximum possible operating frequency VCC = Supply voltage (could be 2.5, 3.3 or 5.0 volts). Rioh=Pull-down resistance + Resistance calculated from the slope of the IOH characteristics in the device datasheet.

PAGE 45

36 Riol=Pull-up resistance + Resistance calculated from the slope of the IOL characteristics in the device datasheet. togIO= percentage of switching expected in the outputs (usually assumed as 12.5%) Vo=the output high voltage at a particular value of VCC (3.8,3.3, 2.5V for 5.0, 3.3, 2.5 V VCCIO respectively) CAverage= Average capacitance of the family of devices specified in the data sheet. Usually the higher point FFTs extend over more than one device and an accurate power calculation would take into account power for each of these devices. The power calculations for the biggest design implemented, i.e., the Radix-2 8-point FFT is arrived at as follows. The implementation distributes itself onto three different devices, EPF10K130EFC672-1(a), EPF10K200EBC600-1(b) and EPF10K100EF484-1(c). The power calculation must take into consideration the requirements of the three chips separately. Table 5.5 Power calculations for Radix-2 8-point FFT. O/P pins Logic blocks Vcc/Vccio K ICCActive(mA) Iccsup (mA) PINT (mW) Pdc (mW) Pac mW a 144 1765 2.5V 4.6 0.0593 150 167.099 32.89 768 b 181 1966 2.5V 4.8 0.0689 250 191.178 41.34 966 c 112 2013 2.5V 4.5 0.0662 125 184.265 25.58 598 fmax= 58.47 Mhz, VCCINT = 2.5V VCCio = 2.5V ICCstandby = 7.5 mA Rioh=1400 ohms Riol=1007 ohms

PAGE 46

37 Assuming 50% high o/p and 50 % low op and 1k pull-up and pull-down resistors, we have the total power as 2975.253 mW for a radix-2 8-point implementation with a 2.5V VCC. Noise Tolerance The performance of each of the designs was tested under varying noise conditions. The radix-2 and radix-4 perform identical to each other for an equal bus width and number of points (Figures 5.4 and 5.5). Increasing the bus width brought minor improvement into the system but when compared to the floating point implementation (Figure 5.6), the improvement is insignificant. So increasing the bus width to more than 16-bit width in these implementations would not be of much value. The changing of the radix from noise performance is much better in the case of 32-bit bus width. Figure 5.4 BER variations against SNR for an internal bus width of 16.

PAGE 47

38 Figure 5.5 BER variations against SNR for an internal bus width of 32. Figure 5.6 BER variations against SNR: Comparison of floating point results with modeled Radix –2 and Radix –4 8 point FFTs with 16and 32-bit internal bus width.

PAGE 48

39 The Cooley Tukey algorithm was more feasible for implementation into the FPGAs when compared to the Chirp-z implementation due to the excessive amount of multipliers involved in the lower point FFTs. The Chirp-z algorithm usually has the FIR convolution blocks implemented onto CCDs or SAW devices which reduces the number of multipliers to a minimal number. The implementation complexity of the Chirp-z algorithm is 6N complex multiplications compared to the N*log2N for the radix-2 Cooley –Tukey implementation which would mean the break-even point for the algorithms to be sixty-four. So any implementation greater than 64-point FFT would be better off using the Chirp-z algorithm. Since the standard library parameterized modules were not used in the design, even 4-point FFT could not be configured using the Chirp-z algorithm. For the smaller point FFTs, if the data could be clocked out after the ROM coefficient pre-multiplications to the external device (CCDs or SAW devices) for the circular convolution, the implementation complexity in that case would have been only 2N complex multiplications on the FPGA. This would involve converting the digital data into analog form at the output of the FPGA (input of the CCD) and a re-conversion into digital form after a certain time for clocking the input into the FPGAs. This analog implementation of the circular convolution also preserves the precision to a great extent. Though this type of implementation is theoretically feasible and also efficient in terms of precision, its practical implementation would be avoided since it involves the intermediate digital to analog conversion and also the analog to digital conversion. Most applications use longer point FFTs and so the Chirp-z algorithm is a much better choice in those cases. The Cooley Tukey algorithm implementation is also highly desirable for the reduction of complexity associated with it.

PAGE 49

40 Directions of Future Work This thesis used fast multipliers that were designed for being able to provide a comparison between the general multipliers and the multi-stage fast multipliers. This restricted the maximum possible FFT length to eight. The simulation software provides multiplier modules that are highly optimized for each family of the FPGA devices. Usage of such modules may reduce the power consumption and latency. Since these multipliers are usually much smaller, a higher point FFT implementation using greater number of such multipliers is possible and further research can be done with these multiplier modules. The Chirp-z algorithm provides for more efficient implementations with much lesser hardware. More work can be done using such an implementation. Also the amount of precision loss in such a design would be independent of the length of the FFT. The increased spectral resolution provided by this algorithm with much lesser hardware would be an incentive to pursue the study of this algorithm implementations on FPGAs.

PAGE 50

41 APPENDIX A 16-BIT COOLEY-TUKEY IMPLEMENTATION %%%%%%%%% bwcell1.m %%%%%%%% function [s]=bwcell1(a,b) % Cell 1 of the Baugh Wooley Multiplier s=a & b; return %%%%%%%%% END %%%%%%%% %%%%%%%%% bwcell2.m %%%%%%%% function [S,cout]=bwcell2(a,b,Sin,Cin) % Cell 2 of the Baugh Wooley Multiplier d=a & b; [S,cout]=fulladder(d,Sin,Cin); return %%%%%%%%% END %%%%%%%% %%%%%%%%% bwcell3.m %%%%%%%% function [s]=bwcell3(a,b) % Cell 3 of the Baugh Wooley Multiplier s=a & ~b; return %%%%%%%%% END %%%%%%%% %%%%%%%%% bwcell4.m %%%%%%%% function [s,cout]=bwcell4(a,b,Sin,Cin) % Cell 4 of the Baugh Wooley Multiplier c=~a & b; [s,cout]=fulladder(c,Sin,Cin); return %%%%%%%%% END %%%%%%%% %%%%%%%%% bwcell5.m %%%%%%%% function [s,cout]=bwcell5(x,y) % Cell 5 of the Baugh Wooley Multiplier b=x & y; c=~x; d=~y; [s,cout]=fulladder(b,c,d); return %%%%%%%%% END %%%%%%%% %%%%%%%%% bwm16a.m %%%%%%%% function [p]=bwm16a(x,y) % This function multiplies two 16-bit signed numbers and gives a 16-bit signed number % as output.

PAGE 51

42 % The logic implemented is the Baugh Wooley Multiplier t=zeros(16); c=zeros(16); % First row for i=16:-1:2 t(1,i) = bwcell1( x(i),y(16) ); end t(1,1)=bwcell3(x(1),y(16)); % row 2 for i=16:-1:2 [t(2,i),c(2,i)]=bwcell2(x(i),y(15),t(1,i-1),0); end t(2,1)=bwcell3(x(1),y(15)); % Row three to row 15 for j=3:15 for i=16:-1:2 [t(j,i),c(j,i)]=bwcell2(x(i),y(17-j),t(j-1,i-1),c(j-1,i)); end t(j,1)=bwcell3(x(1),y(17-j)); end % row 16 for i=16:-1:2 [t(16,i),c(16,i)]=bwcell4(x(i),y(1),t(15,i-1),c(15,i)); end [t(16,1),c(16,1)]=bwcell5(x(1),y(1)); % Last Row [temp,cout(16)]=fulladder(x(1),y(1),t(16,16)); for i=16:-1:2 [p(i),cout(i-1)]=fulladder(c(16,i),t(16,i-1),cout(i)); end [p(1),nc]=fulladder(1,c(16,1),cout(1)); clear c clear t clear cout return %%%%%%%%% END %%%%%%%% %%%%%%%%% com.m %%%%%%%% function [b]=com(a) % This function calculates the twos complement of a number tmp1=~a; tmp2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]; [b,tmp]=ad16c(tmp1,tmp2); clear tmp1 clear tmp2 clear a

PAGE 52

43 return %%%%%%%%% END %%%%%%%% %%%%%%%%% cplxmul16.m %%%%%%%% function [pr,pi]=cplxmul16(xr,xi,yr,yi) % This function multiplies two 16-bit signed complex inputs and gives a 16-bit signed % complex output. t0=bwm16a(xr,yr); t1=bwm16a(xr,yi); t2=bwm16a(xi,yr); t3=bwm16a(xi,yi); pr=sub16(t0,t3); pi=add16(t2,t1); return %%%%%%%%% END %%%%%%%% %%%%%%%%% fulladder.m %%%%%%%% function [s,cout]=fulladder(a,b,c) tmp1=(a & ~b) | (~a & b); s=(tmp1 & ~c) | (~tmp1 & c); tmp2= a & b; tmp3= b & c; tmp4= c & a; cout= (tmp2 | tmp3) | tmp4; return %%%%%%%%% END %%%%%%%% %%%%%%%%% getformat.m %%%%%%%% function [format]=getformat(a) % function gets the input vector a into a format that allows only real transmissions % to be possible in the time domain data. lena=length(a); format=[real(a(1)); a(2:lena); imag(a(1)); conj(a(2:lena))]; return %%%%%%%%% END %%%%%%%% %%%%%%%%% hexa2bin.m %%%%%%%% % Function returns the string(NOT ARRAY) containing the binary equivalent of the input hexadecimal number % for example % if b='f74b' % then hexa2bin(b) would be equal to '1111011101001011' % and if b='074b' % then hexa2bin(b) would be equal to '11101001011' % NOTE THAT THE INITIAL ZEROS ARE NOT PRESENT IN THE RESULT function [s]=hexa2bin(a) s=dec2bin(hex2dec(a));

PAGE 53

44 return %%%%%%%%% END %%%%%%%% %%%%%%%%% hex2decim.m %%%%%%%% intebits=6; fracbits=16-intebits; hexad=1010111100000000; a=[0]; for i=1:16 digi=mod(hexad,10); hexad=floor(hexad/10); a=[digi a]; end a=a(1:16); decim=0; factor=2^(-fracbits); for i=16:-1:2 decim=decim+a(i)*factor; factor=factor*2; end decim if a(1)==1 decim=decim-2^(intebits-1); end decim %%%%%%%%% END %%%%%%%% %%%%%%%%% iterative.m %%%%%%%% % This program does the performance analysis of the program for FFT in terms % of the errors and its types, execution times, etc, over a number of iterations. E_mean=[]; E_var=[]; mag_err=[]; sign_err=[]; magsign_err=[]; E_no=[]; ratio_vector=[]; N=8; % index of the fft to be analysed for i=1:1000 % i is the iteration index % i % Initialization of variables for each iteration Error_mean=0; Error_variance=0; Avg_no_of_magnitude_errors=0; Avg_no_of_sign_errors=0; Avg_no_of_mag_and_sign_errors=0; % preparation of inputs inr=rand(1,N); ini=rand(1,N);

PAGE 54

45 cplx=inr+j*ini; cplxfft=fft(cplx); test=ramasfft8(cplx); right=0; signerr=0; magerr=0; magsignerr=0; Error_mean1=sqrt(mean((test-reshape(cplxfft,[N,1])).^2)); E_mean=[E_mean Error_mean1]; end Error_mean=mean(abs(E_mean)) Error_variance=var(abs(E_mean)) %%%%%%%%% END %%%%%%%%% rad2ct16.m %%%%%%%% function [or0,oi0,or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi9,or10,oi10,or11,oi11,or12,oi12,or13,oi13,or14,oi14,or15,oi15]=rad2ct16(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7,ir8,ii8,ir9,ii9,ir10,ii10,ir11,ii11,ir12,ii12,ir13,ii13,ir14,ii14,ir15,ii15) % this function calculates the 8-point radix-2 cooley tukey fft transform % % GOVERNING EQUATIONS AND ANALYSIS % % N = 16 ; => (N1 = 2) & (N2 = 8) % k = k1 + 2 k2; % n = 8 n1 + n2; % k1,n1 => (0,1) % k2,n2 => (0,1,2,...7) % % RESULTS : % EXECUTION TIME : % FRACTIONAL PRECISION : % ERROR OVER A RUN OF 100 TIMES : % AVERAGE NUMBER OF ERRORS : % ERROR TYPES---> % MAGNITUDE ERRORS ONLY : % SIGN ERRORS ONLY : % BOTH MAGNITUDE AND SIGN : [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3,tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct8( ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6,ir8,ii8,ir10,ii10,ir12,ii12,ir14,ii14); [tr8,ti8,tr9,ti9,tr10,ti10,tr11,ti11,tr12,ti12,tr13,ti13,tr14,ti14,tr15,ti15]=rad2ct8(ir1,ii1,ir3,ii3,ir5,ii5,ir7,ii7,ir9,ii9,ir11,ii11,ir13,ii13,ir15,ii15);

PAGE 55

46 tra0=shrega(tr0); tia0=shrega(ti0); tra1=shrega(tr1); tia1=shrega(ti1); tra2=shrega(tr2); tia2=shrega(ti2); tra3=shrega(tr3); tia3=shrega(ti3); tra4=shrega(tr4); tia4=shrega(ti4); tra5=shrega(tr5); tia5=shrega(ti5); tra6=shrega(tr6); tia6=shrega(ti6); tra7=shrega(tr7); tia7=shrega(ti7); tra8=shrega(tr8); tia8=shrega(ti8); w16_r1=[0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0]; w16_i1=[1 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1]; w16_r2=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; w16_i2=[1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0]; w16_r3=[0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 0]; w16_i3=[1 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0]; w16_r4=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w16_i4=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1]; [tra9,tia9]=cplxmul16(tr9,ti9,w16_r1,w16_i1); [tra10,tia10]=cplxmul16(tr10,ti10,w16_r2,w16_i2); [tra11,tia11]=cplxmul16(tr11,ti11,w16_r3,w16_i3); [tra12,tia12]=cplxmul16(tr12,ti12,w16_r4,w16_i4); [tra13,tia13]=cplxmul16(tr13,ti13,w16_i1,w16_i3); [tra14,tia14]=cplxmul16(tr14,ti14,w16_i2,w16_i2); [tra15,tia15]=cplxmul16(tr15,ti15,w16_i3,w16_i1); [or0,oi0,or8,oi8]=rad2ct2(tra0,tia0,tra8,tia8); [or1,oi1,or9,oi9]=rad2ct2(tra1,tia1,tra9,tia9); [or2,oi2,or10,oi10]=rad2ct2(tra2,tia2,tra10,tia10); [or3,oi3,or11,oi11]=rad2ct2(tra3,tia3,tra11,tia11); [or4,oi4,or12,oi12]=rad2ct2(tra4,tia4,tra12,tia12); [or5,oi5,or13,oi13]=rad2ct2(tra5,tia5,tra13,tia13); [or6,oi6,or14,oi14]=rad2ct2(tra6,tia6,tra14,tia14); [or7,oi7,or15,oi15]=rad2ct2(tra7,tia7,tra15,tia15); return %%%%%%%%% END %%%%%%%% %%%%%%%%% rad2ct16a.m %%%%%%%% function [or0,oi0,or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi9,or10,oi10,or11,oi11,or12,oi12,or13,oi13,or14,oi14,or15,oi15]=rad2ct16a(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7,ir8,ii8,ir9,ii9,ir10,ii10,ir11,ii11,ir12,ii12,ir13,ii13,ir14,ii14,ir15,ii15) % this function calculates the 8-point radix-2 cooley tukey fft transform

PAGE 56

47 % % GOVERNING EQUATIONS AND ANALYSIS % % N = 16 ; => (N1 = 4) & (N2 = 4) % k = k1 + 2 k2; % n = 8 n1 + n2; % k1,n1 => (0,1,2,3) % k2,n2 => (0,1,2,3) % RESULTS : % EXECUTION TIME : % FRACTIONAL PRECISION : % ERROR OVER A RUN OF 100 TIMES : % AVERAGE NUMBER OF ERRORS : % ERROR TYPES---> % MAGNITUDE ERRORS ONLY : % SIGN ERRORS ONLY : % BOTH MAGNITUDE AND SIGN : [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3]=rad2ct4( ir0,ii0,ir4,ii4,ir8,ii8,ir12,ii12); [tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4(ir1,ii1,ir5,ii5,ir9,ii9,ir13,ii13); [tr8,ti8,tr9,ti9,tr10,ti10,tr11,ti11]=rad2ct4( ir2,ii2,ir6,ii6,ir10,ii10,ir14,ii14); [tr12,ti12,tr13,ti13,tr14,ti14,tr15,ti15]=rad2ct4(ir3,ii3,ir7,ii7,ir11,ii11,ir15,ii15); tra0=shrega(tr0); tia0=shrega(ti0); tra1=shrega(tr1); tia1=shrega(ti1); tra2=shrega(tr2); tia2=shrega(ti2); tra3=shrega(tr3); tia3=shrega(ti3); tra4=shrega(tr4); tia4=shrega(ti4); tra8=shrega(tr8); tia8=shrega(ti8); tra12=shrega(tr12); tia12=shrega(ti12); w16_r1=[0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0]; w16_i1=[1 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1]; w16_r2=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; w16_i2=[1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0]; w16_r3=[0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 0]; w16_i3=[1 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0]; w16_r4=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w16_i4=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1]; [tra5,tia5]=cplxmul16(tr5,ti5,w16_r1,w16_i1); [tra6,tia6]=cplxmul16(tr6,ti6,w16_r2,w16_i2); [tra7,tia7]=cplxmul16(tr7,ti7,w16_r3,w16_i3); [tra9,tia9]=cplxmul16(tr9,ti9,w16_r2,w16_i2); [tra10,tia10]=cplxmul16(tr10,ti10,w16_r4,w16_i4);

PAGE 57

48 [tra11,tia11]=cplxmul16(tr11,ti11,w16_i2,w16_i2); [tra13,tia13]=cplxmul16(tr13,ti13,w16_r3,w16_i3); [tra14,tia14]=cplxmul16(tr14,ti14,w16_i2,w16_i2); [tra15,tia15]=cplxmul16(tr15,ti15,w16_i3,w16_r3); [or0,oi0,or4,oi4,or8,oi8,or12,oi12]=rad2ct4(tra0,tia0,tra4,tia4,tra8,tia8,tra12,tia12); [or1,oi1,or5,oi5,or9,oi9,or13,oi13]=rad2ct4(tra1,tia1,tra5,tia5,tra9,tia9,tra13,tia13); [or2,oi2,or6,oi6,or10,oi10,or14,oi14]=rad2ct4(tra2,tia2,tra6,tia6,tra10,tia10,tra14,tia14); [or3,oi3,or7,oi7,or11,oi11,or15,oi15]=rad2ct4(tra3,tia3,tra7,tia7,tra11,tia11,tra15,tia15); return %%%%%%%%% END %%%%%%%% %%%%%%%%% rad2ct2.m %%%%%%%% function [Or1,Oi1,Or2,Oi2]=rad2ct2(ir1,ii1,ir2,ii2) % This function calculates the 2 point FFT of the two 16-bit inputs and gives % two 16 bit outputs. % % GOVERNING EQUATIONS AND ANALYSIS % % out1=in1+in2; % out2=in1-in2; % RESULTS : % EXECUTION TIME : 27.4 ns(DELAY in MAXPLUS) WHEN FITTED INTO "EP1K10FC256-1" DEVICE OF ACEX 1K FAMILY % FRACTIONAL PRECISION : 1 less than input precision % ERROR OVER A RUN OF 100 TIMES : mean := -6.8272e-05 6.0807e-05i and variance := 4.0445e-09 % AVERAGE NUMBER OF ERRORS : % ERROR TYPES---> % MAGNITUDE ERRORS ONLY : 0 % SIGN ERRORS ONLY : 0.02 % BOTH MAGNITUDE AND SIGN : 0 Or1=add16(ir1,ir2); Or2=sub16(ir1,ir2); Oi1=add16(ii1,ii2); Oi2=sub16(ii1,ii2); return %%%%%%%%% END %%%%%%%% %%%%%%%%% rad2ct2ifft.m %%%%%%%% function [Or1,Oi1,Or2,Oi2]=rad2ct2ifft(ir1,ii1,ir2,ii2) % This function calculates the 2 point FFT of the two 16-bit inputs and gives % two 16 bit outputs. Or1=add16(ir1,ir2); Or2=sub16(ir1,ir2); Oi1=add16(ii1,ii2); Oi2=sub16(ii1,ii2); return

PAGE 58

49 %%%%%%%%% END %%%%%%%% %%%%%%%%% rad2ct32.m %%%%%%%% function [or0,oi0,or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi9,or10,oi10,or11,oi11,or12,oi12,or13,oi13,or14,oi14,or15,oi15,or16,oi16,or17,oi17,or18,oi18,or19,oi19,or20,oi20,or21,oi21,or22,oi22,or23,oi23,or24,oi24,or25,oi25,or26,oi26,or27,oi27,or28,oi28,or29,oi29,or30,oi30,or31,oi31]=rad2ct32(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7,ir8,ii8,ir9,ii9,ir10,ii10,ir11,ii11,ir12,ii12,ir13,ii13,ir14,ii14,ir15,ii15,ir16,ii16,ir17,ii17,ir18,ii18,ir19,ii19,ir20,ii20,ir21,ii21,ir22,ii22,ir23,ii23,ir24,ii24,ir25,ii25,ir26,ii26,ir27,ii27,ir28,ii28,ir29,ii29,ir30,ii30,ir31,ii31) %Function to calculate the 16 point radix 4 fft % % GOVERNING EQUATIONS AND ANALYSIS % % N = 16 ; => (N1 = 4) & (N2 = 4) % k = k1 + 2 k2; % n = 8 n1 + n2; % k1,n1 => (0,1,2,3) % k2,n2 => (0,1,2,3) % RESULTS : % EXECUTION TIME : % FRACTIONAL PRECISION : % ERROR OVER A RUN OF 100 TIMES : % AVERAGE NUMBER OF ERRORS : % ERROR TYPES---> % MAGNITUDE ERRORS ONLY : % SIGN ERRORS ONLY : % BOTH MAGNITUDE AND SIGN : [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3,tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad4ct8(ir0,ii0,ir4,ii4,ir8,ii8,ir12,ii12,ir16,ii16,ir20,ii20,ir24,ii24,ir28,ii28); [tr8,ti8,tr9,ti9,tr10,ti10,tr11,ti11,tr12,ti12,tr13,ti13,tr14,ti14,tr15,ti15]=rad4ct8(ir1,ii1,ir5,ii5,ir9,ii9,ir13,ii13,ir17,ii17,ir21,ii21,ir25,ii25,ir29,ii29); [tr16,ti16,tr17,ti17,tr18,ti18,tr19,ti19,tr20,ti20,tr21,ti21,tr22,ti22,tr23,ti23]=rad4ct8(ir2,ii2,ir6,ii6,ir10,ii10,ir14,ii14,ir18,ii18,ir22,ii22,ir26,ii26,ir30,ii30); [tr24,ti24,tr25,ti25,tr26,ti26,tr27,ti27,tr28,ti28,tr29,ti29,tr30,ti30,tr31,ti31]=rad4ct8(ir3,ii3,ir7,ii7,ir11,ii11,ir15,ii15,ir19,ii19,ir23,ii23,ir27,ii27,ir31,ii31); tra0=shrega(tr0); tia0=shrega(ti0); tra1=shrega(tr1); tia1=shrega(ti1); tra2=shrega(tr2); tia2=shrega(ti2); tra3=shrega(tr3); tia3=shrega(ti3); tra4=shrega(tr4); tia4=shrega(ti4);

PAGE 59

50 tra5=shrega(tr5); tia5=shrega(ti5); tra6=shrega(tr6); tia6=shrega(ti6); tra7=shrega(tr7); tia7=shrega(ti7); tra8=shrega(tr8); tia8=shrega(ti8); tra16=shrega(tr16); tia16=shrega(ti16); tra24=shrega(tr24); tia24=shrega(ti24); w32_1r=[0 0 1 1 1 1 1 0 1 1 0 0 0 1 0 1]; w32_1i=[1 1 1 1 0 0 1 1 1 0 0 0 0 1 0 0]; w32_2r=[0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0]; w32_2i=[1 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1]; w32_3r=[0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 0]; w32_3i=[1 1 0 1 1 1 0 0 0 1 1 1 0 0 1 0]; w32_4r=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; w32_4i=[1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0]; w32_5r=[0 0 1 0 0 0 1 1 1 0 0 0 1 1 1 0]; w32_5i=[1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 0]; w32_6r=[0 0 0 1 0 1 1 1 1 0 0 0 1 1 1 0]; w32_6i=[1 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0]; w32_7r=[0 0 0 0 1 1 0 0 0 1 1 1 1 1 0 1]; w32_7i=[1 1 0 0 0 0 0 1 0 0 1 1 1 0 1 1]; w32_8r=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w32_8i=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1]; [tra9,tia9]=cplxmul16(tr9,ti9,w32_1r,w32_1i); [tra10,tia10]=cplxmul16(tr10,ti10,w32_2r,w32_2i); [tra11,tia11]=cplxmul16(tr11,ti11,w32_3r,w32_3i); [tra12,tia12]=cplxmul16(tr12,ti12,w32_2r,w32_2i); [tra13,tia13]=cplxmul16(tr13,ti13,w32_4r,w32_4i); [tra14,tia14]=cplxmul16(tr14,ti14,w32_6r,w32_6i); [tra15,tia15]=cplxmul16(tr15,ti15,w32_7r,w32_7i); [tra17,tia17]=cplxmul16(tr17,ti17,w32_2r,w32_2i); [tra18,tia18]=cplxmul16(tr18,ti18,w32_4r,w32_4i); [tra19,tia19]=cplxmul16(tr19,ti19,w32_6r,w32_6i); [tra20,tia20]=cplxmul16(tr20,ti20,w32_8r,w32_8i); [tra21,tia21]=cplxmul16(tr21,ti21,w32_2i,w32_6i); [tra22,tia22]=cplxmul16(tr22,ti22,w32_4i,w32_4i); [tra23,tia23]=cplxmul16(tr23,ti23,w32_6i,w32_2i); [tra25,tia25]=cplxmul16(tr25,ti25,w32_3r,w32_3i); [tra26,tia26]=cplxmul16(tr26,ti26,w32_6r,w32_6i); [tra27,tia27]=cplxmul16(tr27,ti27,w32_1i,w32_7i); [tra28,tia28]=cplxmul16(tr28,ti28,w32_4i,w32_4i); [tra29,tia29]=cplxmul16(tr29,ti29,w32_7i,w32_1i); [tra30,tia30]=cplxmul16(tr30,ti30,w32_6i,w32_6r); [tra31,tia31]=cplxmul16(tr31,ti31,w32_3i,w32_3r); [or0,oi0,or8,oi8,or16,oi16,or24,oi24]=rad2ct2(tra0,tia0,tra8,tia8,tra16,tia16,tra24,tia24);

PAGE 60

51 [or1,oi1,or9,oi9,or17,oi17,or25,oi25]=rad2ct2(tra1,tia1,tra9,tia9,tra17,tia17,tra25,tia25); [or2,oi2,or10,oi10,or18,oi18,or26,oi26]=rad2ct2(tra2,tia2,tra10,tia10,tra18,tia18,tra26,tia26); [or3,oi3,or11,oi11,or19,oi19,or27,oi27]=rad2ct2(tra3,tia3,tra11,tia11,tra19,tia19,tra27,tia27); [or4,oi4,or12,oi12,or20,oi20,or28,oi28]=rad2ct2(tra4,tia4,tra12,tia12,tra20,tia20,tra28,tia28); [or5,oi5,or13,oi13,or21,oi21,or29,oi29]=rad2ct2(tra5,tia5,tra13,tia13,tra21,tia21,tra29,tia29); [or6,oi6,or14,oi14,or22,oi22,or30,oi30]=rad2ct2(tra6,tia6,tra14,tia14,tra22,tia22,tra30,tia30); [or7,oi7,or15,oi15,or23,oi23,or31,oi31]=rad2ct2(tra7,tia7,tra15,tia15,tra23,tia23,tra31,tia31); return %%%%%%%%% END %%%%%%%% %%%%%%%%% rad2ct4.m %%%%%%%% function [or1,oi1,or2,oi2,or3,oi3,or4,oi4]=rad2ct4(ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4) % This function computes the Radix 2 4-point Cooley Tukey FFT given the four points % % GOVERNING EQUATIONS AND ANALYSIS % % N = 16 ; => (N1 = 4) & (N2 = 4) % k = k1 + 2 k2; % n = 8 n1 + n2; % k1,n1 => (0,1,2,3) % k2,n2 => (0,1,2,3) % RESULTS : % EXECUTION TIME : 40.0 ns Delay when fitted into "EP1k100FC484-1"device of ACEX 1K family % FRACTIONAL PRECISION : 2 bits less than input precision % ERROR OVER A RUN OF 100 TIMES : -1.1922e-04 -1.2181e-04i with a variance of 0.3048 % AVERAGE NUMBER OF ERRORS : % ERROR TYPES---> % MAGNITUDE ERRORS ONLY : 0 % SIGN ERRORS ONLY : 0.03 % BOTH MAGNITUDE AND SIGN : 3.97 [tr1,ti1,tr2,ti2]=rad2ct2(ir1,ii1,ir3,ii3); [tr3,ti3,tr4,ti4]=rad2ct2(ir2,ii2,ir4,ii4); [or1,oi1,or3,oi3]=rad2ct2(tr1,ti1,tr3,ti3); [or2,oi2,or4,oi4]=rad2ct2(tr2,ti2,ti4,com(tr4)); return %%%%%%%%% END %%%%%%%%%%%%%%%%% rad2ct4ifft.m %%%%%%%%

PAGE 61

52 function [or1,oi1,or2,oi2,or3,oi3,or4,oi4]=rad2ct4ifft(ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4) % This function computes the Radix 2 4-point Cooley Tukey FFT given the four points [tr1,ti1,tr2,ti2]=rad2ct2(ir1,ii1,ir3,ii3); [tr3,ti3,tr4,ti4]=rad2ct2(ir2,ii2,ir4,ii4); [or1,oi1,or3,oi3]=rad2ct2(tr1,ti1,tr3,ti3); [or2,oi2,or4,oi4]=rad2ct2(tr2,ti2,com(ti4),tr4); return %%%%%%%%% END %%%%%%%% %%%%%%%%% rad2ct8.m %%%%%%%% function [or0,oi0,or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7]=rad2ct8(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7) % this function calculates the 8-point radix-2 cooley tukey fft transform % % GOVERNING EQUATIONS AND ANALYSIS % % N = 16 ; => (N1 = 4) & (N2 = 4) % k = k1 + 2 k2; % n = 8 n1 + n2; % k1,n1 => (0,1,2,3) % k2,n2 => (0,1,2,3) % RESULTS : % EXECUTION TIME : 106.8ns when fitted into "EP1K30QC208-1","EP1K50FC484-1","EP1K100FC484-1" Decives of ACEX1K family % FRACTIONAL PRECISION : 5 fractional bits less than the input % ERROR OVER A RUN OF 100 TIMES : -0.0010 0.0010 i % AVERAGE NUMBER OF ERRORS : % ERROR TYPES---> % MAGNITUDE ERRORS ONLY : 0 % SIGN ERRORS ONLY : 1.26 % BOTH MAGNITUDE AND SIGN : 1.17 [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3]=rad2ct4( ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6); [tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4(ir1,ii1,ir3,ii3,ir5,ii5,ir7,ii7); tra0=shrega(tr0); tia0=shrega(ti0); tra1=shrega(tr1); tia1=shrega(ti1); tra2=shrega(tr2); tia2=shrega(ti2); tra3=shrega(tr3);

PAGE 62

53 tia3=shrega(ti3); tra4=shrega(tr4); tia4=shrega(ti4); w8_r1=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; w8_i1=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; w8_r2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w8_i2=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w8_r3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; w8_i3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; [tra5,tia5]=cplxmul16(tr5,ti5,w8_r1,w8_i1); [tra6,tia6]=cplxmul16(tr6,ti6,w8_r2,w8_i2); [tra7,tia7]=cplxmul16(tr7,ti7,w8_r3,w8_i3); [or0,oi0,or4,oi4]=rad2ct2(tra0,tia0,tra4,tia4); [or1,oi1,or5,oi5]=rad2ct2(tra1,tia1,tra5,tia5); [or2,oi2,or6,oi6]=rad2ct2(tra2,tia2,tra6,tia6); [or3,oi3,or7,oi7]=rad2ct2(tra3,tia3,tra7,tia7); return %%%%%%%%% END %%%%%%%% %%%%%%%%% rad2ct8ifft.m %%%%%%%% function [or0,oi0,or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7]=rad2ct8ifft(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7) % this function calculates the 8-point radix-2 cooley tukey ifft transform [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3]=rad2ct4ifft( ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6); [tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4ifft( ir1,ii1,ir3,ii3,ir5,ii5,ir7,ii7); tra0=shrega(tr0); tia0=shrega(ti0); tra1=shrega(tr1); tia1=shrega(ti1); tra2=shrega(tr2); tia2=shrega(ti2); tra3=shrega(tr3); tia3=shrega(ti3); tra4=shrega(tr4); tia4=shrega(ti4); w8_r1=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; w8_i1=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0]; w8_r2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w8_i2=[0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1]; w8_r3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0];

PAGE 63

54 w8_i3=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0]; [tra5,tia5]=cplxmul16(tr5,ti5,w8_r1,w8_i1); [tra6,tia6]=cplxmul16(tr6,ti6,w8_r2,w8_i2); [tra7,tia7]=cplxmul16(tr7,ti7,w8_r3,w8_i3); [or0,oi0,or4,oi4]=rad2ct2ifft(tra0,tia0,tra4,tia4); [or1,oi1,or5,oi5]=rad2ct2ifft(tra1,tia1,tra5,tia5); [or2,oi2,or6,oi6]=rad2ct2ifft(tra2,tia2,tra6,tia6); [or3,oi3,or7,oi7]=rad2ct2ifft(tra3,tia3,tra7,tia7); return %%%%%%%%% END %%%%%%%% %%%%%%%%% rad4ct4.m %%%%%%%% function [or0,oi0,or1,oi1,or2,oi2,or3,oi3]=rad4ct4(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3) % This function calculates the raxix 4 4-point Cooley Tukey FFT % % GOVERNING EQUATIONS AND ANALYSIS % % N = 16 ; => (N1 = 4) & (N2 = 4) % k = k1 + 2 k2; % n = 8 n1 + n2; % k1,n1 => (0,1,2,3) % k2,n2 => (0,1,2,3) % RESULTS : % EXECUTION TIME : 44.1ns when fitted into "EP1K100FC484-1" device of ACEX1K family % FRACTIONAL PRECISION : 2 less than Input % ERROR OVER A RUN OF 100 TIMES : -1.1553e-04 1.1447e-04 % AVERAGE NUMBER OF ERRORS : % ERROR TYPES---> % MAGNITUDE ERRORS ONLY : 0 % SIGN ERRORS ONLY : 0.0450 % BOTH MAGNITUDE AND SIGN : 3.955 t0=add16(ir0,ir1); t1=add16(ir2,ir3); t2=add16(ii0,ii1); t3=add16(ii2,ii3); t4=sub16(ir0,ir2); t5=sub16(ii0,ii2); t6=sub16(ii1,ii3); t7=sub16(ir1,ir3); t8=add16(ir0,ir2); t9=add16(ir1,ir3); t10=add16(ii0,ii2); t11=add16(ii1,ii3); oi1=sub16(t5,t7);

PAGE 64

55 oi2=sub16(t10,t11); or2=sub16(t8,t9); or3=sub16(t4,t6); or0=add16(t0,t1); oi0=add16(t2,t3); or1=add16(t4,t6); oi3=add16(t5,t7); return %%%%%%%%% END %%%%%%%% %%%%%%%%% rad4ct4ifft.m %%%%%%%% function [or0,oi0,or3,oi3,or2,oi2,or1,oi1]=rad4ct4(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3) % This function calculates the raxix 4 4-point Cooley Tukey FFT % % GOVERNING EQUATIONS AND ANALYSIS % % N = 16 ; => (N1 = 4) & (N2 = 4) % k = k1 + 2 k2; % n = 8 n1 + n2; % k1,n1 => (0,1,2,3) % k2,n2 => (0,1,2,3) % RESULTS : % EXECUTION TIME : 44.1ns when fitted into "EP1K100FC484-1" device of ACEX1K family % FRACTIONAL PRECISION : 2 less than Input % ERROR OVER A RUN OF 100 TIMES : -1.1553e-04 1.1447e-04 % AVERAGE NUMBER OF ERRORS : % ERROR TYPES---> % MAGNITUDE ERRORS ONLY : 0 % SIGN ERRORS ONLY : 0.0450 % BOTH MAGNITUDE AND SIGN : 3.955 t0=add16(ir0,ir1); t1=add16(ir2,ir3); t2=add16(ii0,ii1); t3=add16(ii2,ii3); t4=sub16(ir0,ir2); t5=sub16(ii0,ii2); t6=sub16(ii1,ii3); t7=sub16(ir1,ir3); t8=add16(ir0,ir2); t9=add16(ir1,ir3); t10=add16(ii0,ii2); t11=add16(ii1,ii3); oi1=sub16(t5,t7); oi2=sub16(t10,t11); or2=sub16(t8,t9); or3=sub16(t4,t6);

PAGE 65

56 or0=add16(t0,t1); oi0=add16(t2,t3); or1=add16(t4,t6); oi3=add16(t5,t7); return %%%%%%%%% END %%%%%%%% %%%%%%%%% rad4ct8.m %%%%%%%% function [or0,oi0,or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7]=rad4ct8(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7) % % GOVERNING EQUATIONS AND ANALYSIS % % N = 16 ; => (N1 = 4) & (N2 = 4) % k = k1 + 2 k2; % n = 8 n1 + n2; % k1,n1 => (0,1,2,3) % k2,n2 => (0,1,2,3) % RESULTS : % EXECUTION TIME : 124.0 ns when fitted into devices of the ACEX 1K family % FRACTIONAL PRECISION : 5 fractional bits less than the input % ERROR OVER A RUN OF 100 TIMES : -9.6769e-04 9.6433e-04i with a variance of 0.1246 % AVERAGE NUMBER OF ERRORS : % ERROR TYPES---> % MAGNITUDE ERRORS ONLY : 0 % SIGN ERRORS ONLY : 1.12 % BOTH MAGNITUDE AND SIGN : 1.35 % N1=4 PoiNT FFT [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3]=rad4ct4(ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6); [tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad4ct4(ir1,ii1,ir3,ii3,ir5,ii5,ir7,ii7); % MultiPLiCAtiON PHASE w8_1r=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; w8_1i=[1 1 0 1 0 0 1 0 1 0 1 1 1 1 1 1]; w8_2r=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w8_2i=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w8_3r=[1 1 0 1 0 0 1 0 1 0 1 1 1 1 1 1]; w8_3i=[1 1 0 1 0 0 1 0 1 0 1 1 1 1 1 1]; tr0m=shrega(tr0); tr1m=shrega(tr1); tr2m=shrega(tr2); tr3m=shrega(tr3); tr4m=shrega(tr4);

PAGE 66

57 ti0m=shrega(ti0); ti1m=shrega(ti1); ti2m=shrega(ti2); ti3m=shrega(ti3); ti4m=shrega(ti4); [tr5m,ti5m]=cplxmul16(tr5,ti5,w8_1r,w8_1i); [tr6m,ti6m]=cplxmul16(tr6,ti6,w8_2r,w8_2i); [tr7m,ti7m]=cplxmul16(tr7,ti7,w8_3r,w8_3i); % N2=2 PoiNT FFT [or0,oi0,or4,oi4]=rad2ct2(tr0m,ti0m,tr4m,ti4m); [or1,oi1,or5,oi5]=rad2ct2(tr1m,ti1m,tr5m,ti5m); [or2,oi2,or6,oi6]=rad2ct2(tr2m,ti2m,tr6m,ti6m); [or3,oi3,or7,oi7]=rad2ct2(tr3m,ti3m,tr7m,ti7m); return %%%%%%%%% END %%%%%%%% %%%%%%%%% ramasfft.m %%%%%%%% function [b]=ramasfft(a) % Program to interface my fft engine in place of the standard FFT function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% N=8; %%% ! MODIFY ! %%%%% shift=8; %%% ! MODIFY ! %%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% or=zeros(N,1); oi=or; a_re=real(a); a_im=imag(a); a_r=zeros(N,16); a_i=zeros(N,16); o_r=a_r; o_i=a_i; for i=1:N a_r(i,:)=convert(a_re(i),2); a_i(i,:)=convert(a_im(i),2); end % 2-point [or1,oi1,or2,oi2]=rad2ct2(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:)); % 4-point [or1,oi1,or2,oi2,or3,oi3,or4,oi4]=rad2ct4(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),a_r(4,:),a_i(4,:)); [or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8]=rad2ct8(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),a_r(4,:),a_i(4,:),a_r(5,:),a_i(5,:),a_r(6,:),a_i(6,:),a_r(7,:),a_i(7,:),a_r(8,:),a_i(8,:));

PAGE 67

58 % 16-point [or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi9,or10,oi10,or11,oi11,or12,oi12,or13,oi13,or14,oi14,or15,oi15,or16,oi16]=rad2ct16(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),a_r(4,:),a_i(4,:),a_r(5,:),a_i(5,:),a_r(6,:),a_i(6,:),a_r(7,:),a_i(7,:),a_r(8,:),a_i(8,:),a_r(9,:),a_i(9,:),a_r(10,:),a_i(10,:),a_r(11,:),a_i(11,:),a_r(12,:),a_i(12,:),a_r(13,:),a_i(13,:),a_r(14,:),a_i(14,:),a_r(15,:),a_i(15,:),a_r(16,:),a_i(16,:)); % 32-point [or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi9,or10,oi10,or11,oi11,or12,oi12,or13,oi13,or14,oi14,or15,oi15,or16,oi16,or17,oi17,or18,oi18,or19,oi19,or20,oi20,or21,oi21,or22,oi22,or23,oi23,or24,oi24,or25,oi25,or26,oi26,or27,oi27,or28,oi28,or29,oi29,or30,oi30,or31,oi31,or32,oi32]=rad2ct16(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),a_r(4,:),a_i(4,:),a_r(5,:),a_i(5,:),a_r(6,:),a_i(6,:),a_r(7,:),a_i(7,:),a_r(8,:),a_i(8,:),a_r(9,:),a_i(9,:),a_r(10,:),a_i(10,:),a_r(11,:),a_i(11,:),a_r(12,:),a_i(12,:),a_r(13,:),a_i(13,:),a_r(14,:),a_i(14,:),a_r(15,:),a_i(15,:),a_r(16,:),a_i(16,:),a_r(17,:),a_i(17,:),a_r(18,:),a_i(18,:),a_r(19,:),a_i(19,:),a_r(20,:),a_i(20,:),a_r(21,:),a_i(21,:),a_r(22,:),a_i(22,:),a_r(23,:),a_i(23,:),a_r(24,:),a_i(24,:),a_r(25,:),a_i(25,:),a_r(26,:),a_i(26,:),a_r(27,:),a_i(27,:),a_r(28,:),a_i(28,:),a_r(29,:),a_i(29,:),a_r(30,:),a_i(30,:),a_r(31,:),a_i(31,:),a_r(32,:),a_i(32,:)); o_r1=arr2dec(or1,shift); o_i1=arr2dec(oi1,shift); o_r2=arr2dec(or2,shift); o_i2=arr2dec(oi2,shift); o_r3=arr2dec(or3,shift); o_i3=arr2dec(oi3,shift); o_r4=arr2dec(or4,shift); o_i4=arr2dec(oi4,shift); o_r5=arr2dec(or5,shift); o_i5=arr2dec(oi5,shift); o_r6=arr2dec(or6,shift); o_i6=arr2dec(oi6,shift); o_r7=arr2dec(or7,shift); o_i7=arr2dec(oi7,shift); o_r8=arr2dec(or8,shift); o_i8=arr2dec(oi8,shift); % 2-point b=[o_r1+j*o_i1;o_r2+j*o_i2]; % 4-point b=[o_r1+j*o_i1;o_r2+j*o_i2;o_r3+j*o_i3;o_r4+j*o_i4]; b=[o_r1+j*o_i1;o_r2+j*o_i2;o_r3+j*o_i3;o_r4+j*o_i4;o_r5+j*o_i5;o_r6+j*o_i6;o_r7+j*o_i7;o_r8+j*o_i8]; % 16-point b=[o_r1+j*o_i1;o_r2+j*o_i2;o_r3+j*o_i3;o_r4+j*o_i4;o_r5+j*o_i5;o_r6+j*o_i6;o_r7+j*o_i7;o_r8+j*o_i8;o_r9+j*o_i9;o_r10+j*o_i10;o_r11+j*o_i11;o_r12+j*o_i12;o_r13+j*o_i13;o_r14+j*o_i14;o_r15+j*o_i15;o_r16+j*o_i16]; % b=[o_r1+j*o_i1;o_r2+j*o_i2;o_r3+j*o_i3;o_r4+j*o_i4;o_r5+j*o_i5;o_r6+j*o_i6;o_r7+j*o_i7;o_r8+j*o_i8;o_r9+j*o_i9;o_r10+j*o_i10;o_r11+j*o_i11;o_r12+j*o_i12;o_r13+j*o_i13;o_r14+j*o_i14;o_r15+j*o_i15;o_r16+j*o_i16;o_r1

PAGE 68

59 7+j*o_i17;o_r18+j*o_i18;o_r19+j*o_i19;o_r20+j*o_i20;o_r21+j*o_i21;o_r22+j*o_i22;o_r23+j*o_i23;o_r24+j*o_i24;o_r25+j*o_i25;o_r26+j*o_i26;o_r27+j*o_i27;o_r28+j*o_i28;o_r29+j*o_i29;o_r30+j*o_i30;o_r31+j*o_i31;o_r32+j*o_i32]; return %%%%%%%%% END %%%%%%%% %%%%%%%%% ramasifft.m %%%%%%%% function [b]=ramasfft(a) % Program to interface my fft engine in place of the standard FFT function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% N=8; %%% ! MODIFY ! %%%%% shift=5; %%% ! MODIFY ! %%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% or=zeros(N,1); oi=or; a_re=real(a); a_im=imag(a); a_r=zeros(N,16); a_i=zeros(N,16); o_r=a_r; o_i=a_i; for i=1:N a_r(i,:)=convert(a_re(i),2); a_i(i,:)=convert(a_im(i),2); end % 4-point [or1,oi1,or2,oi2,or3,oi3,or4,oi4]=rad2ct4ifft(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),a_r(4,:),a_i(4,:)); [or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8]=rad2ct8ifft(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),a_r(4,:),a_i(4,:),a_r(5,:),a_i(5,:),a_r(6,:),a_i(6,:),a_r(7,:),a_i(7,:),a_r(8,:),a_i(8,:)); % 16-point [or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi9,or10,oi10,or11,oi11,or12,oi12,or13,oi13,or14,oi14,or15,oi15,or16,oi16]=rad2ct16ifft(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),a_r(4,:),a_i(4,:),a_r(5,:),a_i(5,:),a_r(6,:),a_i(6,:),a_r(7,:),a_i(7,:),a_r(8,:),a_i(8,:),a_r(9,:),a_i(9,:),a_r(10,:),a_i(10,:),a_r(11,:),a_i(11,:),a_r(12,:),a_i(12,:),a_r(13,:),a_i(13,:),a_r(14,:),a_i(14,:),a_r(15,:),a_i(15,:),a_r(16,:),a_i(16,:)); % 32-point [or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi9,or10,oi10,or11,oi11,or12,oi12,or13,oi13,or14,oi14,or15,oi15,or16,oi16,or17,oi17,or18,oi18,or19,oi19,or20,oi20,or21,oi21,or22,oi22,or23,oi23,or24,oi24,or25,oi25,or26,oi26,or27,oi27,or28,oi28,or29,oi29,or30,oi30,or31,oi31,or32,oi32]=rad2ct32ifft(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),a_r(4,:),a_i(4,:),a_r(5,:),a_i(5,:),a_r(6,:),a_i(6,:),a_r(7,:),a_i(7,:),a_r(8,:),a_i(8,:),a_r(9,:),a_i(9,:),a_r(10,:),a_i(10,:

PAGE 69

60 ),a_r(11,:),a_i(11,:),a_r(12,:),a_i(12,:),a_r(13,:),a_i(13,:),a_r(14,:),a_i(14,:),a_r(15,:),a_i(15,:),a_r(16,:),a_i(16,:),a_r(17,:),a_i(17,:),a_r(18,:),a_i(18,:),a_r(19,:),a_i(19,:),a_r(20,:),a_i(20,:),a_r(21,:),a_i(21,:),a_r(22,:),a_i(22,:),a_r(23,:),a_i(23,:),a_r(24,:),a_i(24,:),a_r(25,:),a_i(25,:),a_r(26,:),a_i(26,:),a_r(27,:),a_i(27,:),a_r(28,:),a_i(28,:),a_r(29,:),a_i(29,:),a_r(30,:),a_i(30,:),a_r(31,:),a_i(31,:),a_r(32,:),a_i(32,:)); o_r1=arr2dec(or1,shift); o_i1=arr2dec(oi1,shift); o_r2=arr2dec(or2,shift); o_i2=arr2dec(oi2,shift); o_r3=arr2dec(or3,shift); o_i3=arr2dec(oi3,shift); o_r4=arr2dec(or4,shift); o_i4=arr2dec(oi4,shift); o_r5=arr2dec(or5,shift); o_i5=arr2dec(oi5,shift); o_r6=arr2dec(or6,shift); o_i6=arr2dec(oi6,shift); o_r7=arr2dec(or7,shift); o_i7=arr2dec(oi7,shift); o_r8=arr2dec(or8,shift); o_i8=arr2dec(oi8,shift); % 4-point b=[o_r1+j*o_i1;o_r2+j*o_i2;o_r3+j*o_i3;o_r4+j*o_i4]; b=[o_r1+j*o_i1;o_r2+j*o_i2;o_r3+j*o_i3;o_r4+j*o_i4;o_r5+j*o_i5;o_r6+j*o_i6;o_r7+j*o_i7;o_r8+j*o_i8]; % 16-point b=[o_r1+j*o_i1;o_r2+j*o_i2;o_r3+j*o_i3;o_r4+j*o_i4;o_r5+j*o_i5;o_r6+j*o_i6;o_r7+j*o_i7;o_r8+j*o_i8;o_r9+j*o_i9;o_r10+j*o_i10;o_r11+j*o_i11;o_r12+j*o_i12;o_r13+j*o_i13;o_r14+j*o_i14;o_r15+j*o_i15;o_r16+j*o_i16]; % b=[o_r1+j*o_i1;o_r2+j*o_i2;o_r3+j*o_i3;o_r4+j*o_i4;o_r5+j*o_i5;o_r6+j*o_i6;o_r7+j*o_i7;o_r8+j*o_i8;o_r9+j*o_i9;o_r10+j*o_i10;o_r11+j*o_i11;o_r12+j*o_i12;o_r13+j*o_i13;o_r14+j*o_i14;o_r15+j*o_i15;o_r16+j*o_i16;o_r17+j*o_i17;o_r18+j*o_i18;o_r19+j*o_i19;o_r20+j*o_i20;o_r21+j*o_i21;o_r22+j*o_i22;o_r23+j*o_i23;o_r24+j*o_i24;o_r25+j*o_i25;o_r26+j*o_i26;o_r27+j*o_i27;o_r28+j*o_i28;o_r29+j*o_i29;o_r30+j*o_i30;o_r31+j*o_i31;o_r32+j*o_i32]; return %%%%%%%%% END %%%%%%%% %%%%%%%%% shrega.m %%%%%%%% function [S]=shrega(a) % this function shifts the register contents to the right % while taking care of SIGN EXTENTION S=[a(1) a(1) a(1) a]; S=S(1:16); return %%%%%%%%% END %%%%%%%%

PAGE 70

61 %%%%%%%%% snrvary1.m %%%%%%%% % This program reads data from a text file 'intext.txt' and sends the file data to the IFFT engine % and may add noise to the transmitted data before receiving it and passing it to the FFT engine % to decipher its output. The output is then written to a file called 'outtext.txt'. % Gather FIDs for input and output files infid= fopen('outtext.txt','r'); outfid=fopen('outtext2.txt','w'); % Noise Measure snratio=[-20 -15 -10 -5 -4 -3 -2 -1 0]; lengsnr=length(snratio); % Read Data from input file [indata,incount]=fread(infid,'bit1'); %indata=randint(50,1); %incount=50; for g=1:lengsnr g % Initialize Outout data outdata=[]; outsymbol=[]; N=8; datalen=incount; rema=mod(incount,2*N); if (rema~=0) indata=[indata;zeros(2*N-rema,1)]; datalen=datalen+2*N-rema end %indata=randint(datalen/N,1); batches=datalen/N symbollist=[]; % Encoding the input Data... disp('Encoding the Input Data ...'); for i=0:batches-1 symbollist=[symbollist; getsymbol(indata(N*i+1:N*(i+1)))]; end disp('Encoding the Input Data .... Done'); for i=0:batches/2-1 z=getformat(symbollist(N*i+1:N*(i+1/2))); transmit=ramasifft8(z); powx=sum(abs(transmit.*transmit)); % Power of the transmitted window of data pownoise=powx 10 ^ ( snratio(g) / 10 ); % Noise power calculation noise=rand(N,1);

PAGE 71

62 inputnoise=pownoise noise; receiverinput=transmit+inputnoise; receive=ramasfft8(receiverinput); p=deformat(receive); outsymbol=[outsymbol; p]; z=getformat(symbollist(N*(i+1/2)+1:N*(i+1))); transmit=ramasifft8(z); powx=sum(abs(transmit.*transmit)); % Power of the transmitted window of data pownoise=powx 10 ^ ( snratio(g) / 10 ); % Noise power calculation noise=rand(N,1); inputnoise=pownoise noise; receiverinput=transmit+inputnoise; receive=ramasfft8(receiverinput); p=deformat(receive); outsymbol=[outsymbol; p]; end % finding the nearest point in the given constellation disp('Fitting the received vector in the constellation'); lenoutsymlist=length(outsymbol); symout=zeros(lenoutsymlist,1); for k=1:lenoutsymlist symout(k)=getpoint4qam(outsymbol(k)); end % Decoding the input Data... disp('Decoding the Output Data ...'); for i=0:batches/2-1 y=symout(N*i+1:N*(i+1)); outdata=[outdata; getnumber(y)]; end disp('Decoding the Output Data .... Done'); [numberoferrors(g),ber(g)]=biterr(abs(indata),abs(outdata)); end count=fwrite(outfid,outdata,'bit1'); st=fclose('all'); errdata=indata-outdata; plot(abs(indata),'ro'); plot(abs(indata),'r'); hold on plot(abs(outdata),'kd'); plot(abs(outdata),'k'); plot(abs(errdata),'*');

PAGE 72

63 plot(abs(errdata)); hold off %%%%%%%%% END %%%%%%%% %%%%%%%%% sub16.m %%%%%%%% function [S]=sub16(a,b) % This function calculates the difference between two 16-bit numbers and gives the % result as a 16 bit word avoiding an overflow. So one bit of precision is lost in the % process. T=com(b); S=add16(a,T); return %%%%%%%%% END %%%%%%%%

PAGE 73

64 APPENDIX B 32-BIT COOLEY-TUKEY AND CHIRP-Z IMPLEMENTATION %%%%%%%% add40.m %%%%%%%%% function [C]=add40(A,B) % function [C]=add40(A,B) % Adds two 32 bit numbers to produce a 40 bit output %A,B --> in 32 bits %C --> out 40 bits Cin=0; A=[A zeros(1,8)]; B=[B zeros(1,8)]; [tmp1,c0]=ad16c(A(25:40),B(25:40)); [tmp2,c1]=ad12c(A(13:24),B(13:24),c0); [tmp3,c2]=ad12c(A(1:12),B(1:12),c1); test=xor(A(1) ,B(1)); testbar=~test; term1=tmp3(1) & test; term2=c2 & testbar; msb=term1 | term2; C=[msb tmp3 tmp2 tmp1]; C=C(1:40); return %%%%%%%% END %%%%%%%%% %%%%%%%% bm4.m %%%%%%%%% function [P1]=bm4(X1,Y1) X=X1(4:-1:1); Y=Y1(4:-1:1); %port(X,Y:in std_logic_vector(4 downto 1); % P:out std_logic_vector(8 downto 1)); LOW=0; for i= 1:4 for j=1:4 t(i+4*j-4)=X(i) & Y(j); end end P(1)=t(1); [P(2),c(1)]=fulladder(t(5),LOW,t(2)); [a(1),c(2)]=fulladder(t(6),LOW,t(3)); [a(2),c(3)]=fulladder(t(7),LOW,t(4)); [P(3),c(4)]=fulladder(t(9),c(1),a(1));

PAGE 74

65 [a(3),c(5)]=fulladder(t(10),c(2),a(2)); [a(4),c(6)]=fulladder(t(11),c(3),t(8)); [P(4),c(7)]=fulladder(t(13),c(4),a(3)); [a(5),c(8)]=fulladder(t(14),c(5),a(4)); [a(6),c(9)]=fulladder(t(15),c(6),t(12)); [P(5),c(10)]=fulladder(LOW,c(7),a(5)); [P(6),c(11)]=fulladder(c(10),c(8),a(6)); [P(7),P(8)]=fulladder(c(11),c(9),t(16)); P1=P(8:-1:1); return %%%%%%%% END %%%%%%%%% %%%%%%%% bm8.m %%%%%%%%% function[P1]=bm8(A1,B1) %port(X,Y:= std_logic_vector(7 downto 0); % P:out std_logic_vector(15 downto 0)); X=A1(8:-1:1); Y=B1(8:-1:1); LOW=0; for i = 1:8 for j = 1:8 t(i+8*j-8)=X(i) & Y(j); end end P(1)=t(1); [P(2),c(1)]=fulladder(t(9),LOW,t(2)); for i=2:7 [a(i-1),c(i)]=fulladder(t(i+1),LOW,t(i+8)); end [P(3),c(8)]=fulladder(t(17),c(1),a(1)); for i=1:5 [a(i+6),c(i+8)]=fulladder(t(i+17),a(i+1),c(i+1)); end [a(12),c(14)]=fulladder(t(16),c(7),t(23)); [P(4),c(15)]=fulladder(t(25),c(8),a(7)); for i=1:5 [a(i+12),c(i+15)]=fulladder(t(i+25),a(i+7),c(i+8)); end [a(18),c(21)]=fulladder(t(24),c(14),t(31)); [P(5),c(22)]=fulladder(t(33),c(15),a(13)); for i=1:5 [a(i+18),c(i+22)]=fulladder(t(i+33),a(i+13),c(i+15)); end [a(24),c(28)]=fulladder(t(32),c(21),t(39)); [P(6),c(29)]=fulladder(t(41),c(22),a(19)); for i=1:5

PAGE 75

66 [a(i+24),c(i+29)]=fulladder(t(i+41),a(i+19),c(i+22)); end [a(30),c(35)]=fulladder(t(40),c(28),t(47)); [P(7),c(36)]=fulladder(t(49),c(29),a(25)); for i=1:5 [a(i+30),c(i+36)]=fulladder(t(i+49),a(i+25),c(i+29)); end [a(36),c(42)]=fulladder(t(48),c(35),t(55)); [P(8),c(43)]=fulladder(t(57),c(36),a(31)); for i=1:5 [a(i+36),c(i+43)]=fulladder(t(i+55),a(i+31),c(i+36)); end [a(42),c(49)]=fulladder(t(56),c(42),t(63)); [P(9),c(50)]=fulladder(LOW,c(43),a(37)); [P(10),c(51)]=fulladder(c(50),c(44),a(38)); [P(11),c(52)]=fulladder(c(51),c(45),a(39)); [P(12),c(53)]=fulladder(c(52),c(46),a(40)); [P(13),c(54)]=fulladder(c(53),c(47),a(41)); [P(14),c(55)]=fulladder(c(54),c(48),a(42)); [P(15),P(16)]=fulladder(c(55),c(49),t(64)); P1=P(16:-1:1); return %%%%%%%% END %%%%%%%%% %%%%%%%% bwcell1.m %%%%%%%%% function [P]=bwcell1(X,Y) P=X & Y; return %%%%%%%% END %%%%%%%%% %%%%%%%% bwcell2.m %%%%%%%%% function [SUMOUT,COUT]=bwcell2(X,Y,SUMIN, CIN) D=X & Y; [SUMOUT,COUT]=fulladder(D,SUMIN,CIN); return %%%%%%%% END %%%%%%%%% %%%%%%%% bwcell3.m %%%%%%%%% function [s]=bwcell3(a,b) % Cell 3 of the Baugh Wooley Multiplier s=a & ~b; return %%%%%%%% END %%%%%%%%% %%%%%%%% bwcell4.m %%%%%%%%% function [s,cout]=bwcell4(a,b,Sin,Cin)

PAGE 76

67 % Cell 4 of the Baugh Wooley Multiplier c=~a & b; [s,cout]=fulladder(c,Sin,Cin); return %%%%%%%% END %%%%%%%%% %%%%%%%% bwcell5.m %%%%%%%%% function [s,cout]=bwcell5(x,y) % Cell 5 of the Baugh Wooley Multiplier b=x & y; c=~x; d=~y; [s,cout]=fulladder(b,c,d); return %%%%%%%% END %%%%%%%%% %%%%%%%% circconv.m %%%%%%%%% function z=circconv(a,b) n=length(a); z=zeros(size(a)); if (n~=length(b)) ERROR('Unequal Vector lengths in the circular convolution argument'); end x=1; y=1; for i=1:n z(i)=0; for j=1:n if(x>n) x=x-n; end if (y<1) y=y+n; end z(i)=z(i)+a(x)*b(y); x=x+1; y=y-1; end y=y+1; end return %%%%%%%% END %%%%%%%%% %%%%%%%% com32.m %%%%%%%%% function [b]=com32(a) %function [b]=com32(a)

PAGE 77

68 % returns the 16-bit complement of a number represented by the array 'a' % for example, % if a=[1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 1 0 1 1 0 1 1 0 1 1 0 0 0 1 0 1] % then b =[0 1 1 0 0 1 1 1 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 1 1 0 1 1] tmp=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]; tmp1=~a; [tmp3]= add40(tmp1,tmp); b=tmp3(2:33); return %%%%%%%% END %%%%%%%%% %%%%%%%% cplxmul16.m %%%%%%%%% function [pr,pi]=cplxmul16( xr,xi,yr,yi,clk,resetn) tmp0=pipemul16 (xr,yr,clk,resetn); tmp1=pipemul16 (xr,yi,clk,resetn); tmp2=pipemul16 (xi,yr,clk,resetn); tmp3=pipemul16 (xi,yi,clk,resetn); pra=sub40(tmp0,tmp3); pia=add40(tmp2,tmp1); pr=pra(1:32); pi=pia(1:32); return %%%%%%%% END %%%%%%%%% %%%%%%%% czt.m %%%%%%%%% N=4; ROM1=zeros(size(cplx)); ROM0=zeros(size(cplx)); sqr=zeros(size(cplx)); hi=zeros(size(cplx)); hr=zeros(size(cplx)); for i=1:N ROM0(i)=cos(pi*(i-1)*(i-1)/N); ROM1(i)=-sin(pi*(i-1)*(i-1)/N); hr(i)=ROM0(i); hi(i)=-ROM1(i); end g0= cplx .* ROM0; g1= cplx .* ROM1; hi c=1; o1=circconv(hr,g0); o2=circconv(hi,g0); o3=circconv(hr,g1); o4=circconv(hi,g1); a0=o1-o4;

PAGE 78

69 a1=o2+o3; s1=a0 .* a0; s2=a1 .* a1; a3=s1+s2; sqr=sqrt(a3); %%%%%%%% END %%%%%%%%% %%%%%%%% deformat.m %%%%%%%%% function [deformed]=deformat(formatin) % Function reads the out put of the FFT at the receiver and deformats it into % its constituent symbols. len=length(formatin); deformed=[real(formatin(1))+j*imag(formatin(len/2))]; for i=2:len/2 deformed=[deformed; formatin(i)]; end return %%%%%%%% END %%%%%%%%% %%%%%%%% fircos2.m %%%%%%%%% function [z0,z1,z2,z3]=fircos2(x0,x1,x2,x3); y0 = [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 4000h --> 1 y1 = [0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0]; % 3b20 --> 0.9239 p0=bwm16a(x0,y0); p1=bwm16a(x1,y1); z0=add16(p0,p1); q0=bwm16a(x0,y1); q1=bwm16a(x1,y0); z1=add16(q0,q1); return %%%%%%%% END %%%%%%%%% %%%%%%%% fircos4.m %%%%%%%%%

PAGE 79

70 function [z0,z1,z2,z3]=fircos4(x0,x1,x2,x3); %y0 = [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 4000h --> 1 y1 = [0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; % 2d41 --> 0.7071 y3 = [1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % c000 --> -1 y2 = [0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; % 2d41 --> 0.7071 p0=shrega(x0); p1=bwm16a(x1,y3); p2=bwm16a(x2,y2); p3=bwm16a(x3,y1); pr01=add16(p0,p1); pr23=add16(p2,p3); z0=add16(pr01,pr23); q0=bwm16a(x0,y1); q1=shrega(x1); q2=bwm16a(x2,y3); q3=bwm16a(x3,y2); qr01=add16(q0,q1); qr23=add16(q2,q3); z1=add16(qr01,qr23); r0=bwm16a(x0,y2); r1=bwm16a(x1,y1); r2=shrega(x2); r3=bwm16a(x3,y3); rr01=add16(r0,r1); rr23=add16(r2,r3); z2=add16(rr01,rr23); s0=bwm16a(x0,y3); s1=bwm16a(x1,y2); s2=bwm16a(x2,y1); s3=shrega(x3); sr01=add16(s0,s1); sr23=add16(s2,s3); z3=add16(sr01,sr23);

PAGE 80

71 return %%%%%%%% END %%%%%%%%% %%%%%%%% fircos8.m %%%%%%%%% function [z0,z1,z2,z3,z4,z5,z6,z7]=fircos8(x0,x1,x2,x3,x4,x5,x6,x7); y0 = [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 4000h --> 1 y1 = [0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0]; % 3b20 --> 0.9239 y3 = [1 1 0 0 0 1 0 0 1 1 0 1 1 1 1 1]; % c4df --> -0.9239 y2 = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 0000 --> 0 y4 = [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 4000h --> 1 y5 = [1 1 0 0 0 1 0 0 1 1 0 1 1 1 1 1]; % c4df --> -0.9239 y6 = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 0000 --> 0 y7 = [0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0]; % 3b20 --> 0.9239 p0=bwm16a(x0,y0); p1=bwm16a(x1,y7); p2=bwm16a(x2,y6); p3=bwm16a(x3,y5); p4=bwm16a(x4,y4); p5=bwm16a(x5,y3); p6=bwm16a(x6,y2); p7=bwm16a(x7,y1); pr01=add16(p0,p1); pr23=add16(p2,p3); pr45=add16(p4,p5); pr67=add16(p6,p7); pri03=add16(pr01,pr23); pri47=add16(pr45,pr67); z0=add16(pri03,pri47); q0=bwm16a(x0,y1); q1=bwm16a(x1,y0); q2=bwm16a(x2,y7); q3=bwm16a(x3,y6); q4=bwm16a(x4,y5); q5=bwm16a(x5,y4); q6=bwm16a(x6,y3); q7=bwm16a(x7,y2); qr01=add16(q0,q1);

PAGE 81

72 qr23=add16(q2,q3); qr45=add16(q4,q5); qr67=add16(q6,q7); qri03=add16(qr01,qr23); qri47=add16(qr45,qr67); z1=add16(qri03,qri47); r0=bwm16a(x0,y2); r1=bwm16a(x1,y1); r2=bwm16a(x2,y0); r3=bwm16a(x3,y7); r4=bwm16a(x4,y6); r5=bwm16a(x5,y5); r6=bwm16a(x6,y4); r7=bwm16a(x7,y3); rr01=add16(r0,r1); rr23=add16(r2,r3); rr45=add16(r4,r5); rr67=add16(r6,r7); rri03=add16(rr01,rr23); rri47=add16(rr45,rr67); z2=add16(rri03,rri47); s0=bwm16a(x0,y3); s1=bwm16a(x1,y2); s2=bwm16a(x2,y1); s3=bwm16a(x3,y0); s4=bwm16a(x4,y7); s5=bwm16a(x5,y6); s6=bwm16a(x6,y5); s7=bwm16a(x7,y4); sr01=add16(s0,s1); sr23=add16(s2,s3); sr45=add16(s4,s5); sr67=add16(s6,s7); sri03=add16(sr01,sr23); sri47=add16(sr45,sr67); z3=add16(sri03,sri47); t0=bwm16a(x0,y4); t1=bwm16a(x1,y3);

PAGE 82

73 t2=bwm16a(x2,y2); t3=bwm16a(x3,y1); t4=bwm16a(x4,y0); t5=bwm16a(x5,y7); t6=bwm16a(x6,y6); t7=bwm16a(x7,y5); tr01=add16(t0,t1); tr23=add16(t2,t3); tr45=add16(t4,t5); tr67=add16(t6,t7); tri03=add16(tr01,tr23); tri47=add16(tr45,tr67); z4=add16(tri03,tri47); u0=bwm16a(x0,y5); u1=bwm16a(x1,y4); u2=bwm16a(x2,y3); u3=bwm16a(x3,y2); u4=bwm16a(x4,y1); u5=bwm16a(x5,y0); u6=bwm16a(x6,y7); u7=bwm16a(x7,y6); ur01=add16(u0,u1); ur23=add16(u2,u3); ur45=add16(u4,u5); ur67=add16(u6,u7); uri03=add16(ur01,ur23); uri47=add16(ur45,ur67); z5=add16(uri03,uri47); v0=bwm16a(x0,y6); v1=bwm16a(x1,y5); v2=bwm16a(x2,y4); v3=bwm16a(x3,y3); v4=bwm16a(x4,y2); v5=bwm16a(x5,y1); v6=bwm16a(x6,y0); v7=bwm16a(x7,y7); vr01=add16(v0,v1); vr23=add16(v2,v3); vr45=add16(v4,v5); vr67=add16(v6,v7); vri03=add16(vr01,vr23); vri47=add16(vr45,vr67); z6=add16(vri03,vri47);

PAGE 83

74 w0=bwm16a(x0,y7); w1=bwm16a(x1,y6); w2=bwm16a(x2,y5); w3=bwm16a(x3,y4); w4=bwm16a(x4,y3); w5=bwm16a(x5,y2); w6=bwm16a(x6,y1); w7=bwm16a(x7,y0); wr01=add16(w0,w1); wr23=add16(w2,w3); wr45=add16(w4,w5); wr67=add16(w6,w7); wri03=add16(wr01,wr23); wri47=add16(wr45,wr67); z7=add16(wri03,wri47); return %%%%%%%% END %%%%%%%%% %%%%%%%% firsin2.m %%%%%%%%% function [z0,z1,z2,z3]=firsin2(x0,x1,x2,x3); y0 = [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 4000h --> 1 y1 = [0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0]; % 3b20 --> 0.9239 p0=bwm16a(x0,y0); p1=bwm16a(x1,y1); z0=add16(p0,p1); q0=bwm16a(x0,y1); q1=bwm16a(x1,y0); z1=add16(q0,q1); return %%%%%%%% END %%%%%%%%% %%%%%%%% firsin4.m %%%%%%%%% function [z0,z1,z2,z3]=firsin4(x0,x1,x2,x3); y1 = [0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; % 2d41h --> 0.7071 y3 = [0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; % 2d41 --> 0.7071

PAGE 84

75 p0=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; p1=bwm16a(x1,y3); p2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; p3=bwm16a(x3,y1); pr01=add16(p0,p1); pr23=add16(p2,p3); z0=add16(pr01,pr23); q0=bwm16a(x0,y1); q1=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; q2=bwm16a(x2,y3); q3=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; qr01=add16(q0,q1); qr23=add16(q2,q3); z1=add16(qr01,qr23); r0=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; r1=bwm16a(x1,y1); r2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; r3=bwm16a(x3,y3); rr01=add16(r0,r1); rr23=add16(r2,r3); z2=add16(rr01,rr23); s0=bwm16a(x0,y3); s1=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; s2=bwm16a(x2,y1); s3=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; sr01=add16(s0,s1); sr23=add16(s2,s3); z3=add16(sr01,sr23); return %%%%%%%% END %%%%%%%%% %%%%%%%% firsin8.m %%%%%%%%%

PAGE 85

76 function [z0,z1,z2,z3,z4,z5,z6,z7]=firsin8(x0,x1,x2,x3,x4,x5,x6,x7); y0 = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 0000h --> 0 y1 = [0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 1]; % 187d --> 0.3827 y3 = [1 1 1 0 0 1 1 1 1 0 0 0 0 0 1 0]; % e782 --> -0.9239 y2 = [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 4000 --> 1 y4 = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 0000h --> 0 y5 = [1 1 1 0 0 1 1 1 1 0 0 0 0 0 1 0]; % e782 --> -0.9239 y6 = [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 4000 --> 1 y7 = [0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 1]; % 187d --> 0.3827 p0=bwm16a(x0,y0); p1=bwm16a(x1,y7); p2=bwm16a(x2,y6); p3=bwm16a(x3,y5); p4=bwm16a(x4,y4); p5=bwm16a(x5,y3); p6=bwm16a(x6,y2); p7=bwm16a(x7,y1); pr01=add16(p0,p1); pr23=add16(p2,p3); pr45=add16(p4,p5); pr67=add16(p6,p7); pri03=add16(pr01,pr23); pri47=add16(pr45,pr67); z0=add16(pri03,pri47); q0=bwm16a(x0,y1); q1=bwm16a(x1,y0); q2=bwm16a(x2,y7); q3=bwm16a(x3,y6); q4=bwm16a(x4,y5); q5=bwm16a(x5,y4); q6=bwm16a(x6,y3); q7=bwm16a(x7,y2); qr01=add16(q0,q1); qr23=add16(q2,q3); qr45=add16(q4,q5); qr67=add16(q6,q7); qri03=add16(qr01,qr23); qri47=add16(qr45,qr67); z1=add16(qri03,qri47);

PAGE 86

77 r0=bwm16a(x0,y2); r1=bwm16a(x1,y1); r2=bwm16a(x2,y0); r3=bwm16a(x3,y7); r4=bwm16a(x4,y6); r5=bwm16a(x5,y5); r6=bwm16a(x6,y4); r7=bwm16a(x7,y3); rr01=add16(r0,r1); rr23=add16(r2,r3); rr45=add16(r4,r5); rr67=add16(r6,r7); rri03=add16(rr01,rr23); rri47=add16(rr45,rr67); z2=add16(rri03,rri47); s0=bwm16a(x0,y3); s1=bwm16a(x1,y2); s2=bwm16a(x2,y1); s3=bwm16a(x3,y0); s4=bwm16a(x4,y7); s5=bwm16a(x5,y6); s6=bwm16a(x6,y5); s7=bwm16a(x7,y4); sr01=add16(s0,s1); sr23=add16(s2,s3); sr45=add16(s4,s5); sr67=add16(s6,s7); sri03=add16(sr01,sr23); sri47=add16(sr45,sr67); z3=add16(sri03,sri47); t0=bwm16a(x0,y4); t1=bwm16a(x1,y3); t2=bwm16a(x2,y2); t3=bwm16a(x3,y1); t4=bwm16a(x4,y0); t5=bwm16a(x5,y7); t6=bwm16a(x6,y6); t7=bwm16a(x7,y5);

PAGE 87

78 tr01=add16(t0,t1); tr23=add16(t2,t3); tr45=add16(t4,t5); tr67=add16(t6,t7); tri03=add16(tr01,tr23); tri47=add16(tr45,tr67); z4=add16(tri03,tri47); u0=bwm16a(x0,y5); u1=bwm16a(x1,y4); u2=bwm16a(x2,y3); u3=bwm16a(x3,y2); u4=bwm16a(x4,y1); u5=bwm16a(x5,y0); u6=bwm16a(x6,y7); u7=bwm16a(x7,y6); ur01=add16(u0,u1); ur23=add16(u2,u3); ur45=add16(u4,u5); ur67=add16(u6,u7); uri03=add16(ur01,ur23); uri47=add16(ur45,ur67); z5=add16(uri03,uri47); v0=bwm16a(x0,y6); v1=bwm16a(x1,y5); v2=bwm16a(x2,y4); v3=bwm16a(x3,y3); v4=bwm16a(x4,y2); v5=bwm16a(x5,y1); v6=bwm16a(x6,y0); v7=bwm16a(x7,y7); vr01=add16(v0,v1); vr23=add16(v2,v3); vr45=add16(v4,v5); vr67=add16(v6,v7); vri03=add16(vr01,vr23); vri47=add16(vr45,vr67); z6=add16(vri03,vri47); w0=bwm16a(x0,y7); w1=bwm16a(x1,y6); w2=bwm16a(x2,y5); w3=bwm16a(x3,y4); w4=bwm16a(x4,y3); w5=bwm16a(x5,y2); w6=bwm16a(x6,y1);

PAGE 88

79 w7=bwm16a(x7,y0); wr01=add16(w0,w1); wr23=add16(w2,w3); wr45=add16(w4,w5); wr67=add16(w6,w7); wri03=add16(wr01,wr23); wri47=add16(wr45,wr67); z7=add16(wri03,wri47); return %%%%%%%% END %%%%%%%%% %%%%%%%% getformat.m %%%%%%%%% function [format]=getformat(a) % function gets the input vector a into a format that allows only real transmissions % to be possible in the time domain data. lena=length(a); format=[real(a(1)); a(2:lena); imag(a(1)); conj(a(2:lena))]; return %%%%%%%% END %%%%%%%%% %%%%%%%% icztsim.m %%%%%%%%% function sqr=icztsim(cplx) N=length(cplx); ROM1=zeros(size(cplx)); ROM0=zeros(size(cplx)); sqr=zeros(size(cplx)); hi=zeros(size(cplx)); hr=zeros(size(cplx)); for i=1:N ROM0(i)=cos(pi*(i-1)*(i-1)/N); ROM1(i)=sin(pi*(i-1)*(i-1)/N); hr(i)=ROM0(i); hi(i)=ROM1(i); end g0= cplx .* ROM0; g1= cplx .* ROM1; c=1;

PAGE 89

80 o1=circconv(hr,g0); o2=circconv(hi,g0); o3=circconv(hr,g1); o4=circconv(hi,g1); a0=o1-o4; a1=o2+o3; s1=a0 .* a0; s2=a1 .* a1; a3=s1+s2; sqr=sqrt(a3); return %%%%%%%% END %%%%%%%%% %%%%%%%% muln.m %%%%%%%%% function [C]=muln(A,B) %muln %---%A,B --> inputs (n downto 1) %clk --> input 1 bit %C --> output(2n downto 1) n=length(A); C=zeros(2*n,1); if n==2 C(1) =A(1) & B(1); x=A(2) & B(2); y= A(1) & B(2); z= A(2) & B(1); C(2)= xor(y,z); n=y & z; C(3)=xor(n,x); C(4)=n & x; else p=muln(A(n:-1:n/2+1),B(n:-1:n/2+1)); q=muln(A(n:-1:n/2+1),B(n/2:-1:1)); r=muln(A(n/2:-1:1),B(n:-1:n/2+1)); s=muln(A(n/2:-1:1),B(n/2:-1:1));

PAGE 90

81 tmp=[zeros(1,n/2); s(n:-1:n/2+1)]; tmp2=tmp+r+r; tmp3=[zeros(1,n/2) tmp2(n:-1:n/2+1)]; tmp4=tmp3+p; C=[tmp4 tmp2(n/2:-1:1) s(n/2:-1:1)]; end; return; %%%%%%%% END %%%%%%%%% %%%%%%%% multiplier.m %%%%%%%%% function [C]=muln(A,B) %muln %---%A,B --> inputs (n downto 1) %clk --> input 1 bit %C --> output(2n downto 1) n=length(A); C=zeros(2*n,1); if n==2 then C(1) =A(1) and B(1); x=A(2) and B(2); y= A(1) and B(2); z= A(2) and B(1); C(2)= y xor z; n=y and z; C(3)=n xor x; C(4)=n and x; return; else p=muln(A(n:n/2+1),B(n:n/2+1); q=muln(A(n:n/2+1,B(n/2:1)); r=muln(A(n/2:1),B(n:n/2+1); s=muln(A(n/2:1),B(n/2:1)); tmp=[zeros(1,n/2) s(n:n/2+1)]; tmp2=tmp+r+r; tmp3=[zeros(1,n/2) tmp2(n:n/2+1)]; tmp4=tmp3+p; C=[tmp4 tmp2(n/2:1) s(n/2:1)]; return; end if; %%%%%%%% END %%%%%%%%%

PAGE 91

82 %%%%%%%% pipemul16.m %%%%%%%%% function result=pipemul16(xa1,xb1,clk,resetn) LOW=0; ya=com16(xa1); yb=com16(xb1); if xa1(1)==1 ta1=ya(9:16); ta2=ya(1:8); else ta1=xa1(9:16); ta2=xa1(1:8); end if xb1(1)==1 tb1=yb(9:16); tb2=yb(1:8); else tb1=xb1(9:16); tb2=xb1(1:8); end tmp0_1=pipemul8(ta1,tb1,clk,resetn); tmp0_2=pipemul8(ta2,tb1,clk,resetn); tmp0_3=pipemul8(ta1,tb2,clk,resetn); tmp0_4=pipemul8(ta2,tb2,clk,resetn); tmp1_1=reg16(tmp0_1,resetn); tmp1_2=reg16(tmp0_2,resetn); tmp1_3=reg16(tmp0_3,resetn); tmp1_4=reg16(tmp0_4,resetn); tmp2_2=add1617(tmp1_2,tmp1_3); tmp3_1=reg16(tmp1_1,resetn); tmp3_2=reg17(tmp2_2,resetn); tmp3_3=reg16(tmp1_4,resetn); tc1=[tmp3_2(10:17) zeros(1,8)]; tc2=[zeros(1,7) tmp3_2(1:9)]; tmp4_1=add1617(tmp3_1,tc1); tmp4_2=reg16(tc2,resetn); tmp4_3=reg16(tmp3_3,resetn); tmp5=ad16ca(tmp4_2,tmp4_3,tmp4_1(1)); p=[tmp5(2:17) tmp4_1(2:17)]; p1=com32(p); if xa1(1)==xb1(1) result=p; else result=p1; end

PAGE 92

83 return %%%%%%%% END %%%%%%%%% %%%%%%%% pipemul8.m %%%%%%%%% function p=pipemul8(xa1,xb1,clk,resetn) LOW=0; tb1=xb1(5:8); tb2=xb1(1:4); ta1=xa1(5:8); ta2=xa1(1:4); tmp0_1=bm4(ta1,tb1); tmp0_2=bm4(ta2,tb1); tmp0_3=bm4(ta1,tb2); tmp0_4=bm4(ta2,tb2); tmp1_1=reg8(tmp0_1,resetn); tmp1_2=reg8(tmp0_2,resetn); tmp1_3=reg8(tmp0_3,resetn); tmp1_4=reg8(tmp0_4,resetn); tmp2_2=add89(tmp1_2,tmp1_3); tmp3_1=reg8(tmp1_1,resetn); tmp3_2=reg9(tmp2_2,resetn); tmp3_3=reg8(tmp1_4,resetn); tc1=[tmp3_2(6:9) zeros(1,4)]; tc2=[zeros(1,3) tmp3_2(1:5)]; tmp4_1=add89(tmp3_1,tc1); tmp4_2=reg8(tc2,resetn); tmp4_3=reg8(tmp3_3,resetn); tmp5=ad8ca(tmp4_2,tmp4_3,tmp4_1(1)); p=[tmp5(2:9) tmp4_1(2:9)]; return %%%%%%%% END %%%%%%%%% %%%%%%%% rad2ct2.m %%%%%%%%% function [ore0,oi0,ore1,oi1]=rad2ct2(ir0,ii0,ir1,ii1) %function [ore0,oi0,ore1,oi1]=rad2ct2(ir0,ii0,ir1,ii1) % This function calculates the 2-point FFT of two 32 bit complex inputs % the result is a 2-point 16-bit complex output

PAGE 93

84 tmpre0=add40(ir0,ir1); tmpre1=sub40(ir0,ir1); tmpi0=add40(ii0,ii1); tmpi1=sub40(ii0,ii1); ore0=tmpre0(1:16); oi0=tmpi0(1:16); ore1=tmpre1(1:16); oi1=tmpi1(1:16); return %%%%%%%% END %%%%%%%%% %%%%%%%% rad2ct2ifft.m %%%%%%%%% function [ore0,oi0,ore1,oi1]=rad2ct2ifft(ir0,ii0,ir1,ii1) %function [ore0,oi0,ore1,oi1]=rad2ct2(ir0,ii0,ir1,ii1) % This function calculates the 2-point iFFT of two 32 bit complex inputs % the result is a 2-point 16-bit complex output tmpre0=add40(ir0,ir1); tmpre1=sub40(ir0,ir1); tmpi0=add40(ii0,ii1); tmpi1=sub40(ii0,ii1); ore0=tmpre0(1:16); oi0=tmpi0(1:16); ore1=tmpre1(1:16); oi1=tmpi1(1:16); return %%%%%%%% END %%%%%%%%% %%%%%%%% rad2ct4.m %%%%%%%%% function [ore0,oi0,ore1,oi1,ore2,oi2,ore3,oi3]=rad2ct4(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3) %function [ore0,oi0,ore1,oi1]=rad2ct2(ir0,ii0,ir1,ii1) % This function calculates the 2-point FFT of two 32 bit complex inputs % the result is a 2-point 16-bit complex output % If the input is in the notation 32.32-x(32 bit input, x integer bits) then the output would be in the form % 32.32-x-2 i.e., there would be a 4 bit shift in the output implying 2 bit precision is lost.

PAGE 94

85 %%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% Statistics %%%% %%%%%%%%%%%%%%%%%%%%%%%%%%% % Statistics collected over 100 iterations %Error_mean = -0.0001 + 1.0655i %Error_variance = 5.4781 %Avg_no_of_magnitude_errors = 0 %Avg_no_of_sign_errors = 0.1300 %Avg_no_of_mag_and_sign_errors = 0 %Avg_no_of_right_results = 3.8700 %Avg_ratio = 1.0001 zero=0; [tmpr0,tmpi0,tmpr2,tmpi2]=rad2ct2(ir0,ii0,ir2,ii2); [tmpr1,tmpi1,tmpr3,tmpi3]=rad2ct2(ir1,ii1,ir3,ii3); tr0=[tmpr0 zeros(1,16)]; tr1=[tmpr1 zeros(1,16)]; tr2=[tmpr2 zeros(1,16)]; tr3=[tmpr3 zeros(1,16)]; ti0=[tmpi0 zeros(1,16)]; ti1=[tmpi1 zeros(1,16)]; ti2=[tmpi2 zeros(1,16)]; ti3=[tmpi3 zeros(1,16)]; [ore0,oi0,ore2,oi2]=rad2ct2(tr0,ti0,tr1,ti1); [tr3c]=com32(tr3); [ore1,oi1,ore3,oi3]=rad2ct2(tr2,ti2,ti3,tr3c); return %%%%%%%% END %%%%%%%%% %%%%%%%% rad2ct4ifft.m %%%%%%%%% function [ore0,oi0,ore1,oi1,ore2,oi2,ore3,oi3]=rad2ct4ifft(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3) %function [ore0,oi0,ore1,oi1]=rad2ct2(ir0,ii0,ir1,ii1) % This function calculates the 2-point FFT of two 32 bit complex inputs % the result is a 2-point 16-bit complex output zero=0; [tmpr0,tmpi0,tmpr2,tmpi2]=rad2ct2(ir0,ii0,ir2,ii2); [tmpr1,tmpi1,tmpr3,tmpi3]=rad2ct2(ir1,ii1,ir3,ii3); tr0=[tmpr0 zeros(1,16)];

PAGE 95

86 tr1=[tmpr1 zeros(1,16)]; tr2=[tmpr2 zeros(1,16)]; tr3=[tmpr3 zeros(1,16)]; ti0=[tmpi0 zeros(1,16)]; ti1=[tmpi1 zeros(1,16)]; ti2=[tmpi2 zeros(1,16)]; ti3=[tmpi3 zeros(1,16)]; [ore0,oi0,ore2,oi2]=rad2ct2(tr0,ti0,tr1,ti1); [ti3c]=com32(ti3); [ore1,oi1,ore3,oi3]=rad2ct2(tr2,ti2,ti3c,tr3); return %%%%%%%% END %%%%%%%%% %%%%%%%% rad2ct8.m %%%%%%%%% function [ore0,oi0,ore1,oi1,ore2,oi2,ore3,oi3,ore4,oi4,ore5,oi5,ore6,oi6,ore7,oi7]=rad2ct8(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7) %function [ore0,oi0,ore1,oi1]=rad2ct2(ir0,ii0,ir1,ii1) % this function calculates the 2-point FFt of eight 32 bit complex inputs % the result is a 8-point 16-bit complex output %Error_mean = -0.2844 + 0.7659i %Error_variance = 6.0012 %Avg_no_of_magnitude_errors = 0 %Avg_no_of_sign_errors = 0.0100 %Avg_no_of_mag_and_sign_errors = 7.9900 %Avg_no_of_right_results = 0 %Avg_ratio = 2.0006 [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3]=rad2ct4(ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6); [tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4(ir1,ii1,ir3,ii3,ir5,ii5,ir7,ii7); [tra0]=shrega(zeropad(tr0)); [tia0]=shrega(zeropad(ti0)); [tra1]=shrega(zeropad(tr1)); [tia1]=shrega(zeropad(ti1)); [tra2]=shrega(zeropad(tr2)); [tia2]=shrega(zeropad(ti2)); [tra3]=shrega(zeropad(tr3)); [tia3]=shrega(zeropad(ti3)); [tra4]=shrega(zeropad(tr4)); [tia4]=shrega(zeropad(ti4));

PAGE 96

87 W8_r1=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; W8_i1=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; W8_r2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; W8_i2=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1]; W8_r3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; W8_i3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; tr5=tr5(1:16); ti5=ti5(1:16); tr6=tr6(1:16); ti6=ti6(1:16); tr7=tr7(1:16); ti7=ti7(1:16); clk=1; resetn=1; [tra5,tia5]=cplxmul16(tr5,ti5,W8_r1,W8_i1,clk,resetn); [tra6,tia6]=cplxmul16(tr6,ti6,W8_r2,W8_i2,clk,resetn); [tra7,tia7]=cplxmul16(tr7,ti7,W8_r3,W8_i3,clk,resetn); [ore0,oi0,ore4,oi4]=rad2ct2(tra0,tia0,tra4,tia4); [ore1,oi1,ore5,oi5]=rad2ct2(tra1,tia1,tra5,tia5); [ore2,oi2,ore6,oi6]=rad2ct2(tra2,tia2,tra6,tia6); [ore3,oi3,ore7,oi7]=rad2ct2(tra3,tia3,tra7,tia7); return %%%%%%%% END %%%%%%%%% %%%%%%%% rad2ct8fft.m %%%%%%%%% function [ore0,oi0,ore1,oi1,ore2,oi2,ore3,oi3,ore4,oi4,ore5,oi5,ore6,oi6,ore7,oi7]=rad2ct8fft(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7) %function [ore0,oi0,ore1,oi1]=rad2ct2(ir0,ii0,ir1,ii1) % this function calculates the 2-point FFt of eight 32 bit complex inputs % the result is a 8-point 16-bit complex output %Error_mean = -0.2844 + 0.7659i %Error_variance = 6.0012 %Avg_no_of_magnitude_errors = 0 %Avg_no_of_sign_errors = 0.0100 %Avg_no_of_mag_and_sign_errors = 7.9900 %Avg_no_of_right_results = 0 %Avg_ratio = 2.0006

PAGE 97

88 check(ir0); check(ii0); check(ir1); check(ii1); check(ir2); check(ii2); check(ir3); check(ii3); check(ir4); check(ii4); check(ir5); check(ii5); check(ir6); check(ii6); check(ir7); check(ii7); [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3]=rad2ct4fft(ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6); [tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4fft(ir1,ii1,ir3,ii3,ir5,ii5,ir7,ii7); [tra0]=shrega(zeropad(tr0)); [tia0]=shrega(zeropad(ti0)); [tra1]=shrega(zeropad(tr1)); [tia1]=shrega(zeropad(ti1)); [tra2]=shrega(zeropad(tr2)); [tia2]=shrega(zeropad(ti2)); [tra3]=shrega(zeropad(tr3)); [tia3]=shrega(zeropad(ti3)); [tra4]=shrega(zeropad(tr4)); [tia4]=shrega(zeropad(ti4)); W8_r1=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; W8_i1=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; W8_r2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; W8_i2=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1]; W8_r3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; W8_i3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; tr5=tr5(1:16); ti5=ti5(1:16); tr6=tr6(1:16);

PAGE 98

89 ti6=ti6(1:16); tr7=tr7(1:16); ti7=ti7(1:16); clk=1; resetn=1; [tra5,tia5]=cplxmul16(tr5,ti5,W8_r1,W8_i1,clk,resetn); [tra6,tia6]=cplxmul16(tr6,ti6,W8_r2,W8_i2,clk,resetn); [tra7,tia7]=cplxmul16(tr7,ti7,W8_r3,W8_i3,clk,resetn); [ore0,oi0,ore4,oi4]=rad2ct2fft(tra0,tia0,tra4,tia4); [ore1,oi1,ore5,oi5]=rad2ct2fft(tra1,tia1,tra5,tia5); [ore2,oi2,ore6,oi6]=rad2ct2fft(tra2,tia2,tra6,tia6); [ore3,oi3,ore7,oi7]=rad2ct2fft(tra3,tia3,tra7,tia7); return %%%%%%%% END %%%%%%%%% %%%%%%%% rad2ct8ifft.m %%%%%%%%% function [ore0,oi0,ore1,oi1,ore2,oi2,ore3,oi3,ore4,oi4,ore5,oi5,ore6,oi6,ore7,oi7]=rad2ct8ifft(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7) % this function calculates the 2-point FFt of eight 32 bit complex inputs % the result is a 8-point 16-bit complex output %Error_mean = -0.2844 + 0.7659i %Error_variance = 6.0012 %Avg_no_of_magnitude_errors = 0 %Avg_no_of_sign_errors = 0.0100 %Avg_no_of_mag_and_sign_errors = 7.9900 %Avg_no_of_right_results = 0 %Avg_ratio = 2.0006 [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3]=rad2ct4ifft(ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6); [tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4ifft(ir1,ii1,ir3,ii3,ir5,ii5,ir7,ii7); [tra0]=shrega(zeropad(tr0)); [tia0]=shrega(zeropad(ti0)); [tra1]=shrega(zeropad(tr1)); [tia1]=shrega(zeropad(ti1)); [tra2]=shrega(zeropad(tr2)); [tia2]=shrega(zeropad(ti2)); [tra3]=shrega(zeropad(tr3)); [tia3]=shrega(zeropad(ti3));

PAGE 99

90 [tra4]=shrega(zeropad(tr4)); [tia4]=shrega(zeropad(ti4)); w8_r1=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; w8_i1=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; w8_r2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w8_i2=[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w8_r3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; w8_i3=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; tr5=tr5(1:16); ti5=ti5(1:16); tr6=tr6(1:16); ti6=ti6(1:16); tr7=tr7(1:16); ti7=ti7(1:16); clk=1; resetn=1; [tra5,tia5]=cplxmul16(tr5,ti5,w8_r1,w8_i1,clk,resetn); [tra6,tia6]=cplxmul16(tr6,ti6,w8_r2,w8_i2,clk,resetn); [tra7,tia7]=cplxmul16(tr7,ti7,w8_r3,w8_i3,clk,resetn); [ore0,oi0,ore4,oi4]=rad2ct2ifft(tra0,tia0,tra4,tia4); [ore1,oi1,ore5,oi5]=rad2ct2ifft(tra1,tia1,tra5,tia5); [ore2,oi2,ore6,oi6]=rad2ct2ifft(tra2,tia2,tra6,tia6); [ore3,oi3,ore7,oi7]=rad2ct2ifft(tra3,tia3,tra7,tia7); return %%%%%%%% END %%%%%%%%% %%%%%%%% rad4ct4.m %%%%%%%%% function [ore0,oi0,ore1,oi1,ore2,oi2,ore3,oi3]=rad4ct4( ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3) tmp0 =add40(ir0,ir1); tmp1 =add40(ir2,ir3); tmp2 =add40(ii0,ii1); tmp3 =add40(ii2,ii3); tmp8 =add40(ir0,ir2); tmp9 =add40(ir1,ir3); tmp10= add40(ii0,ii2); tmp11= add40(ii1,ii3); tmp4 =sub40(ir0,ir2); tmp5 =sub40(ii0,ii2); tmp7 =sub40(ir1,ir3);

PAGE 100

91 tmp6 =sub40(ii1,ii3); oi1 =sub40(tmp5,tmp7); oi2 =sub40(tmp10,tmp11); ore2= sub40(tmp8,tmp9); ore3= sub40(tmp4,tmp6); ore0= add40(tmp0,tmp1); oi0 =add40(tmp2,tmp3); ore1= add40(tmp4,tmp6); oi3 =add40(tmp5,tmp7); return %%%%%%%% END %%%%%%%%% %%%%%%%% rad4ct4ifft.m %%%%%%%%% function [ore0,oi0,ore3,oi3,ore2,oi2,ore1,oi1]=rad4ct4ifft( ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3) tmp0 =add40(ir0,ir1); tmp1 =add40(ir2,ir3); tmp2 =add40(ii0,ii1); tmp3 =add40(ii2,ii3); tmp8 =add40(ir0,ir2); tmp9 =add40(ir1,ir3); tmp10= add40(ii0,ii2); tmp11= add40(ii1,ii3); tmp4 =sub40(ir0,ir2); tmp5 =sub40(ii0,ii2); tmp7 =sub40(ir1,ir3); tmp6 =sub40(ii1,ii3); oi1 =sub40(tmp5,tmp7); oi2 =sub40(tmp10,tmp11); ore2= sub40(tmp8,tmp9); ore3= sub40(tmp4,tmp6); ore0= add40(tmp0,tmp1); oi0 =add40(tmp2,tmp3); ore1= add40(tmp4,tmp6); oi3 =add40(tmp5,tmp7); return %%%%%%%% END %%%%%%%%% %%%%%%%% rad2ct8.m %%%%%%%%%

PAGE 101

92 function [ore0,oi0,ore1,oi1,ore2,oi2,ore3,oi3,ore4,oi4,ore5,oi5,ore6,oi6,ore7,oi7]=rad2ct8(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7) %function [ore0,oi0,ore1,oi1]=rad2ct2(ir0,ii0,ir1,ii1) % this function calculates the 2-point FFt of eight 32 bit complex inputs % the result is a 8-point 16-bit complex output %Error_mean = -0.2844 + 0.7659i %Error_variance = 6.0012 %Avg_no_of_magnitude_errors = 0 %Avg_no_of_sign_errors = 0.0100 %Avg_no_of_mag_and_sign_errors = 7.9900 %Avg_no_of_right_results = 0 %Avg_ratio = 2.0006 [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3]=rad4ct4(ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6); [tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad4ct4(ir1,ii1,ir3,ii3,ir5,ii5,ir7,ii7); [tra0]=shrega(zeropad(tr0)); [tia0]=shrega(zeropad(ti0)); [tra1]=shrega(zeropad(tr1)); [tia1]=shrega(zeropad(ti1)); [tra2]=shrega(zeropad(tr2)); [tia2]=shrega(zeropad(ti2)); [tra3]=shrega(zeropad(tr3)); [tia3]=shrega(zeropad(ti3)); [tra4]=shrega(zeropad(tr4)); [tia4]=shrega(zeropad(ti4)); W8_r1=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; W8_i1=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; W8_r2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; W8_i2=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1]; W8_r3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; W8_i3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; tr5=tr5(1:16); ti5=ti5(1:16); tr6=tr6(1:16); ti6=ti6(1:16); tr7=tr7(1:16); ti7=ti7(1:16); clk=1; resetn=1; [tra5,tia5]=cplxmul16(tr5,ti5,W8_r1,W8_i1,clk,resetn);

PAGE 102

93 [tra6,tia6]=cplxmul16(tr6,ti6,W8_r2,W8_i2,clk,resetn); [tra7,tia7]=cplxmul16(tr7,ti7,W8_r3,W8_i3,clk,resetn); [ore0,oi0,ore4,oi4]=rad2ct2(tra0,tia0,tra4,tia4); [ore1,oi1,ore5,oi5]=rad2ct2(tra1,tia1,tra5,tia5); [ore2,oi2,ore6,oi6]=rad2ct2(tra2,tia2,tra6,tia6); [ore3,oi3,ore7,oi7]=rad2ct2(tra3,tia3,tra7,tia7); return %%%%%%%% END %%%%%%%%% %%%%%%%% rad2ct8ifft.m %%%%%%%%% function [ore0,oi0,ore1,oi1,ore2,oi2,ore3,oi3,ore4,oi4,ore5,oi5,ore6,oi6,ore7,oi7]=rad2ct8ifft(ir0,ii0,ir1,ii1,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7) %function [ore0,oi0,ore1,oi1]=rad2ct2(ir0,ii0,ir1,ii1) % this function calculates the 2-point FFt of eight 32 bit complex inputs % the result is a 8-point 16-bit complex output %Error_mean = -0.2844 + 0.7659i %Error_variance = 6.0012 %Avg_no_of_magnitude_errors = 0 %Avg_no_of_sign_errors = 0.0100 %Avg_no_of_mag_and_sign_errors = 7.9900 %Avg_no_of_right_results = 0 %Avg_ratio = 2.0006 [tr0,ti0,tr1,ti1,tr2,ti2,tr3,ti3]=rad4ct4ifft(ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6); [tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad4ct4ifft(ir1,ii1,ir3,ii3,ir5,ii5,ir7,ii7); [tra0]=shrega(zeropad(tr0)); [tia0]=shrega(zeropad(ti0)); [tra1]=shrega(zeropad(tr1)); [tia1]=shrega(zeropad(ti1)); [tra2]=shrega(zeropad(tr2)); [tia2]=shrega(zeropad(ti2)); [tra3]=shrega(zeropad(tr3)); [tia3]=shrega(zeropad(ti3)); [tra4]=shrega(zeropad(tr4)); [tia4]=shrega(zeropad(ti4)); w8_r1=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; w8_i1=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; w8_r2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; w8_i2=[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0];

PAGE 103

94 w8_r3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0]; w8_i3=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1]; tr5=tr5(1:16); ti5=ti5(1:16); tr6=tr6(1:16); ti6=ti6(1:16); tr7=tr7(1:16); ti7=ti7(1:16); clk=1; resetn=1; [tra5,tia5]=cplxmul16(tr5,ti5,w8_r1,w8_i1,clk,resetn); [tra6,tia6]=cplxmul16(tr6,ti6,w8_r2,w8_i2,clk,resetn); [tra7,tia7]=cplxmul16(tr7,ti7,w8_r3,w8_i3,clk,resetn); [ore0,oi0,ore4,oi4]=rad2ct2ifft(tra0,tia0,tra4,tia4); [ore1,oi1,ore5,oi5]=rad2ct2ifft(tra1,tia1,tra5,tia5); [ore2,oi2,ore6,oi6]=rad2ct2ifft(tra2,tia2,tra6,tia6); [ore3,oi3,ore7,oi7]=rad2ct2ifft(tra3,tia3,tra7,tia7); return %%%%%%%% END %%%%%%%%% %%%%%%%% ramasczt4.m %%%%%%%%% function [b]=ramasczt4(a) % Program to interface my fft engine in place of the standard FFT function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% N=length(a); %%% ! MODIFY ! %%%%% shift=3; %%% ! MODIFY ! %%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% a_re=real(a); a_im=imag(a); a_r=zeros(N,32); a_i=zeros(N,32); for i=1:N a_r(i,:)=convert(a_re(i),32,2); a_i(i,:)=convert(a_im(i),32,2); end

PAGE 104

95 [or1,oi1,or2,oi2,or3,oi3,or4,oi4]=czt4(a_r(1,:),a_r(2,:),a_r(3,:),a_r(4,:)); o_r1=arr2dec(or1,2+shift); o_i1=arr2dec(oi1,2+shift); o_r2=arr2dec(or2,2+shift); o_i2=arr2dec(oi2,2+shift); o_r3=arr2dec(or3,2+shift); o_i3=arr2dec(oi3,2+shift); o_r4=arr2dec(or4,2+shift); o_i4=arr2dec(oi4,2+shift); b=[o_r1+j*o_i1; o_r2+j*o_i2; o_r3+j*o_i3; o_r4+j*o_i4]; return %%%%%%%% END %%%%%%%%% %%%%%%%% ramasfft8.m %%%%%%%%% function [b]=ramasfft8(a) % Program to interface my fft engine in place of the standard FFT function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% N=length(a); %%% ! MODIFY ! %%%%% shift=7; %%% ! MODIFY ! %%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% a_re=real(a); a_im=imag(a); a_r=zeros(N,32); a_i=zeros(N,32); for i=1:N a_r(i,:)=convert(a_re(i),32,2); a_i(i,:)=convert(a_im(i),32,2); end [or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8]=czt8(a_r(1,:),a_r(2,:),a_r(3,:),a_r(4,:),a_r(5,:),a_r(6,:),a_r(7,:),a_r(8,:)); o_r1=arr2dec(or1,2+shift); o_i1=arr2dec(oi1,2+shift); o_r2=arr2dec(or2,2+shift); o_i2=arr2dec(oi2,2+shift);

PAGE 105

96 o_r3=arr2dec(or3,2+shift); o_i3=arr2dec(oi3,2+shift); o_r4=arr2dec(or4,2+shift); o_i4=arr2dec(oi4,2+shift); o_r5=arr2dec(or5,2+shift); o_i5=arr2dec(oi5,2+shift); o_r6=arr2dec(or6,2+shift); o_i6=arr2dec(oi6,2+shift); o_r7=arr2dec(or7,2+shift); o_i7=arr2dec(oi7,2+shift); o_r8=arr2dec(or8,2+shift); o_i8=arr2dec(oi8,2+shift); b=[o_r1+j*o_i1; o_r2+j*o_i2; o_r3+j*o_i3; o_r4+j*o_i4; o_r5+j*o_i5; o_r6+j*o_i6; o_r7+j*o_i7; o_r8+j*o_i8];%o_r9+j*o_i9;o_r10+j*o_i10;o_r11+j*o_i11;o_r12+j*o_i12;o_r13+j*o_i13;o_r14+j*o_i14;o_r15+j*o_i15;o_r16+j*o_i16;o_r17+j*o_i17;o_r18+j*o_i18;o_r19+j*o_i19;o_r20+j*o_i20;o_r21+j*o_i21;o_r22+j*o_i22;o_r23+j*o_i23;o_r24+j*o_i24;o_r25+j*o_i25;o_r26+j*o_i26;o_r27+j*o_i27;o_r28+j*o_i28;o_r29+j*o_i29;o_r30+j*o_i30;o_r31+j*o_i31;o_r32+j*o_i32]; return %%%%%%%% END %%%%%%%%% %%%%%%%% ramasfft.m %%%%%%%%% function [b]=ramasfft(a) % Program to interface my fft engine in place of the standard FFT function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% N=length(a); %%% ! MODIFY ! %%%%% shift=2; %%% ! MODIFY ! %%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% a_re=real(a); a_im=imag(a); a_r=zeros(N,32); a_i=zeros(N,32); for i=1:N a_r(i,:)=convert(a_re(i),32,2); a_i(i,:)=convert(a_im(i),32,2); end [or1,oi1,or2,oi2,or3,oi3,or4,oi4]=rad2ct4(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),

PAGE 106

97 a_r(4,:),a_i(4,:));%,a_r(5,:),a_i(5,:),a_r(6,:),a_i(6,:),a_r(7,:),a_i(7,:),a_r(8,:),a_i(8,:));%,a_r(9,:),a_i(9,:),a_r(10,:),a_i(10,:),a_r(11,:),a_i(11,:),a_r(12,:),a_i(12,:),a_r(13,:),a_i(13,:),a_r(14,:),a_i(14,:),a_r(15,:),a_i(15,:),a_r(16,:),a_i(16,:),a_r(17,:),a_i(17,:),a_r(18,:),a_i(18,:),a_r(19,:),a_i(19,:),a_r(20,:),a_i(20,:),a_r(21,:),a_i(21,:),a_r(22,:),a_i(22,:),a_r(23,:),a_i(23,:),a_r(24,:),a_i(24,:),a_r(25,:),a_i(25,:),a_r(26,:),a_i(26,:),a_r(27,:),a_i(27,:),a_r(28,:),a_i(28,:),a_r(29,:),a_i(29,:),a_r(30,:),a_i(30,:),a_r(31,:),a_i(31,:),a_r(32,:),a_i(32,:)); % 8-point [or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8]=rad2ct8(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),a_r(4,:),a_i(4,:),a_r(5,:),a_i(5,:),a_r(6,:),a_i(6,:),a_r(7,:),a_i(7,:),a_r(8,:),a_i(8,:));%,a_r(9,:),a_i(9,:),a_r(10,:),a_i(10,:),a_r(11,:),a_i(11,:),a_r(12,:),a_i(12,:),a_r(13,:),a_i(13,:),a_r(14,:),a_i(14,:),a_r(15,:),a_i(15,:),a_r(16,:),a_i(16,:),a_r(17,:),a_i(17,:),a_r(18,:),a_i(18,:),a_r(19,:),a_i(19,:),a_r(20,:),a_i(20,:),a_r(21,:),a_i(21,:),a_r(22,:),a_i(22,:),a_r(23,:),a_i(23,:),a_r(24,:),a_i(24,:),a_r(25,:),a_i(25,:),a_r(26,:),a_i(26,:),a_r(27,:),a_i(27,:),a_r(28,:),a_i(28,:),a_r(29,:),a_i(29,:),a_r(30,:),a_i(30,:),a_r(31,:),a_i(31,:),a_r(32,:),a_i(32,:)); % 16-point [or1,oi1,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi9,or10,oi10,or11,oi11,or12,oi12,or13,oi13,or14,oi14,or15,oi15,or16,oi16]=rad2ct16(a_r(1,:),a_i(1,:),a_r(2,:),a_i(2,:),a_r(3,:),a_i(3,:),a_r(4,:),a_i(4,:),a_r(5,:),a_i(5,:),a_r(6,:),a_i(6,:),a_r(7,:),a_i(7,:),a_r(8,:),a_i(8,:),a_r(9,:),a_i(9,:),a_r(10,:),a_i(10,:),a_r(11,:),a_i(11,:),a_r(12,:),a_i(12,:),a_r(13,:),a_i(13,:),a_r(14,:),a_i(14,:),a_r(15,:),a_i(15,:),a_r(16,:),a_i(16,:));%,a_r(17,:),a_i(17,:),a_r(18,:),a_i(18,:),a_r(19,:),a_i(19,:),a_r(20,:),a_i(20,:),a_r(21,:),a_i(21,:),a_r(22,:),a_i(22,:),a_r(23,:),a_i(23,:),a_r(24,:),a_i(24,:),a_r(25,:),a_i(25,:),a_r(26,:),a_i(26,:),a_r(27,:),a_i(27,:),a_r(28,:),a_i(28,:),a_r(29,:),a_i(29,:),a_r(30,:),a_i(30,:),a_r(31,:),a_i(31,:),a_r(32,:),a_i(32,:)); o_r1=arr2dec(or1,2+shift); o_i1=arr2dec(oi1,2+shift); o_r2=arr2dec(or2,2+shift); o_i2=arr2dec(oi2,2+shift); o_r3=arr2dec(or3,2+shift); o_i3=arr2dec(oi3,2+shift); o_r4=arr2dec(or4,2+shift); o_i4=arr2dec(oi4,2+shift); b=[o_r1+j*o_i1 o_r2+j*o_i2 o_r3+j*o_i3 o_r4+j*o_i4];% o_r5+j*o_i5 o_r6+j*o_i6 o_r7+j*o_i7 o_r8+j*o_i8];%o_r9+j*o_i9;o_r10+j*o_i10;o_r11+j*o_i11;o_r12+j*o_i12;o_r13+j*o_i13;o_r14+j*o_i14;o_r15+j*o_i15;o_r16+j*o_i16;o_r17+j*o_i17;o_r18+j*o_i18;o_r19+j*o_i19;o_r20+j*o_i20;o_r21+j*o_i21;o_r22+j*o_i22;o_r23+j*o_i23;o_r24+j*o_i24;o_r25+j*o_i25;o_r26+j*o_i26;o_r27+j*o_i27;o_r28+j*o_i28;o_r29+j*o_i29;o_r30+j*o_i30;o_r31+j*o_i31;o_r32+j*o_i32]; % 8 point b=[o_r1+j*o_i1 o_r2+j*o_i2 o_r3+j*o_i3 o_r4+j*o_i4 o_r5+j*o_i5 o_r6+j*o_i6 o_r7+j*o_i7 o_r8+j*o_i8];%o_r9+j*o_i9;o_r10+j*o_i10;o_r11+j*o_i11;o_r12+j*o_i12;o_r13+j*o_i1

PAGE 107

98 3;o_r14+j*o_i14;o_r15+j*o_i15;o_r16+j*o_i16;o_r17+j*o_i17;o_r18+j*o_i18;o_r19+j*o_i19;o_r20+j*o_i20;o_r21+j*o_i21;o_r22+j*o_i22;o_r23+j*o_i23;o_r24+j*o_i24;o_r25+j*o_i25;o_r26+j*o_i26;o_r27+j*o_i27;o_r28+j*o_i28;o_r29+j*o_i29;o_r30+j*o_i30;o_r31+j*o_i31;o_r32+j*o_i32]; % 16 point b=[o_r1+j*o_i1 o_r2+j*o_i2 o_r3+j*o_i3 o_r4+j*o_i4 o_r5+j*o_i5 o_r6+j*o_i6 o_r7+j*o_i7 o_r8+j*o_i8 o_r9+j*o_i9 o_r10+j*o_i10 o_r11+j*o_i11 o_r12+j*o_i12 o_r13+j*o_i13 o_r14+j*o_i14 o_r15+j*o_i15 o_r16+j*o_i16];%o_r17+j*o_i17;o_r18+j*o_i18;%o_r19+j*o_i19;o_r20+j*o_i20;o_r21+j*o_i21;o_r22+j*o_i22;o_r23+j*o_i23;o_r24+j*o_i24;o_r25+j*o_i25;o_r26+j*o_i26;o_r27+j*o_i27;o_r28+j*o_i28;o_r29+j*o_i29;o_r30+j*o_i30;o_r31+j*o_i31;o_r32+j*o_i32]; % c=fft(a); %b-c' return %%%%%%%% END %%%%%%%%% %%%%%%%% reconvert.m %%%%%%%%% function [decim]=reconvert(hexad1,bits,intebits); % function converts a 16-bit binary STRING(NOT ARRAY) into its equivalent binary number % for example % if b=0000011101001011 (BCD) % then reconvert(b,2) would give a result of 0.1140 fracbits=bits-intebits; a=[0]; digi=zeros(1,bits); a=destring(hexad1); decim=0; factor=2^(-fracbits); for j=bits:-1:2 decim=decim+a(j)*factor; factor=factor*2; end if a(1)==1 decim=decim-2^(intebits-1); end return %%%%%%%% END %%%%%%%%% %%%%%%%% reg16.m %%%%%%%%%

PAGE 108

99 function [q]=reg16(d,resetn) % function [q]=reg16(d,resetn) % Function simulates the behaviour of a 16-bit register % when resetn='0' then q=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] irrespective of d % when resetn='1' then q=d; if resetn==0 q=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]; else q=d; end; return %%%%%%%% END %%%%%%%%% %%%%%%%% romcoef.m %%%%%%%%% for i=1:N % figure(i); plot(incrfac',abs(cumul(:,i)),'r'); end hold off N=4; for n=1:N c(n)=cos(pi*(n-1)*(n-1)/N); disp(c(n)); disp(stringize(convert(c(n),32,2))) end disp('sincoef'); for n=1:N s(n)=sin(pi*(n-1)*(n-1)/N); disp(s(n)); disp(stringize(convert(s(n),32,2))) end %%%%%%%% END %%%%%%%%% %%%%%%%% roufil.m %%%%%%%%% function z=roufil(a,b) z=[]; a b lena=length(a); lenb=length(b); for i=1:lena

PAGE 109

100 i tmp=a .* b z=[z sum(tmp)] a=[a(lena) a(1:lena-1)] end return %%%%%%%% END %%%%%%%%% %%%%%%%% shrega.m %%%%%%%%% function [R0M]=shrega(TR) % Program to right shift a decimal point in a register by two places to the right with sign extension and to extend the word to 32 bit length R0M=[TR(1) TR(1) TR(1) TR(1:29)]; return %%%%%%%% END %%%%%%%%% %%%%%%%% snrvary1.m %%%%%%%%% % This program reads data from a text file 'intext.txt' and sends the file data to the IFFT engine % and may add noise to the transmitted data before receiving it and passing it to the FFT engine % to decipher its output. The output is then written to a file called 'outtext.txt'. % Gather FIDs for input and output files infid= fopen('outtext.txt','r'); outfid=fopen('out.txt','w'); % Noise Measure %snratio=[-20]; % snratio=[-20:0]; lengsnr=length(snratio); % Read Data from input file [indata,incount]=fread(infid,'bit1'); N=4; for g=1:lengsnr % Initialize Outout data outdata=[]; outsymbol=[];

PAGE 110

101 snrvalue=snratio(g) datalen=incount; rema=mod(incount,2*N); if (rema~=0) indata=[indata;zeros(2*N-rema,1)]; datalen=datalen+2*N-rema; end %indata=randint(datalen/N,1); batches=datalen/N; symbollist=[]; % Encoding the input Data... for i=0:batches-1 symbollist=[symbollist; getsymbol(indata(N*i+1:N*(i+1)))]; end for i=0:batches/2-1 % i z=getformat(symbollist(N*i+1:N*(i+1/2))); transmit=ramasifft4(z); powx=sum(abs(transmit.*transmit)); % Power of the transmitted window of data pownoise=powx 10 ^ ( snratio(g) / 10 ); % Noise power calculation noise=rand(N,1); inputnoise=pownoise noise; receiverinput=transmit+inputnoise; receive=ramasfft4(receiverinput); p=deformat(receive); outsymbol=[outsymbol; p]; z=getformat(symbollist(N*(i+1/2)+1:N*(i+1))); transmit=ramasifft4(z); powx=sum(abs(transmit.*transmit)); % Power of the transmitted window of data pownoise=powx 10 ^ ( snratio(g) / 10 ); % Noise power calculation noise=rand(N,1); inputnoise=pownoise noise;

PAGE 111

102 receiverinput=transmit+inputnoise; receive=ramasfft4(receiverinput); p=deformat(receive); outsymbol=[outsymbol; p]; end % finding the nearest point in the given constellation lenoutsymlist=length(outsymbol); symout=zeros(lenoutsymlist,1); for k=1:lenoutsymlist symout(k)=getpoint4qam(outsymbol(k)); end % Decoding the input Data... for i=0:batches/2-1 y=symout(N*i+1:N*(i+1)); outdata=[outdata; getnumber(y)]; end lenoutdata=length(outdata); minim=min(lenoutdata,datalen); [numberoferrors(g),ber(g)]=biterr(abs(indata(1:minim)),abs(outdata(1:minim))); end save fft_snr_vary_r4c4 count=fwrite(outfid,outdata,'bit1'); st=fclose('all'); %plot(points,ber); %%%%%%%% END %%%%%%%%% %%%%%%%% sub16.m %%%%%%%%% function [S]=sub16(a,b) % This function calculates the difference between two 16-bit numbers and gives the % result as a 16 bit word avoiding an overflow. So one bit of precision is lost in the % process. T=com(b); test=xor(a(1),b(1)); testbar=~test;

PAGE 112

103 [tmp,cout]=ad16c(T,a); term1=test & cout; term2=testbar & tmp(1); shiftbit=term1 | term2; S=[shiftbit tmp]; n=length(S); S=[S zeros(1,40-n)]; S=S(1:40); return %%%%%%%% END %%%%%%%%% %%%%%%%% sub40.m %%%%%%%%% function [S]=sub40(a,b) % This function calculates the difference between two 32-bit numbers and gives the % result as a 32 bit word avoiding an overflow. So no precision is lost in the % process. T=com32(b); S=add40(a,T); returnN=8; inr=rand(1,N); ini=rand(1,N); cplx=inr+j*ini; cplxfft=fft(cplx); test=ramasfft(cplx); cplxout=fft(cplxifft); testout=ramasfft(test); plot(abs(testout),'kd'); hold on; plot(abs(testout),'k'); plot(abs(cplxout),'r'); hold off; mean(test-cplxfft) max(test-cplxfft) mean(testout-cplxout)

PAGE 113

104 max(testout-cplxout)for w=0:0.05:1 wp=convert(w,16,2); wn=convert(-w,16,2); w corrcoef(wp,wn) end %%%%%%%% END %%%%%%%%% %%%%%%%% zeropad %%%%%%%%% function [z]=zeropad(a) % pads the given input to 32 bits by appending zeros to fill the element z=[a zeros(1,32-length(a))]; return %%%%%%%% END %%%%%%%%%

PAGE 114

105 LIST OF REFERENCES [1] Prasad, Ramjee; Richard Van Nee, OFDM Wireless Multimedia Communications, Artech House, Boston, 2000. [2] Chu, Eleanor; Alan George, Inside the FFT Blackbox, CRC Press, Boca Raton, 2000. [3] Proakis, John G.; Dimitris G. Manolakis, Digital Signal Processing, Prentice Hall of India Private Limited, New Delhi, 2000. [4] Burrus, C.S. and T.W.Parks, DFT/FFT and Convolution Algorithms Theory and Implementation, John Wiley & Sons, New York, 1985. [5] Taylor, Fred and Jon Mellot, Hands-On Digital Signal Processing, McGraw-Hill, New York, 1998. [6] Brown, Stephen and Zvonko Vranesic, Fundamentals of Digital Logic with VHDL Design, McGraw-Hill, New York, 2000 [7] Bellaouar, Abdellatif and Mohamed Ielmasry, Low Power Digital VLSI Design: Circuits and Systems, Kluwer Publishers, Norwell, 1995. [8] Altera Corporation, “Altera Corporation: The Programmable Solutions Company,” 1995-2002, link: www.altera.com, July 3, 2002 [9] Xilinx Inc, “Xilinx: Programmable Logic Devices, FPGA & CPLD,” 1994-2002, link: www.xilinx.com, July 3, 2002. [10] Salomon, O.; J. M. Green; H. Klar, “General Algorithms for a Simplified Addition of 2’s Complement Numbers,” IEEE Journal of Solid State Circuits, Vol. 30, No.7, July 1995, pp. 839-844. [11] Kraniauskas, Peter, ”A Plain Man’s Guide To The FFT,” IEEE Signal Processing Magazine, April 1994, pp. 24-35. [12] Ochiai, Hideki; Hideki Imai, “On Clipping for Peak Power Reduction of OFDM Signals,” Global Telecommunications Conference, 2000. GLOBECOM ’00. IEEE, Vol. 2, 2000, pp. 7311-735. [13] Zhao, Yuping; Sven-Gustav Haggman, “BER Analysis of OFDM Communication Systems with Intercarrier Interference,” International Conference on Communication Technology, ICCT ’98 October 22-24, 1998, Beijing, China, pp. S38-02-1 – S38-02 –5. [14] Wu, Yiyan; William Y. Zou, “Orthogonal Frequency Division Multiplexing: A Multi-Carrier Modulation Scheme,” IEEE Transactions on Consumer Electronics, Vol. 41, No.3, August 1995, pp. 392-399.

PAGE 115

106[15] Li, Xiaodong; Leonard J. Cimini, “Effects of Clipping and Filtering on the Performance of OFDM,” IEEE Communications Letters, Vol. 2, No. 5, May 1998, pp. 131-133. [16] Oliver, William D., ”The Singing Tree: A Novel Interactive Musical Interface,” Master’s Thesis, Massachusetts Institute of Technology, 1997. [17] Oraintara, Soontorn; Ying-Jui Chen; Truong Q. Nguyen, “Integer Fast Fourier Transform,” IEEE Transactions On Signal Processing, Vol. 50, No. 3, March 2002, pp. 607-618. [18] Waggener, Bill, Pulse Code Modulation Techniques with Applications in Communications and Data Recording, Van Nostrand Reinhold, New York, 1995.

PAGE 116

107 BIOGRAPHICAL SKETCH Rama Krishna Lolla was born on February 15, 1979, at Machillipatnam, Andhra Pradesh, India. He attended Abhyudaya Cooperative Junior College in Hyderabad, Andhra Pradesh, India, and graduated in 1996. He received his Bachelor of Engineering from Birla Institute of Technology, Ranchi, India, in 2000.


Permanent Link: http://ufdc.ufl.edu/UFE0000563/00001

Material Information

Title: Fast Fourier Transform Implementation Using Field Programmable Gate Array Technology for Orthogonal Frequency Division Multiplexing Systems
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0000563:00001

Permanent Link: http://ufdc.ufl.edu/UFE0000563/00001

Material Information

Title: Fast Fourier Transform Implementation Using Field Programmable Gate Array Technology for Orthogonal Frequency Division Multiplexing Systems
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0000563:00001


This item has the following downloads:


Full Text











FAST FOURIER TRANSFORM IMPLEMENTATION USING FIELD
PROGRAMMABLE GATE ARRAY TECHNOLOGY FOR ORTHOGONAL
FREQUENCY DIVISION MULTIPLEXING SYSTEMS












By

RAMA KRISHNA LOLLA


A THESIS PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE

UNIVERSITY OF FLORIDA


2002




























Copyright 2002

by

RAMA KRISHNA LOLLA






























To
My Family
&
BABA















ACKNOWLEDGMENTS

I would like to extend my thanks to Dr. Fred J. Taylor for his suggestions at all the

stages of the project. This project would not have taken shape without his guidance.

I would like to thank my advisors, Dr. John G. Harris and Dr. John M. Shea, for their

timely suggestions. I am also thankful to my colleagues in the High Speed Digital

Architecture Laboratory for their support.

I would also like to acknowledge the continuous support my family has given me

during the course of my work.
















TABLE OF CONTENTS
page

A C K N O W L E D G M E N T S ........................................................................ .................... iv

LIST O F TA BLE S ................. ..................................................... ...... ......vii

LIST OF FIGURES ........... .................. ........................ ............ viii

A B S T R A C T ................................ix.............................

CHAPTER

1 IN TRODU CTION .................. ................................ ................. .... .... .....

O F D M O v erv iew ....................................................................... 1
FF T A lgorithm s E explored ........................................................................................... 2
Thesis Organization. .................. .... ........ ............. 3

2 OFDM THEORY AND IMPLEMENTATION......................... .............. 4

D description of the W wireless C hannel.......................................................................... ... 4
H history of O FD M .................................................................. .............................. 6

3 ALGORITHM THEORY AND DESCRIPTION.................................. 9

C o oley -T u k ey A lg orith m .................................................................................................... 9
Com plexity A analysis ............................................ .... ........ .. ........ .. 12
R adix-2 A lgorithm ......... ............................ ...... ... ... ..... .......... 13
R adix-4 A lgorithm ......... ............................ ...... ... ... ..... .......... 15
C h irp -z A lg o rith m .................................................................................................... 17

4 FIELD PROGRAMMABLE GATE ARRAYS..................................................... 24

P ow er C calculations in F P G A s ........................................................................................... 27
C osts Involved in FPG A Fabrication .................................................................... ...... 27
Com prison to other Technologies ........................................................ ......... ..... 28

5 IMPLEMENTATION DETAILS AND RESULTS .............................................. 29

D description of the W ork ........................................................ .................... 29
D description of Tools U sed...................... .......................................... .......................... 32









R results and C conclusions ..................................................... ........ .. .......... 33
P ow er C alcu nation s............................ ...... ............ ............................. .... ............... 3 5
N oise Tolerance........................................... .............. 37
D directions of Future W ork ............................................................ .... .......... .... 40

APPENDIX

A 16-BIT COOLEY-TUKEY IMPLEMENTATION .................................................. 41

B 32-BIT COOLEY-TUKEY AND CHIRP-Z IMPLEMENTATION ......................... 64

L IST O F R E F E R E N C E S ................................................................................................ 105

BIOGRAPH ICAL SKETCH .................................................. ............. 107
















LIST OF TABLES


Table page

3.1 Time-domain index n resolved in term s of nl and n2................................................. 11

3.2 Resolution of the frequency domain index k ......................................................... 11

4.1 Truth table of the function implemented in Figure (4.3)........................................... 26

5.1 Radix-2 Cooley-Tukey implementation with round off errors.............................. 33

5.2 Radix-4 Cooley Tukey implementation with round off errors............... .............. 33

5.3 Radix-2 Cooley Tukey implementation without round off errors............................ 34

5.4 Radix-4 Cooley-Tukey implementation without round off errors. .......................... 34

5.5 Power calculations for Radix-2 8-point FFT.......................................................... 36
















LIST OF FIGURES


Figure page

2 .1 M u ltip ath P rop ag ation ........................................................... .................................. 5

2.2 General Block Diagram of an OFDM communication system........ ............... 6

3.1 Cooley-Tukey Algorithm Implementation..................... ....... .............. 12

3.2 Radix-2 repetitive unit.............................................. 14

3.3 Implementation of a Radix-2 8-point FFT unit ....................................................... 15

3.4 R adix-4 basic block .................. ................................. ... ........ ............. 16

3 .5 C hirp -z im plem entation .................................... ................................................... 2 1

3.6 C hirp Signal ........................................................................................................... 2 1

3.7 Phase response of the Chirp Signal shown in Figure 3.6.......................................... 22

4.1 General structure of an FPGA .............................................................. ... 24

4.2 Program m able Interconnection Switch.............................................. ................. 25

4.3 A 3-input LU T im plem entation .............. .......................................................... 25

5.1 Implementation of Multipliers (a) shows the initial truncating configuration and
Figure (b) shows the truncation operation after one more level of processing..... 31

5.2 N -by-N -bit Pipelined M ultiplier ............................................ ......................... 31

5.3 M odel used in the Thesis w ork .............. ........................................................... 33

5.4 BER variations against SNR for an internal bus width of 16.................................... 37

5.5 BER variations against SNR for an internal bus width of 32............... .............. 38

5.6 BER variations against SNR: Comparison of floating point results with modeled
Radix -2 and Radix -4 8 point FFTs with 16- and 32-bit internal bus width....... 38















Abstract of Thesis Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Science

FAST FOURIER TRANSFORM IMPLEMENTATION USING FIELD
PROGRAMMABLE GATE ARRAY TECHNOLOGY FOR ORTHOGONAL
FREQUENCY DIVISION MULTIPLEXING SYSTEMS

By

Rama Krishna Lolla

December 2002


Chair: Dr. Fred J. Taylor
Major Department: Electrical and Computer Engineering

Orthogonal Frequency Division Multiplexing (OFDM) is an emerging multi-carrier

technique, which uses Fast Fourier Transforms (FFTs) to modulate the data onto sets of

orthogonal frequencies. The core operation in the OFDM systems is the FFT unit that

consumes a large amount of resources and power. The goal of this thesis was to study

better implementation structures for the FFT. The Radix-2 and Radix-4 implementations

of the Cooley-Tukey algorithm and the Chirp-z algorithm were implemented using the

Field Programmable Gate Array (FPGA) technology. Twos complement numbering

system was used in the designs, and their performance was judged on the basis of their

implementation complexity and amount of power consumed for implementation.














CHAPTER 1
INTRODUCTION

Orthogonal Frequency Division Multiplexing (OFDM) is an emerging Multi-carrier

technique, which uses FFTs to modulate the data onto sets of orthogonal frequencies.

Orthogonality enables the frequencies to overlap while still maintaining statistical

independence. The transmitter uses an IFFT to convert the "frequency domain" data into

the "time domain" and the received signals are converted back into the "frequency

domain" by using an FFT at the receiver. An IFFT is similar in structure to the FFT, the

differences being the twiddle factors in each being the complex conjugates of other [1].

This core operation is often the limiting technology when it comes to the power

consumed for its implementation. The objective of this thesis is to study the

implementations of Cooley-Tukey and Chirp-z FFT algorithms onto FPGA technology to

arrive at a low power, low latency configuration.

OFDM Overview

OFDM efficiently overcomes the problems that plague most wireless channels. Multi-

path propagation is a serious hazard that introduces delay spread accounting for multiple

copies of the transmitted signal to reach the receiver. This causes energy of one symbol

of information to spill onto several successive symbols. This phenomenon is called Inter

Symbol Interference (ISI). OFDM reduces ISI through several simultaneous

transmissions, thus making it possible to have an increase in the transmission time for

each symbol. OFDM moves the equalization operation to the frequency domain instead

of time domain as in the case of single carrier systems.









The OFDM implementation also has excellent ICI performance. Using the FFT, which

uses bands of frequencies that are the harmonics of the fundamental frequency band, does

this. This ensures minimum cross talk between the sub-carriers thereby reducing the ICI.

This does not require the phase lock of the local oscillators. These properties of an

OFDM system are the much sought after solutions to combat the delay spread in the

wireless environment [1].

FFT Algorithms Explored

The Cooley-Tukey algorithm formulates an efficient way to reduce the usage the

number of complex multiplications. The algorithm allows for configuring the design in

more than one way based on the fundamental repetitive unit used for implementing the

longer point FFTs [2-5]. Two such configurations are explored in this thesis. They are the

Radix-2 (grouping in units of 2) and Radix-4 (grouping in units of 4) Cooley-Tukey

implementations.

The Chirp-Z algorithm provides for greater frequency resolution that is independent of

the sampling rate. Spectral resolution is greatly improved by mapping the contour closer

to the poles in the z-domain. In the limiting case, when the contour chosen is a unit circle,

the results are a perfect match with the Cooley-Tukey algorithm. Its implementation has

blocks of circular convolutions that are usually implemented in CCDs [2-5]. An attempt

has been made in this thesis work to assemble the circular convolution blocks onto

FPGAs.

The modules in this work were compiled onto Altera FLEX10KE family of devices

using Altera MAX+PLUS II software. These FPGAs give maximum flexibility by

allowing modifications in the designing as well as the testing phase. The FPGAs can









often be used to gain quicker control over the market at a cheaper price. The internal

blocks of the FPGA are usually standardized for each family of device [6-9].

Thesis Organization

Chapter 2 describes the OFDM scheme and the ways it reduces the effects of ISI and

ICI. Chapter 3 discusses the details of the Cooley-Tukey and the Chirp-z algorithms for

the implementation of FFTs. Chapter 4 discusses the FPGA technology and its

comparisons with other technologies. Chapter 5 concludes the thesis work with a detailed

description of the actual work done, the results obtained and the inferences drawn.














CHAPTER 2
OFDM THEORY AND IMPLEMENTATION

Description of the Wireless Channel

The wireless communication channel introduces some non-linearity in the signal.

These nonlinear effects can be modeled in most cases as an filter. The receiver is

assumed to receive multiple copies of the transmitted sequence with different amplitudes

and phases. This is mainly due to the different signal paths (Figure 2.1) and their

associated path losses. The gains distributed along the multiple paths determine the

coefficients of the filtering model of the channel. In addition there are often variations

along a path either due to mobile environment or climatic changes. These effects can be

mitigated to some degree, by increasing the transmitted power. The more the power

generated at the transmitter, the lesser the error probability. A typical power spectrum

consists of some peaks and troughs signifying the distribution of power at various

frequency bands. An ideal communication strategy assigns signals to those channels or

bands having the highest gain. This is called Water Filling Strategy for power allocation

[1].

Multi-path propagation effects cause the signal to spread in time into successive signal

values. This results in Inter-Symbol-Interference (ISI). Here the energy of one symbol

overlaps onto the successive symbols. This is a major concern in the case of single carrier

systems. This overlap causes constructive interference at some instances and destructive

interference at others. Multi-carrier systems attempt to resolve this issue by dividing the

high-speed data stream into several simultaneous transmissions (thus keeping the overall










data rate constant) so that each individual transmission has an increased transmission

time. This reduces the probability of each symbol to be misread at the receiver.

Buildings

Reflection from immobile
S "-s sources including ground reflection

k _/ Line of sight transmission
Receiver

Transmitter Reflections from
mobile sources

Vehicles


Figure 2.1 Multipath Propagation

Orthogonal Frequency Division Multiplexing (OFDM) is one of the prominent and

effective multi carrier transmission techniques. Frequency division multiplexing involves

transmitting of various symbols of information at various frequency bands with some

"guard band" to separate the carriers. OFDM is an advanced technique that eliminates the

use of the guard band even while retaining proper decipherability at the receiver.

Parallel transmission is accomplished by implementing some type of quadrature

amplitude multiplexing (QAM) and then transmitting the data after performing an inverse

Fourier transform. That is an inverse Fourier transform is taken at the transmitter with the

received sequence processed by a FFT. The idea behind taking the inverse Fourier

transform at the transmitter can be motivated as follows. Orthogonality is desired in a

transmitted array data sequence having known frequency, amplitude and phase. If we

assume that if the data is already assumed to be in "frequency domain" at the transmitter,

the IFFT produces a "time domain" array of signals subject to some form frequency,

amplitude and phase restrictions. At the receiver, the "frequency domain" data is regained

by taking the FFT of the received "time-domain" sequence [1].










History of OFDM

An impractical analog implementation of a Fourier transform would involve using

oscillators at the required frequencies. The oscillator drift in analog components is a

major reason for the initial failure of this line of thought. This would cause the carriers to

lose orthogonality and result in a phenomenon called Inter Carrier interference (ICI). The

digital implementations became available, the frequency drift problems were mitigated

and OFDM research once again resumed [1]. The digital implementations almost meet

the orthogonality criteria by remaining at a constant frequency. The block diagram of the

OFDM link is shown in Figure (2.2).

Transmitter Stage




noise
r------------- ^--------------


Cliarn*n
Crh a r n-61
I
Paral I eI i z a IFFT Serialization
St;l.ge de nappi n Stage
J
Receive r Stng

Figure 2.2 General Block Diagram of an OFDM communication system


The quadrature amplitude multiplexing (QAM) stage produces in-phase and

quadrature components that can be fed into the Fourier transform stage as the real and

imaginary parts of the complex input respectively. The Fourier transform can be thought

of as a tool to simultaneously transmit an array of symbols at various frequencies giving

the effect of a filter bank. Each sub-channel acts as a single carrier system and can be

treated as such allowing for some statistical dependence on one another. Single carrier

systems usually have an equalizer in the time domain to nullify the effects of ISI. Multi-









carrier systems like OFDM have the equalizing phenomenon in the frequency domain to

combat the ISI. ISI can be effectively eliminated by adding a "guard interval" at the end

of each symbol, the length of the guard time being greater than the maximum tolerable

delay spread. This reduces the spreading effect of signals onto successive symbols.

Efficient utilization of bandwidth is evident in the simultaneous transmissions at a

different range of frequencies.

For an N-point FFT/IFFT channel, with the N simultaneous transmission the symbol

time can be increased by a factor of N, thus reducing the ISI in the same proportion. The

length of the FFT is however constrained by the exponentially increasing complexity of

design of the transmitter and receiver modules. So the choice of the length of FFT is

usually a trade off between the complexity of implementation and decipherability of data

at the receiver.

Orthogonality is a concept that statistically quantifies the independence among the

components of the skeleton structure for describing any system. The projections of any

signal along the components of the orthogonal system could sufficiently represent the

system. In actual implementations, the information is sent in the shape of since (sin x / x)

pulses so that the frequency domain representation would be rectangular pulses. The since

pulses have nulls at periodic intervals and if the subcarriers are placed at that spacing,

then the maxima of each subcarrier would occur only when all other subcarrier

contributions are zero. The inherent orthogonality in an OFDM system allows the

spectrum to overlap without causing any interference problems. This eliminates the usage

of any steep bandpass filters that are required for other frequency division multiplexing

systems implying lesser implementation complexity. The lack of orthogonality causes









some amount of cross talk between the subcarriers and this phenomenon is called the

Inter Carrier Interference (ICI). This ICI is to be minimized to establish a good

communication link. Orthogonality is introduced into the system by ensuring that when

one the output corresponding to a particular sub-carrier is at its peak, there is minimum

(ideally zero) contribution from the remaining sub-carriers. The sub-carrier spacing is

thus determined by the null-null spacing of each transmission. This forces the correlation

between the sub-carriers to zero.

Thus the OFDM system attempts to reduce the problems of ISI and ICI with very low

implementation complexity. This may not be effectively removed in the single carrier

systems even after equalization. The implementation of the OFDM however brings into

focus the issues of power usage. It is observed that the major power sink in the

transmitter/receiver design was the IFFT/FFT and for low power applications, this issue

must be dealt with extensively. Fortunately, the IFFT/FFT implementations for an

arbitrary N-point implementation vary only in their twiddle factors in most cases. There

are many ways of the FFT implementation. The underlying concept of the DFT and two

algorithms (Cooley-Tukey and Chirp-z algorithms) to implement the FFT were studied as

a part of this thesis work and they were compared in terms of power and latency issues.















CHAPTER 3
ALGORITHM THEORY AND DESCRIPTION

The algorithms used for the purpose of this thesis work were the Cooley Tukey and

the Chirp-z Transform Algorithms. The main purpose of this study was to innovate a

better FFT implementation structure for OFDM applications. The following are

descriptions of the Cooley Tukey algorithm and the Chirp-z algorithm.

Cooley-Tukey Algorithm

Formally a discrete Fourier transform (DFT) is given by equations (3.1) and (3.2) as

Analysis Equation:

N-1
x[n] = -_ X[k]e Vn E [0,N-l] (3.1)
k=O

Synthesis Equation:

N-1
X[k]= Xx[n]e Vk[' [0,N-1] (3.2)
n=O

where, x[n] is the nth sample of an N-element time series and correspondingly, X[k] is

the kth harmonic of an N-point discrete Fourier transform of x[n]. The summations in

both the synthesis and analysis equations exist only if all the values of x and X are

bounded. The DFT assumes periodicity. The multiplying exponential coefficients are of

unit magnitude and can effectively be represented as equally spaced points along a unit

circle mapped in the z-domain according to their phase. More detailed description of the

properties of the DFT equations described above can be found in [2-5]. The

implementation of the equations (3.1) and (3.2) in their canonic form would require










N*(N-1) complex additions and N2 complex multiplications. This can result in a high

implementation complexity and latency.

Cooley and Tukey published their simplifications to this set of equations in 1965 that

took form of the algorithm described below [5]. A complexity reduction is achieved by

breaking down long DFTs into collections of smaller FFTs. The algorithm is described as

follows.

Let N be the number of points in the input sequence. Consider representing N as the

composite number,

N = N N (3.3)

The time domain index n, and the frequency domain index k are also resolved as

n =n2 N1 + n V n, e [0,N1 -1] and n2 [0, N2 -1] (3.4)

k =kiN2 +k,2 k1 E [0, N1 -1] and k, E [0, N2 -1] (3.5)

Substituting these in the DFT Equation (3.2)

N-1
= -j(2[(n2 N1+nl)(klN2+k2 N
[kiN2+k n= x[ +ne (3.6)


From Equation (3.6), the exponential is of the form

W (klN2+k2)(n2Nl+nl) = WT(kln2,NN2) WJ(knllnN2) W(k2n2N) W (k2n,) (3.7)
N ~N WN N N (3.7)

Using the relations,


WI"N= W/m- WkN1N2 -Jkl, = 1 132, ,k,

Equation (3.7) becomes,

W (klN2+k2)(n2Nl+nl) = TW (knl)* (k2n2) W(kn (3.8))
N ~ N (, N 3.)

Substituting Equation (3.8) in Equation (3.6),









XNIN 1 k)C nn-IF Nz=OXI+22-1 I W 2W k1 nl,
X(kN2 +k2)= w)12 kx[nzNW WN WN1
nf=0 ( n2=0


(3.9)


The inner sum is clearly a N2-point DFT for fixed nl and the outer sum is an Ni-point

DFT for fixed k2. There is also a gluing factor, known as the "twiddle factor", Wk2"

which is multiplied by the inner sum of products term for the fixed values of k2 and nl.

Tables (3.1) and (3.2) shows the unresolved indices (n, k) in terms of the resolved indices

((ni, n2), (ki, k2)).


Table 3.1 Time-domain index n resolved in terms of nl and n2
ni n2-4 0 1 2 N2-1

0 0 N1 2* Ni (N2-1)*Ni

1 1 Ni+1 2* Ni+1 (N2-1)* Ni+1

2 2 N1+2 2* N1+2 (N2-1)* N1+2


Ni-1 Ni-1 2* Ni-1 3* N-1 N-l



Table 3.2 Resolution of the frequency domain index k
ki k2- 0 1 2 N2-1

0 0 1 2 N2-1

1 N2 N2+1 N2+2 2*N2-1

2 2*N2 2*N2+1 2*N2+2 3*N2-1



Ni-1 (Ni-1)*N2 (N1-1)*N2+1 (N1-1)*N2+2 (N-l)










Thus the algorithm describes a means by which sets ofN2-point DFTs (for fixed ni)

interface sets ofNi-point DFTs (for fixed k2). The algorithm is interpreted in the Figure

(3.1).


*-........ .1.1 O. .. .. ..... .
II --F l. -l
*:--------- -- -- -,
I 11 N 2- i-'. -.. N 1-- pt "- vl-i ,i
S). \J I I. ,
l"2a- Pt % rl,_ I



.- ,-,'.".. l-- ,- -t ... .. ,,- -..- ...N 1 ll I..




-1.1 -i r '..- -II I' .. 1-- 'n 2 n11








The complexity of the entire N-point DFT implementation can be modeled as the

complex multiplication and addition count associated with N1 N2-point FFT units and N2

Ni-point FFT units and N twiddle factors [5].


MultiplierComplexity = N [(N2 )2 + ] 2 [( )1 ]+ N

=> MultiplierComplexity = N(N1 + N2 +1) (3.10)

AdditionComplexity = N, [N* (N2 1)]+ N2 [N (N 1)]


= N2 N1N2 + N2N12 -NN

> AdditionComplexity = N(N2 + N -2) (3.11)

If N can be resolved into a highly composite number:

N=N *N2 *N *...*N, (3.12)









then the multiplier complexity is approximately N*(N1 + N2 + N3 + ... + Nn)

This is much lesser than the original N2 for a direct implementation shown in Equation

(3.1). It is common knowledge that a multiplier unit is more complex than a simple adder.

So the complexity of the DFT units is expressed often in terms of the multiplier

complexity alone.

Radix-2 Algorithm

When N is of the form N=2", it can be factored as N=2 x 2 x 2 x .... x 2. Thus all the

individual blocks that are implemented would only be 2-point FFT blocks, which require

no multiplications at all. All the butterfly coefficients would then be implemented as a

part of the gluing logic that connects the individual blocks. If the first level of

factorization is N=2 x N/2, i.e., N1 = 2 and N2 = N/2, then the frequency domain and time

domain indices (k, n) can be modified as

n = 2n2 + n n1 e [0,1],n2 e [0,(/-1)] (3.13)

k = (N/2)k +k2 k, E [0,1],k2 E [0, ( 1)] (3.14)

Substituting from Equations (3.13) and (3.14) in Equation (3.9),


X[k, () + k2]= x[2n2 + n,* *WNk2 W2klnl
n ,=0 n2=0 2


= x[2n2*"- +(1k1 2 *[ x[2n2 +1]*JV"
n2=0 n2=0

= X[k, (N) + k2] = X0'[k2]+ (-1)1 2 X* [k2 ] (3.15)

where the terms Xi are (N/2)-point DFT units. The first term is a grouping of even

indexed terms in the time domain and the second term is a grouping of odd-indexed terms









in the time domain. We can further express the above result to extract the even and odd

indexed frequency domain indices as

X[k2]= X0[k,]+W2 X O k2 -1 (3.16)

X[k, + N]= X,[k]- W02 *X; O k2 N/-1 (3.17)

A block representation of a basic radix-2 implementation unit is shown in the Figure

(3.2).

xri1 xiin)






Figure 3.2 Radix-2 repetitive unit

This one level of reduction would reduce the implementation complexity to sum of

multiplier complexities of two N/2-point DFT units and N/2 multiplications and N

additions. If N=2m and we factorize N m times, then the multiplier complexity is given by

MultiplierComplexity = (N /2) log (N) (3.18)

and the addition complexity is given by

AdditionComplexity = N log (N) (3.19)

Further if we observe the twiddle factors to be multiplied, we see that there are some

factors which are only multiplications with 0,(l-)k,(-j)k. A detailed diagram of a radix-2

8-point implementation is shown in Figure (3.3). The resulting algorithm is called a

radix-2 fast Fourier transform (FFT).





















Figure 3.3 Implementation of a Radix-2 8-point FFT unit


Radix-4 Algorithm

The radix-4 FFT algorithm goes a step further in reducing the complexity of a DFT

implementation. When N is a power of 4, i.e., N= 4 x 4 x 4 x ... x 4, N can be factorized

as N= 4 x (N/4). Here N1=4 and N2 = N/4.The time and frequency domain indices (n,k)

can therefore be expressed as


n = 4n/2 + nZ

k = (4+)k, + k


(3.20)

(3.21)


(3.22)


X[k, ()+k,= ]= [ x[4n2 + n1 ]*l *W2 *
n=0I n2=0


X[k, *(N)+k, ]= x[4n 2]*i" /

+(-1)kl *W2k2 x[4n2 +2]*l "


+(- j)k Wk2 x[4n +1]* ';


+ ( ,i *i k ,x[4n2 +3]*, I


X[k, (y)+ k ] = X, [k, ] + (- j) W~ X [k2 ]+ (- 1)k *W72 X [k ]

+(j)k W3 *X, [k ]


(3.23)


(3.24)


n, E [0,3],n2 a [o,(N-1)]

n, = [0,3],n2 E [o, ( 1)]









The Xi s in Equation 3.20 are all N/4 point FFT units of grouped terms of type

(4m+i). Thus the complexity reduces to sum of complexity of 4 N/4 point FFT units and

3N/4 complex multiplications and 3N complex additions. Expressing the right hand side

of the equations for the varying ki values can further reduce this.

X[k2 ]= (X, [kz]+ W2k X' [k2 ])+ (W X; [k ]+ Wk X [k ]) (3.25)

X[()+k2]= (X [k2 ]- Wk2 X, [k ])- j(W X; [k,]-Wk2 *X [k2]) (3.26)

X[(^)+ k ]= (X [k, l2]+ W2k [ X; [k, ]+ Wk XW [k ]) (3.27)

X[(3N)+kz]= (X[k2]-Wk2 *Xz[kz])+ ;j/2 *X [k2]-Wk2 *X'[k2]) (3.28)

From Equations (3.25), (3.26), (3.27), (3.28), we observe that the number of complex

additions reduces to 2N from 3N. So if we go for log2(N) stage implementations( and

factorizations), then we see that

MultiplierComplexity = (3N)* log2 (N) (3.29)

and

AdditionComplexity = Nlog2 (N) (3.30)

This implementation can also be seen as a repetition of a fundamental unit, which is

the radix-4 block shown in Figure (3.4).


W1 1
"N


Figure 3.4 Radix-4 basic block









We observe that the basic repetitive block in the radix-4 algorithm does not have any

actual multiplications just like the radix-2 block. The resulting algorithm is called a radix-

4 FFT.

Chirp-z Algorithm

One interesting feature of the Cooley-Tukey algorithm implementation is that the

frequency resolution is always related to the number of points at the input of the FFT

unit. The only way to increase the spectral resolution is to increase the number of data

points at the input of the DFT unit. Also, this algorithm only maps onto equally spaced

locations on the unit circular contour in the z-domain. As a result, the FFT can only

provide constant bandwidth analysis in the context of N equal frequency bands of

constant gain. An alternative way of implementation is to map onto a contour close to the

poles in the z-domain so that the spectral resolution is improved.

The Chirp-z transform algorithm avoids these problems by giving the freedom to

choose a range of frequencies to be analyzed independent of the sampling rate and define

the frequency response resolution to be determined by the chosen contour in the z-

domain. If the contour is chosen to be the unit circle, the Chirp-z transform and Cooley-

Tukey FFT algorithm produce the same result. The downside to the Chirp-z

implementation is its higher implementation complexity and slower performance in

comparison to the Cooley-Tukey FFT algorithm. The algorithm modifies the z-domain

mapping to represent the FFT of the signal.

Ifx(n) is a N-point sequence, the z-transform is defined as

N-1
X(zk) = 2x[n]*zk" k [0,L -1] (3.31)
n=0









Here L represents the number of frequency domain outputs; clearly this is independent

of the sampling rate. The z- domain contour is chosen starting at a point closer to the

poles of the system, which is to be resolved and also it is a continuous track which could

also be a unit circle as in the case of the DFT. Though we have only an N-point sequence

x[n] in the time domain, we can have an L-point sequence X[z] in the frequency domain.

If z0 = roeJ" is the origin of the contour, then the contour spirals either inwards or

outwards based on the value of R according to the equation,

zk = z,(R*e' j ke [0,L-1]

This results in

= Zk =k 0roeJ (R* eJ k [0,L-1] (3.32)

Here (ro,?o) represent the origin of the contour spiral, Oo represents how the successive

stages follow on the contour and R determines the convergence or divergence of the

contour. If R<1 the contour spirals inwards towards the origin and if R>1 the contour

spirals away from the origin. If R=I, then the contour is a circle of radius ro.

Substituting Equation (3.32) in Equation (3.31),

Equation 3.33


n=0

N-1
= x[n]* (roeJeo Y (R* elJ o)nk (3.33)
n=0

If we defineV = R*eJ,

N-1
X[zk ] x[n]* (roej' )-" *V-k (3.34)
n=O


To simplify this further, we use the relation










nk = [k2 +n2 (k-n)2 (3.35)

in Equation (3.34).


X[zk ]=N ]* (r0 n Vk/ Vkn)2
n=0


= X[z, = V-k2 c[nre) V V(% (3.36)
n=0

Defining the grouped term to represent a new sequence g (n)


g(n)= x[nroej, e *V "nY ne [0,N-l] (3.37)

In the case of a circular mapping (R=1),

zk =r*eJ2/ (3.38)

the Zk are equally spaced points along a circle of radius r.

Then

N-1
X[zk = x(n)*r -" *e "N
n=0

N-1
= [x(n)*r-"]*e- j2/N
n=0

Here the modified sequence is y(n)=x(n)*r-n and it is sufficient to calculate the DFT of

the modified sequence.

Returning to the more general case, consider a sequence h(n) defined as

h(n)= V (3.39)

Substituting from Equation (3.37) and Equation (3.39) in Equation (3.36),

N-1
X(zk) = V g(n)h(k n) (3.40)
n=0


Defining the convolution sum as another sequence y(k), we have









N-I
y(k) = _g(n)h(k n) (3.41)
n=O

Thus Equation (3.40) becomes

X(zk)= y (3.42)

Here the sequence y(n) is a convolution between a sequence g(n) of length N and

second sequence h(n) of infinite length. Taking a M-point segment of this infinite length

sequence for practical purposes, y(n) would be a sequence of length L given by

L = M + N 1 (3.43)

The convolution filter h(k) is usually implemented using charge coupled devices

(CCD) or surface acoustic wave (SAW) devices. Since we try to obtain a frequency

resolution of L, the length of the convolution filter h(n) is considered to be

M =L -N+1 M= L -(N-1) (3.44)

which implies that

-(N -) < n< (L -1). (3.45)

The computational complexity of the Chirp-z algorithm is thus dependent on M

requiring M*log2M complex multiplications. Compared to N*L, the complexity

comparison can be argued as follows. When L is small, direct computation is more

efficient but when L is large, the Chirp-z transform is better.

To calculate a DFT, set the contour parameters as:

ro=R=1, ?o=0, F o=2p/N and L=N.

So, h(n) from Equation (3.39) simplifies into

h(n)= cos()+ jsin() (3.46)
Sh(n) = h, (n) + jh, (n)










and


h(-n) = V"= cos(-) j sin(--) (3.47)


These coefficients are implemented in a ROM for the pre-multiplications and post-

multiplications. The algorithm is implemented as shown in the following figure (Figure

3.5).


ROM
hr(nr



FIR Filbtr






IR1 i- -
hhi(n)



Figure 3.5 Chirp -z implementation

The sequence h(n) has n2 complex exponential values that can be thought of as a

continuously increasing frequency term as ?n = n2F 0/2 = (nFo/2)n. This signal, shown in

Figure 3.6, has an increasing frequency and sounds like the chirp of a bird.



I I .: 1 1,









Figure 3.6 Chirp Signal

There are some interesting properties of this chirp signal which enhance the

applications of this Chirp-z algorithm for the computation of the DFT. The phase of the









signal in Figure 3.6 is parabolic as shown is shown in Figure 3.7. The figure shows a

linear region as well as the curvature in the in the phase. So the phase can be expressed in

the form Phase(n) = a*n+B*n2. Here a determines the linear region and 8 determines the

curvature in the Figure 3.7.

,,'L


-------------- -- ...................... .. .. .. ..
U -_ J. .' J. L-- J.; L. 1:
NF ml-ulj' .*,'.L r" l>',,r, ;I .,r > lr '.


Figure 3.7 Phase response of the Chirp Signal shown in Figure 3.6

An interesting point is that a chirp signal can be completely recovered from an

impulse signal after passing through a system with unit magnitude and phase shown in

the Figure 3.7 and vice versa. The reverse system would require a system with unit

magnitude response and increasing phase (opposite to the original system). This property

of the chirp signal encourages its use in radar systems, which require short pulses with

higher energy.

Relation between the tolerances of the chirp system and the DFT algorithm is obtained

by seeing the equation describing the incremental evolution factor F. Defining f, and f1

to be the maximum and minimum operating frequencies,


( 2)( f2-f) (3.48)

The frequency resolution in a conventional DFT system is given by

PDFT = (I-) (3.48)

From Equations 3.48 and 3.49, we see that

o = (DF2 () (3.48)






23


This implies that the Chirp-z algorithm has a lesser frequency tolerance for a given N.

This also indicates that the number of points required to achieve a particular spectral

resolution is always smaller when using the Chirp-z algorithm.














CHAPTER 4
FIELD PROGRAMMABLE GATE ARRAYS

The field programmable gate arrays (FPGAs) are a class of programmable devices

which house large circuits with gate count exceeding 20,000 gates, a count that is too

large to be fit onto a CPLD. CPLDs have blocks of AND gates interfacing blocks of OR

gates. Unlike the CPLDs, the FPGAs have logic blocks interconnected with sets of

programmable switches. The structure of a general FPGA is shown in the figure(4.1).

I 0BL0 C K LI IlrtKrnn,,to




iL











Figure 4.1 General structure of an FPGA

There are three types of blocks in the figure, viz., I/O blocks, logic blocks and

interconnection switches. The logic blocks, all identical and usually standardized for each

family of devices, are arranged in a neat arrangement of a matrix. The I/O blocks usually

interface the internal circuitry to the external pins. The interconnecting switches connect

the I/O blocks and the logic blocks. These switches, shown in figure (4.2), are








programmable and form a connection between a horizontal and vertical line based on the

value of the SRAM cell ('0' for no connection, Vv?Vh, and '1' for a formed connection,

Vv=Vh).

V
h


V,

SRAM
cell


Figure 4.2 Programmable Interconnection Switch
Each logic segment of a user program must be small enough to fit into a logic block.

Each logic block is usually an implementation of either look-up-tables (LUT),

multiplexers or general gates. An LUT implementation of a three input function f=X1X2+

X1X3+ X2X3 is shown in figure (4.3) and the truth table is implemented in table (4.1).

EL


E f i Xf
I, [


X I 1
Eli I


Figure 4.3 A 3-input LUT implementation









Table 4.1 Truth table of the function im lemented in Figure (4.3)
X1 X2 X3 f

0 0 0 0

0 0 1 0

0 1 0 0

0 1 1 1




1 0 1 1
1 1 0 1

1 1 1 1


In Figure (4.3), each multiplexer is controlled by a single input based on what the

multiplexer decides to pass a particular input to the output. Usually the number of inputs

to the LUT is about five, which then would require 32 input blocks. Since the FPGAs are

volatile, they must be reprogrammed every time they are powered up. An alternative

solution is to have a RAM/ROM memory block that automatically supplied these

requisite values at power on. Each of the memory cells holds a value of either a '1' or a

'0'. The values that are fed into the SRAM cells are calculated using a simple protocol.

From the truth table, if the entries are placed in ascending order, and if the first level of

multiplexers is controlled by the least significant bit (LSB), and the last level of

multiplexers is controlled by the most significant bit (MSB), the SRAM values would be

in the same order as the output values of the truth table. When a circuit is implemented in

an FPGA, the logic blocks are programmed to realize the necessary functions and the

programmable switches are also programmed to make the suitable interconnections.









Power Calculations in FPGAs

Power dissipation in digital circuits is often the limiting factor in the utility of a

particular circuit in an application. For the purpose of this thesis, Altera FLEX10KE

FPGAs were used and their power dissipation is given by the following equation [8].

d
Power = (IN CCNT )+ PDCn +0.5 *OUT *A *Vo f tog c (4.1)
n=1

where ICCINT= no-load current in the device
VCCINT= no-load voltage Vcc
d= number of DC outputs
PDCn=DC output of output n
fMAx= Maximum frequency of operation
togio=average number of I/O pins toggling at each block
Vccio=DC power supply value.
OUT=Number of output and bi-directional pins
CAVE=Average capacitance of the FPGA device
Vo= Voltage level of the high output state

Costs Involved in FPGA Fabrication

The actual cost of FPGA fabrication is the engineering costs, and tool

(software/hardware) price. The engineering cost for an FPGA fabrication is much less

than that of the ASIC counterpart. But the actual comparison of the FPGA costs is

evident when it is compared to the manufacturing cost of ASIC devices. The ASIC

devices on the other hand have high NRE costs and longer times to market the product.

This actually is the major advantage for the FPGA. The break even number of the FPGA

design can be found as follows.

FPGA cost = Engineering costs & tools + total sales for the all items sold

On the other hand,

ASIC cost = NRE + Engineering cost & tools + total sales for all items sold + Re-
spin cost + Inventory costs + Accounting for future price reductions.









When the additional costs of the ASICs are considered, the FPGAs are a much better

choice for even a moderate amount of sales.

Comparison to other Technologies

Programmability is one very good advantage of the FPGAs that is absent in the ASIC

implementations of digital circuits. ASICs cannot be changed at will unlike the FPGAs.

The design and testing cycles in an FPGA are much shorter than an ASIC and hence can

be marketed much faster. However, for high volume productions, ASIC implementations

are much cheaper. Since optimizations can be done up to the gate level in ASIC

implementations, they are more power efficient.














CHAPTER 5
IMPLEMENTATION DETAILS AND RESULTS

This chapter describes inferences drawn from the results and the work done for

obtaining the results, and also a description of the tools used.

Description of the Work

The thesis work is based on modules built for the purposes of this thesis rather than

the standard modules provided along with Altera tools. This was done to obtain an in-

depth understanding of the pipelining and FPGA concepts in general.

The Cooley-Tukey and Chirp-Z algorithms were implemented using a fixed-point 2's

complement integer arithmetic in VHDL and Verilog. The Cooley-Tukey FFTs have

been fit into the Altera FLEX10KE family of devices but the Chirp-z FFTs were too

cumbersome (24 multipliers of the type used in this work for a 4-point implementation)

to fit onto the FPGAs. Matlab models were built for those designs which could be fit onto

Altera FPGAs. These modules were used for observing the performance of the FFTs in

varying noise environments. The fixed-point implementation allows power efficient high-

speed operations at low cost. This is very much suitable for the mobile/portable

applications [10]. This implementation on the other hand loses precision thus decreasing

the dynamic range and increasing the round off noise. It is important to note that the

complexity of design increases exponentially as the internal bus width is increased.

From the packaging point of view, the pin count is also a major issue. If the complex

input were of 16-bit width, then a parallel input of an N-point FFT (N>4) would be a

virtual impossibility. An alternative work around for this problem is to provide only 2 16-









bit inputs that would take in real and complex data inputs simultaneously at clocked

intervals. The main drawback in this implementation is that the FFT would assume a

serial-type form and hence the actual operation of the FFTs is done at N times lesser

speed.

The effect of this bus width is visible at the output also. Noise is introduced in the

system due to insufficient representation of all the numbers in the system. This type of

noise is called round-off noise and it propagates in the system along successive stages.

Initially the system was attempted to be with only an internal bus width (all the twiddle

factors, outputs of all stages) of 16-bit width only. This requires rounding off the output

of a 16-by-16-bit multiplier from 32 bits to 16-bit width (thereby losing 16 bits of

precision), and a 16-bit adder (losing /2 bit precision loss at each unit). Tables 5.1 and 5.2

reveal the implementation details of the case where the round-off errors have not been

eliminated. The deficiencies in the preliminary design were removed after employing the

following techniques.

1. Each multiplier output is not truncated till it reaches one more level (Figure
(5.1)).

2. The conventional multiplier was replaced with a pipelined multiplier
(Figure 5.2), which greatly reduced the speed-bottleneck and increased the
maximum operating frequency of the system.

3. All 16-bit adder/subtraction units were replaced with 32-bit units.

4. A serial-to -parallel converter is placed at the input unit and a parallel-to-
serial converter is placed after the output unit to reduce the pin counts.















N ~ -
I2
SAddrrl
sn1btractor




Figure 5.1 Implementation of Multipliers (a) shows the initial truncating configuration
and Figure (b) shows the truncation operation after one more level of processing




H -laflhP' Vultipl"ir N Reostr N Righter N

A Iiher- 1 2-lit N-bit i R i-t
Bi Igmhr Multiplier A Register N








Figure 5.2 N-by-N-bit Pipelined Multiplier
AIn general, the internal bus widt wasit standardized to 32-bit width. These adjustmLe ents
greatly increased tgher complexity in Reign. The pipeline multiplier required 3847 logic
Ntl






blocks while the conventional Baugh-Wooley multiplier required only 2160 logic blocks
D__bwe r- ullJ;1ip r er f [tgltr N







for its implementation. Due to the pipeline ultiplier implementation, there was an








observed increase in the maximum possible operating frequency from about 7 MHz to


about 58MHz and 30 MHz for the Radix-2 and Radix-4 Cooley-Tukey 8-point FFT

implementations respectively. The pipeline multiplier was a 7-stge case involving the


usage of smaller 4-by-4 point multiplications. Tables 5.3 and 5.4 quantify the


implementation issues of the case where round off errors were eliminated to a great


extent.









Description of Tools Used

The Altera MAX+PLUS II software was used to synthesize the VHDL/Verilog code.

The software contained tools to compile, simulate and edit the floor plan of the design.

The compiler was equally optimized for area and speed. The designs were fit into the

FLEX10KE device family. The FLEX10KE family of devices has the following features:

* High gate density implementation

* Accommodates designs of about 200,000 typical gates

* 4096 SRAM bits per Embedded Array Block (EAB).

* Multi-volt I/O pins (2.5V, 3.3 V or 5.0 V devices)

* Built-in Joint Test Action Group (JTAG) Boundary Scan Test (BST) circuitry
available without consuming additional device logic.

* Built in low-skew clock distribution trees.

* Flexible fast track interconnects

* Powerful I/O pins

Each FPGA has an embedded array (EAB) and a logic array (LAB) which are useful

for efficient implementations. The embedded array is used in implementing a variety of

memory functions, complex logic functions, microcontroller applications and data

transform. The logic array is used to implement a multitude of general logic functions.

Each LAB has 8 logic elements (LEs) and a local interconnect and each LE has a four-

input look-up-table (LUT), a programmable flip-flop and a dedicated signal path for carry

and cascaded functions. The FPGA device has the ability of be configured either serially

or in parallel synchronously/asynchronously. The average capacitance of the device

remains unchanged for any operating frequency.










AWGN
channel
noise

4-QAP.1 IFFT FFT QAM
Demapping

Figure 5.3 Model used in the Thesis work

Figure (5.3) shows the model used in the thesis work. The IFFT is only a special case

of the FFT implementation. Here smaller Point IFFT units replace the smaller Point FFTs

and the twiddle factors were replaced by their complex conjugates. The FFTs were fit

into the FLEX10KE family of devices. Up to 4-point FFTs could be accommodated using

a single device for both the radix-2 and radix-4 cases but their 8-point implementations

required multiple devices. This reduced the percentage utilization of the logic blocks to a

great extent. Matlab models were built for the FFTs for the Cooley-Tukey and Chirp-z

algorithms. The model was used for performance evaluations under varying noise

conditions. The results for the implementation are in the following results and inferences

section.

Results and Conclusions

Table 5.1 Radix-2 Cooley-Tukey implementation with round off errors.
N Multiliers LC Count Delay(ns) Variance Precision* Variance Precision*

2 0 150 29.7 1.01E-09 15 2.46E-10 16
4 0 609 43.2 2.57E-09 15 1.66E-10 16
8 3 4151 154.4 2.94E-07 10 5.13E-09 13






Table 5.2 Radix-4 Cooley Tukey implementation with round off errors.
FFT FFT IFFT IFFT
N Multiliers LC count Delay(ns) Variance Precision* Variance Precision*

4 0 723 40.9 2.13E-09 14 1.40E-10 16
8 3 4114 145.6 3.02E-07 10 5.53E-09 13









Table 5.3 Radix-2 Cooley Tukey implementation without round off errors.
FFT FFT IFFT IFFT
N Multiliers LC count Delay(ns) Variance recision** Variance recision**

2 0 244 49.9 6.20E-10 15 1.46E-10 16
4 0 795 67.1 2.53E-09 14 1.53E-10 16
8 3 5744 169.1 1.68E-07 11 2.53E-09 14





Table 5.4 Radix-4 Cooley-Tukey implementation without round off errors.
FFT FFT IFFT IFFT
N Multiliers LC count Delay(ns) Variance Precision* Variance Precision*

4 0 1381 81.1 1.01E-19 31 7.09E-21 33
8 3 5699 171.1 1.72E-07 11 2.62E-09 14

The results in Tables 1-4 can be argued as follows. The IFFT equation (Equation 3.1)

has a factor (1/N) in it. Since this can be achieved by a simple shifting operation, the

IFFT has more precision than the FFT in all the Cooley-Tukey implementations. The

number of logic blocks required for the implementation of the no round-off errors case is

considerably more than when the round-off errors were present. The increase in the

amount of hardware has a direct impact on the propagation delays. But the usage of a

multi-stage multiplier allows pipelined implementation and this increases the throughput

of the entire system. The Baugh-Wooley complex multiplier has about 2160 logic blocks

as compared to the 3847 logic blocks of the complex pipelined multiplier. As the length

of the FFT increases, the number of the multipliers is also on the rise and that increases

the implementation complexity of the system. Attempts to fit a 16-point FFT unit in both

the cases (with and without the round-off errors) proved futile. Usage of library-

parameterized modules (LPMs) is a possible solution to this problem. But since the

purpose of this thesis was to also judge the performance of the systems in terms of

latency, the usage of a pipelined multiplier became a necessity. It is common knowledge









that greater the number of stages in a design implementation, lesser is the amount of

precision preserved. The Radix-4 FFT implementation has fewer stages of

implementation compared to the Radix-2 implementation and hence greater is its

precision. The theoretical precision of a fixed-point implementation is given by

Pr decision = log2 (Variance) (5.1)

and is calculated in bits. The tables reveal that the observed precision is very close in

almost all cases to the theoretical case.

Power Calculations

The power dissipation in an FPGA is the aggregate sum of all the internal power

dissipation and the power dissipation due to the I/O. It is given by the formula


PowerEstimate = Pi + Po

{ CCINT VCCINT IJ { ACOUT + 1DCOUT I
L CCs tan dby + ICCactive ) VCCINT 1 (5.2)
(0.5 *OUT* CAverage V togo Vccio

+ h*OUT* + l*OUT* )
n=l n=l
Here

h =percentage of high dc outputs

1 = percentage of low dc outputs

OUT= number of total output and bi-directional pins in an FPGA device

fMAx = Maximum possible operating frequency

Vcc = Supply voltage (could be 2.5, 3.3 or 5.0 volts).

Rioh=Pull-down resistance + Resistance calculated from the slope of the IOH

characteristics in the device datasheet.









Riol=Pull-up resistance + Resistance calculated from the slope of the IOL

characteristics in the device datasheet.

toglo= percentage of switching expected in the outputs (usually assumed as 12.5%)

Vo=the output high voltage at a particular value of Vcc (3.8,3.3, 2.5V for 5.0, 3.3, 2.5

V Vccio respectively)

CAverage= Average capacitance of the family of devices specified in the data sheet.

Usually the higher point FFTs extend over more than one device and an accurate

power calculation would take into account power for each of these devices. The power

calculations for the biggest design implemented, i.e., the Radix-2 8-point FFT is arrived

at as follows. The implementation distributes itself onto three different devices,

EPF10K 130EFC672-1(a), EPF10K200EBC600-1(b) and EPF10K 100EF484-1(c). The

power calculation must take into consideration the requirements of the three chips

separately.

Table 5.5 Power calculations for Radix-2 8-point FFT.
O/P Logic Vcc/ K ICCActive Iccsup PINT Pdc Pac

pins blocks Vccio (mA) (mA) (mW) (mW) mW

a 144 1765 2.5V 4.6 0.0593 150 167.099 32.89 768

b 181 1966 2.5V 4.8 0.0689 250 191.178 41.34 966

c 112 2013 2.5V 4.5 0.0662 125 184.265 25.58 598



fmax= 58.47 Mhz, VCCINT = 2.5V Vccio = 2.5V

ICCstandby = 7.5 mA Rioh=1400 ohms Riol=1007 ohms











Assuming 50% high o/p and 50 % low op and 1k pull-up and pull-down resistors, we


have the total power as 2975.253 mW for a radix-2 8-point implementation with a 2.5V


Vcc.


Noise Tolerance

The performance of each of the designs was tested under varying noise conditions.


The radix-2 and radix-4 perform identical to each other for an equal bus width and


number of points (Figures 5.4 and 5.5). Increasing the bus width brought minor


improvement into the system but when compared to the floating point implementation


(Figure 5.6), the improvement is insignificant. So increasing the bus width to more than


16-bit width in these implementations would not be of much value. The changing of the


radix from noise performance is much better in the case of 32-bit bus width.


IN R u B ER t~r 1-bll Inlernll bu- ld Ih
10E
RadZ--Zp I
R d2-+ I
d Rad --.pI
R d --2p
Idn\. '




1o '


SN R --


Figure 5.4 BER variations against SNR for an internal bus width of 16.









38




SN R uS B ER tbr 32-bil ilErnal Bus lid lh.


8NR---


Figure 5.5 BER variations against SNR for an internal bus width of 32.


NR US B ER
F lc-llng poinl F F Tus bull m od.d, e comparlUon


F Ic-llng pclnl
1bl F-Rad 4-E-p I
S1GbiF-Rd2-p I
32blI-R.d2-SpI
32b I-Rad 4-Sp I








10







1013







111-
0 1 2 3 45 6 7
SNR ---





Figure 5.6 BER variations against SNR: Comparison of floating point results with

modeled Radix -2 and Radix -4 8 point FFTs with 16- and 32-bit internal bus

width.









The Cooley Tukey algorithm was more feasible for implementation into the FPGAs

when compared to the Chirp-z implementation due to the excessive amount of multipliers

involved in the lower point FFTs. The Chirp-z algorithm usually has the FIR convolution

blocks implemented onto CCDs or SAW devices which reduces the number of

multipliers to a minimal number. The implementation complexity of the Chirp-z

algorithm is 6N complex multiplications compared to the N*log2N for the radix-2 Cooley

-Tukey implementation which would mean the break-even point for the algorithms to be

sixty-four. So any implementation greater than 64-point FFT would be better off using

the Chirp-z algorithm. Since the standard library parameterized modules were not used in

the design, even 4-point FFT could not be configured using the Chirp-z algorithm. For

the smaller point FFTs, if the data could be clocked out after the ROM coefficient pre-

multiplications to the external device (CCDs or SAW devices) for the circular

convolution, the implementation complexity in that case would have been only 2N

complex multiplications on the FPGA. This would involve converting the digital data

into analog form at the output of the FPGA (input of the CCD) and a re-conversion into

digital form after a certain time for clocking the input into the FPGAs. This analog

implementation of the circular convolution also preserves the precision to a great extent.

Though this type of implementation is theoretically feasible and also efficient in terms of

precision, its practical implementation would be avoided since it involves the

intermediate digital to analog conversion and also the analog to digital conversion. Most

applications use longer point FFTs and so the Chirp-z algorithm is a much better choice

in those cases. The Cooley Tukey algorithm implementation is also highly desirable for

the reduction of complexity associated with it.









Directions of Future Work

This thesis used fast multipliers that were designed for being able to provide a

comparison between the general multipliers and the multi-stage fast multipliers. This

restricted the maximum possible FFT length to eight. The simulation software provides

multiplier modules that are highly optimized for each family of the FPGA devices. Usage

of such modules may reduce the power consumption and latency. Since these multipliers

are usually much smaller, a higher point FFT implementation using greater number of

such multipliers is possible and further research can be done with these multiplier

modules.

The Chirp-z algorithm provides for more efficient implementations with much lesser

hardware. More work can be done using such an implementation. Also the amount of

precision loss in such a design would be independent of the length of the FFT. The

increased spectral resolution provided by this algorithm with much lesser hardware

would be an incentive to pursue the study of this algorithm implementations on FPGAs.


















APPENDIX A
16-BIT COOLEY-TUKEY IMPLEMENTATION




%%%%%%%%% bwcelll.m %%%%%%%%
function [s]=bwcelll(a,b)
% Cell 1 of the Baugh Wooley Multiplier
s=a & b;
return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% bwcell2.m %%%%%%%%
function [S,cout]=bwcell2(a,b,Sin,Cin)
% Cell 2 of the Baugh Wooley Multiplier
d=a & b;
[S,cout]=fulladder(d,Sin,Cin);
return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% bwcell3.m %%%%%%%%
function [s]=bwcell3(a,b)
% Cell 3 of the Baugh Wooley Multiplier
s=a & ~b;
return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% bwcell4.m %%%%%%%%
function [s,cout]=bwcell4(a,b,Sin,Cin)
% Cell 4 of the Baugh Wooley Multiplier
c=~a & b;
[s,cout]=fulladder(c,Sin,Cin);
return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% bwcell5.m %%%%%%%%
function [s,cout]=bwcell5(x,y)
% Cell 5 of the Baugh Wooley Multiplier
b=x & y;
c=~x;
d=~y;
[s,cout]=fulladder(b,c,d);
return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% bwml6a.m %%%%%%%%
function [p]=bwml6a(x,y)
% This function multiplies two 16-bit signed numbers and gives a 16-bit
signed number
% as output.












% The logic implemented is the Baugh Wooley Multiplier

t=zeros(16);
c=zeros(16);

% First row
for i=16:-1:2
t(l,i) = bwcelll( x(i),y(16)
end
t(l,l)=bwcell3(x(1),y(16));

% row 2
for i=16:-1:2
[t(2,i),c(2,i)]=bwcell2 (x(i),y(15),t(1,i-1),0);
end
t(2,1)=bwcell3(x(1),y(15));

% Row three to row 15

for j=3:15
for i=16:-1:2
[t(j,i),c(j,i)]=bwcell2(x(i),y(17-j),t(j-1,i-1),c(j-1,i));
end
t(j,l)=bwcell3(x(1),y(17-j));
end
% row 16

for i=16:-1:2
[t(16,i) ,c(16,i)]=bwcell4(x(i) ,y(1),t(15,i-1),c(15,i));
end

[t(16,1),c(16,1)]=bwcell5(x(1),y(l));

% Last Row
[temp,cout(16)]=fulladder(x(1),y(1),t(16,16));
for i=16:-1:2
[p(i),cout(i-1)]=fulladder(c(16,i),t(16,i-1),cout(i));
end
[p(1),nc]=fulladder(l,c(16,1),cout(1));

clear c
clear t
clear cout
return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% com.m %%%%%%%%
function [b]=com(a)

% This function calculates the twos complement of a number
tmpl=~a;
tmp2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1];
[b,tmp]=adl6c(tmpl,tmp2);
clear tmpl
clear tmp2
clear a







43


return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% cplxmull6.m %%%%%%%%
function [pr,pi]=cplxmull6(xr,xi,yr,yi)

% This function multiplies two 16-bit signed complex inputs and gives a
16-bit signed
% complex output.

t0=bwml6a(xr,yr);
tl=bwml6a(xr,yi);
t2=bwml6a(xi,yr);
t3=bwml6a(xi,yi);


pr=subl6(t0,t3);
pi=addl6(t2,tl);
return
%%%%%%%%% END %%%%%%%%


%%%%%%%%% fulladder.m %%%%%%%%
function [s,cout]=fulladder(a,b,c)
tmpl=(a & ~b) (~a & b);
s=(tmpl & ~c) (~tmpl & c);
tmp2= a & b;
tmp3= b & c;
tmp4= c & a;
cout= (tmp2 tmp3) tmp4;
return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% getformat.m %%%%%%%%
function [format]=getformat(a)
% function gets the input vector a into a format that allows only real
transmissions
% to be possible in the time domain data.

lena=length(a);

format=[real(a(1)); a(2:lena); imag(a(1)); conj (a(2:lena))];
return
%%%%%%%%% END %%%%%%%%


%%%%%%%%% hexa2bin.m %%%%%%%%
% Function returns the string(NOT ARRAY) containing the binary
equivalent of the input hexadecimal number
% for example
% if b='f74b'
% then hexa2bin(b) would be equal to '1111011101001011'
% and if b='074b'
% then hexa2bin(b) would be equal to '11101001011'
% NOTE THAT THE INITIAL ZEROS ARE NOT PRESENT IN THE RESULT
function [s]=hexa2bin(a)
s=dec2bin(hex2dec(a));








44


return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% hex2decim.m %%%%%%%%
intebits=6;
fracbits=16-intebits;
hexad=1010111100000000;
a=[0];
for i=1:16
digi=mod(hexad,10);
hexad=floor(hexad/10);
a=[digi a];
end
a=a(l:16);
decim=0;
factor=2^(-fracbits);
for i=16:-1:2
decim=decim+a(i)*factor;
factor=factor*2;
end
decim
if a(1)==l
decim=decim-2^(intebits-1);
end
decim
%%%%%%%%% END %%%%%%%%

%%%%%%%%% iterative.m %%%%%%%%
% This program does the performance analysis of the program for FFT in
terms
% of the errors and its types, execution times, etc, over a number of
iterations.

E mean=[];
E var=[];
magerr=[];
sign err=[];
magsign err=[];
E no=[];
ratio vector=[];
N=8; % index of the fft to be analysed

for i=1:1000
% i is the iteration index
% i
% Initialization of variables for each iteration
Error mean=0;
Error variance=0;
Avgnoof_magnitude errors=0;
Avg no of sign errors=0;
Avgnoofmagandsignerrors=0;

% preparation of inputs

inr=rand(1,N);
ini=rand(1,N);











cplx=inr+j*ini;

cplxfft=fft(cplx);
test=ramasfft8(cplx);

right=0;
signerr=0;
magerr=0;
magsignerr=0;

Error meanl=sqrt(mean((test-reshape(cplxfft,[N,1)) .^2));
E mean=[E mean Error meanl];

end

Error mean=mean(abs(E mean))
Error variance=var(abs(E mean))
%%%%%%%%% END


%%%%%%%%% rad2ctl6.m %%%%%%%%
function
[or0,oi0,orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi
8,or9,oi9,orl0,oil0,orll,oill,orl2,oil2,orl3,oil3,orl4,oil4,orl5,oil5]=
rad2ctl6(irO,iiO,irl,iil,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii
7,ir8,ii8,ir9,ii9,irl0,iil0,irll,iill,irl2,iil2,irl3,iil3,irl4,iil4,irl
5,iil5)
% this function calculates the 8-point radix-2 cooley tukey fft
transform

% GOVERNING EQUATIONS AND ANALYSIS

% N = 16 ; => (N1 = 2) & (N2 = 8)
% k = kl + 2 k2;
% n = 8 nl + n2;
% kl,nl => (0,1)
% k2,n2 => (0,1,2,... 7)


% RESULTS
% EXECUTION TIME
% FRACTIONAL PRECISION
% ERROR OVER A RUN OF 100 TIMES
% AVERAGE NUMBER OF ERRORS
% ERROR TYPES--->
% MAGNITUDE ERRORS ONLY
% SIGN ERRORS ONLY
% BOTH MAGNITUDE AND SIGN


[tr0,ti r,trl,til,tr2,ti2,tr3,ti3,tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2c
t8(
ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6,ir8,ii8,irl0,iil0,irl2,iil2,irl4,iil4);
[tr8,ti8,tr9,ti9ttri,til0r,trll,till,trl2,til2,trl3,til3,trl4,til4,trl5
,til5]=rad2ct8(irl,iil,ir3,ii3,ir5,ii5,ir7,ii7,ir9,ii9,irll,iill,irl3,i
il3,irl5,iil5);







46


tra0=shrega(tr0);
tia0=shrega(tiO);
tral=shrega(trl);
tial=shrega(til);
tra2=shrega(tr2);
tia2=shrega(ti2);
tra3=shrega(tr3);
tia3=shrega(ti3);
tra4=shrega(tr4);
tia4=shrega(ti4);
tra5=shrega(tr5);
tia5=shrega(ti5);
tra6=shrega(tr6);
tia6=shrega(ti6);
tra7=shrega(tr7);
tia7=shrega(ti7);
tra8=shrega(tr8);
tia8=shrega(ti8);

w16 rl=[0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0];
w16 il=[l 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1];
w16 r2=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1];
w16 i2=[1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0];
w16 r3=[0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 0];
w16 i3=[1 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0];
w16 r4=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
wl6 i4=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1];

[tra9,tia9]=cplxmull6(tr9,ti9,wl6 rl,wl6_il);
[tral0,tial0]=cplxmull6(tr10,til0,wl6 r2,wl6 i2);
[trall,tiall]=cplxmull6(trll,till,wl6 r3,wl6_i3);
[tral2,tial2]=cplxmull6(trl2,til2,wl6 r4,wl6 i4);
[tral3,tial3]=cplxmull6(trl3,til3,wl6 il,wl6 i3);
[tral4,tial4]=cplxmull6(trl4,til4,wl6 i2,wl6_i2);
[tral5,tial5]=cplxmull6(trl5,til5,wl6 i3,wl6 il);


[or0,oi0,or8,oi8]=rad2ct2(tra0,tia0,tra8,tia8);
[orl,oil,or9,oi9]=rad2ct2(tral,tial,tra9,tia9);
[or2,oi2,orl0,oil0]=rad2ct2(tra2,tia2,tralO,tialO);
[or3,oi3,orll,oill]=rad2ct2(tra3,tia3,trall,tiall);
[or4,oi4,orl2,oil2]=rad2ct2(tra4,tia4,tral2,tial2);
[or5,oi5,orl3,oil3]=rad2ct2(tra5,tia5,tral3,tial3);
[or6,oi6,orl4,oil4]=rad2ct2(tra6,tia6,tral4,tial4);
[or7,oi7,orl5,oil5]=rad2ct2(tra7,tia7,tral5,tial5);
return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% rad2ctl6a.m %%%%%%%%
function
[or0,oio0,orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi
8,or9,oi9,orl0,oil0,orll,oill,orl2,oil2,orl3,oil3,orl4,oil4,orl5,oil5]=
rad2ctl6a(irO,ii0,irl,iil,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,i
i7,ir8,ii8,ir9,ii9,irl0,iil0,irll,iill,irl2,iil2,irl3,iil3,irl4,iil4,ir
15,ii15)
% this function calculates the 8-point radix-2 cooley tukey fft
transform







47




% GOVERNING EQUATIONS AND ANALYSIS

% N = 16 ; => (N1 = 4) & (N2 = 4)
% k = kl + 2 k2;
% n = 8 nl + n2;
% kl,nl => (0,1,2,3)
% k2,n2 => (0,1,2,3)

% RESULTS :
% EXECUTION TIME
% FRACTIONAL PRECISION
% ERROR OVER A RUN OF 100 TIMES
% AVERAGE NUMBER OF ERRORS
% ERROR TYPES--->
% MAGNITUDE ERRORS ONLY
% SIGN ERRORS ONLY
% BOTH MAGNITUDE AND SIGN

[tr0,ti0,trl,til,tr2,ti2,tr3,ti3]=rad2ct4(
ir0,ii0,ir4,ii4,ir8,ii8,irl2,iil2);
[tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4(irl,iil,ir5,ii5,ir9,ii9,irl3,
iil3);
[tr8,ti8,tr9,ti9,trlO,til0,t,tll,tillrad2ct4(
ir2,ii2,ir6,ii6,irl0,iil0,irl4,iil4);
[trl2,til2,trl3,til3,trl4,til4,trl5,til5]=rad2ct4(ir3,ii3,ir7,ii7,irll,
iill,irl5,iil5);

tra0=shrega(tr0);
tia0=shrega(ti0);
tral=shrega(trl);
tial=shrega(til);
tra2=shrega(tr2);
tia2=shrega(ti2);
tra3=shrega(tr3);
tia3=shrega(ti3);
tra4=shrega(tr4);
tia4=shrega(ti4);
tra8=shrega(tr8);
tia8=shrega(ti8);
tral2=shrega(trl2);
tial2=shrega(til2);

w16 rl=[0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0];
w16 il=[l 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1];
w16 r2=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1];
w16 i2=[1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0];
w16 r3=[0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 0];
w16 i3=[1 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0];
w16 r4=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
wl6 i4=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1];

[tra5,tia5]=cplxmull6(tr5,ti5,wl6 rl,wl6 il);
[tra6,tia6]=cplxmull6(tr6,ti6,wl6 r2,wl6_i2);
[tra7,tia7]=cplxmull6(tr7,ti7,wl6 r3,wl6 i3);
[tra9,tia9]=cplxmull6(tr9,ti9,wl6 r2,wl6_i2);
[tral0,tial0]=cplxmull6(tr10,til0,wl6 r4,wl6 i4);







48


[trall,tiall]=cplxmull6(trll,till,wl6 i2,wl6_i2);
[tral3,tial3]=cplxmull6(trl3,til3,wl6 r3,wl6_i3);
[tral4,tial4]=cplxmull6(trl4,til4,wl6 i2,wl6 i2);
[tral5,tial5]=cplxmull6(trl5,til5,wl6 i3,wl6 r3);

[or0,oi0,or4,oi4,or8,oi8,orl2,oil2]=rad2ct4(tra0,tia0,tra4,tia4,tra8,ti
a8,tral2,tial2);
[orl,oil,or5,oi5,or9,oi9,orl3,oil3]=rad2ct4(tral,tial,tra5,tia5,tra9,ti
a9,tral3,tial3);
[or2,oi2,or6,oi6,orl0,oil0,orl4,oil4]=rad2ct4(tra2,tia2,tra6,tia6,tral0
,tial0,tral4,tial4);
[or3,oi3,or7,oi7,orll,oill,orl5,oil5]=rad2ct4(tra3,tia3,tra7,tia7,trall
,tiall,tral5,tial5);

return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% rad2ct2.m %%%%%%%%
function [Orl,Oil,0r2,0i2]=rad2ct2(irl,iil,ir2,ii2)
% This function calculates the 2 point FFT of the two 16-bit inputs and
gives
% two 16 bit outputs.

% GOVERNING EQUATIONS AND ANALYSIS

% outl=inl+in2;
% out2=inl-in2;

% RESULTS :
% EXECUTION TIME : 27.4 ns(DELAY in MAXPLUS) WHEN FITTED INTO
"EP1K10FC256-1" DEVICE OF ACEX 1K FAMILY
% FRACTIONAL PRECISION : 1 less than input precision
% ERROR OVER A RUN OF 100 TIMES : mean := -6.8272e-05 -6.0807e-05i
and variance := 4.0445e-09
% AVERAGE NUMBER OF ERRORS
% ERROR TYPES--->
% MAGNITUDE ERRORS ONLY : 0
% SIGN ERRORS ONLY : 0.02
% BOTH MAGNITUDE AND SIGN : 0

Orl=addl6(irl,ir2);
Or2=subl6(irl,ir2);
Oil=addl6(iil,ii2);
Oi2=subl6(iil,ii2);
return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% rad2ct2ifft.m %%%%%%%%
function [Orl,Oil,0r2,0i2]=rad2ct2ifft(irl,iil,ir2,ii2)
% This function calculates the 2 point FFT of the two 16-bit inputs and
gives
% two 16 bit outputs.
Orl=addl6(irl,ir2);
Or2=subl6(irl,ir2);
Oil=addl6(iil,ii2);
Oi2=subl6(iil,ii2);
return







49


%%%%%%%%% END %%%%%%%%

%%%%%%%%% rad2ct32.m %%%%%%%%
function
[or0,oi0,o,orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi
8,or9,oi9or,orl0,oil0,orll,oill,orl2,oil2,orl3,oil3,orl4,oil4,orl5,oil5,o
rl6,oil6,orl7,oil7,orl8,oil8,orl9,oil9,or20,oi20,or21,oi21,or22,oi22,or
23,oi23,or24,oi24,or25,oi25,or26,oi26,or27,oi27,or28,oi28,or29,oi29,or3
0,oi30,or31,oi31]=rad2ct32(ir0,ii0,irl,iil,ir2,ii2,ir3,ii3,ir4,ii4,ir5,
ii5,ir6,ii6,ir7,ii7,ir8,ii8,ir9,ii9,irl0,iil0,irll,iill,irl2,iil2,irl3,
iil3,irl4,iil4,irl5,iil5,irl6,iil6,irl7,iil7,irl8,iil8,irl9,iil9,ir20,i
i20,ir21,ii21,ir22,ii22,ir23,ii23,ir24,ii24,ir25,ii25,ir26,ii26,ir27,ii
27,ir28,ii28,ir29,ii29,ir30,ii30,ir31,ii31)
%Function to calculate the 16 point radix 4 fft

% GOVERNING EQUATIONS AND ANALYSIS

% N = 16 ; => (N1 = 4) & (N2 = 4)
% k = kl + 2 k2;
% n = 8 nl + n2;
% kl,nl => (0,1,2,3)
% k2,n2 => (0,1,2,3)

% RESULTS :
% EXECUTION TIME
% FRACTIONAL PRECISION
% ERROR OVER A RUN OF 100 TIMES
% AVERAGE NUMBER OF ERRORS
% ERROR TYPES--->
% MAGNITUDE ERRORS ONLY
% SIGN ERRORS ONLY
% BOTH MAGNITUDE AND SIGN


[tr0,ti r,trl,til,tr2,ti2,tr3,ti3,tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad4c
t8(ir0,ii0,ir4,ii4,ir8,ii8,irl2,iil2,irl6,iil6,ir20,ii20,ir24,ii24,ir28
,ii28);
[tr8,ti8,tr9,ti9,trl0t til0,trll,till,trl2,til2,trl3,til3,trl4,til4,trl5
,til5]=rad4ct8(irl,iil,ir5,ii5,ir9,ii9,irl3,iil3,irl7,iil7,ir21,ii21,ir
25,ii25,ir29,ii29);
[trl6,til6,trl7,til7,ttrl8l9,l8tr9,ti9,tr20,ti20,tr21,ti21,tr22,ti22,
tr23,ti23]=rad4ct8(ir2,ii2,ir6,ii6,ir10,iil0,irl4,iil4,irl8,iil8,ir22,i
i22,ir26,ii26,ir30,ii30);
[tr24,ti24,tr25,ti25,tr26,ti26,tr27,ti27,tr28,ti28,tr29,ti29,tr30,ti30,
tr31,ti31]=rad4ct8(ir3,ii3,ir7,ii7,irll,iill,irl5,iil5,irl9,iil9,ir23,i
i23,ir27,ii27,ir31,ii31);

tra0=shrega(tr0);
tia0=shrega(ti0);
tral=shrega(trl);
tial=shrega(til);
tra2=shrega(tr2);
tia2=shrega(ti2);
tra3=shrega(tr3);
tia3=shrega(ti3);
tra4=shrega(tr4);
tia4=shrega(ti4);







50


tra5=shrega(tr5);
tia5=shrega(ti5);
tra6=shrega(tr6);
tia6=shrega(ti6);
tra7=shrega(tr7);
tia7=shrega(ti7);
tra8=shrega(tr8);
tia8=shrega(ti8);
tral6=shrega(trl6);
tial6=shrega(til6);
tra24=shrega(tr24);
tia24=shrega(ti24);

w32 ir=[0 0 1 1 1 1 1 0 1 1 0 0 0 1 0 1];
w32 li=[l 1 1 1 0 0 1 1 1 0 0 0 0 1 0 0];
w32 2r=[0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0];
w32 2i=[1 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1];
w32 3r=[0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 0];
w32 3i=[1 1 0 1 1 1 0 0 0 1 1 1 0 0 1 0];
w32 4r=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1];
w32 4i=[1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0];
w32 5r=[0 0 1 0 0 0 1 1 1 0 0 0 1 1 1 0];
w32 5i=[1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 0];
w32 6r=[0 0 0 1 0 1 1 1 1 0 0 0 1 1 1 0];
w32 6i=[1 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0];
w32 7r=[0 0 0 0 1 1 0 0 0 1 1 1 1 1 0 1];
w32 7i=[1 1 0 0 0 0 0 1 0 0 1 1 1 0 1 1];
w32 8r=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
w32 8i=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1];


[tra9,tia9]=cplxmull6(tr9,ti9,w32 lr,w32 li);
[tral0,tial0]=cplxmull6(trlO,tilO,w32 2r,w32 2i);
[trall,tiall]=cplxmull6(trll,till,w32 3r,w32 3i);
[tral2,tial2]=cplxmull6(trl2,ti2,w32 2r,w32 2i);
[tral3,tial3]=cplxmull6(trl3,ti3,w32 4r,w32 4i);
[tral4,tial4]=cplxmull6(trl4,ti4,w32 6r,w32 6i);
[tral5,tial5]=cplxmull6(trl5,til5,w32 7r,w32 7i);
[tral7,tial7]=cplxmull6(trl7,til7,w32 2r,w32 2i);
[tral8,tial8]=cplxmull6(trl8,til8,w32 4r,w32 4i);
[tral9,tial9]=cplxmull6(trl9,til9,w32 6r,w32 6i);
[tra20,tia20]=cplxmull6(tr20,ti20,w32 8r,w32 8i);
[tra2l,tia21]=cplxmull6(tr21,ti21,w32 2i,w32 6i);
[tra22,tia22]=cplxmull6(tr22,ti22,w32 4i,w32 4i);
[tra23,tia23]=cplxmull6(tr23,ti23,w32 6i,w32 2i);
[tra25,tia25]=cplxmull6(tr25,ti25,w32 3r,w32 3i);
[tra26,tia26]=cplxmull6(tr26,ti26,w32 6r,w32 6i);
[tra27,tia27]=cplxmull6(tr27,ti27,w32 li,w32 7i);
[tra28,tia28]=cplxmull6(tr28,ti28,w32 4i,w32 4i);
[tra29,tia29]=cplxmull6(tr29,ti29,w32 7i,w32 li);
[tra30,tia30]=cplxmull6(tr30,ti30,w32 6i,w32 6r);
[tra3l,tia31]=cplxmull6(tr31,ti31,w32 3i,w32 3r);


[orO,oiO,or8,oi8,orl66,i6,or24,oi24]rad2ct2(tra0,tiaO,tra8,tia8,tral6
,tial6,tra24,tia24);











[orl,oil,or9,oi9,orl7,oil7,or25,oi25]=rad2ct2(tral,tial,tra9,tia9,tral7
,tial7,tra25,tia25);
[or2,oi2,or10,oil0,orl8,oil8,or26,oi26]=rad2ct2(tra2,tia2,tral0,tial0,t
ral8,tial8,tra26,tia26);
[or3,oi3,orll,oill,orl9,oil9,or27,oi27]=rad2ct2(tra3,tia3,trall,tiall,t
ral9,tial9,tra27,tia27);
[or4,oi4,orl2,oil2,or20,oi20,or28,oi28]=rad2ct2(tra4,tia4,tral2,tial2,t
ra20,tia20,tra28,tia28);
[or5,oi5,orl3l3,o,or21,oi21,or29,oi29]=rad2ct2(tra5,tia5,tral3,tial3,t
ra21,tia21,tra29,tia29);
[or6,oi6,orl4l4,ori4,or22,oi22,o30oi30]=rad2ct2(tra6,tia6,tral4,tial4,t
ra22,tia22,tra30,tia30);
[or7,oi7,orl5,oil5,or23,oi23,or31,oi31]=rad2ct2(tra7,tia7,tral5,tial5,t
ra23,tia23,tra31,tia31);

return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% rad2ct4.m %%%%%%%%
function
[orl,oil,or2,oi2,or3,oi3,or4,oi4]=rad2ct4(irl,iil,ir2,ii2,ir3,ii3,ir4,i
i4)
% This function computes the Radix 2 4-point Cooley Tukey FFT given
the four points

% GOVERNING EQUATIONS AND ANALYSIS

% N = 16 ; => (N1 = 4) & (N2 = 4)
% k = kl + 2 k2;
% n = 8 nl + n2;
% kl,nl => (0,1,2,3)
% k2,n2 => (0,1,2,3)

% RESULTS :
% EXECUTION TIME : 40.0 ns Delay when fitted into "EPlk100FC484-
1"device of ACEX 1K family
% FRACTIONAL PRECISION : 2 bits less than input precision
% ERROR OVER A RUN OF 100 TIMES : -1.1922e-04 -1.2181e-04i with a
variance of 0.3048
% AVERAGE NUMBER OF ERRORS
% ERROR TYPES--->
% MAGNITUDE ERRORS ONLY : 0
% SIGN ERRORS ONLY : 0.03
% BOTH MAGNITUDE AND SIGN : 3.97


[trl,til,tr2,ti2]=rad2ct2(irl,iil,ir3,ii3);
[tr3,ti3,tr4,ti4]=rad2ct2(ir2,ii2,ir4,ii4);

[orl,oil,or3,oi3]=rad2ct2(trl,til,tr3,ti3);
[or2,oi2,or4,oi4]=rad2ct2(tr2,ti2,ti4,com(tr4));

return

%%%%%%%%% END %%%%%%%%%%%%%%%%% rad2ct4ifft.m %%%%%%%%







52


function
[orl,oil,or2,oi2,or3,oi3,or4,oi4]=rad2ct4ifft(irl,iil,ir2,ii2,ir3,ii3,i
r4,ii4)
% This function computes the Radix 2 4-point Cooley Tukey FFT given
the four points

[trl,til,tr2,ti2]=rad2ct2(irl,iil,ir3,ii3);
[tr3,ti3,tr4,ti4]=rad2ct2(ir2,ii2,ir4,ii4);

[orl,oil,or3,oi3]=rad2ct2(trl,til,tr3,ti3);
[or2,oi2,or4,oi4]=rad2ct2(tr2,ti2,com(ti4),tr4);

return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% rad2ct8.m %%%%%%%%
function
[or0,oi0,o,orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7]=rad2c
t8(ir0,ii0,irl,iil,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7)
% this function calculates the 8-point radix-2 cooley tukey fft
transform

% GOVERNING EQUATIONS AND ANALYSIS

% N = 16 ; => (N1 = 4) & (N2 = 4)
% k = kl + 2 k2;
% n = 8 nl + n2;
% kl,nl => (0,1,2,3)
% k2,n2 => (0,1,2,3)

% RESULTS :
% EXECUTION TIME : 106.8ns when fitted into "EP1K30QC208-
1","EP1K50FC484-1","EP1K100FC484-1" Decives of ACEX1K family
% FRACTIONAL PRECISION : 5 fractional bits less than the input
% ERROR OVER A RUN OF 100 TIMES : -0.0010 0.0010 i
% AVERAGE NUMBER OF ERRORS
% ERROR TYPES--->
% MAGNITUDE ERRORS ONLY : 0
% SIGN ERRORS ONLY : 1.26
% BOTH MAGNITUDE AND SIGN : 1.17


[tr0,ti0,trl,til,tr2,ti2,tr3,ti3]=rad2ct4(
ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6);
[tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4(irl,iil,ir3,ii3,ir5,ii5,ir7,i
i7);





tra0=shrega(tr0);
tia0=shrega(tiO);
tral=shrega(trl);
tial=shrega(til);
tra2=shrega(tr2);
tia2=shrega(ti2);
tra3=shrega(tr3);







53


tia3=shrega(ti3);
tra4=shrega(tr4);
tia4=shrega(ti4);


w8 rl=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1];
w8 il=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0];
w8 r2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
w8 i2=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
w8 r3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0];
w8 i3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0];

[tra5,tia5]=cplxmull6(tr5,ti5,w8 rl,w8 il);

[tra6,tia6]=cplxmull6(tr6,ti6,w8 r2,w8 i2);

[tra7,tia7]=cplxmull6(tr7,ti7,wSr3,w8_i3);

[or0,oi0,or4,oi4]=rad2ct2(tra0,tia0,tra4,tia4);

[orl,oil,or5,oi5]=rad2ct2(tral,tial,tra5,tia5);

[or2,oi2,or6,oi6]=rad2ct2(tra2,tia2,tra6,tia6);

[or3,oi3,or7,oi7]=rad2ct2(tra3,tia3,tra7,tia7);

return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% rad2ct8ifft.m %%%%%%%%
function
[or0,oio0,orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7]=rad2c
t8ifft(ir0,ii0,irl,iil,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7)
% this function calculates the 8-point radix-2 cooley tukey ifft
transform

[tr0,ti0,trl,til,tr2,ti2,tr3,ti3]=rad2ct4ifft(
irO,ii0,ir2,ii2,ir4,ii4,ir6,ii6);
[tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4ifft(
irl,iil,ir3,ii3,ir5,ii5,ir7,ii7);

tra0=shrega(tr0);
tia0=shrega(tiO);
tral=shrega(trl);
tial=shrega(til);
tra2=shrega(tr2);
tia2=shrega(ti2);
tra3=shrega(tr3);
tia3=shrega(ti3);
tra4=shrega(tr4);
tia4=shrega(ti4);

w8 rl=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1];
w8 il=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0];
w8 r2=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
w8 i2=[0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1];
w8 r3=[1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0];







54


w8 i3=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0];


[tra5,tia5]=cplxmull6(tr5,5,8ti,w rl,wil);
[tra6,tia6]=cplxmull6(tr6,ti6,w8 r2,w8 i2);
[tra7,tia7]=cplxmull6(tr7,ti7,wSr3,w8_i3);

[or0,oi0,or4,oi4]=rad2ct2ifft(tra0,tia0,tra4,tia4);
[orl,oil,or5,oi5]=rad2ct2ifft(tral,tial,tra5,tia5);
[or2,oi2,or6,oi6]=rad2ct2ifft(tra2,tia2,tra6,tia6);
[or3,oi3,or7,oi7]=rad2ct2ifft(tra3,tia3,tra7,tia7);

return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% rad4ct4.m %%%%%%%%
function
[or0,oi0,orll,oil,o2,oi2,or3,oi3]=rad4ct4(ir0,ii0,irl,iil,ir2,ii2,ir3,i
i3)
% This function calculates the raxix 4 4-point Cooley Tukey FFT

% GOVERNING EQUATIONS AND ANALYSIS

% N = 16 ; => (N1 = 4) & (N2 = 4)
% k = kl + 2 k2;
% n = 8 nl + n2;
% kl,nl => (0,1,2,3)
% k2,n2 => (0,1,2,3)

% RESULTS :
% EXECUTION TIME : 44.1ns when fitted into "EP1K100FC484-1" device of
ACEX1K family
% FRACTIONAL PRECISION : 2 less than Input
% ERROR OVER A RUN OF 100 TIMES : -1.1553e-04 -1.1447e-04
% AVERAGE NUMBER OF ERRORS
% ERROR TYPES--->
% MAGNITUDE ERRORS ONLY : 0
% SIGN ERRORS ONLY : 0.0450
% BOTH MAGNITUDE AND SIGN : 3.955


t0=addl6(ir0,irl);
tl=addl6(ir2,ir3);
t2=addl6(ii0,iil);
t3=addl6(ii2,ii3);

t4=subl6(ir0,ir2);
t5=subl6(ii0,ii2);
t6=subl6(iil,ii3);
t7=subl6(irl,ir3);

t8=addl6(ir0,ir2);
t9=addl6(irl,ir3);
t10=addl6(ii0,ii2);
tll=addl6(iil,ii3);


oil=subl6(t5,t7);











oi2=subl6(tl0,tll);
or2=subl6(t8,t9);
or3=subl6(t4,t6);

or0=addl6(t0,tl);
oi0=addl6(t2,t3);
orl=addl6(t4,t6);
oi3=addl6(t5,t7);

return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% rad4ct4ifft.m %%%%%%%%
function
[or0,oi0,or3,oi3,or2,oi2,orl,oil]=rad4ct4(ir0,ii0,irl,iil,ir2,ii2,ir3,i
i3)
% This function calculates the raxix 4 4-point Cooley Tukey FFT

% GOVERNING EQUATIONS AND ANALYSIS

% N = 16 ; => (N1 = 4) & (N2 = 4)
% k = kl + 2 k2;
% n = 8 nl + n2;
% kl,nl => (0,1,2,3)
% k2,n2 => (0,1,2,3)

% RESULTS :
% EXECUTION TIME : 44.1ns when fitted into "EP1K100FC484-1" device of
ACEX1K family
% FRACTIONAL PRECISION : 2 less than Input
% ERROR OVER A RUN OF 100 TIMES : -1.1553e-04 -1.1447e-04
% AVERAGE NUMBER OF ERRORS
% ERROR TYPES--->
% MAGNITUDE ERRORS ONLY : 0
% SIGN ERRORS ONLY : 0.0450
% BOTH MAGNITUDE AND SIGN : 3.955


t0=addl6(iro,irl);
tl=addl6(ir2,ir3);
t2=addl6(iio,iil);
t3=addl6(ii2,ii3);

t4=subl6(ir0,ir2);
t5=subl6(ii0,ii2);
t6=subl6(iil,ii3);
t7=subl6(irl,ir3);

t8=addl6(ir0,ir2);
t9=addl6(irl,ir3);
t10=addl6(ii0,ii2);
tll=addl6(iil,ii3);

oil=subl6(t5,t7);
oi2=subl6(tl0,tll);
or2=subl6(t8,t9);
or3=subl6(t4,t6);







56




or0=addl6(t0,tl);
oi0=addl6(t2,t3);
orl=addl6(t4,t6);
oi3=addl6(t5,t7);

return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% rad4ct8.m %%%%%%%%
function
[or0,oio0,orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7]=rad4c
t8(ir0,ii0,irl,iil,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7)

% GOVERNING EQUATIONS AND ANALYSIS

% N = 16 ; => (N1 = 4) & (N2 = 4)
% k = kl + 2 k2;
% n = 8 nl + n2;
% kl,nl => (0,1,2,3)
% k2,n2 => (0,1,2,3)

% RESULTS :
% EXECUTION TIME : 124.0 ns when fitted into devices of the ACEX 1K
family
% FRACTIONAL PRECISION : 5 fractional bits less than the input
% ERROR OVER A RUN OF 100 TIMES : -9.6769e-04 9.6433e-04i with a
variance of 0.1246
% AVERAGE NUMBER OF ERRORS
% ERROR TYPES--->
% MAGNITUDE ERRORS ONLY : 0
% SIGN ERRORS ONLY : 1.12
% BOTH MAGNITUDE AND SIGN : 1.35


% N1=4 PoiNT FFT

[tr0,ti0,t,trl,til,tr2,ti2,tr3,ti3rad4ct4(ir0,ii0,ir2,ii2,ir4,ii4,ir6,i
i6);

[tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad4ct4(irl,iil,ir3,ii3,ir5,ii5,ir7,i
i7);

% MultiPLiCAtiON PHASE

w8 ir=[0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 1];
w8 li=[1 1 0 1 0 0 1 0 1 0 1 1 1 1 1 1];
w8 2r=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
w8 2i=[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
w8 3r=[1 1 0 1 0 0 1 0 1 0 1 1 1 1 1 1];
w8 3i=[1 1 0 1 0 0 1 0 1 0 1 1 1 1 1 1];

tr0m=shrega(tr0);
trlm=shrega(trl);
tr2m=shrega(tr2);
tr3m=shrega(tr3);
tr4m=shrega(tr4);







57


ti0m=shrega(ti0);
tilm=shrega(til);
ti2m=shrega(ti2);
ti3m=shrega(ti3);
ti4m=shrega(ti4);


[tr5m,ti5m]=cplxmull6(tr5,ti5,w8 lr,w8 li);
[tr6m,ti6m]=cplxmull6(tr6,ti6,w8 2r,w8 2i);
[tr7m,ti7m]=cplxmull6(tr7,ti7,w8 3r,w8 3i);


% N2=2 PoiNT FFT
[or0,oi0,or4,or4,oi4]=rad2ct2(trm,tm,tr4m,ti4m);
[orl,oil,or5,oi5]=rad2ct2(trlm,tilm,tr5m,ti5m);
[or2,oi2,or6,oi6]=rad2ct2(tr2m,ti2m,tr6m,ti6m);
[or3,oi3,or7,oi7]=rad2ct2(tr3m,ti3m,tr7m,ti7m);

return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% ramasfft.m %%%%%%%%
function [b]=ramasfft(a)
% Program to interface my fft engine in place of the standard FFT
function




N=8; %%% MODIFY %%%%%
shift=8; %%% MODIFY %%%%%




or=zeros(N,1);
oi=or;
a re=real(a);
a im=imag(a);

a r=zeros(N,16);
a i=zeros(N,16);
o r=a r;
o i=a i;

for i=l:N
a r(i,:)=convert(a re(i),2);
a i(i,:)=convert(a im(i),2);
end
% 2-point
[orl,oil,or2,oi2]=rad2ct2(a r(1,:),a i(1,:),a r(2,:),a i(2,:));
% 4-point
[orl,oil,or2,oi2,or3,oi3,or4,oi4]=rad2ct4(a r(1,:),a i(1,:),a r(2,:),a
i(2,:),a r (3,:),a i (3,:),a r (4,:),a i (4,:));

[orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8]=rad2c
t8(a r( :),ai (1,:),ar(2,:),ai (2,:),ar(3,:),ai (3,:),ar(4,:),ai (4
,:),a_r(5,:),a_i(5,:),a_r(6, :),a_i(6, :),a_r(7,:),a_i(7,:),a_r(8,:),a i(
8,:));










% 16-point
[orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi
9,orl0,oil0,orll,oill,orl2,oil2,orl3,oil3,orl4,oil4,orl5,oil5,orl6,oil6
]=rad2ctl6(a r(l,:),a i(1,:),a r(2,:),a i(2,:),a r(3,:),a i(3,:),a r(4,
:),ai (4, :),a_r(5, :),a_i(5, :),a_r(6, :),a_i(6, :),a r(7, :),a_i(7, :),a r(8
, :) ,a_i(8, :) ,a_r(9, :) ,a_i(9, :) ,a_r(10, :) ,a_i(10, :),ar(11, :) ,ai(11, :) ,
a_r(12,:),a i(12,:),a r(13,:),a i(13,:),a r(14,:),a i(14,:),a r(15,:),a
i(15,:),ar(16,:),ai(16,:)) ;
% 32-point
[orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi
9,orl0,oil0,orll,oill,orl2,oil2,orl3,oil3,orl4,oil4,orl5,oil5,orl6,oil6
,orl7,oil7,orl8,oil8,orl9,oil9,or20,oi20,or21,oi21,or22,oi22,or23,oi23,
or24,oi24,or25,oi25,or26,oi26,or27,oi27,or28,oi28,or29,oi29,or30,oi30,o
r31,oi31,or32,oi32]=rad2ctl6(a r(l,:),a i(1,:),a r(2,:),a i(2,:),a r(3,
:),a i(3,:),a r(4,:),a i(4, :) ,a r(5,:) ,a_i(5, :),a_r(6,:),a i(6,:),a r(7
, :),a_i(7,:),a_r(8,:),a_i(8,:),ar(9,:),ai(9,:),ar(10,:),ai(10,:),a_
r(ll,:),ai(11,:),ar(12,:),ai(12,:),ar(13,:),ai(13,:),ar(14,:),ai
(14,:),a r(15,:),a i(15,:),a r(16,:),a i(16,:),a r(17,:),a i(17,:),a r(
18,:),a i(18,:),a r(19,:),ai(19,:),a r(20,:),a i(20,:),a r(21,:),ai(2
1,:),a r(22,:),ai(22,:),ar(23,:),ai(23,:),a r(24,:),ai(24,:),ar(25
,:),a i(25,:),a r(26,:),a i(26,:),a r(27,:),a i(27,:),a r(28,:),a i(28,
:),a r(29,:),a i(29,:),ar(30,:),ai(30,:),ar(31,:),ai(31,:),ar(32,:
),ai (32, : ) ) ;
o rl=arr2dec(orl,shift);
o il=arr2dec(oil,shift);
o r2=arr2dec(or2,shift);
o i2=arr2dec(oi2,shift);
o r3=arr2dec(or3,shift);
o i3=arr2dec(oi3,shift);
o r4=arr2dec(or4,shift);
o i4=arr2dec(oi4,shift);
o r5=arr2dec(or5,shift);
o i5=arr2dec(oi5,shift);
o r6=arr2dec(or6,shift);
o i6=arr2dec(oi6,shift);
o r7=arr2dec(or7,shift);
o i7=arr2dec(oi7,shift);
o r8=arr2dec(or8,shift);
o i8=arr2dec(oi8,shift);

% 2-point b=[o rl+j*o il;o r2+j*oi2] ;
% 4-point b=[o rl+j*o il;o r2+j*o i2;o r3+j*o i3;o r4+j*o i4];


b=[o rl+j*o il;o r2+j*o i2;o r3+j*o i3;o r4+j*o i4;o r5+j*o i5;o r6+j*o
i6;o r7+j*oi7;or8+j*oi8] ;


% 16-point
b=[o rl+j*o il;o r2+j*o i2;o r3+j*o i3;o r4+j*o i4;o r5+j*o i5;o r6+j*o
i6;or7+j*oi7;or8+j*o i8;o r9+j*o i9;orlO+j*oilO;orll+j*o ill;o r
12+j*o i12;o rl3+j*o i13;o rl4+j*o il4;o rl5+j*o i15;o rl6+j*o il6];

%
b=[o rl+j*o il;o r2+j*o i2;o r3+j*o i3;o r4+j*o i4;o r5+j*o i5;o r6+j*o
i6;or7+j*oi7;or8+j*o i8;o r9+j*o i9;orlO+j*oilO;orll+j*o ill;o r
12+j*o i12;o rl3+j*o i13;o rl4+j*o il4;o rl5+j*o i15;o rl6+j*o il6;o rl







59


7+j*o i17;o rl8+j*o il8;o rl9+j*o i19;o r20+j*o i20;o r21+j*o i21;o r22
+j*o i22;o r23+j*oi23;o r24+j*o i24;o r25+j*o i25;or26+j*oi26;o r27+
j*o i27;o r28+j*o i28;o r29+j*o i29;o r30+j*o i30;o r31+j*o i31;o r32+j
*o i32];
return
%%%%%%%%% END %%%%%%%%

%%%%%%%%% ramasifft.m %%%%%%%%
function [b]=ramasfft(a)
% Program to interface my fft engine in place of the standard FFT
function



N=8; %%% MODIFY %%%%%
shift=5; %%% MODIFY %%%%%



or=zeros(N,1);
oi=or;
a re=real(a);
a im=imag(a);

a r=zeros(N,16);
a i=zeros(N,16);
o r=a r;
o i=a i;

for i=l:N
a r(i,:)=convert(a re(i),2);
a i(i,:)=convert(a im(i),2);
end
% 4-point
[orl,oil,or2,oi2,or3,oi3,or4,oi4]=rad2ct4ifft(a r(1,:),a i(1,:),a r(2,:
),a i (2,:),a r (3,:),a i (3,:),a r (4,:),a i (4,:));

[orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8]=rad2c
t8ifft(a r(l,:),ai(1,:),ar(2,:),ai(2,:),ar(3,:),ai(3,:),ar(4,:),a
i(4,:),a r(5,:),a i(5,:),a r(6,:),a i(6,:),a r(7,:),a i(7,:),a r(8,:),
a i(8,:));
% 16-point
[orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi
9,orl0,oil0,orll,oill,orl2,oil2,orl3,oil3,orl4,oil4,orl5,oil5,orl6,oil6
]=rad2ctl6ifft(ar(1,:),ai(1,:),a r(2,:),a i(2,:),a r(3,:),a i(3,:),a
r(4,:),a i(4,:),a_r(5,:),a_i(5,:),a_r(6, :),a_i(6, :),a r(7,:),a_i(7,:),a
_r(8,:),ai(8,:),ar(9,:),ai(9,:),ar(10,:),ai(10,:),ar(11, :),ai(11
,:),a r(12,:),a i(12,:),a r(13,:),a i(13,:),a r(14,:),a i(14,:),a r(15,
:),a i(15,:),a r(16,:),a i(16,:)) ;
% 32-point
[orl,oil,or2,oi2,or3,oi3,or4,oi4,or5,oi5,or6,oi6,or7,oi7,or8,oi8,or9,oi
9,orl0,oil0,orll,oill,orl2,oil2,orl3,oil3,orl4,oil4,orl5,oil5,orl6,oil6
,orl7,oil7,orl8,oil8,orl9,oil9,or20,oi20,or21,oi21,or22,oi22,or23,oi23,
or24,oi24,or25,oi25,or26,oi26,or27,oi27,or28,oi28,or29,oi29,or30,oi30,o
r31,oi31,or32,oi32]=rad2ct32ifft(a r(1,:),a i(1,:),a r(2,:),a i(2,:),a
r(3,:),a i(3,:),a r(4,:),a i(4,:),a_r(5,:),a_i(5,:),a_r(6, :),a_i(6, :),a
r(7,:),a i(7,:),a r(8,:),a i(8,:),a r(9,:),a i(9,:),a r(10,:),a i(10,:











),a r(11, :),ai(11,:),ar(12,:),ai(12,:),ar(13,:),ai(13,:),ar(14,:)
,ai(14,:),a r(15,:),a i(15,:),a r(16,:),ai(16,:),a r(17,:),ai(17,:),
a_r(18,:),ai(18,:),ar(19,:),ai(19,:),ar(20,:),ai(20,:),ar(21,:),a
i(21,:),a r(22,:),a i(22,:),a r(23,:),a i(23,:),ar(24,:),a i(24,:),a
r(25,:),a i(25,:),a r(26,:),a i(26,:),a r(27,:),a i(27,:),a r(28,:),a i
(28,:),ar(29,:),ai(29,:),ar(30,:),ai(30,:),ar(31,:),ai(31,:),ar(
32,:),a i(32,:));
o rl=arr2dec(orl,shift);
o il=arr2dec(oil,shift);
o r2=arr2dec(or2,shift);
o i2=arr2dec(oi2,shift);
o r3=arr2dec(or3,shift);
o i3=arr2dec(oi3,shift);
o r4=arr2dec(or4,shift);
o i4=arr2dec(oi4,shift);
o r5=arr2dec(or5,shift);
o i5=arr2dec(oi5,shift);
o r6=arr2dec(or6,shift);
o i6=arr2dec(oi6,shift) ;
o r7=arr2dec(or7,shift);
o i7=arr2dec(oi7,shift);
o r8=arr2dec(or8,shift);
o i8=arr2dec(oi8,shift);

% 4-point b=[o rl+j*o il;o r2+j*o i2;o r3+j*o i3;o r4+j*o i4];


b=[o rl+j*o il;o r2+j*o i2;o r3+j*o i3;o r4+j*o i4;o r5+j*o i5;o r6+j*o
i6;o r7+j*oi7;or8+j*oi8] ;


% 16-point
b= [o rl+j*o il;o r2+j*o i2;o r3+j*o i3;o r4+j*o i4;o r5+j*o i5;o r6+j*o
i6;or7+j*oi7;or8+j*o i8;o r9+j*o i9;orl0+j*oil0;orll+j*o ill;o r
12+j*o i12;o rl3+j*o i13;o rl4+j*o il4;o rl5+j*o i15;o rl6+j*o i16];

%
b=[o rl+j*o il;o r2+j*o i2;o r3+j*o i3;o r4+j*o i4;o r5+j*o i5;o r6+j*o
i6;or7+j*oi7;or8+j*o i8;o r9+j*o i9;orl0+j*oil0;orll+j*o ill;o r
12+j*o il2;o rl3+j*o il3;o rl4+j*o il4;o rl5+j*o il5;o rl6+j*o i16;o rl
7+j*oil7;o rl8+j*o_il8;orl9+j*o il9;o r20+j*o i20;o r21+j*oi21;o r22
+j*o i22;o r23+j*o i23;o r24+j*o i24;o r25+j*o i25;o r26+j*o i26;o r27+
j*o i27;o r28+j*o i28;o r29+j*o i29;o r30+j*o i30;o r31+j*o i31;o r32+j
*o i32];
return
%%%%%%%%% END %%%%%%%%


%%%%%%%%% shrega.m %%%%%%%%
function [S]=shrega(a)
% this function shifts the register contents to the right
% while taking care of SIGN EXTENTION
S=[a(1) a(1) a(1) a];
S=S(1:16);
return
%%%%%%%%% END %%%%%%%%







61


%%%%%%%%% snrvaryl.m %%%%%%%%
% This program reads data from a text file 'intext.txt' and sends the
file data to the IFFT engine
% and may add noise to the transmitted data before receiving it and
passing it to the FFT engine
% to decipher its output. The output is then written to a file called
'outtext.txt'

% Gather FIDs for input and output files
infid= fopen('outtext.txt','r');
outfid=fopen('outtext2.txt','w');

% Noise Measure
snratio=[-20 -15 -10 -5 -4 -3 -2 -1 0];
lengsnr=length(snratio);
% Read Data from input file
[indata,incount]=fread(infid,'bitl');

%indata=randint(50,1);
%incount=50;
for g=l:lengsnr
g
% Initialize Outout data
outdata=[] ;
outsymbol=[] ;

N=8;

datalen=incount;
rema=mod(incount,2*N);
if (rema-=0)
indata=[indata;zeros(2*N-rema,1)];
datalen=datalen+2*N-rema
end
%indata=randint(datalen/N,1);
batches=datalen/N
symbollist=[] ;

% Encoding the input Data...
disp('Encoding the Input Data ...');

for i=0:batches-1
symbollist=[symbollist; getsymbol(indata(N*i+l:N*(i+1)))] ;
end
disp('Encoding the Input Data .... Done');


for i=0:batches/2-1
z=getformat(symbollist(N*i+l:N*(i+1/2)));
transmit=ramasifft8(z);
powx=sum(abs(transmit.*transmit)); % Power of the transmitted
window of data

pownoise=powx 10 ^ ( snratio(g) / 10 ); % Noise power
calculation


noise=rand(N,1);







62



inputnoise=pownoise noise;

receiverinput=transmit+inputnoise;


receive=ramasfft8(receiverinput);
p=deformat(receive);
outsymbol=[outsymbol; p];
z=getformat(symbollist(N*(i+1/2)+1:N*(i+1)));
transmit=ramasifft8(z);
powx=sum(abs(transmit.*transmit)); % Power of the transmitted
window of data

pownoise=powx 10 ^ ( snratio(g) / 10 ); % Noise power
calculation


noise=rand(N,1);

inputnoise=pownoise noise;

receiverinput=transmit+inputnoise;

receive=ramasfft8(receiverinput);
p=deformat(receive);
outsymbol=[outsymbol; p];


end


% finding the nearest point in the given
disp('Fitting the received vector in the
lenoutsymlist=length(outsymbol);
symout=zeros(lenoutsymlist,1);
for k=l:lenoutsymlist
symout(k)=getpoint4qam(outsymbol(k));
end


% Decoding the input Data...
disp('Decoding the Output Data ...');
for i=0:batches/2-1
y=symout(N*i+l:N*(i+1));
outdata=[outdata; getnumber(y)];


constellation
constellation');


end
disp('Decoding the Output Data .... Done');

[numberoferrors(g),ber(g)]=biterr(abs(indata),abs(outdata));
end


count=fwrite(outfid,outdata,'bitl');
st=fclose('all');
errdata=indata-outdata;
plot(abs(indata) 'ro');
plot(abs(indata) 'r');
hold on
plot(abs(outdata),'kd');
plot(abs(outdata) 'k');
plot(abs(errdata) '*');







63


plot(abs(errdata));
hold off
%%%%%%%%% END %%%%%%%%

%%%%%%%%% subl6.m %%%%%%%%
function [S]=subl6(a,b)
% This function calculates the difference between two 16-bit numbers
and gives the
% result as a 16 bit word avoiding an overflow. So one bit of precision
is lost in the
% process.

T=com(b);
S=addl6(a,T);
return
%%%%%%%%% END %%%%%%%%














APPENDIX B
32-BIT COOLEY-TUKEY AND CHIRP-Z IMPLEMENTATION

%%%%%%%% add40.m %%%%%%%%%
function [C]=add40(A,B)
% function [C]=add40(A,B)
% Adds two 32 bit numbers to produce a 40 bit output
%A,B --> in 32 bits
%C --> out 40 bits
Cin=0;
A=[A zeros(1,8)];
B=[B zeros(1,8)];
[tmp ,cO]=ad 6c(A(25:40),B(25:40));
[tmp2,cl]=adl2c(A(13:24),B(13:24),cO);
[tmp3,c2]=adl2c(A(1: 12),B(1:12),cl);
test=xor(A(1) ,B(1));
testbar=-test;
term l=tmp3(1) & test;
term2=c2 & testbar;
msb=terml | term2;
C=[msb tmp3 tmp2 tmpl];
C=C(1:40);
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% bm4.m %%%%%%%%%
function [Pl]=bm4(X1,Y1)
X=X1(4:-1:1);
Y=Y1(4:-1:1);
%port(X,Y:in stdlogic vector(4 downto 1);
% P:out stdlogic vector(8 downto 1));
LOW=0;
fori= 1:4
forj=1:4
t(i+4*j-4)=X(i) & Y(j);
end
end
P(1)=t(1);
[P(2),c(l)]=fulladder(t(5),LOW,t(2));
[a(1),c(2)]=fulladder(t(6),LOW,t(3));
[a(2),c(3)]=fulladder(t(7),LOW,t(4));
[P(3),c(4)]=fulladder(t(9),c(1),a(1));









[a(3),c(5)]=fulladder(t(10),c(2),a(2));
[a(4),c(6)]=fulladder(t(11),c(3),t(8));
[P(4),c(7)]=fulladder(t(13),c(4),a(3));
[a(5),c(8)]=fulladder(t(14),c(5),a(4));
[a(6),c(9)]=fulladder(t(15),c(6),t(12));
[P(5),c(10)]=fulladder(LOW,c(7),a(5));
[P(6),c(11)]=fulladder(c(10),c(8),a(6));
[P(7),P(8)]=fulladder(c(11),c(9),t(16));
P1=P(8:-1:1);
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% bm8.m %%%%%%%%%
function[P ]=bm8(A1,B 1)
%port(X,Y:= stdlogic vector(7 downto 0);
% P:out stdlogic vector(15 downto 0));
X=A1(8:-1:1);
Y=B1(8:-1:1);
LOW=0;
fori = 1:8
forj = 1:8
t(i+8*j-8)=X(i) & Y(j);
end
end
P(1)=t(1);
[P(2),c(1)]=fulladder(t(9),LOW,t(2));
for i=2:7
[a(i-1),c(i)]=fulladder(t(i+1),LOW,t(i+8));
end
[P(3),c(8)]=fulladder(t(17),c(1),a(1));
for i=1:5
[a(i+6),c(i+8)]=fulladder(t(i+17),a(i+l),c(i+l));
end
[a(12),c(14)]=fulladder(t(16),c(7),t(23));
[P(4),c(15)]=fulladder(t(25),c(8),a(7));
for i=1:5
[a(i+12),c(i+ 15)] =fulladder(t(i+25),a(i+7),c(i+8));
end
[a( 18),c(2 1)]=fulladder(t(24),c(14),t(31));
[P(5),c(22)]=fulladder(t(33),c(15),a(13));
for i=1:5
[a(i+ 18),c(i+22)]=fulladder(t(i+33),a(i+ 13),c(i+ 15));
end
[a(24),c(28)]=fulladder(t(32), c(21),t(39));
[P(6),c(29)]=fulladder(t(41),c(22),a(19));
for i=1:5









[a(i+24),c(i+29)] =fulladder(t(i+41),a(i+19),c(i+22));
end
[a(30),c(35)]=fulladder(t(40),c(28),t(47));
[P(7),c(36)]=fulladder(t(49),c(29),a(25));
for i=1:5
[a(i+30),c(i+36)] =fulladder(t(i+49),a(i+25),c(i+29));
end
[a(36),c(42)]=fulladder(t(48),c(35),t(55));
[P(8),c(43)]=fulladder(t(57),c(36),a(31));
for i=1:5
[a(i+36),c(i+43)]=fulladder(t(i+55),a(i+31),c(i+36));
end
[a(42),c(49)]=fulladder(t(56),c(42),t(63));
[P(9),c(50)]=fulladder(LOW,c(43),a(37));
[P(10),c(5 1)]=fulladder(c(50),c(44), a(3 8));
[P(11),c(52)]=fulladder(c(5 1),c(45),a(39));
[P(12),c(53)]=fulladder(c(52),c(46),a(40));
[P(13),c(54)]=fulladder(c(53),c(47),a(41));
[P(14),c(55)] =fulladder(c(54),c(48),a(42));
[P(15),P(16)]=fulladder(c(55),c(49),t(64));
P1=P(16:-1:1);
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% bwcelll.m %%%%%%%%%
function [P]=bwcell 1(X,Y)
P=X & Y;
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% bwcell2.m %%%%%%%%%
function [SUMOUT,COUT]=bwcell2(X,Y,SUMIN, CIN)
D=X & Y;
[SUMOUT,COUT]=fulladder(D,SUMIN,CIN);
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% bwcell3.m %%%%%%%%%
function [s]=bwcell3(a,b)
% Cell 3 of the Baugh Wooley Multiplier
s=a & -b;
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% bwcell4.m %%%%%%%%%
function [s,cout]=bwcell4(a,b,Sin,Cin)










% Cell 4 of the Baugh Wooley Multiplier
c=-a & b;
[s,cout]=fulladder(c, Sin,Cin);
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% bwcell5.m %%%%%%%%%
function [s,cout]=bwcell5(x,y)
% Cell 5 of the Baugh Wooley Multiplier
b=x & y;
c=-x;
d=~y;
[s,cout]=fulladder(b,c,d);
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% circconv.m %%%%%%%%%
function z=circconv(a,b)
n=length(a);
z=zeros(size(a));
if(n~=length(b))
ERROR('Unequal Vector lengths in the circular convolution argument');
end
x=l;
y=l;
for i=l:n
z(i)=0;
forj=l:n
if(x>n)
x=x-n;
end
if(y y=y+n;
end
z(i)=z(i)+a(x)*b(y);
x=x+1;
y=y-1;
end
y=y+1;
end
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% com32.m %%%%%%%%%
function [b]=com32(a)
%function [b]=com32(a)









% returns the 16-bit complement of a number represented by the array 'a'
% for example,
%ifa=[10011000101001111011011011000101]
%thenb=[01100111010110000100100100111011]
tmp=[00000000000000000000000000000001];
tmpl=-a;
[tmp3]= add40(tmpl,tmp);
b=tmp3(2:33);
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% cplxmul 6.m %%%%%%%%%
function [pr,pi]=cplxmull6( xr,xi,yr,yi,clk,resetn)
tmp0=pipemul 16 (xr,yr,clk,resetn);
tmp l=pipemull6 (xr,yi,clk,resetn);
tmp2=pipemul 16 (xi,yr,clk,resetn);
tmp3=pipemul 6 (xi,yi,clk,resetn);
pra=sub40(tmp0,tmp3);
pia=add40(tmp2,tmp 1);
pr=pra(1:32);
pi=pia(1:32);
return
%%%%%%%% END %%%%%%%%%

%%%%%%%% czt.m %%%%%%%%%
N=4;
ROM =zeros(size(cplx));
ROMO=zeros(size(cplx));
sqr=zeros(size(cplx));
hi=zeros(size(cplx));
hr=zeros(size(cplx));
for i=l:N
ROMO(i)=cos(pi*(i-l)*(i-1)/N);
ROMl(i)=-sin(pi*(i-l)*(i-1)/N);
hr(i)=ROMO(i);
hi(i)=-ROMl(i);
end
gO= cplx .* ROMO;
gl= cplx.* ROM1;
hi
c=l;
ol=circconv(hr,g0);
o2=circconv(hi,g0);
o3=circconv(hr,gl);
o4=circconv(hi,gl);
a0=ol-o4;









al=o2+o3;
sl=a0 .* aO;
s2=al .* al;
a3=sl+s2;
sqr=sqrt(a3);
%%%%%%%% END %%%%%%%%%

%%%%%%%% deformat.m %%%%%%%%%

function [deformed]=deformat(formatin)
% Function reads the out put of the FFT at the receiver and deformats it into
% its constituent symbols.

len=length(formatin);

deformed= [real(formatin( 1))+j *imag(formatin(len/2))];

for i=2:len/2
deformed=[deformed; formatin(i)];
end
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% fircos2.m %%%%%%%%%

function [zO,zl,z2,z3]=fircos2(x0,xl,x2,x3);


yO = [0 10 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 4000h--> 1
yl = [0 0 1 1 1 0 1 10 0 1 0 0 0 0 0]; % 3b20 --> 0.9239

p0=bwml6a(x0,y0);
pl=bwml6a(xl,yl);
z0=addl6(p0,pl);

q0=bwml6a(x0,yl);
ql=bwml6a(xl,y0);
zl=addl6(q0,ql);

return

%%%%%%%% END %%%%%%%%%

%%%%%%%% fircos4.m %%%%%%%%%









function [zO,zl,z2,z3]=fircos4(x0,xl,x2,x3);


%yO
yl=
y3=
y2=


=[0100000000000000];
[0010110101000001];
[1100000000000000];
[0010110101000001];


% 4000h --> 1
%2d41 -->0.7071
% c000 -->-1
% 2d41 -->0.7071


p0=shrega(x0);
pl=bwml6a(xl,y3);
p2=bwml6a(x2,y2);
p3=bwml6a(x3,yl);


pr01=addl6(p0,pl);
pr23=addl6(p2,p3);

z0=addl6(pr01,pr23);


=bwml6a(x0,yl);
=shrega(xl);
=bwml6a(x2,y3);
=bwml6a(x3,y2);


qr01=addl6(q0,ql);
qr23=addl6(q2,q3);
zl=addl6(qr01,qr23);


:bwml6a(x0,y2);
=bwml6a(xl,yl);
=shrega(x2);
=bwml6a(x3,y3);


rr01=addl6(r0,rl);
rr23=addl6(r2,r3);
z2=addl6(rr01,rr23);

s0=bwml6a(xO,y3);
sl=bwml6a(xl,y2);
s2=bwml6a(x2,yl);
s3=shrega(x3);

srOl=addl6(s0,sl);
sr23=addl6(s2,s3);
z3=addl6(sr01,sr23);









return

%%%%%%%% END %%%%%%%%%

%%%%%%%% fircos8.m %%%%%%%%%

function [z0,zl,z2,z3,z4,z5,z6,z7]=fircos8(x0,xl,x2,x3,x4,x5,x6,x7);


[0 10
[00 1
[11 0
[000
[0 10
[11 0
[000
[0 0 1


0000000000
1101100100
0010011011
0000000000
0000000000
0010011011
0000000000
1101100100


000];
000];
111];
000];
000];
111];
000];
0 0 0];


% 4000h --> 1
% 3b20 -->0.9239
% c4df --> -0.9239
% 0000 -> 0
% 4000h --> 1
% c4df --> -0.9239
% 0000 -->0
% 3b20 --> 0.9239


p0=bwml6a(x0,y0);
pl=bwml6a(xl,y7);
p2=bwm 16a(x2,y6);
p3=bwml6a(x3,y5);
p4=bwm 16a(x4,y4);
p5=bwml6a(x5,y3);
p6=bwm 16a(x6,y2);
p7=bwml6a(x7,yl);


pr01=addl6(pO,pl);
pr23=addl6(p2,p3);
pr45=addl6(p4,p5);
pr67=addl6(p6,p7);
pri03=addl6(pr01,pr23);
pri47=addl6(pr45,pr67);
z0=addl6(pri03,pri47);

q0=bwml6a(x0,yl);
ql=bwml6a(xl,yO);
q2=bwml6a(x2,y7);
q3=bwml6a(x3,y6);
q4=bwm 6a(x4,y5);
q5=bwm 6a(x5,y4);
q6=bwm 16a(x6,y3);
q7=bwm 16a(x7,y2);


qr01=addl6(qO,ql);









qr23=addl6(q2,q3);
qr45=addl6(q4,q5);
qr67=addl6(q6,q7);
qri03=addl6(qr01,qr23);
qri47=addl6(qr45,qr67);
zl=addl6(qri03,qri47);


r0=bwml6a(xO,y2);
rl=bwml6a(xl,yl);
r2=bwml6a(x2,y0);
r3=bwml6a(x3,y7);
r4=bwm 16a(x4,y6);
r5=bwm 6a(x5,y5);
r6=bwm 16a(x6,y4);
r7=bwml6a(x7,y3);

rr01=addl6(rO,rl);
rr23=addl6(r2,r3);
rr45=addl6(r4,r5);
rr67=addl6(r6,r7);
rri03=addl6(rr01,rr23);
rri47=addl6(rr45,rr67);
z2=addl6(rriO3,rri47);


s0=bwml6a(xO,y3);
sl=bwml6a(xl,y2);
s2=bwml6a(x2,yl);
s3=bwml6a(x3,y0);
s4=bwm 16a(x4,y7);
s5=bwml6a(x5,y6);
s6=bwm 6a(x6,y5);
s7=bwm 16a(x7,y4);

srOl=addl6(s0,sl);
sr23=addl6(s2,s3);
sr45=addl6(s4,s5);
sr67=addl6(s6,s7);
sri03=addl6(sr01,sr23);
sri47=addl6(sr45,sr67);
z3=addl6(sri03,sri47);


t0=bwml6a(xO,y4);
tl=bwml6a(xl,y3);









t2=bwml6a(x2,y2);
t3=bwml6a(x3,yl);
t4=bwml6a(x4,y0);
t5=bwm 6a(x5,y7);
t6=bwm 16a(x6,y6);
t7=bwm 6a(x7,y5);

tr01=addl6(tO,tl);
tr23=addl6(t2,t3);
tr45=addl6(t4,t5);
tr67=addl6(t6,t7);
tri03=addl6(tr01,tr23);
tri47=addl6(tr45,tr67);
z4=addl6(tri03,tri47);

u0=bwm 6a(xO,y5);
ul=bwml6a(xl,y4);
u2=bwml6a(x2,y3);
u3=bwml6a(x3,y2);
u4=bwml6a(x4,yl);
u5=bwm 6a(x5,y0);
u6=bwm 16a(x6,y7);
u7=bwm 16a(x7,y6);
ur01=addl6(uO,ul);
ur23=addl6(u2,u3);
ur45=addl6(u4,u5);
ur67=addl6(u6,u7);
uri03=addl6(ur01,ur23);
uri47=addl6(ur45,ur67);
z5=addl6(uri03,uri47);

v0=bwm 16a(xO,y6);
vl=bwml6a(xl,y5);
v2=bwm 16a(x2,y4);
v3=bwml6a(x3,y3);
v4=bwm 16a(x4,y2);
v5=bwml6a(x5,yl);
v6=bwm 16a(x6,y0);
v7=bwm 16a(x7,y7);
vr01=addl6(vO,vl);
vr23=addl6(v2,v3);
vr45=addl6(v4,v5);
vr67=addl6(v6,v7);
vri03=addl6(vr01,vr23);
vri47=addl6(vr45,vr67);
z6=addl6(vri03,vri47);









w0=bwm 6a(x0,y7);
wl=bwml6a(xl,y6);
w2=bwm 6a(x2,y5);
w3=bwml6a(x3,y4);
w4=bwm 6a(x4,y3);
w5=bwm 6a(x5,y2);
w6=bwml6a(x6,yl);
w7=bwm 6a(x7,y0);

wr01=addl6(w0,wl);
wr23=addl6(w2,w3);
wr45=addl6(w4,w5);
wr67=add 6(w6,w7);
wri03=add 16(wr01,wr23);
wri47=addl6(wr45,wr67);
z7=add 16(wri03,wri47);
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% firsin2.m %%%%%%%%%

function [z0,zl,z2,z3]=firsin2(x0,xl,x2,x3);


yO= [0 10 0 0 0 0 0 0 0 0 0 0 0 0 0]; % 4000h-->1
yl = [0 0 1 1 1 0 1 10 0 1 0 0 0 0 0]; % 3b20 -->0.9239

p0=bwml6a(x0,y0);
pl=bwml6a(xl,yl);
z0=addl6(p0,pl);

q0=bwml6a(x0,yl);
ql=bwml6a(xl,y0);
zl=addl6(q0,ql);

return

%%%%%%%% END %%%%%%%%%

%%%%%%%% firsin4.m %%%%%%%%%

function [z0,z ,z2,z3]=firsin4(x0,x ,x2,x3);

yl = [0 0 1 0 1 10 10 10 0 0 0 0 1]; % 2d41h -->0.7071
y3 = [0 0 1 0 1 10 10 10 0 0 0 0 1]; % 2d41 -->0.7071










p0=[0 0000 00 00 00 0000];
pl=bwml6a(xl,y3);
p2=[0 00 0 0 0 0 0 0 000 00 ];
p3=bwml6a(x3,yl);


pr01=addl6(pO,pl);
pr23=addl6(p2,p3);

z0=addl6(pr01,pr23);


q0=bwml6a(x0,yl);
ql=[0O 00000000000 0000];
q2=bwml6a(x2,y3);
q3=[0 00 00 00 00 000000];

qr01=addl6(q0,ql);
qr23=addl6(q2,q3);
zl=addl6(qr01,qr23);


r0=[0 00 000 00 00 00000];
rl=bwml6a(xl,yl);
r2=[0 00000 00 00 00 000];
r3=bwml6a(x3,y3);

rr01=addl6(r0,rl);
rr23=addl6(r2,r3);
z2=addl6(rr01,rr23);

s0=bwml6a(x0,y3);
sl=[0 000 00 00 00 00000];
s2=bwml6a(x2,yl);
s3=[0 000 00 00 00 00000];

sr01=addl6(s0,sl);
sr23=addl6(s2,s3);
z3=addl6(sr01,sr23);
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% firsin8.m %%%%%%%%%









function [zO,zl,z2,z3,z4,z5,z6,z7]=firsin8(x0,xl,x2,x3,x4,x5,x6,x7);


[00000000
[00011000
[1 1 100 111
[01000000
[00000000
[1 1 100 111
[01000000
[00011000


000
011
100
000
000
100
000
011


000
111
000
000
000
000
000
111


00];
0 1];
1 0];
00];
00];
1 0];
00];
0 1];


% 0000h -->0
% 187d -->0.3827
% e782 -->-0.9239
% 4000-->1
% 0000h -->0
% e782 -->-0.9239
% 4000--> 1
% 187d -->0.3827


p0=bwml6a(x0,y0);
pl=bwml6a(xl,y7);
p2=bwm 16a(x2,y6);
p3=bwml6a(x3,y5);
p4=bwm 16a(x4,y4);
p5=bwml6a(x5,y3);
p6=bwm 16a(x6,y2);
p7=bwml6a(x7,yl);


pr01=addl6(pO,pl);
pr23=addl6(p2,p3);
pr45=addl6(p4,p5);
pr67=addl6(p6,p7);
pri03=addl6(pr01,pr23);
pri47=addl6(pr45,pr67);
z0=addl6(pri03,pri47);


=bwml6a(x0,yl);
=bwml6a(xl,yO);
=bwml6a(x2,y7);
=bwml6a(x3,y6);
=bwml6a(x4,y5);
=bwml6a(x5,y4);
=bwml6a(x6,y3);
=bwml6a(x7,y2);


qr01=addl6(q0,ql);
qr23=addl6(q2,q3);
qr45=addl6(q4,q5);
qr67=addl6(q6,q7);
qri03=addl6(qr01,qr23);
qri47=addl6(qr45,qr67);
zl=addl6(qri03,qri47);










r0=bwml6a(xO,y2);
rl=bwml6a(xl,yl);
r2=bwml6a(x2,y0);
r3=bwml6a(x3,y7);
r4=bwm 16a(x4,y6);
r5=bwm 6a(x5,y5);
r6=bwm 16a(x6,y4);
r7=bwm 16a(x7,y3);

rr01=addl6(rO,rl);
rr23=addl6(r2,r3);
rr45=addl6(r4,r5);
rr67=addl6(r6,r7);
rri03=addl6(rr01,rr23);
rri47=addl6(rr45,rr67);
z2=addl6(rriO3,rri47);


s0=bwml6a(xO,y3);
sl=bwml6a(xl,y2);
s2=bwml6a(x2,yl);
s3=bwml6a(x3,y0);
s4=bwm 16a(x4,y7);
s5=bwml6a(x5,y6);
s6=bwm 6a(x6,y5);
s7=bwm 16a(x7,y4);

srOl=addl6(s0,sl);
sr23=addl6(s2,s3);
sr45=addl6(s4,s5);
sr67=addl6(s6,s7);
sri03=addl6(sr01,sr23);
sri47=addl6(sr45,sr67);
z3=addl6(sri03,sri47);


t0=bwml6a(xO,y4);
tl=bwml6a(xl,y3);
t2=bwml6a(x2,y2);
t3=bwml6a(x3,yl);
t4=bwml6a(x4,y0);
t5=bwm 6a(x5,y7);
t6=bwm 16a(x6,y6);
t7=bwm 6a(x7,y5);









tr01=addl6(tO,tl);
tr23=addl6(t2,t3);
tr45=addl6(t4,t5);
tr67=addl6(t6,t7);
tri03=addl6(tr01,tr23);
tri47=addl6(tr45,tr67);
z4=addl6(tri03,tri47);

u0=bwm 6a(xO,y5);
ul=bwml6a(xl,y4);
u2=bwml6a(x2,y3);
u3=bwml6a(x3,y2);
u4=bwml6a(x4,yl);
u5=bwm 6a(x5,y0);
u6=bwm 16a(x6,y7);
u7=bwm 16a(x7,y6);
ur01=addl6(uO,ul);
ur23=addl6(u2,u3);
ur45=addl6(u4,u5);
ur67=addl6(u6,u7);
uri03=addl6(ur01,ur23);
uri47=addl6(ur45,ur67);
z5=addl6(uri03,uri47);

v0=bwm 16a(xO,y6);
vl=bwml6a(xl,y5);
v2=bwm 16a(x2,y4);
v3=bwml6a(x3,y3);
v4=bwm 16a(x4,y2);
v5=bwml6a(x5,yl);
v6=bwm 16a(x6,y0);
v7=bwm 16a(x7,y7);
vr01=addl6(vO,vl);
vr23=addl6(v2,v3);
vr45=addl6(v4,v5);
vr67=addl6(v6,v7);
vri03=addl6(vr01,vr23);
vri47=addl6(vr45,vr67);
z6=add 16(vri03,vri47);
w0=bwm 16a(xO,y7);
wl=bwml6a(xl,y6);
w2=bwm 6a(x2,y5);
w3=bwml6a(x3,y4);
w4=bwm 6a(x4,y3);
w5=bwm 6a(x5,y2);
w6=bwml6a(x6,yl);









w7=bwm 6a(x7,y0);

wr01=addl6(w0,wl);
wr23=addl6(w2,w3);
wr45=addl6(w4,w5);
wr67=add 6(w6,w7);
wri03=add 16(wr01,wr23);
wri47=addl6(wr45,wr67);
z7=add 16(wri03,wri47);
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% getformat.m %%%%%%%%%

function [format]=getformat(a)
% function gets the input vector a into a format that allows only real transmissions
% to be possible in the time domain data.

lena=length(a);

format=[real(a(1)); a(2:lena); imag(a(l)); conj(a(2:lena))];
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% icztsim.m %%%%%%%%%

function sqr=icztsim(cplx)
N=length(cplx);
ROM =zeros(size(cplx));
ROMO=zeros(size(cplx));
sqr=zeros(size(cplx));
hi=zeros(size(cplx));
hr=zeros(size(cplx));

for i=l:N
ROMO(i)=cos(pi*(i-l)*(i-1)/N);
ROM1(i)=sin(pi*(i-l)*(i-1)/N);
hr(i)=ROMO(i);
hi(i)=ROM1(i);
end
gO= cplx .* ROMO;
gl= cplx.* ROM1;


c=l;










-circconv(hr,g0);
=circconv(hi,g0);
=circconv(hr,gl);
=circconv(hi,gl);


a0=ol-o4;
al=o2+o3;

sl=a0 .* aO;
s2=al .* al;

a3=sl+s2;
sqr=sqrt(a3);
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% muln.m %%%%%%%%%


function
%muln
%----


[C]=muln(A,B)


%A,B --> inputs (n downto 1)
%clk --> input 1 bit

%C --> output(2n downto 1)
n=length(A);
C=zeros(2*n,1);

if n==2

C(1) =A(1) & B(1);
x=A(2) & B(2);
y= A(1)& B(2);
z= A(2)& B(1);
C(2)= xor(y,z);
n=y & z;
C(3)=xor(n,x);
C(4)=n & x;
else
p=muln(A(n:-1:n/2+1),B(n:-1:n/2+1));
q=muln(A(n:-1:n/2+1),B(n/2:-1:1));
r=muln(A(n/2:-1:1),B(n:-1:n/2+1));
s=muln(A(n/2:-1:1 ),B(n/2:-1:1));









tmp=[zeros(1,n/2); s(n:-1:n/2+1)];
tmp2=tmp+r+r;
tmp3=[zeros(l,n/2) tmp2(n:-1:n/2+1)];
tmp4=tmp3+p;
C=[tmp4 tmp2(n/2:-1:1) s(n/2:-1:l)];
end;
return;

%%%%%%%% END %%%%%%%%%

%%%%%%%% multiplier.m %%%%%%%%%

function [C]=muln(A,B)
%muln
%----
%A,B --> inputs (n downto 1)
%clk --> input 1 bit
%C --> output(2n downto 1)
n=length(A);
C=zeros(2*n,1);

if n==2
then
C(1) =A(1) and B(1);
x=A(2) and B(2);
y= A(1) and B(2);
z= A(2) and B(1);
C(2)= y xor z;
n=y and z;
C(3)=n xor x;
C(4)=n and x;
return;
else
p=muln(A(n:n/2+1),B(n:n/2+1);
q=muln(A(n:n/2+1,B(n/2: 1));
r=muln(A(n/2: 1),B(n:n/2+1);
s=muln(A(n/2: 1),B(n/2:1));
tmp=[zeros(1,n/2) s(n:n/2+1)];
tmp2=tmp+r+r;
tmp3=[zeros(1,n/2) tmp2(n:n/2+1)];
tmp4=tmp3+p;
C=[tmp4 tmp2(n/2:1) s(n/2:1)];
return;
end if;

%%%%%%%% END %%%%%%%%%










%%%%%%%% pipemull6.m %%%%%%%%%

function result=pipemul 16(xal,xb 1,clk,resetn)
LOW=0;
ya=coml6(xal);
yb=coml6(xbl);
ifxal(1)==l
tal=ya(9:16);
ta2=ya(1:8);
else
tal=xal(9:16);
ta2=xal(1:8);
end
ifxbl(1)==l
tb1=yb(9:16);
tb2=yb(1:8);
else
tbl=xbl(9:16);
tb2=xb 1(1:8);
end
tmp_l =pipemul8(ta l,tb 1,clk,resetn);
tmp0_2=pipemul8(ta2,tb ,clk,resetn);
tmp0_3=pipemul8(tal,tb2,clk,resetn);
tmp0_4=pipemul8(ta2,tb2,clk,resetn);
tmpl_l=regl6(tmp0_, resetn);
tmp l2=reg16(tmp0_2,resetn);
tmpl_3=regl6(tmp0_3,resetn);
tmp l4=reg16(tmp0_4,resetn);
tmp2_2=addl 617(tmp l2,tmp 1_3);
tmp3_1=regl6(tmp 1_1,resetn);
tmp3_2=regl7(tmp2_2,resetn);
tmp3_3=regl6(tmpl_4,resetn);
tcl=[tmp3_2(10:17) zeros(1,8)];
tc2=[zeros(1,7) tmp3_2(1:9)];
tmp4_1 =add l617(tmp3_1,tcl);
tmp4_2=regl6(tc2,resetn);
tmp4_3=regl6(tmp3_3,resetn);
tmp5=adl6ca(tmp4_2,tmp4_3,tmp4_1(1));
p=[tmp5(2:17) tmp4_1(2:17)];
pl=com32(p);
ifxal(1)==xbl(1)
result=p;
else
result=p1;
end









return


%%%%%%%% END %%%%%%%%%

%%%%%%%% pipemul8.m %%%%%%%%%

function p=pipemul8(xal,xbl,clk,resetn)
LOW=0;

tbl=xbl(5:8);
tb2=xbl(1:4);
tal=xal(5:8);
ta2=xal(1:4);
tmp0_l=bm4(tal,tbl);
tmp0_2=bm4(ta2,tb 1);
tmp0_3=bm4(tal,tb2);
tmp0_4=bm4(ta2,tb2);
tmpl_1=reg8(tmp0_ ,resetn);
tmpl_2=reg8(tmp0_2,resetn);
tmpl_3=reg8(tmp0_3,resetn);
tmp l4=reg8(tmp0_4,resetn);
tmp2_2=add89(tmpl_2,tmpl_3);
tmp3_1 =reg8(tmpl_1,resetn);
tmp3_2=reg9(tmp2_2,resetn);
tmp3_3=reg8(tmpl_4,resetn);

tcl=[tmp3_2(6:9) zeros(1,4)];
tc2=[zeros(1,3) tmp3_2(1:5)];

tmp4_1=add89(tmp3_l,tcl);
tmp4_2=reg8(tc2,resetn);
tmp4_3=reg8(tmp3_3,resetn);
tmp5=ad8ca(tmp4_2,tmp4_3,tmp4_1(1));

p=[tmp5(2:9) tmp4_1(2:9)];
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% rad2ct2.m %%%%%%%%%

function [ore0,oi0,orel,oi 1]=rad2ct2(ir0,ii0,irl,iil)
%function [ore0,oi0,orel,oil]=rad2ct2(ir0,ii0,irl,ii 1)
% This function calculates the 2-point FFT of two 32 bit complex inputs
% the result is a 2-point 16-bit complex output










tmpre0=add40(ir0,irl);
tmprel=sub40(ir0,irl);
tmpi0=add40(ii0,iil);
tmpi 1=sub40(ii0,ii 1);

ore0=tmpre0(1:16);
oi0=tmpi0(1:16);
orel=tmprel(1:16);
oil=tmpi 1(1:16);
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% rad2ct2ifft.m %%%%%%%%%

function [ore0,oi0,orel,oi ]=rad2ct2ifft(ir0,ii0,irl ,iil)
%function [ore0,oi0,orel,oil]=rad2ct2(ir,ii0,irl,ii 1)
% This function calculates the 2-point iFFT of two 32 bit complex inputs
% the result is a 2-point 16-bit complex output

tmpre0=add40(ir0,irl);
tmprel=sub40(ir0,irl);
tmpi0=add40(ii0,iil);
tmpi 1=sub40(ii0,ii 1);

ore0=tmpre0(1:16);
oi0=tmpi0(1:16);
orel=tmprel(1: 16);
oil=tmpi 1(1:16);
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% rad2ct4.m %%%%%%%%%

function [ore0,oi0,orel,oi 1,ore2,oi2,ore3,oi3]=rad2ct4(ir0,ii0,irl ,ii l,ir2,ii2,ir3,ii3)
%function [ore0,oi0,orel,oil]=rad2ct2(ir,ii0,irl,ii 1)
% This function calculates the 2-point FFT of two 32 bit complex inputs
% the result is a 2-point 16-bit complex output
% If the input is in the notation 32.32-x(32 bit input, x integer bits) then the output would
be in the form
% 32.32-x-2 i.e., there would be a 4 bit shift in the output implying 2 bit precision is lost.









%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% Statistics %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Statistics collected over 100 iterations

%Error mean = -0.0001 + 1.0655i
%Error variance = 5.4781
%Avgnoof magnitude_errors = 0
%Avgno_of sign_errors = 0.1300
%Avgno_of mag_and_sign_errors = 0
%Avgno_of right results = 3.8700
%Avgratio = 1.0001


zero=0;
[tmpr0,tmpi0,tmpr2,tmpi2]=rad2ct2(ir0,ii0,ir2,ii2);
[tmprl,tmpil,tmpr3,tmpi3]=rad2ct2(irl,iil,ir3,ii3);

tr0=[tmpr0 zeros(l,16)];
trl=[tmprl zeros(l,16)];
tr2=[tmpr2 zeros(l,16)];
tr3=[tmpr3 zeros(l,16)];
ti0=[tmpi0 zeros(l,16)];
til=[tmpil zeros(l,16)];
ti2=[tmpi2 zeros(l,16)];
ti3=[tmpi3 zeros(l,16)];

[ore0,oi0,ore2,oi2]=rad2ct2(tr0,ti0,trl,ti 1);
[tr3c]=com32(tr3);
[orel,oil,ore3,oi3]=rad2ct2(tr2,ti2,ti3,tr3c);
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% rad2ct4ifft.m %%%%%%%%%

function [ore0,oi0,orel,oi l,ore2,oi2,ore3,oi3]=rad2ct4ifft(ir0,ii0,irl,iil,ir2,ii2,ir3,ii3)
%function [ore0,oi0,orel,oil]=rad2ct2(ir0,ii0,irl,ii 1)
% This function calculates the 2-point FFT of two 32 bit complex inputs
% the result is a 2-point 16-bit complex output
zero=0;
[tmpr0,tmpi0,tmpr2,tmpi2]=rad2ct2(ir0,ii0,ir2,ii2);
[tmprl,tmpil,tmpr3,tmpi3]=rad2ct2(irl,iil,ir3,ii3);


tr0=[tmpr0 zeros(l,16)];









trl=[tmprl zeros(l,16)];
tr2=[tmpr2 zeros(l,16)];
tr3=[tmpr3 zeros(l,16)];
ti0=[tmpi0 zeros(l,16)];
til=[tmpil zeros(l,16)];
ti2=[tmpi2 zeros(l,16)];
ti3=[tmpi3 zeros(l,16)];

[ore0,oi0,ore2,oi2]=rad2ct2(tr0,ti0,trl,ti 1);
[ti3c]=com32(ti3);
[orel,oi 1 ,ore3,oi3]=rad2ct2(tr2,ti2,ti3c,tr3);
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% rad2ct8.m %%%%%%%%%

function
[ore0,oi0,orel,oil,ore2,oi2,ore3,oi3,ore4,oi4,ore5,oi5,ore6,oi6,ore7,oi7]=rad2ct8(ir0,ii0,i
rl,iil,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7)
%function [ore0,oi0,orel,oil]=rad2ct2(ir,ii,ii0 ,ii 1)
% this function calculates the 2-point FFt of eight 32 bit complex inputs
% the result is a 8-point 16-bit complex output

%Error mean = -0.2844 + 0.7659i
%Error variance = 6.0012
%Avgnoof magnitude_errors = 0
%Avgno_of sign_errors = 0.0100
%Avgno_of mag_and_sign_errors = 7.9900
%Avgno_of right results = 0
%Avgratio = 2.0006

[tr0,ti0,trl,ti 1 ,tr2,ti2,tr3,ti3]=rad2ct4(ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6);
[tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4(irl,iil,ir3,ii3,ir5,ii5,ir7,ii7);

[tra0]=shrega(zeropad(tr0));
[tia0]=shrega(zeropad(ti0));
[tra 1]=shrega(zeropad(tr ));
[tial]=shrega(zeropad(til));
[tra2]=shrega(zeropad(tr2));
[tia2]=shrega(zeropad(ti2));
[tra3]=shrega(zeropad(tr3));
[tia3]=shrega(zeropad(ti3));
[tra4]=shrega(zeropad(tr4));
[tia4]=shrega(zeropad(ti4));









=[0010110101
[110100 1 0 11
=[0000000000
[11o00000000
=[1101001011
[110 00l 1 011


000001];
000000];
000000];
000001];
000000];
000000];


tr5=tr5(1:16);
ti5=ti5(1:16);
tr6=tr6(1:16);
ti6=ti6(1:16);
tr7=tr7(1:16);
ti7=ti7(1:16);
clk= 1;
resetn= 1;


[tra5,tia5]=cplxmul 16(tr5,ti5,W8
[tra6,tia6]=cplxmul 16(tr6,ti6,W8
[tra7,tia7]=cplxmul 16(tr7,ti7,W8


[ore0,oi0,ore4,oi4]
[orel,oil,ore5,oi5]:
[ore2,oi2,ore6,oi6]:
[ore3,oi3,ore7,oi7]:
return


rl,W8
r2,W8
r3,W8


il,clk,resetn);
i2,clk,resetn);
i3,clk,resetn);


-rad2ct2(tra0,tia0,tra4,tia4);
-rad2ct2(tral,tial,tra5,tia5);
-rad2ct2(tra2,tia2,tra6,tia6);
-rad2ct2(tra3,tia3,tra7,tia7);


%%%%%%%% END %%%%%%%%%

%%%%%%%% rad2ct8fft.m %%%%%%%%%

function
[ore0,oi0,orel,oil,ore2,oi2,ore3,oi3,ore4,oi4,ore5,oi5,ore6,oi6,ore7,oi7]=rad2ct8fft(ir0,ii
0,irl,iil,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7)
%function [ore0,oi0,orel,oil]=rad2ct2(ir,ii,ii0 ,ii 1)
% this function calculates the 2-point FFt of eight 32 bit complex inputs
% the result is a 8-point 16-bit complex output


%Error mean = -0.2844 + 0.7659i
%Error variance = 6.0012
%Avgnoof magnitude_errors =
%Avgno_of sign_errors = 0.0100
%Avgno_of mag_and_sign_errors
%Avgno_of right results = 0
%Avgratio = 2.0006


7.9900









check(irO);
check(iiO);

check(irl);
check(ii 1);
check(ir2);
check(ii2);
check(ir3);
check(ii3);
check(ir4);
check(ii4);
check(ir5);
check(ii5);
check(ir6);
check(ii6);
check(ir7);
check(ii7);





[trO,tiO,trl,til,tr2,ti2,tr3,ti3]=rad2ct4fft(ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6);
[tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4fft(irl,ii l,ir3,ii3,ir5,ii5,ir7,ii7);

[traO]=shrega(zeropad(tr0));
[tiaO]=shrega(zeropad(ti0));
[tra 1 ]=shrega(zeropad(tr 1));
[tial]=shrega(zeropad(til));
[tra2]=shrega(zeropad(tr2));
[tia2]=shrega(zeropad(ti2));
[tra3]=shrega(zeropad(tr3));
[tia3]=shrega(zeropad(ti3));
[tra4]=shrega(zeropad(tr4));
[tia4]=shrega(zeropad(ti4));

W8_rl=[0 010110101000001];
W8_il=[1 101001011000000];
W8_r2=[0 000000000000000];
W8_i2=[1100000000000001];
W8_r3=[ 1101001011000000];
W8_i3=[1101001011000000];

tr5=tr5(1:16);
ti5=ti5(1:16);
tr6=tr6(1:16);









ti6=ti6(1:16);
tr7=tr7(1:16);
ti7=ti7(1:16);
clk= 1;
resetn= 1;

[tra5,tia5]=cplxmul 16(tr5,ti5,W8_rl,W8_il,clk,resetn);
[tra6,tia6]=cplxmul 16(tr6,ti6,W8_r2,W8_i2,clk,resetn);
[tra7,tia7]=cplxmul 6(tr7,ti7,W8_r3,W8_i3,clk,resetn);

[ore0,oi0,ore4,oi4]=rad2ct2fft(tra0,tia0,tra4,tia4);
[ore l,oil ,ore5,oi5]=rad2ct2fft(tral ,tial,tra5,tia5);
[ore2,oi2,ore6,oi6]=rad2ct2fft(tra2,tia2,tra6,tia6);
[ore3,oi3,ore7,oi7]=rad2ct2fft(tra3,tia3,tra7,tia7);
return

%%%%%%%% END %%%%%%%%%

%%%%%%%% rad2ct8ifft.m %%%%%%%%%


function
[ore0,oi0,orel,oil,ore2,oi2,ore3,oi3,ore4,oi4,ore5,oi5,ore6,oi6,ore7,oi7]=rad2ct8ifft(ir0,ii
0,irl,iil,ir2,ii2,ir3,ii3,ir4,ii4,ir5,ii5,ir6,ii6,ir7,ii7)
% this function calculates the 2-point FFt of eight 32 bit complex inputs
% the result is a 8-point 16-bit complex output

%Error mean = -0.2844 + 0.7659i
%Error variance = 6.0012
%Avgnoof magnitude_errors = 0
%Avgno_of sign_errors = 0.0100
%Avgno_of mag_and_sign_errors = 7.9900
%Avgno_of right results = 0
%Avgratio = 2.0006

[tr0,ti0,trl,til,tr2,ti2,tr3,ti3]=rad2ct4ifft(ir0,ii0,ir2,ii2,ir4,ii4,ir6,ii6);
[tr4,ti4,tr5,ti5,tr6,ti6,tr7,ti7]=rad2ct4ifft(irl,ii 1,ir3,ii3,ir5,ii5,ir7,ii7);

[tra0]=shrega(zeropad(tr0));
[tia0]=shrega(zeropad(ti0));
[tra 1 ]=shrega(zeropad(tr 1));
[tial ]=shrega(zeropad(ti 1));
[tra2]=shrega(zeropad(tr2));
[tia2]=shrega(zeropad(ti2));
[tra3]=shrega(zeropad(tr3));
[tia3]=shrega(zeropad(ti3));









[tra4]=shrega(zeropad(tr4));
[tia4]=shrega(zeropad(ti4));


=[0010110101000001];
=[0010110101000001];
=[0000000000000000];
=[0100000000000000];
=[1101001011000000];
=[0010110101000001];


tr5=tr5(1:16);
ti5=ti5(1:16);
tr6=tr6(1:16);
ti6=ti6(1:16);
tr7=tr7(1:16);
ti7=ti7(1:16);
clk= 1;
resetn= 1;


[tra5,tia5]=cplxmul 16(tr5,ti5,w8
[tra6,tia6]=cplxmul 16(tr6,ti6,w8
[tra7,tia7]=cplxmul 16(tr7,ti7,w8


[ore0,oi0,ore4,oi4]
[orel,oil,ore5,oi5]:
[ore2,oi2,ore6,oi6]:
[ore3,oi3,ore7,oi7]:
return


rl,w8
r2,w8
r3,w8


i l,clk,resetn);
i2,clk,resetn);
i3,clk,resetn);


-rad2ct2ifft(tra0,tia0,tra4,tia4);
-rad2ct2ifft(tral,tial,tra5,tia5);
-rad2ct2ifft(tra2,tia2,tra6,tia6);
-rad2ct2ifft(tra3,tia3,tra7,tia7);


%%%%%%%% END %%%%%%%%%

%%%%%%%% rad4ct4.m %%%%%%%%%

function [ore0,oi0,orel,oil,ore2,oi2,ore3,oi3]=rad4ct4( ir0,ii0,irl,iil,ir2,ii2,ir3,ii3)


-add40(ir0,irl);
zadd40(ir2,ir3);
zadd40(ii0,ii 1);
add40(ii2,ii3);
zadd40(ir0,ir2);
zadd40(irl,ir3);
= add40(ii0,ii2);
= add40(iil,ii3);
-sub40(ir0,ir2);
-sub40(ii0,ii2);
-sub40(irl,ir3);


tmp0 =
tmpl =
tmp2 =
tmp3 =
tmp8 =
tmp9 =
tmp 10
tmp 11-
tmp4 =
tmp5 =
tmp7 =









tmp6 =sub40(iil,ii3);
oil =sub40(tmp5,tmp7);
oi2 =sub40(tmpl0,tmpl 1);
ore2= sub40(tmp8,tmp9);
ore3= sub40(tmp4,tmp6);
ore0= add40(tmp0,tmpl);
oiO =add40(tmp2,tmp3);
orel= add40(tmp4,tmp6);
oi3 =add40(tmp5,tmp7);

return

%%%%%%%% END %%%%%%%%%

%%%%%%%% rad4ct4ifft.m %%%%%%%%%

function [ore0,oi0,ore3,oi3,ore2,oi2,orel,oi l ]=rad4ct4ifft( ir0,ii0,irl,iil,ir2,ii2,ir3,ii3)

tmp0 =add40(ir0,irl);
tmpl =add40(ir2,ir3);
tmp2 =add40(ii,ii 1);
tmp3 =add40(ii2,ii3);
tmp8 =add40(ir0,ir2);
tmp9 =add40(irl,ir3);
tmpl0= add40(ii0,ii2);
tmpl 1= add40(iil,ii3);
tmp4 =sub40(ir0,ir2);
tmp5 =sub40(ii0,ii2);
tmp7 =sub40(irl,ir3);
tmp6 =sub40(iil,ii3);
oil =sub40(tmp5,tmp7);
oi2 =sub40(tmpl0,tmpl 1);
ore2= sub40(tmp8,tmp9);
ore3= sub40(tmp4,tmp6);
ore0= add40(tmp0,tmpl);
oiO =add40(tmp2,tmp3);
orel= add40(tmp4,tmp6);
oi3 =add40(tmp5,tmp7);

return

%%%%%%%% END %%%%%%%%%


%%%%%%%% rad2ct8.m %%%%%%%%%