************************************************************************ * * FFT for Interactive C * version 1.1 * * Public functions: * * int power_spectrum(char *data)... accepts pointer to 512-byte array, * the first 256 bytes of which represents signed 8-bit real-valued * data; performs simple power-spectrum estimation, without windowing * or binning; on return, array holds unsigned 8-bit spectral compo- * nents; return value is the index to the peak frequency * * int fft(char *data)... accepts pointer to 512-byte array, the first * half of which represents signed 8-bit real components, the second * half of which represents imaginary components; performs in-place * FFT; return value is number of times data had to be divided by 2 * * int inverse_fft(char *data)... same as fft() but array contains * complex-valued inverse transform * * The function argument is the address of the first byte of the array. * Because of the nonstandard way IC handles arrays, you'll need to call * using a form such as power_spectrum((int)&data[0]). * ************************************************************************ * * Version history: * * 1.0 7/31/01 George Musser (georgejr@musser.com) * Adapted from fft.c11 by * * Ron Williams * Department of Chemistry * Ohio University * Athens, OH 45701 * * This, in turn, was a modification of the 6800 FFT presented by * * Richard Lord * Byte Magazine, pp. 108-119 * February 1979 * * 1.1 8/4/01 gsm * power_spectrum() now returns peak frequency * ************************************************************************ * * Williams's FFT algorithm is written in ROMable code for the HC11. It * uses a sine look-up table for speed and stores its dynamic variables * in the machine stack. The user passes the address of a 512-byte array * of 8-bit signed numbers, with the real components in the first 256 * bytes and the imaginary components in the second. power_spectrum() * zeroes out the imaginary components. * * The fft() and inverse_fft() return value is a normalization exponent * equal to the number of times the data was divided by two to keep it * within 8-bit range during the transform. The power_spectrum() return * value is the index of the array component with the maximum absolute * value. This is easily related to the peak frequency. For data at * timesteps of * * 0, dt, ..., 127 dt, 128 dt, 129 dt, ..., 255 dt * * the values represent the power at periods of * * DC, 256 dt, ..., 256/127 dt, 2 dt, -256/127 dt, ..., -256 dt * * For real-valued data, the transform components at positive and nega- * tive periods are complex conjugates, so the power depends only on * the absolute value of the period. Accordingly, the return value is * the absolute value of the array index. Array component 128, which * corresponds to the Nyquist frequency, is a special case. * * Thus a power_spectrum() return value of 0 indicates that the DC * component dominates, a return value of 1 indicates that a frequency * of 1/256th of the sampling frequency dominates, and a return value * of 128 indicates that the Nyquist frequency dominates. * * To ensure that the determination of the peak is as accurate as possi- * ble, the user may want to apply a window function to the data before * passing it to power_spectrum(). I haven't implemented windowing here, * because, in fixed-point math, it would destroy precision and create * spurious harmonics. The calling routine can minimize the loss of * precision by scaling the data to fill the full 8-bit range -- an * operation that is easier to do in Interactive C. * * According to Williams's benchmarking, the computation takes 350 * milliseconds -- pretty fast, but remember that it hogs the processor, * since IC will not preempt a machine-language routine. I have not done * a great deal of testing, apart from some simple cases. * * Clever packing and unpacking of the data array could double the power- * spectrum computation, at the price of added complexity. ************************************************************************ * * offsets for local variables in FFT routine * REAL EQU 0 TOP EQU 2 ; gsm-added SIGN EQU 4 ; gsm-added CELNM EQU 5 CELCT EQU 6 PAIRNM EQU 7 CELDIS EQU 8 DELTA EQU 9 SCLFCT EQU $0A COSA EQU $0B SINA EQU $0C SINPT EQU $0D REAL1 EQU $0F REAL2 EQU $11 TREAL EQU $13 TIMAG EQU $14 TMP EQU $15 TMP2 EQU $16 ORG MAIN_START ************************************************************************ * * power_spectrum * * I split the original routine into two pieces: the actual FFT and the * power-spectrum calculation. That way, we can call the FFT directly if * we have complex-valued data (as we might if using the FFT for filter- * ing or waveform synthesis). * * The argument -- the address of the real-valued data array -- is passed * in the accumulator. * subroutine_power_spectrum * clear imaginary part of data array * * slight optimization saves 1019 cycles (gsm): * + use STAA instead of CLR, saving 2 cycles per iteration * + use X rather than Y, saving 2 cycles per iteration ADDD #$100 ; get address of imaginary part [4] XGDX ; load data address into X [3] CLRA ; A register is zero [2] CLRB ; B register holds count (0=256) [2] zero STAA $FF,X ; [4] DEX ; [3] DECB ; [2] BNE zero ; [3] * do the FFT PSHX ; save the data address XGDX ; put data address back in accumulator BSR subroutine_fft PULX ; grab the data address * calculate sum of absolute values * * formally, we should be calculating the sum of squares, but this is * close enough for government work * * also, remember where the maximum value is, at a cost of 7437 cycles; * the B register is the maximum value, and the location of that value * is stored temporarily on the stack (gsm-added 8/4/01) * * slight optimization saves 2046 cycles (gsm): * + use X rather than Y, saving 3 cycles per iteration * + use Y for the byte counter, saving 5 cycles per iteration smsq LDY #0 ; clear byte counter CLRB ; initialize maximum value [2] PSHY ; initialize index [5] sum LDAA 0,X ; get real component BPL sm1 ; force positive NEGA BVC sm1 ; watch for $80 CLRA ; which is really 0 sm1 STAA 0,X ; store real component [4] INX ; get imaginary component LDAA $FF,X DEX BPL sm2 ; force positive NEGA BVC sm2 ; watch for $80 again CLRA sm2 ADDA 0,X ; compute sum [4] STAA 0,X ; save back in real part of array CBA ; is this the new maximum? [2] BLS sm3 ; no [3] TAB ; yes [2] INS ; ...so pop old index [3] INS PSHY ; ...and push new one [5] sm3 INX ; inc X for next round INY ; done when byte counter reaches 256 CPY #$100 ; [5] BNE sum * done PULX ; move index to accumulator XGDX CMPB #$80 ; take absolute value of LSB BLS smex NEGB smex RTS ************************************************************************ * * fft * * Here's the main body of the FFT. The argument -- the address of the * complex-valued data array -- is passed in the accumulator. The * direction of the transform (0 for forward, 1 for reverse) is passed * in Y. * * fixed local-memory management, which didn't allow room on the stack * for nested calls or interrupt service (gsm) * * public entry points, which set direction of transform subroutine_fft LDY #0 BRA fftc11 subroutine_inverse_fft LDY #1 * initialize local variables fftc11 TSX ; top of stack for frame pointer XGDX ; to be placed in X SUBD #$17 ; subtract offset to make room XGDX ; X now has frame pointer TXS ; relocate stack underneath frame (gsm-added) STD REAL,X ; save data address (passed in accumulator) ADDD #$1FF ; needed in scale subroutine (gsm-added) STD TOP,X XGDY ; store direction of transform STAB SIGN,X CLR SCLFCT,X ; zero out the scale factor * do bit sorting * * added swapping of the imaginary part, too, in case the data is complex- * valued (gsm) * * slight optimization saves ~29000 cycles (gsm): * + expand rev1 loop and use A instead of TMP, saving 72 cycles * + store B in TMP2 rather than on the stack, saving 2 cycles * + use both index pointers during byte swap, saving 40 cycles LDAB #$FE ; setup start for bit reversal revbit STAB TMP2,X ; save copy of address [4] RORB ; rotate B right - bit 1 to carry ROLA ; rotate left - carry bit in RORB ; rotate B right - bit 2 to carry ROLA ; rotate left - carry bit in RORB ; rotate B right - bit 3 to carry ROLA ; rotate left - carry bit in RORB ; rotate B right - bit 4 to carry ROLA ; rotate left - carry bit in RORB ; rotate B right - bit 5 to carry ROLA ; rotate left - carry bit in RORB ; rotate B right - bit 6 to carry ROLA ; rotate left - carry bit in RORB ; rotate B right - bit 7 to carry ROLA ; rotate left - carry bit in RORB ; rotate B right - bit 8 to carry ROLA ; rotate left - carry bit in LDAB TMP2,X ; retrieve unbitreversed address [4] CBA ; make sure we only swap once [2] BHS noswap swap PSHX ; save frame pointer LDY REAL,X ; load base address LDX REAL,X ABY ; add unbitreversed address TAB ; add bitreversed address ABX LDAA 0,Y ; swap real components LDAB 0,X STAA 0,X STAB 0,Y INX ; prepare for 256-byte offset INY LDAA $FF,Y ; swap imaginary components LDAB $FF,X STAA $FF,X STAB $FF,Y PULX ; restore frame pointer LDAB TMP2,X ; get current address back noswap DECB ; decrement address BNE revbit ; do next if not done * special case of first pass of FFT * * added imaginary part, for which the Danielson-Lanczos formula is the * same as for the positive component (gsm) * * slight optimization saves 1658 cycles (gsm): * + use X instead of Y, netting 5x2 cycles per iteration * + use Y instead of TMP, netting 3 cycles per iteration JSR scale PSHX ; save frame pointer [4] LDX REAL,X ; set up data pointer [5] LDY #128 ; get number of cells [4] fpss LDAA 0,X ; get rm [4] LDAB 1,X ; get rn [4] PSHA ; make copy ABA ; rm'=rm+rn STAA 0,X ; save back in data array [4] PULA ; get rm again SBA ; rn'=rm-rn STAA 1,X ; put away [4] INX ; point to next pair [3] INX ; [3] LDAA $FE,X ; get im [4] LDAB $FF,X ; get in [4] PSHA ; make copy ABA ; im'=im+in STAA $FE,X ; save back in data array [4] PULA ; get im again SBA ; in'=im-in STAA $FF,X ; put away [4] DEY ; decrement # cells [4] BNE fpss ; go back if not done PULX ; restore frame pointer [5] * now, the FFT proper for passes 2 through N * * for inverse transforms, it should be enough just to negate the sine value four LDAA #64 ; # of cells is now 64 STAA CELNM,X ; store STAA DELTA,X ; so is delta LDAA #02 ; number of pairs is 2 STAA PAIRNM,X STAA CELDIS,X ; so is distance between npass JSR scale ; check for over-range LDAA CELNM,X ; get current cell # STAA CELCT,X ; store at cell counter LDY REAL,X STY REAL1,X ; get copy of data ncell LDY #sintab ; get address of sines STY SINPT,X ; save copy LDAA PAIRNM,X ; get current pairnm np1 PSHA ; save pair counter LDAB 64,Y ; get sine LDAA SIGN,X ; negate if inverse transform (gsm-added) BEQ ncos NEGB ncos LDAA 0,Y ; get cosine STAA COSA,X ; save copy STAB SINA,X ; ditto LDY REAL1,X ; point to top of data LDAB CELDIS,X ; get current offset ABY ; add to y for current STY REAL2,X ; copy it LDAA 0,Y ; get data point rn PSHA ; copy it LDAB COSA,X ; get cosine JSR smul ; rn*cos(a) STAA TREAL,X PULA ; get copy of rn LDAB SINA,X ; get sin(a) JSR smul ; rn*sin(a) STAA TIMAG,X ; store imaginary tmp INY LDAA $FF,Y ; get imaginary data PSHA ; save it LDAB SINA,X ; get sin(a) JSR smul ; in*sin(a) ADDA TREAL,X STAA TREAL,X ; tr=rn*cos+in*sin PULA ; get data back LDAB COSA,X ; get cosine JSR smul ; in*cos(a) SUBA TIMAG,X ; ti=in*cos-rn*sin STAA TIMAG,X LDY REAL1,X LDAA 0,Y ; get rm TAB ; save a copy ADDA TREAL,X ; rm'=rm+tr STAA 0,Y ; store new rm SUBB TREAL,X ; rn'=rm-tr LDY REAL2,X STAB 0,Y ; store new rn LDY REAL1,X INY STY REAL1,X ; save real1 for nxt LDAA $FF,Y ; get im TAB ; save copy ADDA TIMAG,X ; im'=im+ti STAA $FF,Y ; put back in array LDY REAL2,X INY SUBB TIMAG,X ; in'=im-ti STAB $FF,Y ; put back in array LDY SINPT,X LDAB DELTA,X ; increment sine pntr ABY STY SINPT,X ; save away PULA DECA ; dec pair counter BEQ ar1 ; gsm-added JMP np1 ; gsm-added ar1 LDY REAL1,X LDAB CELDIS,X ABY STY REAL1,X DEC CELCT,X BEQ ar3 JMP ncell ar3 LSR CELNM,X ; half cells BEQ done ; done when all cells ASL PAIRNM,X ; double pairs ASL CELDIS,X ; twice as far apart LSR DELTA,X ; delta is half JMP npass ; one more time! * return to calling program done LDAB SCLFCT,X ; scale factor is return value (in accumulator) CLRA ; zero MSB for IC's benefit (gsm-added) XGDX ; pop off all the local variables (gsm-added) ADDD #$17 XGDX TXS RTS ************************************************************************ * * subroutine for rescaling out-of-range data * * fixed array overflow, incorrect branches; moved top-of-data calculation * to main FFT routine (gsm) * * made out-of-range detection a subroutine, at a cost of 13 cycles (gsm) * * slight optimization saves 3584 cycles in each loop (gsm): * + reverse roles of X and Y, saving 3 cycles per iteration * + replaced ADDA #$80, LSRA, SUBA #$40 sequence with ASRA, saving 4 * cycles per iteration * scale XGDX ; move frame pointer to accumulator [3] XGDY ; thence to Y [4] * first, check whether any value lies outside the range (-64,64] scdow LDAA #$C0 ; -64 LDAB #$40 ; +64 BSR range ; sets carry if out of range BCC sexit ; if not, check whether we need to scale up * divide the whole array by 2 scl INC SCLFCT,Y ; keep track of scaling [7] LDX TOP,Y ; reset pointer [6] scl1 LDAA 0,X ; get data [4] ASRA ; divide by two [2] STAA 0,X ; store away [4] DEX ; bump pointer [3] CPX REAL,Y ; done when both [7] BHS scl1 ; imag and real done (gsm: was BNE) sexit XGDY ; put frame pointer [4] XGDX ; back into X [3] RTS #if 1==0 * check whether any value lies outside the range (-32,32] * * this code, as it stands, doesn't do any good -- multiplying everything * by 2 won't increase the precision of subsequent calcuations; but it * may provide inspiration for a cleverer rescaling algorithm someday * scup LDAA #$E0 ; -32 LDAB #$20 ; +32 BSR range ; sets carry if out of range BCS sexit ; if set, all is well * multiply the whole array by 2 sc2 DEC SCLFCT,Y ; keep track of scaling [7] LDX TOP,Y ; reset pointer [6] sc21 LDAA 0,X ; get data [4] LSLA ; multiply by two [2] STAA 0,X ; store away [4] DEX ; bump pointer [3] CPX REAL,Y ; done when both [7] BHS sc21 ; imag and real done BRA scup ; keep scaling up until data is in midrange #endif ************************************************************************ * * subroutine for checking whether any data is out of range: if any value * is outside the range (A,B], set the carry bit * * extracted from scale subroutine (gsm) * * slight optimization saves 1536 cycles (gsm): * + reverse roles of X and Y, saving 3 cycles per iteration * range LDX TOP,Y ; start at top of data [6] rtop CMPA 0,X ; check for minimum [4] BLO rnxt ; if less negative than A, don't fix CMPB 0,X ; check for maximum [4] BLO rexit ; if > B, go fix it rnxt DEX ; bump pointer [3] CPX REAL,Y ; done when both [7] BHS rtop ; imag and real done (gsm: was BNE) CLC rexit RTS ************************************************************************ * * subroutine for signed 8-bit multiplication * * A <- A*B * B represents number from -1 to 1, so only hi byte counts. * smul STAA TMP,X ; copy multiplier STAB TMP2,X ; ditto multiplicand * get absolute values TSTA ; check sign of multiplier BPL sk1 ; skip negation NEGA BVS sko ; check for $80 BEQ sko ; check for zero sk1 TSTB ; check multiplier sign BPL sk2 NEGB BVS sko ; check for $80 BEQ sko * do the multiplication sk2 MUL ; do multiplication ADCA #0 ; 8 bit conversion ASLA ; and correct for sine (so that 127=1.0) LDAB TMP2,X ; get original multiplicand EORB TMP,X ; check sign of result BPL out NEGA ; result is negative out RTS sko CLRA ; return zero to main RTS ************************************************************************ * * now for the sine lookup table * sintab FCB 127, 127, 127, 127, 126, 126, 126, 125, 125, 124 FCB 123, 122, 122, 121, 120, 118, 117, 116, 115, 113 FCB 112, 111, 109, 107, 106, 104, 102, 100, 98, 96 FCB 94, 92, 90, 88, 85, 83, 81, 78, 76, 73 FCB 71, 68, 65, 63, 60, 57, 54, 51, 49, 46 FCB 43, 40, 37, 34, 31, 28, 25, 22, 19, 16 FCB 12, 9, 6, 3, 0, -3, -6, -9, -12, -16 FCB -19, -22, -25, -28, -31, -34, -37, -40, -43, -46 FCB -49, -51, -54, -57, -60, -63, -65, -68, -71, -73 FCB -76, -78, -81, -83, -85, -88, -90, -92, -94, -96 FCB -98,-100,-102,-104,-106,-107,-109,-111,-112,-113 FCB -115,-116,-117,-118,-120,-121,-122,-122,-123,-124 FCB -125,-125,-126,-126,-126,-127,-127,-127,-127,-127 FCB -127,-127,-126,-126,-126,-125,-125,-124,-123,-122 FCB -122,-121,-120,-118,-117,-116,-115,-113,-112,-111 FCB -109,-107,-106,-104,-102,-100, -98, -96, -94, -92 FCB -90, -88, -85, -83, -81, -78, -76, -73, -71, -68 FCB -65, -63, -60, -57, -54, -51, -49, -46, -43, -40 FCB -37, -34, -31, -28, -25, -22, -19, -16, -12, -9 FCB -6, -3, 0, 3, 6, 9, 12, 16, 19, 22 FCB 25, 28, 31, 34, 37, 40, 43, 46, 49, 51 FCB 54, 57, 60, 63, 65, 68, 71, 73, 76, 78 FCB 81, 83, 85, 88, 90, 92, 94, 96, 98, 100 FCB 102, 104, 106, 107, 109, 111, 112, 113, 115, 116 FCB 117, 118, 120, 121, 122, 122, 123, 124, 125, 125 FCB 126, 126, 126, 127, 127, 127