125 lines
		
	
	
		
			5.5 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			125 lines
		
	
	
		
			5.5 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
|   | 		Brief Description of chkfft, by K1JT | ||
|  | 		------------------------------------ | ||
|  | 
 | ||
|  | Discrete Fourier transforms (DFTs) are found at the root of most | ||
|  | digital signal processing tasks. In WSJT and its sister programs the | ||
|  | transforms are done using the FFTW library, and subroutine four2 | ||
|  | provides a convenient interface to the library.  Program chkfft is a | ||
|  | command-line utility offering a convenient way to test FFT execution | ||
|  | times under a variety of circumstances. | ||
|  | 
 | ||
|  | To compile chkfft in Linux: | ||
|  | 
 | ||
|  | $ gfortran -o chkfft chkfft.f90 four2a.f90 f77_wisdom.f90 gran.c -lfftw3f | ||
|  | 
 | ||
|  | To compile chkfft in Windows (you may need to customize the hard-coded | ||
|  | path shown here for libfftw3f-3.dll): | ||
|  | 
 | ||
|  | > gfortran -o chkfft chkfft.f90 four2a.f90 f77_wisdom.f90 gran.c \ | ||
|  |   /JTSDK-QT/appsupport/runtime/libfftw3f-3.dll | ||
|  | 
 | ||
|  | To see a brief usage message, type chkfft at the command prompt: | ||
|  |   | ||
|  | $ chkfft | ||
|  |  Usage: chkfft <nfft | infile> nr nw nc np | ||
|  |         nfft:   length of FFT | ||
|  |         nfft=0: do lengths 2^n, n=2^4 to 2^23 | ||
|  |         infile: name of file with nfft values, one per line | ||
|  |         nr:     0/1 to not read (or read) wisdom | ||
|  |         nw:     0/1 to not write (or write) wisdom | ||
|  |         nc:     0/1 for real or complex data | ||
|  |         np:     0-4 patience for finding best algorithm | ||
|  | 
 | ||
|  | As an example, to measure the speed of a complex DFT of length 131072: | ||
|  | 
 | ||
|  | ####################################################################### | ||
|  | $ chkfft 131072 0 1 1 2 | ||
|  | 
 | ||
|  | nfft:     131072   nr: 0   nw 1   nc: 1   np: 2 | ||
|  | 
 | ||
|  |     NFFT     Time        rms      MHz   MFlops  iters  tplan | ||
|  | ------------------------------------------------------------- | ||
|  |   131072  0.0021948  0.00000032  59.72  5076.1     231   2.9 | ||
|  | ####################################################################### | ||
|  | 
 | ||
|  | Program output shows that on the test machine the average time for one | ||
|  | forward (or inverse) transform of length N=131072 is about 2.2 ms, | ||
|  | corresponding to slightly over 5 GFlops computing speed.  The planning | ||
|  | time in FFTW was 2.9 s. | ||
|  | 
 | ||
|  | Running the command again with parameter nr=1 will use the  | ||
|  | "wisdom" already accumulated for complex N=131072 FFTs.  The execution | ||
|  | speed will be essentially the same, but no planning time is required:   | ||
|  | 
 | ||
|  | ####################################################################### | ||
|  | $ chkfft 131072 1 1 1 2 | ||
|  | 
 | ||
|  | nfft:     131072   nr: 1   nw 1   nc: 1   np: 2 | ||
|  | 
 | ||
|  |     NFFT     Time        rms      MHz   MFlops  iters  tplan | ||
|  | ------------------------------------------------------------- | ||
|  |   131072  0.0021575  0.00000032  60.75  5164.0     235   0.0 | ||
|  | ####################################################################### | ||
|  | 
 | ||
|  | Optimized algorithms can compute DFTs much faster for lengths that are | ||
|  | the product of small integers.  Length N=131072 = 2^17 is a good | ||
|  | example, and FFTs should be very efficient.  For comparison, look at | ||
|  | the speed for N=131071, a prime number.  The average time is now about | ||
|  | 7 times larger: | ||
|  | 
 | ||
|  | ####################################################################### | ||
|  | C:\JTSDK-QT\src\wsjtx\lib>chkfft 131071 1 1 1 2 | ||
|  | 
 | ||
|  | nfft:     131071   nr: 1   nw 1   nc: 1   np: 2 | ||
|  | 
 | ||
|  |     NFFT     Time        rms      MHz   MFlops  iters  tplan | ||
|  | ------------------------------------------------------------- | ||
|  |   131071  0.0153637  0.00000065   8.53   725.2      33   5.6 | ||
|  | ####################################################################### | ||
|  | 
 | ||
|  | Here's an example that measures execution times for all integral | ||
|  | power-of-2 lengths from 2^4 to 2^23: | ||
|  | 
 | ||
|  | ####################################################################### | ||
|  | $ chkfft 0 1 1 1 2 | ||
|  | 
 | ||
|  | nfft:          0   nr: 1   nw 1   nc: 1   np: 2 | ||
|  | 
 | ||
|  |   n   N=2^n    Time        rms      MHz   MFlops  iters  tplan | ||
|  | --------------------------------------------------------------- | ||
|  |  4      16  0.0000003  0.00000014  58.61  1172.2 1000000   0.0 | ||
|  |  5      32  0.0000004  0.00000016  89.19  2229.6 1000000   0.0 | ||
|  |  6      64  0.0000006  0.00000016 109.44  3283.2  866975   0.0 | ||
|  |  7     128  0.0000009  0.00000021 135.92  4757.1  538369   0.0 | ||
|  |  8     256  0.0000016  0.00000020 158.40  6335.8  313701   0.0 | ||
|  |  9     512  0.0000032  0.00000021 162.53  7313.8  160943   0.1 | ||
|  | 10    1024  0.0000067  0.00000023 152.53  7626.5   75521   0.1 | ||
|  | 11    2048  0.0000136  0.00000025 150.42  8273.3   37239   0.2 | ||
|  | 12    4096  0.0000316  0.00000027 129.75  7784.8   16060   0.3 | ||
|  | 13    8192  0.0000720  0.00000026 113.75  7393.8    7040   0.5 | ||
|  | 14   16384  0.0001620  0.00000028 101.11  7078.0    3129   0.9 | ||
|  | 15   32768  0.0003227  0.00000030 101.53  7615.1    1571   1.7 | ||
|  | 16   65536  0.0010020  0.00000030  65.41  5232.5     506   4.1 | ||
|  | 17  131072  0.0021575  0.00000032  60.75  5164.0     235   0.0 | ||
|  | 18  262144  0.0053937  0.00000032  48.60  4374.2      94   3.6 | ||
|  | 19  524288  0.0190668  0.00000034  27.50  2612.2      27   6.8 | ||
|  | 20 1048576  0.0468001  0.00000035  22.41  2240.5      11   2.4 | ||
|  | 21 2097152  0.0936012  0.00000036  22.41  2352.5       6  31.6 | ||
|  | 22 4194304  0.1949997  0.00000037  21.51  2366.0       3   9.8 | ||
|  | 23 8388608  0.4212036  0.00000038  19.92  2290.3       2 112.9 | ||
|  | ####################################################################### | ||
|  | 
 | ||
|  | Test data for all transforms is gaussian random noise of zero mean and | ||
|  | standard deviation 1.  Tabulated values of "rms" are the | ||
|  | root-mean-square differences between the original data and the | ||
|  | back-transfmred data. | ||
|  | 
 | ||
|  | File nfft.dat contains all numbers between 2^3 and 2^23 with no factor | ||
|  | greater than 7, followed by their factors.  These numbers are good | ||
|  | choices for FFT lengths.  File all_fft.out gives the result on one | ||
|  | machine of running the command  | ||
|  | 
 | ||
|  | $ chkfft nfft.dat 0 1 1 2 | ||
|  | 
 | ||
|  | Take note: this task may take as much as 24 hours, or even more! |