1 |
<?xml version="1.0" encoding="UTF-8" ?> |
2 |
<!DOCTYPE html |
3 |
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" |
4 |
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
5 |
<html xmlns="http://www.w3.org/1999/xhtml"> |
6 |
<head> |
7 |
<meta http-equiv="Content-Type" content="text/html" /> |
8 |
<title>How to compile dSFMT</title> |
9 |
<style type="text/css"> |
10 |
BLOCKQUOTE {background-color:#a0ffa0; |
11 |
padding-left: 1em;} |
12 |
</style> |
13 |
</head> |
14 |
<body> |
15 |
<h2> How to compile dSFMT</h2> |
16 |
|
17 |
<p> |
18 |
This document explains how to compile dSFMT for users who |
19 |
are using UNIX like systems (for example Linux, Free BSD, |
20 |
cygwin, osx, etc) on terminal. I can't help those who use IDE |
21 |
(Integrated Development Environment,) please see your IDE's help |
22 |
to use SIMD feature of your CPU. |
23 |
</p> |
24 |
|
25 |
<h3>1. First Step: Compile test programs using Makefile.</h3> |
26 |
<h4>1-1. Compile standard C test program.</h4> |
27 |
<p> |
28 |
Check if dSFMT.c and Makefile are in your current directory. |
29 |
If not, <strong>cd</strong> to the directory where they exist. |
30 |
Then, type |
31 |
</p> |
32 |
<blockquote> |
33 |
<pre>make std</pre> |
34 |
</blockquote> |
35 |
<p> |
36 |
If it causes an error, try to type |
37 |
</p> |
38 |
<blockquote> |
39 |
<pre>cc -DDSFMT_MEXP=19937 -o test-std-M19937 dSFMT.c test.c</pre> |
40 |
</blockquote> |
41 |
<p> |
42 |
or try to type |
43 |
</p> |
44 |
<blockquote> |
45 |
<pre>gcc -DDSFMT_MEXP=19937 -o test-std-M19937 dSFMT.c test.c</pre> |
46 |
</blockquote> |
47 |
<p> |
48 |
If success, then check the test program. Type |
49 |
</p> |
50 |
<blockquote> |
51 |
<pre>./test-std-M19937 -v</pre> |
52 |
</blockquote> |
53 |
<p> |
54 |
You will see many random numbers displayed on your screen. |
55 |
If you want to check these random numbers are correct output, |
56 |
redirect output to a file and <strong>diff</strong> it with |
57 |
<strong>dSFMT.19937.out.txt</strong>, like this:</p> |
58 |
<blockquote> |
59 |
<pre>./test-std-M19937 -v > foo.txt |
60 |
diff -w foo.txt dSFMT.19937.out.txt</pre> |
61 |
</blockquote> |
62 |
<p> |
63 |
Silence means they are the same because <strong>diff</strong> |
64 |
reports the difference of two files. |
65 |
</p> |
66 |
<p> |
67 |
If you want to know the generation speed of dSFMT, type |
68 |
</p> |
69 |
<blockquote> |
70 |
<pre>./test-std-M19937 -s</pre> |
71 |
</blockquote> |
72 |
<p> |
73 |
It is very slow. To make it fast, compile it |
74 |
with <strong>-O3</strong> option. If your compiler is gcc, you |
75 |
should specify <strong>-fno-strict-aliasing</strong> option |
76 |
with <strong>-O3</strong>. type |
77 |
</p> |
78 |
<blockquote> |
79 |
<pre>gcc -O3 -fno-strict-aliasing -DDSFMT_MEXP=19937 -o test-std-M19937 dSFMT.c test.c |
80 |
./test-std-M19937 -s</pre> |
81 |
</blockquote> |
82 |
<p> |
83 |
If you are using gcc 4.0, you will get more performance of dSFMT |
84 |
by giving additional options |
85 |
<strong>--param max-inline-insns-single=1800</strong>, |
86 |
<strong>--param inline-unit-growth=500</strong> and |
87 |
<strong>--param large-function-growth=900</strong>. |
88 |
</p> |
89 |
|
90 |
<h4>1-2. Compile SSE2 test program.</h4> |
91 |
<p> |
92 |
If your CPU supports SSE2 and you can use gcc version 3.4 or later, |
93 |
you can make test-sse2-M19937. To do this, type |
94 |
</p> |
95 |
<blockquote> |
96 |
<pre>make sse2</pre> |
97 |
</blockquote> |
98 |
<p>or type</p> |
99 |
<blockquote> |
100 |
<pre>gcc -O3 -msse2 -fno-strict-aliasing -DHAVE_SSE2=1 -DDSFMT_MEXP=19937 -o test-sse2-M19937 dSFMT.c test.c</pre> |
101 |
</blockquote> |
102 |
<p>If everything works well,</p> |
103 |
<blockquote> |
104 |
<pre>./test-sse2-M19937 -s</pre> |
105 |
</blockquote> |
106 |
<p>shows much shorter time than <strong>test-std-M19937 -s</strong>.</p> |
107 |
|
108 |
<h4>1-3. Compile AltiVec test program.</h4> |
109 |
<p> |
110 |
If you are using Macintosh computer with PowerPC G4 or G5, and |
111 |
your gcc version is later 3.3, you can make test-alti-M19937. To |
112 |
do this, type |
113 |
</p> |
114 |
<blockquote> |
115 |
<pre>make osx-alti</pre> |
116 |
</blockquote> |
117 |
<p>or type</p> |
118 |
<blockquote> |
119 |
<pre>gcc -O3 -faltivec -fno-strict-aliasing -DHAVE_ALTIVEC=1 -DDSFMT_MEXP=19937 -o test-alti-M19937 dSFMT.c test.c</pre> |
120 |
</blockquote> |
121 |
<p>If everything works well,</p> |
122 |
<blockquote> |
123 |
<pre>./test-alti-M19937 -s</pre> |
124 |
</blockquote> |
125 |
<p>shows much shorter time than <strong>test-std-M19937 -s</strong>.</p> |
126 |
|
127 |
<h4>1-4. Compile and check output automatically.</h4> |
128 |
<p> |
129 |
To make test program and check output |
130 |
automatically for all supported SFMT_MEXPs of dSFMT, type |
131 |
</p> |
132 |
<blockquote> |
133 |
<pre>make std-check</pre> |
134 |
</blockquote> |
135 |
<p> |
136 |
To check test program optimized for SSE2, type |
137 |
</p> |
138 |
<blockquote> |
139 |
<pre>make sse2-check</pre> |
140 |
</blockquote> |
141 |
<p> |
142 |
To check test program optimized for OSX PowerPC AltiVec, type |
143 |
</p> |
144 |
<blockquote> |
145 |
<pre>make osx-alti-check</pre> |
146 |
</blockquote> |
147 |
<p> |
148 |
These commands may take some time. |
149 |
</p> |
150 |
|
151 |
<h3>2. Second Step: Use dSFMT pseudorandom number generator with |
152 |
your C program.</h3> |
153 |
<h4>2-1. Use sequential call and static link.</h4> |
154 |
<p> |
155 |
Here is a very simple program <strong>sample1.c</strong> which |
156 |
calculates PI using Monte-Carlo method. |
157 |
</p> |
158 |
<blockquote> |
159 |
<pre> |
160 |
#include <stdio.h> |
161 |
#include <stdlib.h> |
162 |
#include "dSFMT.h" |
163 |
|
164 |
int main(int argc, char* argv[]) { |
165 |
int i, cnt, seed; |
166 |
double x, y, pi; |
167 |
const int NUM = 10000; |
168 |
dsfmt_t dsfmt; |
169 |
|
170 |
if (argc >= 2) { |
171 |
seed = strtol(argv[1], NULL, 10); |
172 |
} else { |
173 |
seed = 12345; |
174 |
} |
175 |
cnt = 0; |
176 |
dsfmt_init_gen_rand(&dsfmt, seed); |
177 |
for (i = 0; i < NUM; i++) { |
178 |
x = dsfmt_genrand_close_open(&dsfmt); |
179 |
y = dsfmt_genrand_close_open(&dsfmt); |
180 |
if (x * x + y * y < 1.0) { |
181 |
cnt++; |
182 |
} |
183 |
} |
184 |
pi = (double)cnt / NUM * 4; |
185 |
printf("%f\n", pi); |
186 |
return 0; |
187 |
} |
188 |
</pre> |
189 |
</blockquote> |
190 |
<p>To compile <strong>sample1.c</strong> with dSFMT.c with the period of |
191 |
2<sup>607</sup>, type</p> |
192 |
<blockquote> |
193 |
<pre>gcc -DDSFMT_MEXP=521 -o sample1 dSFMT.c sample1.c</pre> |
194 |
</blockquote> |
195 |
<p>If your CPU supports SSE2 and you want to use optimized dSFMT for |
196 |
SSE2, type</p> |
197 |
<blockquote> |
198 |
<pre>gcc -msse2 -DDSFMT_MEXP=521 -DHAVE_SSE2 -o sample1 dSFMT.c sample1.c</pre> |
199 |
</blockquote> |
200 |
<p>If your Computer is Apple PowerPC G4 or G5 and you want to use |
201 |
optimized dSFMT for AltiVec, type</p> |
202 |
<blockquote> |
203 |
<pre>gcc -faltivec -DDSFMT_MEXP=521 -DHAVE_ALTIVEC -o sample1 dSFMT.c sample1.c</pre> |
204 |
</blockquote> |
205 |
|
206 |
<h4>2-2. Use block call and static link.</h4> |
207 |
<p> |
208 |
Here is <strong>sample2.c</strong> which modifies sample1.c. |
209 |
The block call <strong>dsfmt_fill_array_close_open</strong> is |
210 |
much faster than sequential call, but it needs an aligned |
211 |
memory. The standard function to get an aligned memory |
212 |
is <strong>posix_memalign</strong>, but it isn't usable in every |
213 |
OS. |
214 |
</p> |
215 |
<blockquote> |
216 |
<pre> |
217 |
#include <stdio.h> |
218 |
#define _XOPEN_SOURCE 600 |
219 |
#include <stdlib.h> |
220 |
#include "dSFMT.h" |
221 |
|
222 |
int main(int argc, char* argv[]) { |
223 |
int i, j, cnt, seed; |
224 |
double x, y, pi; |
225 |
const int NUM = 10000; |
226 |
const int R_SIZE = 2 * NUM; |
227 |
int size; |
228 |
double *array; |
229 |
dsfmt_t dsfmt; |
230 |
|
231 |
if (argc >= 2) { |
232 |
seed = strtol(argv[1], NULL, 10); |
233 |
} else { |
234 |
seed = 12345; |
235 |
} |
236 |
size = dsfmt_get_min_array_size(); |
237 |
if (size < R_SIZE) { |
238 |
size = R_SIZE; |
239 |
} |
240 |
#if defined(__APPLE__) || \ |
241 |
(defined(__FreeBSD__) && __FreeBSD__ >= 3 && __FreeBSD__ <= 6) |
242 |
printf("malloc used\n"); |
243 |
array = malloc(sizeof(double) * size); |
244 |
if (array == NULL) { |
245 |
printf("can't allocate memory.\n"); |
246 |
return 1; |
247 |
} |
248 |
#elif defined(_POSIX_C_SOURCE) |
249 |
printf("posix_memalign used\n"); |
250 |
if (posix_memalign((void **)&array, 16, sizeof(double) * size) != 0) { |
251 |
printf("can't allocate memory.\n"); |
252 |
return 1; |
253 |
} |
254 |
#elif defined(__GNUC__) && (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 3)) |
255 |
printf("memalign used\n"); |
256 |
array = memalign(16, sizeof(double) * size); |
257 |
if (array == NULL) { |
258 |
printf("can't allocate memory.\n"); |
259 |
return 1; |
260 |
} |
261 |
#else /* in this case, gcc doesn't suppport SSE2 */ |
262 |
array = malloc(sizeof(double) * size); |
263 |
if (array == NULL) { |
264 |
printf("can't allocate memory.\n"); |
265 |
return 1; |
266 |
} |
267 |
#endif |
268 |
cnt = 0; |
269 |
j = 0; |
270 |
dsfmt_init_gen_rand(&dsfmt, seed); |
271 |
dsfmt_fill_array_close_open(&dsfmt, array, size); |
272 |
for (i = 0; i < NUM; i++) { |
273 |
x = array[j++]; |
274 |
y = array[j++]; |
275 |
if (x * x + y * y < 1.0) { |
276 |
cnt++; |
277 |
} |
278 |
} |
279 |
free(array); |
280 |
pi = (double)cnt / NUM * 4; |
281 |
printf("%f\n", pi); |
282 |
return 0; |
283 |
} |
284 |
</pre> |
285 |
</blockquote> |
286 |
<p>To compile <strong>sample2.c</strong> with dSFMT.c with the period of |
287 |
2<sup>2281</sup>, type</p> |
288 |
<blockquote> |
289 |
<pre>gcc -DDSFMT_MEXP=2203 -o sample2 dSFMT.c sample2.c</pre> |
290 |
</blockquote> |
291 |
<p>If your CPU supports SSE2 and you want to use optimized dSFMT for |
292 |
SSE2, type</p> |
293 |
<blockquote> |
294 |
<pre>gcc -msse2 -DDSFMT_MEXP=2203 -DHAVE_SSE2 -o sample2 dSFMT.c sample2.c</pre> |
295 |
</blockquote> |
296 |
<p>If your computer is Apple PowerPC G4 or G5 and you want to use |
297 |
optimized dSFMT for AltiVec, type</p> |
298 |
<blockquote> |
299 |
<pre>gcc -faltivec -DDSFMT_MEXP=2203 -DHAVE_ALTIVEC -o sample2 dSFMT.c sample2.c</pre> |
300 |
</blockquote> |
301 |
<h4>2-3. Initialize dSFMT using dsfmt_init_by_array function.</h4> |
302 |
<p> |
303 |
Here is <strong>sample3.c</strong> which modifies sample1.c. |
304 |
The 32-bit integer seed can only make 2<sup>32</sup> kinds of |
305 |
initial state, to avoid this problem, dSFMT |
306 |
provides <strong>dsfmt_init_by_array</strong> function. This sample |
307 |
uses dsfmt_init_by_array function which initialize the internal state |
308 |
array with an array of 32-bit. The size of an array can be |
309 |
larger than the internal state array and all elements of the |
310 |
array are used for initialization, but too large array is |
311 |
wasteful. |
312 |
</p> |
313 |
<blockquote> |
314 |
<pre> |
315 |
#include <stdio.h> |
316 |
#include <string.h> |
317 |
#include "dSFMT.h" |
318 |
|
319 |
int main(int argc, char* argv[]) { |
320 |
int i, cnt, seed_cnt; |
321 |
double x, y, pi; |
322 |
const int NUM = 10000; |
323 |
uint32_t seeds[100]; |
324 |
dsfmt_t dsfmt; |
325 |
|
326 |
if (argc >= 2) { |
327 |
seed_cnt = 0; |
328 |
for (i = 0; (i < 100) && (i < strlen(argv[1])); i++) { |
329 |
seeds[i] = argv[1][i]; |
330 |
seed_cnt++; |
331 |
} |
332 |
} else { |
333 |
seeds[0] = 12345; |
334 |
seed_cnt = 1; |
335 |
} |
336 |
cnt = 0; |
337 |
dsfmt_init_by_array(&dsfmt, seeds, seed_cnt); |
338 |
for (i = 0; i < NUM; i++) { |
339 |
x = dsfmt_genrand_close_open(&dsfmt); |
340 |
y = dsfmt_genrand_close_open(&dsfmt); |
341 |
if (x * x + y * y < 1.0) { |
342 |
cnt++; |
343 |
} |
344 |
} |
345 |
pi = (double)cnt / NUM * 4; |
346 |
printf("%f\n", pi); |
347 |
return 0; |
348 |
} |
349 |
</pre> |
350 |
</blockquote> |
351 |
<p>To compile <strong>sample3.c</strong>, type</p> |
352 |
<blockquote> |
353 |
<pre>gcc -DDSFMT_MEXP=1279 -o sample3 dSFMT.c sample3.c</pre> |
354 |
</blockquote> |
355 |
<p>Now, seed can be a string. Like this:</p> |
356 |
<blockquote> |
357 |
<pre>./sample3 your-full-name</pre> |
358 |
</blockquote> |
359 |
</body> |
360 |
</html> |