| 1 |
<?xml version="1.0" encoding="UTF-8" ?> |
| 2 |
<!DOCTYPE html |
| 3 |
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" |
| 4 |
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| 5 |
<html xmlns="http://www.w3.org/1999/xhtml"> |
| 6 |
<head> |
| 7 |
<meta http-equiv="Content-Type" content="text/html" /> |
| 8 |
<title>How to compile dSFMT</title> |
| 9 |
<style type="text/css"> |
| 10 |
BLOCKQUOTE {background-color:#a0ffa0; |
| 11 |
padding-left: 1em;} |
| 12 |
</style> |
| 13 |
</head> |
| 14 |
<body> |
| 15 |
<h2> How to compile dSFMT</h2> |
| 16 |
|
| 17 |
<p> |
| 18 |
This document explains how to compile dSFMT for users who |
| 19 |
are using UNIX like systems (for example Linux, Free BSD, |
| 20 |
cygwin, osx, etc) on terminal. I can't help those who use IDE |
| 21 |
(Integrated Development Environment,) please see your IDE's help |
| 22 |
to use SIMD feature of your CPU. |
| 23 |
</p> |
| 24 |
|
| 25 |
<h3>1. First Step: Compile test programs using Makefile.</h3> |
| 26 |
<h4>1-1. Compile standard C test program.</h4> |
| 27 |
<p> |
| 28 |
Check if dSFMT.c and Makefile are in your current directory. |
| 29 |
If not, <strong>cd</strong> to the directory where they exist. |
| 30 |
Then, type |
| 31 |
</p> |
| 32 |
<blockquote> |
| 33 |
<pre>make std</pre> |
| 34 |
</blockquote> |
| 35 |
<p> |
| 36 |
If it causes an error, try to type |
| 37 |
</p> |
| 38 |
<blockquote> |
| 39 |
<pre>cc -DDSFMT_MEXP=19937 -o test-std-M19937 dSFMT.c test.c</pre> |
| 40 |
</blockquote> |
| 41 |
<p> |
| 42 |
or try to type |
| 43 |
</p> |
| 44 |
<blockquote> |
| 45 |
<pre>gcc -DDSFMT_MEXP=19937 -o test-std-M19937 dSFMT.c test.c</pre> |
| 46 |
</blockquote> |
| 47 |
<p> |
| 48 |
If success, then check the test program. Type |
| 49 |
</p> |
| 50 |
<blockquote> |
| 51 |
<pre>./test-std-M19937 -v</pre> |
| 52 |
</blockquote> |
| 53 |
<p> |
| 54 |
You will see many random numbers displayed on your screen. |
| 55 |
If you want to check these random numbers are correct output, |
| 56 |
redirect output to a file and <strong>diff</strong> it with |
| 57 |
<strong>dSFMT.19937.out.txt</strong>, like this:</p> |
| 58 |
<blockquote> |
| 59 |
<pre>./test-std-M19937 -v > foo.txt |
| 60 |
diff -w foo.txt dSFMT.19937.out.txt</pre> |
| 61 |
</blockquote> |
| 62 |
<p> |
| 63 |
Silence means they are the same because <strong>diff</strong> |
| 64 |
reports the difference of two files. |
| 65 |
</p> |
| 66 |
<p> |
| 67 |
If you want to know the generation speed of dSFMT, type |
| 68 |
</p> |
| 69 |
<blockquote> |
| 70 |
<pre>./test-std-M19937 -s</pre> |
| 71 |
</blockquote> |
| 72 |
<p> |
| 73 |
It is very slow. To make it fast, compile it |
| 74 |
with <strong>-O3</strong> option. If your compiler is gcc, you |
| 75 |
should specify <strong>-fno-strict-aliasing</strong> option |
| 76 |
with <strong>-O3</strong>. type |
| 77 |
</p> |
| 78 |
<blockquote> |
| 79 |
<pre>gcc -O3 -fno-strict-aliasing -DDSFMT_MEXP=19937 -o test-std-M19937 dSFMT.c test.c |
| 80 |
./test-std-M19937 -s</pre> |
| 81 |
</blockquote> |
| 82 |
<p> |
| 83 |
If you are using gcc 4.0, you will get more performance of dSFMT |
| 84 |
by giving additional options |
| 85 |
<strong>--param max-inline-insns-single=1800</strong>, |
| 86 |
<strong>--param inline-unit-growth=500</strong> and |
| 87 |
<strong>--param large-function-growth=900</strong>. |
| 88 |
</p> |
| 89 |
|
| 90 |
<h4>1-2. Compile SSE2 test program.</h4> |
| 91 |
<p> |
| 92 |
If your CPU supports SSE2 and you can use gcc version 3.4 or later, |
| 93 |
you can make test-sse2-M19937. To do this, type |
| 94 |
</p> |
| 95 |
<blockquote> |
| 96 |
<pre>make sse2</pre> |
| 97 |
</blockquote> |
| 98 |
<p>or type</p> |
| 99 |
<blockquote> |
| 100 |
<pre>gcc -O3 -msse2 -fno-strict-aliasing -DHAVE_SSE2=1 -DDSFMT_MEXP=19937 -o test-sse2-M19937 dSFMT.c test.c</pre> |
| 101 |
</blockquote> |
| 102 |
<p>If everything works well,</p> |
| 103 |
<blockquote> |
| 104 |
<pre>./test-sse2-M19937 -s</pre> |
| 105 |
</blockquote> |
| 106 |
<p>shows much shorter time than <strong>test-std-M19937 -s</strong>.</p> |
| 107 |
|
| 108 |
<h4>1-3. Compile AltiVec test program.</h4> |
| 109 |
<p> |
| 110 |
If you are using Macintosh computer with PowerPC G4 or G5, and |
| 111 |
your gcc version is later 3.3, you can make test-alti-M19937. To |
| 112 |
do this, type |
| 113 |
</p> |
| 114 |
<blockquote> |
| 115 |
<pre>make osx-alti</pre> |
| 116 |
</blockquote> |
| 117 |
<p>or type</p> |
| 118 |
<blockquote> |
| 119 |
<pre>gcc -O3 -faltivec -fno-strict-aliasing -DHAVE_ALTIVEC=1 -DDSFMT_MEXP=19937 -o test-alti-M19937 dSFMT.c test.c</pre> |
| 120 |
</blockquote> |
| 121 |
<p>If everything works well,</p> |
| 122 |
<blockquote> |
| 123 |
<pre>./test-alti-M19937 -s</pre> |
| 124 |
</blockquote> |
| 125 |
<p>shows much shorter time than <strong>test-std-M19937 -s</strong>.</p> |
| 126 |
|
| 127 |
<h4>1-4. Compile and check output automatically.</h4> |
| 128 |
<p> |
| 129 |
To make test program and check output |
| 130 |
automatically for all supported SFMT_MEXPs of dSFMT, type |
| 131 |
</p> |
| 132 |
<blockquote> |
| 133 |
<pre>make std-check</pre> |
| 134 |
</blockquote> |
| 135 |
<p> |
| 136 |
To check test program optimized for SSE2, type |
| 137 |
</p> |
| 138 |
<blockquote> |
| 139 |
<pre>make sse2-check</pre> |
| 140 |
</blockquote> |
| 141 |
<p> |
| 142 |
To check test program optimized for OSX PowerPC AltiVec, type |
| 143 |
</p> |
| 144 |
<blockquote> |
| 145 |
<pre>make osx-alti-check</pre> |
| 146 |
</blockquote> |
| 147 |
<p> |
| 148 |
These commands may take some time. |
| 149 |
</p> |
| 150 |
|
| 151 |
<h3>2. Second Step: Use dSFMT pseudorandom number generator with |
| 152 |
your C program.</h3> |
| 153 |
<h4>2-1. Use sequential call and static link.</h4> |
| 154 |
<p> |
| 155 |
Here is a very simple program <strong>sample1.c</strong> which |
| 156 |
calculates PI using Monte-Carlo method. |
| 157 |
</p> |
| 158 |
<blockquote> |
| 159 |
<pre> |
| 160 |
#include <stdio.h> |
| 161 |
#include <stdlib.h> |
| 162 |
#include "dSFMT.h" |
| 163 |
|
| 164 |
int main(int argc, char* argv[]) { |
| 165 |
int i, cnt, seed; |
| 166 |
double x, y, pi; |
| 167 |
const int NUM = 10000; |
| 168 |
dsfmt_t dsfmt; |
| 169 |
|
| 170 |
if (argc >= 2) { |
| 171 |
seed = strtol(argv[1], NULL, 10); |
| 172 |
} else { |
| 173 |
seed = 12345; |
| 174 |
} |
| 175 |
cnt = 0; |
| 176 |
dsfmt_init_gen_rand(&dsfmt, seed); |
| 177 |
for (i = 0; i < NUM; i++) { |
| 178 |
x = dsfmt_genrand_close_open(&dsfmt); |
| 179 |
y = dsfmt_genrand_close_open(&dsfmt); |
| 180 |
if (x * x + y * y < 1.0) { |
| 181 |
cnt++; |
| 182 |
} |
| 183 |
} |
| 184 |
pi = (double)cnt / NUM * 4; |
| 185 |
printf("%f\n", pi); |
| 186 |
return 0; |
| 187 |
} |
| 188 |
</pre> |
| 189 |
</blockquote> |
| 190 |
<p>To compile <strong>sample1.c</strong> with dSFMT.c with the period of |
| 191 |
2<sup>607</sup>, type</p> |
| 192 |
<blockquote> |
| 193 |
<pre>gcc -DDSFMT_MEXP=521 -o sample1 dSFMT.c sample1.c</pre> |
| 194 |
</blockquote> |
| 195 |
<p>If your CPU supports SSE2 and you want to use optimized dSFMT for |
| 196 |
SSE2, type</p> |
| 197 |
<blockquote> |
| 198 |
<pre>gcc -msse2 -DDSFMT_MEXP=521 -DHAVE_SSE2 -o sample1 dSFMT.c sample1.c</pre> |
| 199 |
</blockquote> |
| 200 |
<p>If your Computer is Apple PowerPC G4 or G5 and you want to use |
| 201 |
optimized dSFMT for AltiVec, type</p> |
| 202 |
<blockquote> |
| 203 |
<pre>gcc -faltivec -DDSFMT_MEXP=521 -DHAVE_ALTIVEC -o sample1 dSFMT.c sample1.c</pre> |
| 204 |
</blockquote> |
| 205 |
|
| 206 |
<h4>2-2. Use block call and static link.</h4> |
| 207 |
<p> |
| 208 |
Here is <strong>sample2.c</strong> which modifies sample1.c. |
| 209 |
The block call <strong>dsfmt_fill_array_close_open</strong> is |
| 210 |
much faster than sequential call, but it needs an aligned |
| 211 |
memory. The standard function to get an aligned memory |
| 212 |
is <strong>posix_memalign</strong>, but it isn't usable in every |
| 213 |
OS. |
| 214 |
</p> |
| 215 |
<blockquote> |
| 216 |
<pre> |
| 217 |
#include <stdio.h> |
| 218 |
#define _XOPEN_SOURCE 600 |
| 219 |
#include <stdlib.h> |
| 220 |
#include "dSFMT.h" |
| 221 |
|
| 222 |
int main(int argc, char* argv[]) { |
| 223 |
int i, j, cnt, seed; |
| 224 |
double x, y, pi; |
| 225 |
const int NUM = 10000; |
| 226 |
const int R_SIZE = 2 * NUM; |
| 227 |
int size; |
| 228 |
double *array; |
| 229 |
dsfmt_t dsfmt; |
| 230 |
|
| 231 |
if (argc >= 2) { |
| 232 |
seed = strtol(argv[1], NULL, 10); |
| 233 |
} else { |
| 234 |
seed = 12345; |
| 235 |
} |
| 236 |
size = dsfmt_get_min_array_size(); |
| 237 |
if (size < R_SIZE) { |
| 238 |
size = R_SIZE; |
| 239 |
} |
| 240 |
#if defined(__APPLE__) || \ |
| 241 |
(defined(__FreeBSD__) && __FreeBSD__ >= 3 && __FreeBSD__ <= 6) |
| 242 |
printf("malloc used\n"); |
| 243 |
array = malloc(sizeof(double) * size); |
| 244 |
if (array == NULL) { |
| 245 |
printf("can't allocate memory.\n"); |
| 246 |
return 1; |
| 247 |
} |
| 248 |
#elif defined(_POSIX_C_SOURCE) |
| 249 |
printf("posix_memalign used\n"); |
| 250 |
if (posix_memalign((void **)&array, 16, sizeof(double) * size) != 0) { |
| 251 |
printf("can't allocate memory.\n"); |
| 252 |
return 1; |
| 253 |
} |
| 254 |
#elif defined(__GNUC__) && (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 3)) |
| 255 |
printf("memalign used\n"); |
| 256 |
array = memalign(16, sizeof(double) * size); |
| 257 |
if (array == NULL) { |
| 258 |
printf("can't allocate memory.\n"); |
| 259 |
return 1; |
| 260 |
} |
| 261 |
#else /* in this case, gcc doesn't suppport SSE2 */ |
| 262 |
array = malloc(sizeof(double) * size); |
| 263 |
if (array == NULL) { |
| 264 |
printf("can't allocate memory.\n"); |
| 265 |
return 1; |
| 266 |
} |
| 267 |
#endif |
| 268 |
cnt = 0; |
| 269 |
j = 0; |
| 270 |
dsfmt_init_gen_rand(&dsfmt, seed); |
| 271 |
dsfmt_fill_array_close_open(&dsfmt, array, size); |
| 272 |
for (i = 0; i < NUM; i++) { |
| 273 |
x = array[j++]; |
| 274 |
y = array[j++]; |
| 275 |
if (x * x + y * y < 1.0) { |
| 276 |
cnt++; |
| 277 |
} |
| 278 |
} |
| 279 |
free(array); |
| 280 |
pi = (double)cnt / NUM * 4; |
| 281 |
printf("%f\n", pi); |
| 282 |
return 0; |
| 283 |
} |
| 284 |
</pre> |
| 285 |
</blockquote> |
| 286 |
<p>To compile <strong>sample2.c</strong> with dSFMT.c with the period of |
| 287 |
2<sup>2281</sup>, type</p> |
| 288 |
<blockquote> |
| 289 |
<pre>gcc -DDSFMT_MEXP=2203 -o sample2 dSFMT.c sample2.c</pre> |
| 290 |
</blockquote> |
| 291 |
<p>If your CPU supports SSE2 and you want to use optimized dSFMT for |
| 292 |
SSE2, type</p> |
| 293 |
<blockquote> |
| 294 |
<pre>gcc -msse2 -DDSFMT_MEXP=2203 -DHAVE_SSE2 -o sample2 dSFMT.c sample2.c</pre> |
| 295 |
</blockquote> |
| 296 |
<p>If your computer is Apple PowerPC G4 or G5 and you want to use |
| 297 |
optimized dSFMT for AltiVec, type</p> |
| 298 |
<blockquote> |
| 299 |
<pre>gcc -faltivec -DDSFMT_MEXP=2203 -DHAVE_ALTIVEC -o sample2 dSFMT.c sample2.c</pre> |
| 300 |
</blockquote> |
| 301 |
<h4>2-3. Initialize dSFMT using dsfmt_init_by_array function.</h4> |
| 302 |
<p> |
| 303 |
Here is <strong>sample3.c</strong> which modifies sample1.c. |
| 304 |
The 32-bit integer seed can only make 2<sup>32</sup> kinds of |
| 305 |
initial state, to avoid this problem, dSFMT |
| 306 |
provides <strong>dsfmt_init_by_array</strong> function. This sample |
| 307 |
uses dsfmt_init_by_array function which initialize the internal state |
| 308 |
array with an array of 32-bit. The size of an array can be |
| 309 |
larger than the internal state array and all elements of the |
| 310 |
array are used for initialization, but too large array is |
| 311 |
wasteful. |
| 312 |
</p> |
| 313 |
<blockquote> |
| 314 |
<pre> |
| 315 |
#include <stdio.h> |
| 316 |
#include <string.h> |
| 317 |
#include "dSFMT.h" |
| 318 |
|
| 319 |
int main(int argc, char* argv[]) { |
| 320 |
int i, cnt, seed_cnt; |
| 321 |
double x, y, pi; |
| 322 |
const int NUM = 10000; |
| 323 |
uint32_t seeds[100]; |
| 324 |
dsfmt_t dsfmt; |
| 325 |
|
| 326 |
if (argc >= 2) { |
| 327 |
seed_cnt = 0; |
| 328 |
for (i = 0; (i < 100) && (i < strlen(argv[1])); i++) { |
| 329 |
seeds[i] = argv[1][i]; |
| 330 |
seed_cnt++; |
| 331 |
} |
| 332 |
} else { |
| 333 |
seeds[0] = 12345; |
| 334 |
seed_cnt = 1; |
| 335 |
} |
| 336 |
cnt = 0; |
| 337 |
dsfmt_init_by_array(&dsfmt, seeds, seed_cnt); |
| 338 |
for (i = 0; i < NUM; i++) { |
| 339 |
x = dsfmt_genrand_close_open(&dsfmt); |
| 340 |
y = dsfmt_genrand_close_open(&dsfmt); |
| 341 |
if (x * x + y * y < 1.0) { |
| 342 |
cnt++; |
| 343 |
} |
| 344 |
} |
| 345 |
pi = (double)cnt / NUM * 4; |
| 346 |
printf("%f\n", pi); |
| 347 |
return 0; |
| 348 |
} |
| 349 |
</pre> |
| 350 |
</blockquote> |
| 351 |
<p>To compile <strong>sample3.c</strong>, type</p> |
| 352 |
<blockquote> |
| 353 |
<pre>gcc -DDSFMT_MEXP=1279 -o sample3 dSFMT.c sample3.c</pre> |
| 354 |
</blockquote> |
| 355 |
<p>Now, seed can be a string. Like this:</p> |
| 356 |
<blockquote> |
| 357 |
<pre>./sample3 your-full-name</pre> |
| 358 |
</blockquote> |
| 359 |
</body> |
| 360 |
</html> |