-
Notifications
You must be signed in to change notification settings - Fork 12
/
example.html
305 lines (284 loc) · 14.5 KB
/
example.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="icon" type="image/png" href="favicon.png" />
<link rel="stylesheet" href="css/style.css" type="text/css" media="all" />
<!-- WIN IE Style Sheets -->
<!--[if IE]>
<![if gte IE 5.5]>
<![if gte IE 7]><link rel="stylesheet"
type="text/css" media="screen,projection"
href="css/ie.css" />
<![endif]>
<![if lt IE 7]><link rel="stylesheet"
type="text/css" media="screen,projection"
href="css/ie.css" />
<![endif]>
<![endif]>
<![if lt IE 5.5]>
<link rel="stylesheet"
type="text/css" media="screen,projection"
href="css/ie.css" />
<![endif]>
<![endif]-->
<title>Intel® Implicit SPMD Program Compiler</title>
</head>
<body>
<div id="wrap">
<div id="wrap2">
<div id="header">
<h1 id="logo">Intel® Implicit SPMD Program Compiler</h1>
<div id="slogan">An open-source compiler for high-performance SIMD programming on
the CPU and GPU</div>
</div>
<div id="nav">
<div id="nbar">
<ul>
<li><a href="index.html">Overview</a></li>
<li><a href="features.html">Features</a></li>
<li><a href="downloads.html">Downloads</a></li>
<li id="selected"><a href="documentation.html">Documentation</a></li>
<li><a href="perf.html">Performance</a></li>
</ul>
</div>
</div>
<div id="content-wrap">
<div id="sidebar">
<div class="widgetspace">
<h1>Resources</h1>
<ul class="menu">
<li><a href="http://github.com/ispc/ispc">GitHub page</a></li>
<li><a href="https://github.com/ispc/ispc/discussions">Discussions on GitHub</a></li>
<li><a href="http://github.com/ispc/ispc/issues">Issues on Github</a></li>
<li><a href="https://github.com/orgs/ispc/projects/1">Release planning board</a></li>
<li><a href="https://github.com/ispc/ispc/blob/main/CONTRIBUTING.md">Contributing guide</a></li>
<li><a href="http://github.com/ispc/ispc/wiki">Wiki on Github</a></li>
</ul>
</div>
</div>
<div id="content">
<h1>A Simple ispc Example</h1>
<p>Here is a walkthrough of a simple example of using <tt>ispc</tt> to
compute an image of the Mandelbrot set. The full source code for this
example is in the <tt>examples/mandelbrot</tt> directory of
the <tt>ispc</tt> distribution; below is a lightly modified version of
that example. (For example, we have elided some of the details related
to benchmarking and writing the final output image that are in the
actual <tt>mandelbrot.cpp</tt> file there.)
</p>
<p>First we have some code in the main program
file, <tt>mandelbrot.cpp</tt>, which is implemented in C++. We mostly
have a regular C++ source file that sets a number of variables that drive
the overall computation and allocates memory to store the result of the
Mandelbrot set computation. The call to <tt>mandelbrot_ispc</tt> is a
regular subroutine call, though in this case it's calling out to code
that was written in <tt>ispc</tt> rather than in C/C++. Because
the <tt>ispc</tt> compiler generates regular object files using standard
C/C++ calling conventions, this kind of close interoperation between the
two languages is straightforward.
</p>
<p><code>
int main() {<br />
unsigned int width = 768, height = 512;<br />
float x0 = -2., x1 = 1.;<br />
float y0 = -1., y1 = 1.;<br />
int maxIterations = 256;<br />
int *buf = new int[width*height];<br />
<br />
mandelbrot_ispc(x0, y0, x1, y1, width, height, maxIterations, buf);<br />
<br />
// write output...<br />
}<br />
</code></p>
<p>
Here is the start of the definition of the <tt>mandelbrot_ispc()</tt>
function that the <tt>ispc</tt> compiler compiles. (This function is
defined in the <tt>mandelbrot.ispc</tt> file.) There are a four main
things to notice here.
</p>
<ol>
<li>The <tt>export</tt> qualifier indicates that this function should be
made available to be called by C/C++ code, not just from
other <tt>ispc</tt> functions.</li>
<li>This function directly takes parameters passed by the C/C++ code as
regular function parameters. No API calls or a runtime library are
necessary to set parameter values to pass to this function.</li>
<li>The parameters are all tagged with the <tt>uniform</tt>
qualifier. This qualifier indicates that a variable has the same
value for all of the concurrently-executing program instances; it is
discussed in more detail in the <a href="ispc.html">ispc User's
Manual</a>. Any values passed from the application to the <tt>ispc</tt>
program must be <tt>uniform</tt>.</li>
<li>The pointer to the output buffer is passed as an unsized array. Just
like in C, arrays in function calls are converted to pointers to the
first element of the array. (This parameter could also be declared to
just be a pointer.) Like the other function parameters, this buffer is
passed from the application to the <tt>ispc</tt> code without any copying
or reformatting, as would be required with a driver model based
implementation.</li>
</ol>
<p><code>
export void mandelbrot_ispc(uniform float x0, uniform float y0, <br />
uniform float x1, uniform float y1,<br />
uniform int width, uniform int height, <br />
uniform int maxIterations,<br />
uniform int output[]) {<br />
</code></p>
The function implementation starts by computing the step between pixels in
the x and y directions.
<p><code>
float dx = (x1 - x0) / width;<br />
float dy = (y1 - y0) / height;<br />
</code></p>
<p>
Next, the function loops over the pixels of the image with a doubly-nested
loop. The outer loop is a regular loop over scanlines, while the inner
loop introduces a new looping construct provided
by <tt>ispc</tt>, <tt>foreach</tt>. In general, <tt>foreach</tt> specifies
parallel iteration over an n-dimensional range of integer values. In this
case, it specifies parallel iteration over a 1-D range of pixels in a
scanline. Each time through the loop, the iteration variable <tt>i</tt>
takes on values corresponding to a small range of the domain.
</p>
<p>For example, if <tt>ispc</tt> was generating code to run in 8-wide
parallel fashion for an 8-wide AVX SIMD unit, <tt>i</tt> would have the
values (0,1,2,3,4,5,6,7) across the group of executing program instances
the first time the loop executed, (8,9,10,11,12,13,14,15) the second
time the loop executed, and so forth. See the other examples in
the <tt>ispc</tt> distribution for other examples of using <tt>foreach</tt>
to perform this mapping for other computations.
</p>
<p><code>
for (uniform int j = 0; j < height; j++) {<br />
foreach (i = 0 ... width) {<br />
</code></p>
<p>
Inside these loops, we first find the floating-point <tt>x</tt>
and <tt>y</tt> values that correspond to the pixels at which to compute the
Mandelbrot iteration counts; note that these are also straightforward
calculations. (Recall how the <tt>i</tt> variable is initialized by
the <tt>foreach</tt> loop to see how values derived from <tt>i</tt> have
different values across program instances, in this case corresponding to a
number of <tt>x</tt> positions at which to perform the Mandelbrot set test.
</p>
<p><code>
float x = x0 + i * dx;<br />
float y = y0 + j * dy;<br />
</code></p>
<p>
Once it has figured out the coordinates at which to compute the number of
iterations, the program finds the appropriate index into the output array
for them and calls a subroutine to do the actual computation.
</p>
<p><code>
int index = j * width + i;<br />
output[index] = mandel(x, y, maxIterations);<br />
}<br />
}<br />
}<br />
</code></p>
<p>
Here is the implementation of the <tt>mandel()</tt> routine. It takes real
and imaginary coordinates and a maximum iteration count and applies the
iterative computation that defines the Mandelbrot set. Although this
appears to be a serial function, only computing one result for one pair of
input coordinates, recall that <tt>ispc</tt>'s execution model is SPMD; the
program compiles down to run in parallel across the elements of the SIMD
unit, for example computing four results from four input coordinates in
parallel on a 4-wide SSE unit.
</p>
<p><code>
static inline int mandel(float c_re, float c_im, int count) {<br />
float z_re = c_re, z_im = c_im;<br />
int i;<br />
for (i = 0; i < count; ++i) {<br />
if (z_re * z_re + z_im * z_im > 4.)<br />
break;<br />
float new_re = z_re*z_re - z_im*z_im;<br />
float new_im = 2.f * z_re * z_im;<br />
z_re = c_re + new_re;<br />
z_im = c_im + new_im;<br />
}<br />
return i;<br />
}<br />
</code></p>
<p>To compile this example, install the <tt>ispc</tt> compiler in a
directory that is in your <tt>PATH</tt>, and run <tt>make</tt> on Mac OS
X or Linux, or build using the <tt>mandelbrot.vcxproj</tt> build file
with MSVC 2010. Note that the <tt>ispc</tt> compiler generates regular
object files that you can just link with object files compiled with C/C++
compilers. (Here is the <a href="mandelbrot.txt">assembly language
output</a> from compiling the <tt>mandelbrot.ispc</tt> source file for
the AVX target with the <tt>--emit-asm</tt> command-line option.)</p>
<p><code>
% make<br />
ispc -O2 --target=avx mandelbrot.ispc -o objs/mandelbrot_ispc.o -h
objs/mandelbrot_ispc.h<br />
g++ mandelbrot.cpp -Iobjs/ -O3 -Wall -c -o objs/mandelbrot.o<br />
g++ mandelbrot_serial.cpp -Iobjs/ -O3 -Wall -c -o
objs/mandelbrot_serial.o<br />
g++ -Iobjs/ -O3 -Wall -o mandelbrot objs/mandelbrot.o objs/mandelbrot_ispc.o
objs/mandelbrot_serial.o -lm<br />
%<br />
</code></p>
<p>
After the executable is built, running it benchmarks a serial
implementation of computing the Mandelbrot set and the <tt>ispc</tt>
version. (You may want to compare the files <tt>mandelbrot.ispc</tt>
and <tt>mandelbrot_serial.cpp</tt> to see the few syntactic differences
between a C++ implementation of this computation and the <tt>ispc</tt>
implementation.) On an Second Generation Intel® Core i7 CPU, doing
this computation in parallel across the SIMD lanes of a single core gives a
substantial speedup:
</p>
<p><code>
% ./mandelbrot<br />
[mandelbrot ispc]: [58.570] million cycles<br />
[mandelbrot serial]: [335.005] millon cycles<br />
(5.72x speedup from ISPC)<br />
%<br />
</code></p>
<p>
Here is a downscaled version of the image that the program generates:
</p>
<center>
<img src="mandelbrot.png" width="384" height="256" alt="Mandelbrot set image"/>
</center>
<p>Note that all of the exposition up until now has been related to
harvesting the available parallelism from the SIMD unit in a single CPU
core. To fully use the system's computational resources also requires
parallelism across the CPU cores—modern CPUs may from two to ten or
more processing cores, each of which has a SIMD vector unit.
</p>
<p> Thanks to the fact that <tt>ispc</tt> cleanly inter-operates with
application C/C++ code through a regular function call, one option would
be to rewrite the application code to be multi-threaded (for example,
using the task scheduler provided by
the <a href="http://threadingbuildingblocks.org/">Intel® Thread
Building Blocks</a> or
the <a href="http://msdn.microsoft.com/en-us/library/dd504870.aspx">Microsoft
Concurrency Runtime</a>) and to then have parallel tasks that were in turn
partially or fully implemented in <tt>ispc</tt>.
</p>
<p>The second option for parallelizing across cores is to use the built-in
functionality for launching concurrent tasks that <tt>ispc</tt> offers.
The <tt>ispc</tt> language provides constructs for launching asynchronous
tasks that can execute concurrently with the rest of the program; see the
directory <tt>examples/mandelbrot_tasks</tt> in the <tt>ispc</tt>
distribution to see how to parallelize across both cores and the SIMD
units.
</p>
<br />
</div>
</div>
<div class="clearfix"></div>
<div id="footer"> © <strong>Intel Corporation</strong> | Valid <a href="http://validator.w3.org/check?uri=referer">XHTML</a> | <a href="http://jigsaw.w3.org/css-validator/check/referer">CSS</a> | ClearBlue by: <a href="http://www.themebin.com/">ThemeBin</a>
<!-- Please Do Not remove this link, thank u -->
</div>
</div>
<!-- End Wrap2 -->
</div>
<!-- End Wrap -->
</body>
</html>