Overview
This directory contains a simple example that sums values in a tree.
The example exhibits some speedup, but not a lot,  because it quickly saturates 
the system bus on a multiprocessor.  For good speedup, there needs to be 
more computation cycles per memory reference.  The point of the example 
is to teach how to use the raw task interface, so the computation is
deliberately trivial.
The performance of this example is better when objects are allocated
by the scalable_allocator instead of
the default "operator new".  The reason is that the scalable_allocator typically
packs small objects more tightly than the default "operator new", resulting in
a smaller memory footprint, and thus more efficient use of cache and virtual memory.
In addition, the scalable_allocator performs better for multi-threaded allocations.
Files
- SerialSumTree.cpp
- Sums sequentially.
- SimpleParallelSumTree.cpp- 
- Sums in parallel without any fancy tricks.
- OptimizedParallelSumTree.cpp- 
- Sums in parallel, using "recycling" and "continuation-passing" tricks. 
    In this case, it is only slightly faster than the simple version.
- common.h
- Shared declarations.
- main.cpp
- Main program which parses command line options and runs the algorithm.
- Makefile
- Makefile for building example.
Directories
- msvs
- Contains Microsoft* Visual Studio* 2005 workspace for building and running the 
    example (Windows* systems only).
- xcode
- Contains Xcode* IDE workspace for building and running the example (OS X*
    systems only).
To Build
General build directions can be found here.
Usage
- tree_sum -h
- Prints the help for command line options
- tree_sum [n-of-threads=value] [number-of-nodes=value] [silent] [stdmalloc]
- tree_sum [n-of-threads [number-of-nodes]] [silent] [stdmalloc] 
- n-of-threads is the number of threads to use; a range of the form low[:high], where low and optional high are non-negative integers or 'auto' for the default.
 number-of-nodes is the number of nodes in the tree.
 silent - no output except elapsed time.
 stdmalloc - causes the default "operator new" to be used for memory allocations instead of the scalable_allocator.
 
- To run a short version of this example, e.g., for use with Intel® Parallel Inspector:
- Build a debug version of the example
    (see the build directions).
    
 Run it with a small problem size and the desired number of threads, e.g., tree_sum 4 100000.
Up to parent directory
Copyright © 2005-2013 Intel Corporation.  All Rights Reserved.
Intel is a registered trademark or trademark of Intel Corporation
or its subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.