Group Title: Performance analysis tools for partitioned global-address-space programming models
Title: Talk abstract
Full Citation
Permanent Link:
 Material Information
Title: Talk abstract
Physical Description: Book
Language: English
Creator: Leko, Adam
Publisher: Leko et al.
Place of Publication: Gainesville, Fla.
Publication Date: 2006
Copyright Date: 2006
 Record Information
Bibliographic ID: UF00094706
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.


This item has the following downloads:

PGAS06 ( PDF )

Full Text

Performance Analysis Tools for Partitioned Global-Address-Space Programming Models

Adam Leko', Hung-Hsun Su', Dan Bonachea2, Max Billingsley III1,
Hans Sherburne', Bryan Golden', and Alan D. George'

1Department of Electrical and Computer Engineering
University of Florida
{leko, su, billingsley, sherburne, golden,

2Department of Computer Science
University of California at Berkeley

ExtendedAbstract (Oral I ,.. .. ,iir. l demonstration)

The Partitioned Global-Address-Space (PGAS) programming model provides important productivity advantages over traditional
parallel programming models. However, due to their implementation complexity, languages and libraries using PGAS models
currently have little to no support from existing performance tools. We have designed the Global Address Space Performance tool
interface (GASP) that is flexible enough to be used with any PGAS model, while simultaneously allowing existing performance tools
to leverage their existing tool's infrastructure to quickly add support for programming languages and libraries using PGAS models.
Additionally, we have developed Parallel Performance Wizard (PPW), a performance tool focused towards PGAS models, which
provides a strong proof-of-concept for the GASP interface.


The GASP interface was born out of the need to support several different PGAS models in our PPW tool. As we studied different
PGAS implementations we soon realized that large development complexity would be necessary to support different PGAS model
implementations, as the flexibility of PGAS models allow for many implementation techniques. This development complexity is
prohibitive to performance tool developers, inhibiting wide-scale support for PGAS languages in performance tools.

To encourage performance tool support for PGAS models, we created a simple, model-independent, portable and flexible performance
tool interface based on callbacks that is especially useful for capturing performance from programs using PGAS models. In our talk,
we will give a high-level overview of the GASP tool interface, showing how the interface has been successfully implemented in
Berkeley UPC and how the interface can be easily applied to upcoming high-productivity programming languages such as X10,
Fortress, and Chapel. The presentation will include empirical results demonstrating the instrumentation overheads of GASP for
several UPC applications, and the scalability of the approach. A major goal of this presentation is to raise awareness for the GASP
interface so that it gains momentum for both performance tool developers and PGAS model implementers.


PPW is a performance tool specifically geared towards maximizing user productivity for tuning programs using PGAS models. The
tool features easy-to-use compiler wrapper scripts that control the instrumentation process, and support for profile and trace data.
Additionally, PPW provides a cross-platform user interface with PGAS-specific visualizations, including an array layout visualization
feature for UPC programs, a communication visualization that gives a detailed breakdown of inter-thread communication (Figure 1),
and profile data viewers that give a high-level overview of program performance (Figure 2). The tool also includes a SLOG-2 trace
export for use with the scalable Jumpshot trace viewer (Figure 3). Finally, PPW integrates support for hardware counters through the
PAPI library so users may perform detailed analysis of architectural behavior in computational regions of their applications.

As part of our talk, we will give a brief demonstration of the PPW tool, showing how the tool uses the GASP interface and how the
visualizations aid users in pinpointing bottlenecks in their applications.

Project References

PPW: ppw includes alpha release of PPW tool
GASP: includes draft GASP specification
Berkeley UPC:

File Took Help
Profile Table Tree Table Communcaton I Reurce View I Profe Charts I
Metric: Time I Thread: l! Threads
Nam I Total sI elf I Max Cout lSu. Cut
A Root J
Apprcabon 46,285,153 23,506,038 2,892,224 2,893,122 16 4664
B pc notify 2,124062 2,12,062 1 49,577 736 0
ft. 491 1,412 1,412 13 104 16 0
ftcb623 293 293 11 27 16 0
Sft.r:1943 121,198 121,198 11 2,988 128 o
ft.c:330 5,510 5,510 26 453 16 0
ft.c;339 3,051 3,061 17 344 16 0
Ft,c 364 137,362 137,362 9 1,956 96
Sft.c:962 1,587 1,587 2 147 96 o
ft, :971 18.969 18,969 1 556 96 0
ft.r 989 14,907 14907 4 176 96 0
*ftc 387 12,104 12/104 4 810 16 0 i

1938 int i:
1939 long unsigned nt chunk = TDIVMP / Th EADS;
1941 tlmer start( TALLTOALL );
1943 upc_barrier;
1944 / XX Forrran version uses an HPIalltoall() here /!
1946 for ( 0; < THREADS; i++)
1947 {
1948 upc_ memget( (dcomplex )dst[mYIlPEA1ID].cell[chukn'k],
1949 &src[i].cell[chunk*MTRqEAD ],
195 szeof( complex ) chunk );
1951 )
1953 upc_barrier;
1954 -
1955 tiner stop( T ALLTOALL )

Fi,,, 1: Calltree profile viewer. The display shows profile data
for a 16-node run of the NAS FT benchmark alongside the
original source code.

Ble lools delp
Profile Tabe I Tree Table Communicaton I Resource View I Profile Charts I

Mebc Avg Payoad 5ze Operatn Puts + Gets Payload 51 e AI Payloads
Data Affinity Thread 520000
0 1 1 3 4 5 6 7 8 9 10 11 12 13 14 15

2muNurmmi OmE


(UNIT Bytes Op)

Figure 2: Communication visualization. This viewer allows one
to view payload size statistics for gets and puts broken down by
payload size.
7- M I


Lowest / Max. Depti'4f : : L .: .i'., ri ..
0/3 4 '








1 4

@ LinelD i

.2 0.44

/_111I 4' ~

,i. .i-i l .7:

IC r 1 T..-e T.,-i 5, ,51T e e

---------- TinneLines -i

l ...]. ..'illl'il.. -

duration= 1.174 msec
0]: time= 0 571450, LinelD= 0
1]: time 0.572624, LinelD = 0
Source code: ft.c:1950

I I ) I I I i
0.46 0.48 o50 0.52 0.54 0.56 0.58 o6
Time (seconds)



Fit All Roi

F*,,,.' 3: SLOG-2 export viewed in the Jumpshot viewer. Right-clicking on events in the timeline
brings up detailed information about that event, including source code information. This type of
viewer is very useful for visually detecting load imbalances, such as the imbalance of time spent in
up cnoti fy in thread 1 in this example.

Tinnllin : IfT.aI I 61ra.' t.(. gm

- Drawable~~i I

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs