Location: Rm 2018, VTCRC 1880 Pratt Drive Blacksburg, VA

Invited Speaker: Judy Qiu, Indiana University

Title: Towards HPC-ABDS: An Initial Experience Optimizing Hadoop for Scalable High Performance Data Analytics

Abstract: With the increase in both volume and complexity of data nowadays, a runtime environment needs to integrate with community infrastructure which supports interoperable, sustainable and high performance data analytics. One solution is to converge Apache Big Data stack with a High Performance Cyberinfrastructure  (HPC-ABDS) into well-defined and implemented common building blocks, providing richness in capabilities and productivity. HPC-ABDS with about 300 packages aims to provide them in a library form, so that they can be reused by higher-level applications and tuned for a specific domain problem, such as Machine Learning. Harp is an open source project from Indiana University, that builds on our earlier work, Twister and Twister4Azure. We implemented Harp as a library that plugs into Hadoop and enables users to run complex data analysis algorithms on both clouds and supercomputers. We run Harp on K-means, Graph Layout, and Multidimensional Scaling algorithms with realistic application datasets over 4096 cores on the IU BigRed II Supercomputer (Cray/Gemini) where we have achieved linear speedup. This demonstrates the portability of HPC-ABDS to HPC and eventually Exascale systems.


[1] Judy Qiu, Shantenu Jha, Andre Luckow, Geoffrey C. Fox, Towards HPC-ABDS: An Initial High-Performance Big Data Stack, accepted to the proceedings of ACM 1st Big Data Interoperability Framework Workshop: Building Robust Big Data ecosystem, NIST special publication, March 13-21, 2014.

[2] Bingjing Zhang, Yang Ruan, Judy Qiu. Harp: Collective Communication on Hadoop, Proceedings of IEEE International Conference on Cloud Engineering (IC2E 2015).

Short Bio: Judy Qiu is an assistant professor of Computer Science at Indiana University. Her general area of research is in data-intensive computing at the intersection of Cloud and HPC multicore technologies. This includes a specialization on programming models that support iterative computation, ranging from storage to analysis which can scalably execute data intensive applications. Her research has been funded by NSF, NIH, Microsoft, Google, Intel and Indiana University. Judy Qiu is the director of a new Intel Parallel Computing Center (IPCC) site at IU. She is the recipient of a NSF CAREER Award in 2012, Indiana University Trustees Award for Teaching Excellence in 2013-2014, and Indiana University Outstanding Junior Faculty Award in 2015.