Big Data Workshop – July 3, 2014

Tools to Tackle Big Data header.

Click on each presenter’s name below to download their presentation slides.

Fields from health care to social media benefit from leveraging big data, and there are many parallel world-class research efforts underway here at U of T to address the challenges. This one-day workshop unites investigators from across the university, who share the common goal of tackling high-impact societal problems in need of multi-faceted solutions. Discovering new and valuable approaches to these complex issues demands cross-disciplinary collaboration between fields ranging from engineering to astrophysics, mathematics to public health.

Faculty and graduate students are invited to join us on Thursday, July 3, 2014 at the Bahen Centre for Information Technology, room 1130. Attend the talks or submit posters for presentation.

Directions to the Bahen Centre for Information Technology—Room 1130, 40 St. George Street, University of Toronto

Tentative Agenda

TimeSessionTitle - Click for abstract and bio
10:00 - 10:15Welcome
10:15 - 10:45Nick Koudas - Computer ScienceA retrospection of social media analytics
10:45 - 11:15Coffee
11:15 - 11:45Ue-Li Pen - AstrophysicsBig Data in Radio Astronomy
11:45 - 12:15Brendan Frey - ECEData Science
12:15 - 1:30Lunch & Posters
1:30 - 2:00Stephen Strother - Medical BiophysicsNeuroinformatics Pipelines in the BrainCODE Neuroscience Data Centre
2:00 - 2:30Cristiana Amza - ECEScheduling and Guided Search in Highly Parametrized Spaces for Neuroscience Applications
2:30 - 3:30Coffee
3:30 - 4:00Blair Adamache &
Bertrand Brelier - IBM
Big Data Research Projects run by IBM with 7 Ontario Universities
Data Mining in High Energy Physics
4:00 - 4:30Jason Anderson - ECEFrom Software to Circuits: Open Source High-Level Synthesis for FPGA-based Processor/Accelerator Systems
4:30 - 5:00Paul Chow - ECESeeking Opportunities for Hardware Acceleration in Big Data Analytics
5:00Closing remarks
Nick Koudas

Abstract: This talk will provide an overview of the work that we have been conducting over the last ten years on collecting, storing and analyzing social media data. I will present an overview of the BlogScope system an early social media analytics platform, Grapevine a social news system and Peckalytics. In each case we will outline the main challenges and technology we developed to address them. These include both algorithmic challenges as well as system and design optimizations that enabled us to support real-time analysis at scale.

Bio: Nick Koudas received a Bachelors Degree from the University of Patras  in Greece, an MSc from the University of Maryland at College Park  and a PhD degree from the University of Toronto. He co-founded Sysomos a social media monitoring and analytics company and served as the CEO. He conducts research in all aspects of data management and analysis. At present he is primarily interested in big data analytics and social media. He has received two best paper awards for his research and served as the research DB track chair for CIKM 2010, as research program co-chair for VLDB 2011 and the Industrial program chair for SIGMOD 2013; also as a member of the G7 at the Creative Destruction Lab at the Rotman School of Business. He was named the 2011 inventor of the year by the University of Toronto (1st prize) and serves as an advisor at Jolt and Extreme Startups in Toronto.

Ue-Li Pen

Abstract: Global VLBI generates petabytes of data that are shipped around the world and processed.  We describe a new Toronto VLBI initiative which combines the Algonquin Radio Observatory and the SOSCIP BGQ for this task.

Bio: PhD: Princeton, 1995.  Postdoc: Harvard Junior Fellow, 95-98.  At Toronto since 1998.  Currently associate director, Canadian Institute for Theoretical Astrophysics.

Brendan Frey

Abstract: Big data is useless without interpretation. The field of data science is about extracting knowledge from data. This talk will discuss my group’s activities in the field of data science, with a particular focus on my interdisciplinary collaborations that involve engineers, computer scientists, biologists and medical researchers.

Bio: Dr. Brendan J. Frey is a Professor at the University of Toronto, with appointments in Engineering and Medicine. He conducts research in the fields of genome biology and machine learning. Dr. Frey holds the Canada Research Chair in Biological Computation, and is a Fellow of the Canadian Institute for Advanced Research, the Institute of Electrical and Electronic Engineers and the American Institute for the Advancement of Science. He has received several distinctions, including the John C Polanyi Award, the EWR Steacie Fellowship and Canada’s Top 40 Leaders Under 40 Award. Dr. Frey has consulted for several industrial research and development laboratories in Canada, the United States and England, and he is currently on the Technical Advisory Board of Microsoft Research. His former students and postdoctoral fellows include professors, industrial researchers and developers at universities and industrial laboratories from across Canada, the United States and Europe.

Stephen Strother

Abstract: I will provide an overview of the BrainCODE neuroscience data repository being developed for the Ontario Brain Institute (OBI) by a consortium of Ontario research groups including the Ontario Cancer Biomarker Network, Rotman Research Institute (RRI), Baycrest, the Electronic Health Information Laboratory, Ottawa, and the High Performance Computing Virtual Laboratory, Kingston. Within BrainCODE we are developing both standardized processing and analysis pipelines, without restricting the broader range of tools that researchers may wish to use. I will illustrate such pipelines with the specific goal of optimizing processing and analysis of BOLD fMRI data to discover reliable brain networks that support task performance, and are linked with behavioral responses.

Bio: Dr. Strother studied Physics and Mathematics at Auckland University, New Zealand, and received a PhD in Electrical Engineering from McGill University, Montreal in 1986, where he developed early Positron Emission Tomography (PET) techniques at the Montreal Neurological Institute. After a fellowship at Memorial Sloan Kettering Cancer Center, New York, in 1989 he joined the VA Medical Center, Minneapolis as senior PET Physicist, and Assistant Professor of Radiology at the University of Minnesota where he became Professor of Radiology in 2002. In 2004 he moved to Toronto as a senior scientist at the Rotman Research Institute (RRI), Baycrest where he is Associate Site Leader in the multi-institutional Centre for Stroke Recovery (CSR), and Professor of Medical Biophysics at the University of Toronto. His research interests include neuroinformatics with a focus on optimization of PET, EEG and fMRI/MRI neuroimaging pipelines using statistical and machine learning techniques for research and clinical applications applied to the brain’s lifespan. He initiated and has led the neuroinformatics developments within CSR and RRI, Baycrest since 2007, and currently leads the neuroinformatics imaging group of the BrainCODE data repository for the Ontario Brain Institute. He is also a cofounder of Predictek, Inc., and ADMdx in Chicago, medical analysis and diagnostics companies, an Associate Editor for Human Brain Mapping, and a past member of the Neurotechnology Review Committee and past chairman of an international Neuroinformatics Standards Committee at the National Institutes of Health, USA.

Cristiana Amza

Abstract: Neuroscience applications process large data sets consisting in functional Magnetic Resonance Imaging (fMRI) brain images, through a set of pipelined parametrized operations. Overall, a brain model is built for determining correlations across brain images of several patients (subjects) with given accuracy and reproducibility optimizations goals. While the application may process large amounts of data, the operations performed are also compute and memory intensive e.g., Eigen Value Decomposition. We study methods for scheduling tasks efficiently as well as optimizing the task selection for the purposes of reducing the overall run-time and allowing interactive modelling. Our platform is a Cloud infrastructure we have built ourselves on top of a heterogeneous cluster of multi-core CPU and GPU components.

Bio: Cristiana Amza is an associate professor with the Department of Electrical and Computer Engineering at University of Toronto. Cristiana received her B.S. degree in Computer Engineering from Bucharest Polytechnic Institute in 1991, and her M.S. and Ph.D. degrees in Computer Science from Rice University in 1997 and 2003 respectively. Her research interests are in infrastructure design for distributed systems that can automatically adapt to a changing environment and workload through self-managing, self-tuning and self-healing. Her most recent work focuses on design, implementation and evaluation of adaptive algorithms for data analytics in distributed systems.

Blair Adamache

Abstract: An overview of how IBM high-performance computing is aiding big data research projects in disciplines such as computer engineering, electronic health records, physics.

Bio: Project Executive, IBM Canada Research & Development Centre/SOSCIP Scientific Advisory Committee Member – Blair Adamache has spent over 25 years working in the software industry, the majority of it relational database software development. His roles have included market development, product planning, and leadership in quality assurance, release management, multi-platform development, customer support, and software maintenance. In the latter capacity, he managed teams across Europe, Asia, and North America and had responsibility for some of IBM’s largest and most profitable middleware. Most recently, he has led development of database replication software, and worked in technical sales for IBM’s PureData analytics appliances. He has published numerous relational database articles on the web, and also holds a patent in the field of data integrity. He has a B.A (McGill) and M.A. (Waterloo), and has also studied at Ryerson and the University of Toronto. He currently maintains offices at IBM Canada in Markham, and at the University of Toronto, and is the Project Executive for the IBM Canada Research and Development Centre.

Bertrand Brelier

Abstract: This talk will review the techniques used to analyze petabytes of data using a distributed cloud system for the Large Hadron Collider experiments.

Bio: Dr. Bertrand Brelier, Research Scientist, IBM obtained his PhD in High Energy Physics at the Universite de Montreal and at the Universite Joseph Fourier, Grenoble, France. He then worked for the University of Toronto on the ATLAS experiment. He supports SOSCIP projects (Southern Ontario Smart Computing Innovation Platform) on the BlueGene/Q platform.

Jason Anderson

Abstract: Implementing computations in hardware vs. software can provide orders of magnitude improvement in computational speed and energy efficiency. However, hardware design is challenging compared to writing software, and hardware expertise is comparatively rare. In this talk, we will describe a high-level synthesis tool, called LegUp, being developed at the University of Toronto. LegUp accepts a standard C soft program as input and automatically compiles the program to a hybrid architecture containing a processor and custom hardware accelerators. LegUp, along with a set of benchmark C programs, is open source and freely downloadable, providing a powerful platform that can be leveraged for new research on a wide range of high-level synthesis topics. The tool has been downloaded by over 1000 groups from around the world since its initial release in March 2011. The talk will overview LegUp’s current capabilities, as well as current research directions underway.

Bio: Jason Anderson received the B.Sc. degree in computer engineering from the University of Manitoba, and the M.A.Sc. and Ph.D. degrees in electrical and computer engineering (ECE) from the University of Toronto (U of T). He is an Associate Professor with the Department of ECE, U of T. From 1997-2008, he was with the field-programmable gate array (FPGA) implementation tools group at Xilinx, Inc., in San Jose, CA, and Toronto, ON. From 2005 to 2008, he managed groups at Xilinx focused on placement, routing, and strategic projects. He became a Principal Engineer at Xilinx in 2007. He joined the ECE Department at Toronto in 2008. He has received five awards for excellence in undergraduate teaching, holds 25 U.S. patents, and has authored over 60 papers in refereed journals and symposia. His research interests include all aspects of computer-aided design (CAD), architecture and circuits for FPGAs.

Paul Chow

Abstract: Field-Programmable Gate Arrays (FPGAs) have been used successfully as accelerators for many applications because of the ability to build application-specific engines and memory systems.  FPGAs are particularly well-suited to the processing of fine-grain data, pattern matching, streaming data applications, and networking.  FPGAs have also been shown to have significant advantages from the power perspective.  In this talk, I will present an overview of FPGA capabilities and work we are doing that we believe could be useful in the Big Data field.  While we believe there can be significant benefit to using FPGAs in Big Data applications, we lack knowledge about real applications and seek possible collaborations.

Bio: Paul Chow is a Professor in the Department of Electrical & Computer Engineering at the University of Toronto where he holds the Dusan and Anne Miklas Chair in Engineering Design.  Prior to joining UofT in 1988 he was at the Computer Systems Laboratory at Stanford University, Stanford, CA, as a Research Associate, where he was a major contributor to an early RISC microprocessor design called MIPS-X, one of the first microprocessors with an on-chip instruction cache and the root of many concepts used in processors today.  His research interests include high performance computer architectures, reconfigurable computing, embedded and application-specific processors, and field-programmable gate array architectures and applications.