Challenging Web-Scale Graph Analytics with Apache Spark

Wednesday, June 28, 2017 - 18:30
NYC Data Science
New York

Hello NYC data scientists! We're excited to announce our next meetup, featuring a talk from Joseph Bradley, an engineer at Databricks and committer on the Apache Spark project. Come hear from Joseph about graph analytics at scale with Spark!

The meetup is hosted at Datadog's new HQ in the New York Times building. Datadog is sponsoring food and drink for the meetup as well.


6:30pm: Arrive, mingle, grab a bite and a drink
7:00pm: Welcome from JM Saponaro (Datadog)
7:05pm: "Challenging Web-Scale Graph Analytics with Apache Spark" by Joseph Bradley (Databricks)
7:50pm: Q&A
8:00pm: Wrap-up

Joseph's talk:

Graph analytics has a wide range of applications, from information propagation and network flow optimization to fraud and anomaly detection. The rise of social networks and the Internet of Things has given us complex web-scale graphs with billions of vertices and edges. However, in order to extract the hidden gems within those graphs, you need tools to analyze the graphs easily and efficiently. 

At Spark Summit 2016, Databricks introduced GraphFrames, which implemented graph queries and pattern matching on top of Spark SQL to simplify graph analytics. In this talk, you’ll learn about work that has made graph algorithms in GraphFrames faster and more scalable. For example, new implementations like connected components have received algorithm improvements based on recent research, as well as performance improvements from Spark DataFrames. Discover lessons learned from scaling the implementation from millions to billions of nodes; compare its performance with other popular graph libraries; and hear about real-world applications. 

About Joseph:

Joseph Bradley is an Apache Spark Committer and PMC member working as a Machine Learning Software Engineer at Databricks. Previously, he was a postdoc at UC Berkeley after receiving his Ph.D. in Machine Learning from Carnegie Mellon. 


620 8th ave 45th Floor