Bay Area Apache Spark Meetup @ Pinterest, San Francisco

Date: 
Tuesday, August 22, 2017 - 18:30
Source: 
Bay Area Spark Meetup
Attendees: 
192
City: 
San Francisco

Join us for an evening of Bay Area Apache Spark Meetup featuring tech-talks about Apache Spark at scale from Pinterest and Databricks.

Thanks to Pinterest for hosting and sponsoring this meetup.

Agenda:

6:30 - 7:00 pm  Mingling & Refreshments

7:00 - 7:10 pm Welcome opening remarks, announcements, acknowledgments, and introductions.

7:10 - 7:50 pm Pinterest Tech Talk 1 

7:50 - 8:30 pm Databricks Tech Talk  2 

8:30 - 9:00 pm Mingling


Tech-Talk 1:  Large-scale batch processing at Pinterest with Apache Spark

Abstract: 

Pinterest is a data product and we rely heavily on processing a large amount of data for various use cases ranging from discovery products to business metric computation. Spark has been present at Pinterest since 2014, but it was only last year when it started to attract large scale use cases and the use cases are ever growing since. We are going talk about Pinterest’s journey on Spark so far and the technical challenges we have faced running a large scale data infrastructure in the cloud.

One of many use cases for the large-scale batch processing at Pinterest is our experiment framework. At Pinterest, we rely heavily on A/B experiments to make decisions about products and features. Every day we aim to have experiment results ready by 10 a.m. so we can make fast and well-grounded decisions. With more than 1,000 experiments running daily, crunching billions of records for more than 175 million Pinners, we need a reliable pipeline to support our growth and achieve our service-level agreement. We will discuss how we built the experiment framework to speed up the computational processes, make it more scalable and performant.

Bio(s):

Tien T. Nguyen is a software engineer at Pinterest. He is working on large-scale analysis systems.

Jooseong Kim is a software engineer in the big data platform team at Pinterest working on data infrastructure such as Yarn, Mapreduce, Spark, etc. Before Pinterest, he worked on database kernels.

Tech-Talk 2: Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark

Abstract: Deep Learning has shown a tremendous success, yet it often requires a lot of effort to leverage its power. Existing Deep Learning frameworks require writing a lot of code to work with a model, let alone in a distributed manner.

In this talk, we’ll survey the state of Deep Learning at scale, and where we introduce the Deep Learning Pipelines, a new open-source package for Apache Spark. This package simplifies Deep Learning in three major ways:

• It has a simple API that integrates well with enterprise Machine Learning pipelines.

• It automatically scales out common Deep Learning patterns, thanks to Spark.

• It enables exposing Deep Learning models through the familiar Spark APIs, such as MLlib and Spark SQL.

In this talk, we will look at a complex problem of image classification, using Deep Learning and Spark. Using Deep Learning Pipelines, we will show:

• how to build deep learning models in a few lines of code;

• how to scale common tasks like transfer learning and prediction; and

• how to publish models in Spark SQL.

Bio(s):

Sue Ann Hong is a software engineer in the Machine Learning team at Databricks where she contributes to MLlib and Deep Learning Pipelines Library. She got her Ph.D. at CMU studying machine learning and distributed optimization and worked as a software engineer at Facebook in Ads and Commerce.

Tim Hunter is a software engineer at Databricks and contributes to the Apache Spark MLlib project. He has been building distributed Machine Learning systems with Spark since version 0.2 before Spark was an Apache Software Foundation project.

How to Enter Pinterest

Pinterest is located at 580 7th St., San Francisco 94103 and our main entrance is noted with the green arrow.

https://lh6.googleusercontent.com/BstWMotzXkz8M9wPzDoBflzjj1YB5RKy0nA-Nu0VGVRcE4zQinUc947CH3EcAgpqCYGfFPfX5iu5c7cG_OG-KC_YuWajaQZ6u6NQDySxGSw9Z_B00St4jQhuvWPybfE7I47B0MZs

Transportation

We are located four blocks from the San Francisco Caltrain stop on 4th Street. Head northwest on 4th Street towards Townsend Street. Turn left onto Townsend and right onto 7th. Our office is on the left-hand side a block after Brannan Street.

Parking Instructions

Pinterest does not have on-site parking. There is metered street parking near the building as well as several parking garages within walking distance:

• 833 Bryant Street (Ampco parking)

• $15 per day

• 675 Townsend Street (Parking Concepts)

• $15 per day

 35 Gilbert Street (HPM)

•  $15 per day

Nor does Pinterest have an on-site bike parking. If you plan to bring a bike, bring a lock, park and lock it outside the Pinterest premise.

Pinterest

580 7th St