Beyond Word2Vec: Recent Developments in Document Embedding

Tuesday, October 24, 2017 - 18:00
SF Bayarea Machine Learning
San Francisco

Main Talk: Beyond Word2Vec: Recent Developments in Document Embedding - Andrew Blevins (Metis (

Abstract: It is easy to be amazed by their seemingly magical power of word2vec. But in real business use cases, we rarely need to understand single words. So how do we apply the power of word2vec to phrases, sentences, paragraphs or entire documents? We will compare various techniques of generating useful representations of documents of indeterminate length and look at ways of comparing methods.

We will start with bag-of-words approaches and TFIDF. From there we will look at dimensionality reduction techniques like LSA or NMF. After that, we will look at word2vec and sense2vec and various ways to aggregate those word vectors, including summing, weighting, clustering, Chinese restaurant processes, Gensim Doc2vec and developing parse tree representations. Finally, we will look at RNN methods such as LSTMs using Keras. Along the way, we will look at ways to evaluate each of these methods and discuss strengths and weaknesses.

Bio: Andrew comes to Metis from LinkedIn, where he worked as a data scientist, on projects ranging from executive dashboarding, education, inferring profiles and skills standardization. He is passionate about helping people make rational decisions and building cool data products. Prior to that he worked on fraud modelling at IMVU (the lean startup) and studied applied physics at Cornell. Andrew grew up on a sheep farm in North Idaho. He loves snowboarding, traveling, scotch and reading about all kinds of nerdy topics.

Lightning Talk: Machine Learning at TrueAccord - Nadav Samet (True Accord (

Abstract: TrueAccord reinvents debt collection and empowers consumers to regain financial health. Using machine learning and behavioral analytics we replace the majority of human to human interactions with human to machine interactions and make a significant impact on millions of consumers in one of the most regulated industries in the US.

Bio: Nadav has over 20 years of coding experience, with more than 7 years as a solutions engineer for various startups. He began his career at the elite technological unit of the IDF’s Intelligence Corps where he specialized in data and network analysis.

Tentative Schedule:

6:00pm-6:45pm -- pre-reception

6:45pm-7:00pm -- lightning talk

7:00pm-8:00pm -- main talk

8:00pm-8:30pm-- post-reception


Thanks to True Accord ( for hosting, food, and drinks!

Thanks to Intel ( for supporting video recording.


NOTE: attendees will have to sign a release indicating they only had access to the public speaking area of the office (ie, this is not an "NDA"), for TrueAccord compliance purposes.

Intel ML Contest:

Intel also created a mini contest for all the participants. If you have a ML project and want to showcase it, share it, or collaborate it with others, submit it to DevMesh. Everyone who submits a project to DevMesh will get remote access to Machine Learning Servers. On top of that, best projects will be selected and each winner will receive a $50 gift card

Instructions to join DevMesh:

Create a new account at

Join your dedicated group - Artificial Intelligence West Coast

To submit a project, click on “add a project”
*when submitting your project, make sure to select “Artificial Intelligence West Coast” as your group.

To receive invitations for Intel webinars, news and tools for Machine Learning and Deep Learning, register on this link


303 2nd Street