Big Data and Machine Learning - London - Meetup #6

Tuesday, July 18, 2017 - 18:30
Big Data and Machine Learning - London

Meetup #6

PLEASE NOTE: Limit of 80 attendees (see below)

Welcome to Meetup #6, and what we hope will be another interesting evening of presentations and lightning talks.  You are encouraged to participate in the Q&A session and hope that the networking gives you the opportunity to meet with the presenters, other attendees and the organisers of the Meetup.

The agenda is listed below, followed by further details about the main presentations and their presenters.

Looking forward to meeting many of you at this Meetup, and for those who are unable to join us, hope to see you at one of our other meetups throughout the year.

Please also note that there is a maximum limit of 80 attendees for this event.  However, in common with other Meetups, we unfortunately see quite a high no-show rate (despite pleading with people to release their places if they find they are unable to attend).  We have decided to raise the number of attendees who can register, in the hope that we can get closer to a full-house.  We will however have to stick to a maximum of 60 through the doors.  So on the night, it will be on a first come, first served.  So make sure you turn up early to guarantee entry! 

If you have already RSVP’d “YES”, but find you are no longer able to attend, please make the effort to release your space ASAP to enable those on the Waiting List the opportunity to attend – THANKS!

Should you wish to contact me, email me at [masked].

Kindest regards



18:30  Drinks reception and networking

18:55  Welcome (5 mins)

19:00  R Language: R Language: What is R? Why R? Why not R?

              Prof. Algirdas Pakštas

            (30 mins)

19:30  Row store vs column store saving a second.  So what?

            Rob Dempsey

            (30 mins)

20:00  The Lab Series

               Mark Whalley

            (30 mins)

20:30  SHOUT OUT

            1. Manuel Timita - Illustreets

            2. TBC

            3. TBC

            (15 mins)

20:45  Networking /  'Beer & Pizza'

21:30  Close


R Language: R Language: What is R? Why R? Why not R?

This talk will explore the origin and usage of the R programming language which is an important tool for development in the numeric analysis and machine learning spaces. According to the experts "R is the most popular language used in the field of statistics". It is free, open source, powerful and highly extensible environment which contains a lot of pre-packaged stuff. The R programmable environment uses command-line scripting which allows to design, store and re-use a series of complex data-analysis in steps, which in turn makes it easier for others to validate research results and check for errors. According to Tiobe's year-end ratings of language searches there was a noted increase in R language search popularity in 2014 and for comparison Java, PHP, and C++ remain highly popular but are dropping. And why not R? It can be daunting because R syntax is different from that of many other languages. Brief comparison of Pros/Cons of using R and Python for Data Sciences is presented.

Prof. Algirdas Pakštas

Vilnius University Institute of Mathematics and Informatics Department of Systems Analysis


Row store vs column store saving a second.  So what?

Why you would chose a column store over a row store for an analytics database.

Rob Dempsey

Solution driven database specialist / data scientist with fifteen years’ experience project leading, architecting and developing applications and database products.


The Lab Series

In this continuation of The Lab Series, Mark will outline the next phase of the Mini Project for capturing Automatic Dependent Surveillance Broadcast (ADS-B) to track the position, speed and other metrics of commercial aircraft. 

So far we have discussed the background to ADS-B, and how to install and configure DUMP1090 to capture and decode the digital broadcast signals from aircraft transponders.  This was followed by sessions which introduced the various outputs from DUMP1090, outlined a method for transforming and feeding this data into Kafka topics and how, using Vertica’s inbuilt tools for integrating with Apache Kafka, we can populate a series of Vertica tables in near real time, and make this data immediately available for querying and analytics.  The last session demonstrated how streaming data being loaded into Vertica in near real-time, can be immediately interrogated to provide visualisations of that data, including using geospatial elements and the Google Maps API to plot positions of aircraft.

This session, and the last in this current phase of The Lab Series, we will look at some of the advanced analytics, machine learning and predictive analytics components of Vertica and how these can be applied to the flight tracking data.

As previously discussed, The Lab Series presentations at the F2F Meetups are primarily intended to discuss what is happening in the Mini Projects that are covered in more detail during the ONLINE Meetup events and workshops/hackathons.  These events have been recorded and are available on request.

Further details of The Lab Series can be found on the Agenda of Meetup #1.

Mark Whalley

From the early 1980s, Mark worked with Michael Stonebraker's Ingres RDBMS and then column-store big data analytic technologies. In 2016, he joined HPE Big Data Platform as a Systems Engineer specialising in Vertica and Vertica SQL in Hadoop.

Hewlett Packard Enterprise, 1 Aldermanbury Square