Streaming Data Pipelines and Kafka as a Message Queue

Date: 
Wednesday, September 20, 2017 - 18:30
Source: 
Apache Kafka London
Attendees: 
170
City: 
London

Join us on Slack

6.30pm - Doors open, Food + Drinks, Network

7.00pm - Talk - "Look Ma, no Code! Building Streaming Data Pipelines with Apache Kafka" with Robin Moffatt from Confluent

Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again! Companies new and old are all recognising the importance of a low-latency, scalable, fault-tolerant data backbone, in the form of the Apache Kafka streaming platform. With Kafka, developers can integrate multiple sources and systems, which enables low latency analytics, event driven architectures and the population of multiple downstream systems. These data pipelines can be built using configuration alone. In this talk, we'll see how easy it is to stream data from a database such as Oracle into Kafka using the Kafka Connect API. In addition, we'll use KSQL to filter, aggregate and join it to other data, and then stream this from Kafka out into multiple targets such as Elasticsearch and MySQL. All of this can be accomplished without a single line of code! Why should Java geeks have all the fun?

Robin is a Partner Technology Evangelist at Confluent, the company founded by the creators of Apache Kafka, as well as an Oracle ACE Director. His career has always involved data, from the old worlds of COBOL and DB2, through the worlds of Oracle and Hadoop, and into the current world with Kafka. His particular interests are analytics, systems architecture, performance testing and optimization. He blogs at www.confluent.io/blog/author/robin and rmoff.net (and previously http://ritt.md/rmoff) and can be found tweeting grumpy geek thoughts as @rmoff. Outside of work he enjoys drinking good beer and eating fried breakfasts, although generally not at the same time. 


7:50 pm - Talk - "Kafka as a Message Queue: can you do it, and should you do it?" with Adam Warski from SoftwareMill 

Using Kafka's offset commit mechanism we can implement a message processing system with at-least-once delivery guarantees. However, we can only acknowledge processing of all messages up to a given point (offset). That’s very often enough, but not always.
Sometimes we need selective, out-of-order message acknowledgments, like in a “traditional” message queue. If a message is not acknowledged for a given period of time, it should be re-delivered. Can this be implemented on top of Kafka? Sure! (By the way: this is similar to how Amazon’s SQS works.)
In the talk I’ll describe the architecture and implementation of a message queue built on Kafka: kmq. We’ll go through two crucial components: the queue client and the message redelivery tracker. There will be some live coding, some slides, and a couple of demos. 
We’ll also look at the performance (which is surprisingly good) & latency, as well as possible problems that using such an approach can cause, such as “error-flooding”.
Kmq is open-source and available at github.com/softwaremill/kmq.

Adam is one of the co-founders of SoftwareMill, where he codes mainly using Scala and other interesting technologies. He's involved in open-source projects, such as MacWire, ScalaClippy, Quicklens, ElasticMQ and others. He's been a speaker at major conferences, such as JavaOne, LambdaConf, Devoxx and ScalaDays.
Apart from writing closed- and open-source software, in his free time he tries to read the Internet on various (functional) programming-related subjects. Any ideas or insights usually end up with a blog (softwaremill.com/blog).

Centrica Connected Home

20 Rathbone Place, W1T 1HY