I’ve been doing some noodling around with Confluent’s Kafka Connect recently, as part of gaining a wider understanding into Kafka. If you’re not familiar with Kafka Connect this page gives a good idea of the thinking behind it.
One issue that I hit defeated my Google-fu so I’m recording it here to hopefully help out fellow n00bs.
The pipeline that I’d set up looked like this:
Eneco’s Twitter Source streaming tweets to a Kafka topic Confluent’s HDFS Sink to stream tweets to HDFS and define Hive table automagically over them It worked great, but only if I didn’t enable the Hive integration part.