"transforms" : "addTimestampToTopic",
"transforms.addTimestampToTopic.type" : "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.addTimestampToTopic.topic.format" : "${topic}_${timestamp}",
"transforms.addTimestampToTopic.timestamp.format": "YYYY-MM-dd"
Loading CSV data into Confluent Cloud using the FilePulse connector
The FilePulse connector from Florian Hussonnois is a really useful connector for Kafka Connect which enables you to ingest flat files including CSV, JSON, XML, etc into Kafka. You can read more it in its overview here. Other connectors for ingested CSV data include kafka-connect-spooldir (which I wrote about previously), and kafka-connect-fs.
Here I’ll show how to use it to stream CSV data into a topic in Confluent Cloud. You can apply the same config pattern to any other secured Kafka cluster.
Using ksqlDB to process data ingested from ActiveMQ with Kafka Connect
The ActiveMQ source connector creates a Struct holding the value of the message from ActiveMQ (as well as its key). This is as would be expected. However, you can encounter challenges in working with the data if the ActiveMQ data of interest within the payload is complex. Things like converters and schemas can get really funky, really quick.
Kafka Connect JDBC Sink deep-dive: Working with Primary Keys
The Kafka Connect JDBC Sink can be used to stream data from a Kafka topic to a database such as Oracle, Postgres, MySQL, DB2, etc.
It supports many permutations of configuration around how primary keys are handled. The documentation details these. This article aims to illustrate and expand on this.
Kafka Connect - SQLSyntaxErrorException: BLOB/TEXT column … used in key specification without a key length
I got the error SQLSyntaxErrorException: BLOB/TEXT column 'MESSAGE_KEY' used in key specification without a key length
with Kafka Connect JDBC Sink connector (v10.0.2) and MySQL (8.0.23)
Running a self-managed Kafka Connect worker for Confluent Cloud
Confluent Cloud is not only a fully-managed Apache Kafka service, but also provides important additional pieces for building applications and pipelines including managed connectors, Schema Registry, and ksqlDB. Managed Connectors are run for you (hence, managed!) within Confluent Cloud - you just specify the technology to which you want to integrate in or out of Kafka and Confluent Cloud does the rest.
Creating topics with Kafka Connect
When Kafka Connect ingests data from a source system into Kafka it writes it to a topic. If you have set auto.create.topics.enable = true
on your broker then the topic will be created when written to. If auto.create.topics.enable = false
(as it is on Confluent Cloud and many self-managed environments, for good reasons) then you can tell Kafka Connect to create those topics first. This was added in Apache Kafka 2.6 (Confluent Platform 6.0) - prior to that you had to manually create the topics yourself otherwise the connector would fail.
Kafka Connect - Deep Dive into Single Message Transforms
KIP-66 was added in Apache Kafka 0.10.2 and brought new functionality called Single Message Transforms (SMT). Using SMT you can modify the data and its characteristics as it passes through Kafka Connect pipeline, without needing additional stream processors. For things like manipulating fields, changing topic names, conditionally dropping messages, and more, SMT are a perfect solution. If you get to things like aggregation, joining streams, and lookups then SMT may not be the best for you and you should head over to Kafka Streams or ksqlDB instead.
🎄 Twelve Days of SMT 🎄 - Day 12: Community Transformations
🎄 Twelve Days of SMT 🎄 - Day 11: Predicate and Filter
Apache Kafka 2.6 included KIP-585 which adds support for defining predicates against which transforms are conditionally executed, as well as a Filter
Single Message Transform to drop messages - which in combination means that you can conditionally drop messages.
As part of Apache Kafka, Kafka Connect ships with pre-built Single Message Transforms and Predicates, but you can also write you own. The API for each is documented: Transformation
/ Predicate
. The predicates that ship with Apache Kafka are:
-
RecordIsTombstone
- The value part of the message is null (denoting a tombstone message) -
HasHeaderKey
- Matches if a header exists with the name given -
TopicNameMatches
- Matches based on topic
🎄 Twelve Days of SMT 🎄 - Day 10: ReplaceField
The ReplaceField
Single Message Transform has three modes of operation on fields of data passing through Kafka Connect:
-
Include only the fields specified in the list (
whitelist
) -
Include all fields except the ones specified (
blacklist
) -
Rename field(s) (
renames
)
🎄 Twelve Days of SMT 🎄 - Day 9: Cast
The Cast
Single Message Transform lets you change the data type of fields in a Kafka message, supporting numerics, string, and boolean.
🎄 Twelve Days of SMT 🎄 - Day 8: TimestampConverter
The TimestampConverter
Single Message Transform lets you work with timestamp fields in Kafka messages. You can convert a string into a native Timestamp type (or Date or Time), as well as Unix epoch - and the same in reverse too.
This is really useful to make sure that data ingested into Kafka is correctly stored as a Timestamp (if it is one), and also enables you to write a Timestamp out to a sink connector in a string format that you choose.
🎄 Twelve Days of SMT 🎄 - Day 7: TimestampRouter
Just like the RegExRouter
, the TimeStampRouter
can be used to modify the topic name of messages as they pass through Kafka Connect. Since the topic name is usually the basis for the naming of the object to which messages are written in a sink connector, this is a great way to achieve time-based partitioning of those objects if required. For example, instead of streaming messages from Kafka to an Elasticsearch index called cars
, they can be routed to monthly indices e.g. cars_2020-10
, cars_2020-11
, cars_2020-12
, etc.
The TimeStampRouter
takes two arguments; the format of the final topic name to generate, and the format of the timestamp to put in the topic name (based on SimpleDateFormat
).
🎄 Twelve Days of SMT 🎄 - Day 6: InsertField II
We kicked off this series by seeing on day 1 how to use InsertField
to add in the timestamp to a message passing through the Kafka Connect sink connector. Today we’ll see how to use the same Single Message Transform to add in a static field value, as well as the name of the Kafka topic, partition, and offset from which the message has been read.
"transforms" : "insertStaticField1",
"transforms.insertStaticField1.type" : "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.insertStaticField1.static.field": "sourceSystem",
"transforms.insertStaticField1.static.value": "NeverGonna"
🎄 Twelve Days of SMT 🎄 - Day 5: MaskField
If you want to mask fields of data as you ingest from a source into Kafka, or write to a sink from Kafka with Kafka Connect, the MaskField
Single Message Transform is perfect for you. It retains the fields whilst replacing its value.
To use the Single Message Transform you specify the field to mask, and its replacement value. To mask the contents of a field called cc_num
you would use:
"transforms" : "maskCC",
"transforms.maskCC.type" : "org.apache.kafka.connect.transforms.MaskField$Value",
"transforms.maskCC.fields" : "cc_num",
"transforms.maskCC.replacement" : "****-****-****-****"
🎄 Twelve Days of SMT 🎄 - Day 4: RegExRouter
If you want to change the topic name to which a source connector writes, or object name that’s created on a target by a sink connector, the RegExRouter
is exactly what you need.
To use the Single Message Transform you specify the pattern in the topic name to match, and its replacement. To drop a prefix of test-
from a topic you would use:
"transforms" : "dropTopicPrefix",
"transforms.dropTopicPrefix.type" : "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropTopicPrefix.regex" : "test-(.*)",
"transforms.dropTopicPrefix.replacement" : "$1"
🎄 Twelve Days of SMT 🎄 - Day 3: Flatten
The Flatten
Single Message Transform (SMT) is useful when you need to collapse a nested message down to a flat structure.
To use the Single Message Transform you only need to reference it; there’s no additional configuration required:
"transforms" : "flatten",
"transforms.flatten.type" : "org.apache.kafka.connect.transforms.Flatten$Value"
🎄 Twelve Days of SMT 🎄 - Day 2: ValueToKey and ExtractField
Setting the key of a Kafka message is important as it ensures correct logical processing when consumed across multiple partitions, as well as being a requirement when joining to messages in other topics. When using Kafka Connect the connector may already set the key, which is great. If not, you can use these two Single Message Transforms (SMT) to set it as part of the pipeline based on a field in the value part of the message.
To use the ValueToKey
Single Message Transform specify the name of the field (id
) that you want to copy from the value to the key:
"transforms" : "copyIdToKey",
"transforms.copyIdToKey.type" : "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.copyIdToKey.fields" : "id",
🎄 Twelve Days of SMT 🎄 - Day 1: InsertField (timestamp)
You can use the InsertField
Single Message Transform (SMT) to add the message timestamp into each message that Kafka Connect sends to a sink.
To use the Single Message Transform specify the name of the field (timestamp.field
) that you want to add to hold the message timestamp:
"transforms" : "insertTS",
"transforms.insertTS.type" : "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.insertTS.timestamp.field": "messageTS"