Declarative Resource Management for Real-time ETL with Decodable
|
Note
|
This post originally appeared on the Decodable blog. |
So you’ve built your first real-time ETL pipeline with Decodable: congratulations! Now what?
|
Note
|
This post originally appeared on the Decodable blog. |
So you’ve built your first real-time ETL pipeline with Decodable: congratulations! Now what?
|
Note
|
This post originally appeared on the Decodable blog. |
You’d think once was enough. Having already written about the trouble that I had getting Flink SQL to write to S3 (including on MinIO) this should now be a moot issue for me. Right? RIGHT?!
|
Note
|
This post originally appeared on the Decodable blog. |
Amazon Managed Service for Apache Flink (MSF) is one of several providers of hosted Flink. As my colleague Gunnar Morling described in his recent article , it can be used to run a Flink job that you’ve written in Java or Python (PyFlink). But did you know that this isn’t the only way—or perhaps even the best way—to have your Flink jobs run for you?
|
Note
|
This post originally appeared on the Decodable blog. |
Sometimes it’s not possible to have too much of a good thing, and whilst this blog may look at first-glance rather similar to the one that I published just recently , today we’re looking at a 100% pure Apache solution. Because who knows, maybe you prefer rolling your own tech stacks instead of letting Decodable do it for you 😉.
|
Note
|
This post originally appeared on the Decodable blog. |
One of the things that I love about SQL is the power that it gives you to work with data in a declarative manner. I want this thing…go do it. How should it do it? Well that’s the problem for the particular engine, not me. As a language with a pedigree of multiple decades and no sign of waning (despite a wobbly patch for some whilst NoSQL figured out they actually wanted to be NewSQL 😉), it’s the lingua franca of data systems.
|
Note
|
This post originally appeared on the Decodable blog. |
Apache Iceberg is an open table format. It combines the benefits of data lakes (open standards, cheap object storage) with the good things that data warehouses have, like first-class support for tables and SQL capabilities including updates to data in place, time-travel, and transactions. With the recent acquisition by Databricks of Tabular—one of the main companies that contribute to Iceberg—it’s clear that Iceberg is winning out as one of the primary contenders in this space.
|
Note
|
This post originally appeared on the Decodable blog. |
Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your hosts and esteemed curators of said content are Gunnar Morling and Robin Moffatt (your editor-in-chief for this edition). Feel free to send our way any choice nuggets that you think we should feature in future editions.
I do my best to try and keep, if not abreast of, then at least aware of what’s going on in the world of data. That includes RDBMS, Event streaming, stream processing, open source data projects, data engineering, object storage, and more. If you’re interested in the same, then you might find this blog useful, because I’m sharing my sources :)
Let’s not bury the lede: it was DNS. However, unlike the meme ("It’s not DNS, it’s never DNS. It was DNS"), I didn’t even have an inkling that DNS might be the problem.
I’m writing a new blog about streaming Apache Kafka data to Apache Iceberg and wanted to provision a local Kafka cluster to pull data from remotely. I got this working nicely just last year using ngrok to expose the broker to the interwebz, so figured I’d use this again. Simple, right?
Nope.
After a break from using AWS I had reason to reacquaint myself with it again today, and did so via the CLI. The AWS CLI is pretty intuitive and has a good helptext system, but one thing that kept frustrasting me was that after closing the help text, the screen cleared—so I couldn’t copy the syntax out to use in my command!
The same thing happened when I ran a command that returned output - the screen cleared.
Here’s how to fix either, or both, of these
|
Note
|
This post originally appeared on the Decodable blog. |
I never meant to write this blog. I had a whole blog series about Flink SQL lined up…and then I started to write it and realised rapidly that one’s initial exposure to Flink and Flink SQL can be somewhat, shall we say, interesting. Interesting, as in the curse, "may you live in interesting times". Because as wonderful and as powerful Flink is, it is not a simple beast to run for yourself, even as a humble developer just trying to try out some SQL.
|
Note
|
This post originally appeared on the Decodable blog. |
Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your hosts and esteemed curators of said content are Gunnar Morling and Robin Moffatt (your editor-in-chief for this edition). Feel free to send our way any choice nuggets that you think we should feature in future editions.
At this year’s Kafka Summit I’m planning to continue the tradition of going for a run (or walk) with anyone who’d like to join in. This started back at Kafka Summit San Francisco in 2019 over the Golden Gate Bridge and has continued since then. Whilst London’s Docklands might not offer quite the same experience it’ll be fun nonetheless.
This year Kafka Summit London includes a dedicated track for talks about Apache Flink. This reflects the continued rise of interest and use of Apache Flink in the streaming community, as well as the focus that Confluent (the hosts of Kafka Summit) has on it.
I’m looking forward to being back at Kafka Summit. I will be speaking on Tuesday afternoon, room hosting on Wednesday morning, and hanging out at the Decodable booth in between too.
Here’s a list of all the Flink talks, including the talk, time, and speaker. You find find more details, and the full Kafka Summit agenda, here.
|
Note
|
This post originally appeared on the Decodable blog. |
The SQL Gateway in Apache Flink provides a way to run SQL in Flink from places other than the SQL Client. This includes using a JDBC Driver (which opens up a multitude of clients), a Hive client via the HiveServer2 endpoint , and directly against the REST Endpoint .
|
Note
|
This post originally appeared on the Decodable blog. |
I will wager you half of my lottery winnings from 2023[1] that you’re going to encounter this lovely little error at some point on your Flink SQL journey:
|
Note
|
This post originally appeared on the Decodable blog. |
Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your hosts and esteemed curators of said content are Gunnar Morling and Robin Moffatt (your editor-in-chief for this edition). Feel free to send our way any choice nuggets that you think we should feature in future editions.
|
Note
|
This post originally appeared on the Decodable blog. |
In the previous blog post I looked at the role of catalogs in Flink SQL, the different types, and some of the quirks around their configuration and use. If you are new to Flink SQL and catalogs, I would recommend reading that post just to make sure you’re not making some of the same assumptions that I mistakenly did when looking at this for the first time.
|
Note
|
This post originally appeared on the Decodable blog. |
When you’re using Flink SQL you’ll run queries that interact with objects.
An INSERT against a TABLE, a SELECT against a VIEW, for example.
These objects are defined using DDL—but where do these definitions live?
At Decodable we migrated our docs platform onto Antora. I wrote previously about my escapades in getting cross-repository authentication working using Private Access Tokens (PAT). These are fine for just a single user, but they’re tied to that user, which isn’t a good practice for deployment in this case.
In this article I’ll show how to use GitHub Apps and Installation Access Tokens (IAT) instead, and go into some detail on how we’ve deployed Antora. Our GitHub repositories are private which makes it extra-gnarly.