Readers of a certain age and RDBMS background will probably remember northwind
, or HR
, or OE
databases - or quite possibly not just remember them but still be using them. Hardcoded sample data is fine, and it’s great for repeatable tutorials and examples - but it’s boring as heck if you want to build an example with something that isn’t using the same data set for the 100th time.
I’ve written before about one of my favourite resources for mocking data, Mockaroo, and how you can even use it to stream mock data into Kafka. Other mock data generators for Kafka include kafka-connect-datagen and Voluble.
Sometimes though, you just want some real, live, warts-and-all data. And there is fortunately a real shift in governments and public bodies in recent years to Open data. Here is a list of some of my (UK-centric) resources. Many have a mix of live and static datasets.
-
Northern Data Hub - Bradford City Council data, including the car park live stream that I used in this talk
-
Data Mill North - 685 published datasets from Leeds City Council
-
data.gov.uk - Huge listing of open data provided by the UK government
-
UK Environment Agency flood-monitoring API - this is one of my favourites, because not only do you get a live feed of river levels from around the UK, you get to make awful puns about streams (geddit?!)
-
Transport for London (TfL) - Great source of data about the capital’s transport system, including lots of live feeds
-
Network Rail - a nice feed of data all about the UK rail network. I had fun with this data here :)
What are your go-to sources for real data? Let me know and I’ll add them to this list.