Reflections of a Developer on LLMs in January 2026

Published by in AI, Stumbling Into AI, Claude Code at https://rmoff.net/2026/01/27/reflections-of-a-developer-on-llms-in-january-2026/

Funnily enough, Charles Dickens was talking about late 18th century Europe rather than the state of AI and LLMs in 2026, but here goes:

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of light, it was the season of darkness, it was the spring of hope, it was the winter of despair.

For the last few weeks I’ve been coming back to this quotation, again and again. It is the best of times (so far) for AI—you can literally describe an idea for a program or website, and it’s generated for you. Hallucinations are becoming fewer. This is so much more than simply guessing the next word. Honestly, it’s a sufficiently advanced technology that really is indistinguishable from magic (with apologies to Arthur C. Clarke). Whether I’d call this the age of wisdom…I’m not sure yet ;)

But at the same time… it is the worst of times, the age of foolishness, season of darkness. Bot-farms spewing divisive nonsense all over social media no longer need to copy and paste their false statements in a way that’s easily spotted; instead they can write custom text at scale whilst still giving the illusion of a real person behind the fake accounts. Combine human greed with the speed at which LLMs can generate content and you have an infinite flow of slop spurting all over the internet like a farmer’s muck spreader gone awry at scale. AI voice agents are becoming better and used for scamming people with realistic and targeted calls that would previously have been uneconomical to do at the scale necessary to reap a reward. AI-generated pictures are being used to create hoaxes and flood social media with dangerous rage-baiting.

Baby? There went the bathwater 🔗

It might be the best & worst of times, but that doesn’t mean you have to pick sides.

Having lived through the advent of cloud computing to where it is now, I can see real parallels in how developers in the tech industry are approaching it. Some, particularly vendors & VCs, are "all in". Others believe it’s a fad or a straight-up con and will give you a list of reasons why that is. Both extremes are utterly and completely wrong.

If you’re the kind of Bare Metalsson character who believed the cloud was nonsense (iT’s JuSt SoMeOnE eLsE’s CoMpUtEr!11!!) and took pride in racking your own servers (each of which has its own cute name), you’re probably also burying your head in the sand when it comes to using LLMs with cries of BuT It HaLlUcInAtEs AnD yOu CaN’t TrUsT iT!!11. And, just as running a homelab with servers and containers named after Star Wars characters is fun but you wouldn’t use the same approach at work, refusing to acknowledge that AI today has the potential to make you more productive as a developer starts to look somewhat childish or irresponsible.

Just because AI makes shit up sometimes, it doesn’t mean that AI is not therefore ever a useful tool for the right job. Strikingly, what’s happened in the last month or two is that the list of jobs for which you can use it has suddenly grown drastically. The online chatter has moved from "omg you wouldn’t let an LLM code for you" to "omg how do we review all these PRs", because guess what: all of a sudden people are letting an LLM generate code for them.

AI, and specifically LLMs, are a valuable tool for developers, and it’s one that we need to recognise if we’re not to get left behind.

LLMs are a tool that is evolving…rapidly 🔗

Picture a Capuchin monkey sat on its haunches using a stone to crack open a nut. Rudimentary, but effective. Would we as developers use a stone when we needed a hammer to bang in a nail? No, that would be stupid—we use the right tool for the job, of course. Hammers are an evolution of the tool from a crude stone, and we use that because it’s the best tool for the job. But once the hammer drill came along, do we cling to our manual hammer when we’ve got a nail to bang into a brick wall? Again, no, that would be stupid. We want to use the best tool for the job.

It’s the same evolution of tooling happening in AI. LLMs are a tool. Magical, bamboozling, hilariously-wrong at times tools; but ones that are evolving not over centuries or longer, but weeks and months.

I’m just talking about developer productivity; nothing deeper 🔗

Some people fundamentally object to LLMs on principle, citing their use of resources, or threat to mankind. Personally, I believe that cat is out of the bag, the horse has bolted the stables…we’re way past that. Pandora’s box is open, and you and I are not shutting it.

What I would observe is that if you’re working in IT, and you’re not already adopting AI and understanding what it can (and can’t) do for you, you might find yourself with a lot more time to discuss these opinions alongside the hansom cab drivers who figured that the motor engine was a fad and stuck with their horses.

Put somewhat more confrontationally: you may as well be against the internet, or the combustion engine, or atomic energy. All have awful uses and implications; all also serve a role that cannot be overstated. What LLMs are enabling is truly of seismic impact, and I cannot fathom a path forward in which they do not continue to be central to how we do things with computers.

Appeal to authority 🔗

Not convinced by my reasoning above? How about these folk:

Not a fan of DHH? How about Charity Majors:

this year was for AI what 2010 was for the cloud: the year when AI stopped being satellite, experimental tech and started being the mainstream, foundational technology. At least in the world of developer tools. It doesn’t mean there isn’t a bubble. Of COURSE there’s a fucking bubble. Cloud was a bubble. The internet was a bubble. Every massive new driver of innovation has come with its own frothy hype wave. But the existence of froth doesn’t disprove the existence of value.

To those of you who are deeply pessimistic around the use of AI in software delivery, the old quote from John Maynard Keynes comes to mind:

"The market can remain irrational longer than you can remain solvent".

For a considered look at the uses of LLMs, Bryan Cantrill wrote an excellent RFD: Using LLMs at Oxide

Read the above linked articles, and also check out Scott Werner’s post "The Only Skill That Matters Now" which puts it even more clearly into focus, with a nice analogy about how "skating to the puck" is no longer a viable strategy. The long and short of it is that the rate of change in AI means you have no idea where the puck will even be.

The Junior Developer Analogy Holds 🔗

I read an article a while back that I found again here, in which a hospital consultant described their view of LLMs thus:

"Think of it as the most brilliant, talented, often drunk intern you could imagine,"

This was in May 2023 (eons ago, in LLM years).

As an end user of LLMs, I think this mental model really does work. If you, as a senior+ developer, think of an LLM as a very eager junior developer working for you. They’re fresh-eyed and bushy-tailed, and goddamnit they talk too much, don’t listen enough, and make stupid mistakes. But…you give them a job to do, point them in the right direction, and iterate with them under close supervision…and suddenly you’re finding yourself a lot more productive. Tutored well, a junior developer becomes a force-multiplier, a mini-me.

A common instinct amongst inexperienced senior+ developers tasked with looking after a junior can unfortunately be "I’ve not got time to show them this, I’ll do it myself". As any decent developer knows, that’s a short-sighted and flawed way of developing others (as well as oneself). Mentoring and teaching and nurturing juniors is one step back, two steps forward. And…the same goes for an LLM. Do you have to keep telling them the same thing more than once? Yes. Do they write code that drives you into fits of rage with its idiocy and overcomplexity? Yes. Do they improve each time and ultimately give you more time on your plate to think about the bigger picture of system design and implementation? Yes.

Working with Claude Code over the past few weeks really has got me convinced that we’ve now taken a step forward where time invested in learning how to use it (because there is a learning curve) is time that’s well spent.

Previously, using an LLM was not much more than typing explain nuclear fission in the style of peter rabbit (or various cargo-culting "prompt engineering" techniques). Now you have to learn about context windows and the magical file called CLAUDE.md and prompting to get the most out of it for coding, and that’s ok. Some tools are simple (pick up a hammer and hit something) and others require more understanding (I’m not using a chainsaw anytime soon without training on it first).

Where the analogy falls down 🔗

Junior developers are humans. They get tired, they need rest breaks, they need feeding, and at some point they want to go home. LLMs, on the other hand, will keep on going so long as you keep feeding them tokens.

The impact of this on you as their boss is substantial. You might task your junior developer with a piece of work and they’ll return to you later that day, perhaps with a few interruptions to clarify a point. Claude Code, on the other hand, is like an eager puppy, bounding back and forth demanding your attention often every minute or so. I’m still trying to work out how to balance the dopamine hit of each interaction bringing another astounding chunk of functionality delivered, with the impact the rapid context switching has on my brain.

Interacting with Claude Code feels a bit like the hit we get from scrolling short video feeds. One more prompt…one more video…

frantic

Because the feedback loop is so fast, it’s also very easy to get drawn down a rabbit hole of changes and either end up on a side-quest from one’s intended task, or lose sight of the big picture and end up meandering aimlessly through some Frankenstein-like development path that feels fruitful because of the near-instantaneous results but which is ultimately flawed.

Project report: Claude and [A]I play at being webdevs 🔗

I used to speak at a lot of conferences and meetups, and published my talks on a site called noti.st. It’s free to use, but you could pay for bells and whistles including a custom domain, which I duly did: talks.rmoff.net.

My background is databases and SQL; I can spell HTML (see, I just did) and am aware of CSS and can fsck about in the Chrome devtools to fiddle with the odd detail…but basically frontend webdev is completely beyond me. That meant I was more than happy to pay someone else to host my talks for me on an excellent platform.

This was a few years ago, and the annual renewal of the plan was starting to bite—over £100 for what was basically static content that I barely ever changed (I’ve only done three talks since 2021). So I decided to see what Claude Code/Opus 4.5 could do, and signed up for the £18/month "Pro" plan.

The way Claude Code works is nothing short of amazing. You use natural language to tell it what to do…and it does it.

I started off by saying to (prompting) it with something like this:

I would like to migrate my noti.st-based site at https://noti.st/rmoff/ to a static site like my blog at rmoff.net which is built on hugo.

What I actually said is kinda irrelevant, because it’s not precise. It doesn’t care about typos; it captures the intent.

Claude Code then poked around the two sites and probably asked me some questions (did I want to import all content, what kind of style, etc), and then spat out a Python script to do a one-time ingest of all the content from noti.st. After seeking permission it then ran the Python script, debugged the errors that were thrown, until it was happy it had a verbatim copy of the data.

Along the way it’d report in on what it was doing and I could steer it—much the same way you would a junior developer. For example, on noti.st a slide deck’s PDF is exploded out into individual images so that a user can browse it online. This meant a crap-ton of images which I didn’t care about, but Claude Code assumed I would so started grabbing them.

Claude then proceeded to build and populate a site to run locally. There were plenty of mistakes, as well as plenty of yak-shaving ("hmm can you move this bit to there, and change the shade of that link there"). This can be part of the danger with Claude. It will never roll its eyes and sigh at you when you ask for the hundredth amendment to your original spec, so it’s easy to get sucked into endless fiddling and tweaking.

I found I quickly burnt through my Pro token allowance, which actually served well as a gatekeeper on my time, forcing me to step back until the tokens were refreshed. After four early morning/late nights around my regular work, I cut over my DNS and you can see the results at https://talks.rmoff.net/.

2026 01 27T15 36 05 008Z

The key things that Claude Code did that I’d not been able to get ad hoc chat sessions (or even Cursor) last year to do include:

  • Planning out a full project like this one, from the overview down to every detail

  • Talking the talk (writing the code) and walking the walk (independently running the code, fixing errors, evaluating logic problems, etc)

  • Rapidly iterating over design ideas, including discussing them and not just responding one-way to instructions

  • Discussing deployment options, including working through challenges given the cumulative size of the PDFs

  • Explaining and building and executing and testing the deployment framework

Before the sceptics jump in with their well, ackchuallyyy, my point is not that I couldn’t theoretically have done this without Claude. It’s that it took, cumulatively, perhaps eight hours—and half of that will have been learning how to effectively interact with Claude. It’s that it’s a single terminal into which one types, that’s it. No explosion of tabs. No rabbit-holes of threads trying to figure this stuff out. One place. That fixes its own errors. That writes code that you could never have done without a serious investment of time.

Would I apply for a frontend engineering job? Heck no!
Does my new site stand up to scrutiny? Probably not.
Will real frontend devs look at the code and be slightly sick in their mouths? Perhaps.

Does this weaken my point? Not in the slightest!

£18-worth of Claude Code (less, if you pro-rata it over the month) and I’ve saved myself an ongoing annual bill of £100, built a custom website that looks exactly as I want it, has exactly the functionality that I want—oh, and was a fuck-ton of fun to build too :)

Does it matter that I didn’t write the code and don’t understand it? 🔗

Not whilst I have access to Claude ;)

I realise that in reading this the choler will be rising in some seasoned software engineers. After all, who is this data engineer poncing around pretending to build websites?

And that’s perhaps the crux of it: I’m a data engineer, branching out into something I couldn’t do before, courtesy of Claude.

I would definitely use Claude to help me write SQL queries and generate DDL, but I’d be damned if I’d put my name to a pull request with a single byte of code that I couldn’t explain—because that’s my job.

I like Oxide’s words here:

However powerful they may be, LLMs are but a tool, ultimately acting at the behest of a human. Oxide employees bear responsibility for the artifacts we create, whatever automation we might employ to create them.

So I can have fun building a website that’s just my personal site and only on me if it fails. But if I’m writing code as a professional for my job, it’s on me to make sure that it’s code I can put my name to.

Claude tips 🔗

Playwright 🔗

If you’re doing any kind of webdev work, follow Kris Jenkins' tip and use Playwright so that Claude can "see" as it develops. You can manually take screenshots and paste those into Claude too if you want (including ones you’ve annotated with observations and instructions) but in general and particularly for regression testing, Playwright is an excellent addition.

Because this is Claude, you don’t need to actually know how to configure Playwright or run its tests, or anything like that. You just tell Claude: "Use Playwright to test the changes". And it does. Oh, and it’ll install it for you if you don’t have it already.

🛎️ Ding Dong 🔔 🔗

Claude will sometimes ask for permission to do something, or tell you it’s finished its current task. If you’ve got it sat in a terminal window behind your other work you may not realise this, so adding a sound prompt can be useful. In your ~/.claude/settings.json include:

  "hooks": {
    "Notification": [ {
        "hooks": [ { "type": "command", "command": "afplay /System/Library/Sounds/Funk.aiff" } ] } ],
    "Stop": [ {
        "hooks": [ { "type": "command", "command": "afplay /System/Library/Sounds/Ping.aiff" } ] } ] },

Obviously, you can waste a lot of time customising it to use just the right sound effect from your favourite 1980s arcade game.

You might not want to always do this; see my observation above about context switching and continuous interruptions.

Keep an eye on cost 🔗

Depending on how you pay for Claude (fixed plans, or per API calls) you’ll discover sooner or later that it can be quite expensive. You can include the cost of the current session in the status line by adding this to the same config file as above, ~/.claude/settings.json:

  "statusLine": {
    "type": "command",
    "command": "input=$(cat); cwd=$(echo \"$input\" | jq -r '.workspace.current_dir'); tin=$(echo \"$input\" | jq -r '.context_window.total_input_tokens'); tout=$(echo \"$input\" | jq -r '.context_window.total_output_tokens'); mid=$(echo \"$input\" | jq -r '.model.id'); mname=$(echo \"$input\" | jq -r '.model.display_name'); used=$(echo \"$input\" | jq -r '.context_window.used_percentage // \"--\"'); if [[ \"$mid\" == *\"opus\"* ]]; then cost=$(echo \"scale=4; ($tin * 15 + $tout * 75) / 1000000\" | bc); elif [[ \"$mid\" == *\"haiku\"* ]]; then cost=$(echo \"scale=4; ($tin * 0.80 + $tout * 4) / 1000000\" | bc); else cost=$(echo \"scale=4; ($tin * 3 + $tout * 15) / 1000000\" | bc); fi; printf \"\\e[36m◆\\e[0m \\e[1m\\e[96m%s\\e[0m \\e[36m◆\\e[0m \\e[35m%s\\e[0m \\e[36m▸\\e[0m \\e[33mTokens:\\e[0m \\e[32m%'d\\e[0m↓ \\e[34m%'d\\e[0m↑ \\e[36m●\\e[0m \\e[93mCtx Used:\\e[0m \\e[92m%s%%\\e[0m \\e[36m●\\e[0m \\e[1m\\e[31mCost: \\$%s\\e[0m\" \"$mname\" \"$cwd\" \"$tin\" \"$tout\" \"$used\" \"$cost\""
  },

It’ll look something like this:

2026 01 27T15 45 53 209Z

Also check out ccusage which uses the Claude log data to calculate usage and break it down in different ways which can help you optimise your use of it

 ╭───────────────────────────────────────────╮
 │                                           │
 │  Claude Code Token Usage Report - Weekly  │
 │                                           │
 ╰───────────────────────────────────────────╯

┌───────────┬────────────────────┬──────────┬──────────┬───────────┬────────────┬─────────────┬───────────┐
│ Week      │ Models             │    Input │   Output │     Cache │ Cache Read │       Total │      Cost │
│           │                    │          │          │    Create │            │      Tokens │     (USD) │
├───────────┼────────────────────┼──────────┼──────────┼───────────┼────────────┼─────────────┼───────────┤
│ 2026      │ - claude-3-5-haiku │   39,694 │   80,640 │ 4,462,577 │ 28,392,013 │  32,974,924 │    $26.35 │
│ 01-25     │ - sonnet-4-5       │          │          │           │            │             │           │
├───────────┼────────────────────┼──────────┼──────────┼───────────┼────────────┼─────────────┼───────────┤

Learn a bit about the models (ask Claude) 🔗

Different Claude models (Opus, Sonnet, Haiku) cost different amounts, and you can optimise your spend by learning a bit about their relative strengths. I found that asking Claude itself was useful; using Opus (the most capable model) you can describe what you’re going to want it to do, and which model it would recommend. Like all of this LLM malarky, none of it is absolute, but I found its recommendations useful (i.e. the models it recommended were cheaper and did achieve what I needed them to).

Think of it as having different pairs of running shoes in your closet—different ones are going to be suited to different tasks. You’re not going to wear your $200 carbon-plate running shoes to kick the ball around the park, are you?

Master the tooling 🔗

Go read up on things like:

  • Context windows—what the LLM knows about what you’re doing

  • Context rot—the more that’s in the LLM’s context window, the less effective it can sometimes become

  • CLAUDE.md—where Claude makes a note of what it is you’re building and core principles, toolsets, etc

    • You can get a lot of value by spending some time on this so that you can restart your session when you need to (e.g. to clear the context window) without having Claude 'forget' too much of the basics of what you’ve told it

    • Work with Claude on this file—literally say, look at your CLAUDE.md, I have to keep telling you to do x, how can you remember it better. If you give it permission, it’ll then go and update its own file

  • Use plan mode and accept-change (shift-tab) judiciously. If you just YOLO it and accept changes without seeing the plan you’ll often end up with a very busy fool going in the wrong direction. Claude is your servant (for now) and it’s up to you to boss it around firmly as needs be.

  • Watch out for Claude spinning its wheels—if you see it trying to repeatedly fix something and getting stuck you might be burning a ton of tokens on something that it’s misunderstood or doesn’t actually matter

Claude Code is not just about churning out code 🔗

I’ve been experimenting with a few non-coding examples, both pairing Claude with basic-memory and an Obsidian vault.

  • Proofreading my blog (here’s the prompt, if you’re interested; PRs welcome 😉).

    proofread

    I also have a Raycast AI Preset to do this, but am finding myself more and more reaching for Claude’s terminal window. It works well because I write my blog posts in Asciidoc, which Claude can read and edit directly (if I ask it to).

  • Planning a holiday. Iteratively building up with Claude a spec that captures the requirements of the holiday, it can then help with itineraries, checklists, discuss areas, etc etc. As with the coding project above, it being one window with which to interact is really powerful.

  • Acting as a running coach. Plugging in Garmin and Strava data via MCP I can capture all of my running and health info, and discuss with Claude planned workouts, even weaving in notes from past physio appointments. Obviously I am not following it blindly but as an exercise (geddit?!) in integration and LLMs, it’s pretty fun.

My call to action: FAFO 🔗

That’s it. Go fuck around, and find out.

Exciting things are happening. Yes the hype and BS is real and nauseating; but that doesn’t stop it being true.


TABLE OF CONTENTS