Code Monger, cyclist, sim racer and driving enthusiast.
10100 stories
·
6 followers

74 Hours in Kansas – Part 1

1 Share

Long-time readers will react to me saying “I go to a lot of car shows” with an emphatic “DUH!”

Before the blog, when I lived in Phoenix, I attended several Barrett-Jackson auctions, which, if you’re not buying, are just big car shows where every car is for sale. I went to a couple of Copper State Rally shows, where all the cars embarked on thousand-mile tours. The first place I saw an Elise was at an English car show there. Since I got the Elise, I’ve entered the Colorado Concours and the English Motoring Conclave several times. I’ve been to a couple of dozen Cars and Coffee events. I’ve taken tours of restoration shops where I’ve seen multi-million dollar Bugattis and Ferraris. I’ve seen exotics, muscle cars, race cars, hot rods, antiques, low-riders, motorcycles, tractors, and fire trucks. I’ve never had to drive more than about thirty miles to go to any of these.

So, when a Lotus Colorado member told us about a big car show at McPherson College in McPherson, Kansas, my first thought was, “That’s a long way to go for a car show!” I think it’d be a fun trip to go to the Pebble Beach Concours in Monterey, California. It’s one of the world’s great car shows. Although it’s more than twice as far as McPherson, it’s through some fine scenery and over twisty roads through the Rockies and Sierras. To get to McPherson, it’s eastern Colorado and western Kansas.

And how does the show in McPherson compare to any others I’ve been to? They have a renowned auto restoration curriculum there, and the students entered a car they worked on into the Pebble Beach show and won the top prize. The show is run by the students, and several of the cars on display are student projects.

Club members didn’t express much interest, and I had pretty much decided not to go when Chad called and said he’d drive us in his Maverick rather than the Lotus. I thought, “What the heck,” and said I’d go. We both had the condition that we’d have to include some interesting side-trips to sweeten the pot. In nearby Hutchinson, there’s a salt mine you can tour, and there’s the Cosmosphere, a space museum. As a bonus, on the drive back, we can stop at the site of a World War II Japanese Interment camp. (A tip of the hat to Jim for his helpful suggestions.)

So that was the plan: salt mine, space museum, car show, and concentration camp.

Thursday was the drive to Hutchinson. There’s not much point in describing the route or the views. After checking in at the hotel, we went to the Salt City Brewing Company for beer and dinner.

Strataca

About a century and a half ago, a man drilled for oil but found salt instead. Today, you descend in a hoist 650 feet down to the mine, where you find over 150 miles of tunnels, a small sample of which you are allowed to explore.

We did the basic tour and added the Lantern Tour, where we were taken deeper into the darkness. The guide compared it to the surface of the moon: no wind, no weather, nothing to disturb the footprints miners made 80 or 90 years ago. It wasn’t worth the effort to haul the miners’ trash back to the surface, so we occasionally came across piles of perfectly preserved trash – cardboard dynamite boxes still like new (but empty of dynamite), newspapers and magazines and those conical water fountain cups looking as if they were discarded yesterday.

Generally, the caverns are fifty feet wide, separated by fifty-foot-wide pillars, making a sort of giant waffle iron. The walls are salt, the ceiling is salt, the floor is salt. It looks like rock, stratified by bands of dark and light. We are told the salt is 95% pure, with some formations reaching 99%. We were also told not to lick the walls. The salt mined here is used on icy roads and as cattle feed. There is red salt in places, but they don’t mine it as the cattle won’t eat red salt.

So, what is there to see in a salt mine, other than salt? First, there’s the obvious display of the mining equipment used over the decades, along with helpful videos explaining how the salt was (and still is) mined. After several such exhibits, we turned a corner to find … a Civil Defense shelter! As a child of the 60s, I’m well familiar with the lore. But before now, I’d never seen what someone hiding from nuclear holocaust might eat. I imagined stacks of canned green beans (and was not disappointed to see them), but didn’t realize that crackers, biscuits, and carbohydrate supplements were distributed in giant cans, along with 17-gallon drums of water, complete with instructions to turn the drum into a commode.

Also, because of the constant temperature and lack of humidity, a salt mine is a great place to store things you want to preserve, such as paper documents, computer tapes, and old films and movie memorabilia.

Cosmosphere

Now and then, I come across something that seems out of place. The world’s foremost pre-war Bugatti restoration shop used to be in Berthoud, Colorado, a town so small it has no traffic signals. How did that happen?

The Cosmosphere is a space museum that rivals the Air and Space Museum at the Smithsonian. How did such an impressive museum come to be in Kansas? Florida or Houston would be obvious choices. Huntsville or Pasadena, maybe. But Hutchinson, Kansas? Go figure.

It concentrates on space, not aircraft, so it’s not as big, but the collection of space artifacts exceeds what I saw at the Smithsonian. Some of the exhibits here are on loan from the Smithsonian, and some are from private collections, but much of what’s on display at the Cosmosphere is from their own collection.

There are a few aircraft here, like the SR-71 Blackbird. How do you get your SR-71 inside a museum? That’s a trick question: you build the museum around the plane.

Their exhibits cover the entire history of manned spaceflight, from the origins in Nazi weapons (the V2 was the basis for the Redstone rocket) to a SpaceX Merlin engine. I was particularly impressed by the quantity of Soviet gear here. I want to make a joke that this is the entire collection of Soviet space capsules that didn’t blow up on launch or on landing

I was surprised to learn that the Cosmosphere restores these artifacts. It’s not like restoring an eighty-year-old car that can be driven on the road – the spacecraft here in Kansas are only restored to look functional. Nobody is going to fire up that rocket engine or launch this capsule. Still, how do you go about getting a job as a restorer of antique Soviet spacecraft?

These guys restored a V2 they found in a barn. It’s fairly common to hear of rare old cars found in barns, but a V2? Incredible. And it’s not just “barn finds”. They have the Liberty 7 capsule. It was the second manned craft in the Mercury program, a sub-orbital flight carrying Gus Grissom. The capsule sank to the bottom of the ocean after they got Grissom out. It was recovered from the ocean floor in 1999 and was restored by the museum. Amazing.

I assume the name “Cosmosphere” is a play on Cosmonaut. I recently learned the origin of the word “Cosmonaut”. I thought it was simply from “cosmos,” an alternative name for the universe. Instead, it comes from “cosmism” – a Russian philosophical movement integrating science, religion, and metaphysics into a unified worldview and characterized by the belief in humanity’s cosmic destiny, the potential for immortality, and the use of technological advancements to achieve control over nature and explore space. Believers in cosmism imagined immortality for everyone and the resurrection of all past people. (Now I can’t help but wonder if Philip José Farmer looked into it before writing To Your Scattered Bodies Go.)

After exploring space technology, we continued our exploration of local brew pubs. Tonight it was Sandhills Brewing. As a fan of fruit sours and goses, I liked their selection of beers. No kitchen here, but the food truck outside had a selection of tasty foods.

That’s it for Friday.

The post 74 Hours in Kansas – Part 1 appeared first on Nobody is Listening.

Read the whole story
LeMadChef
40 minutes ago
reply
Denver, CO
Share this story
Delete

AI can rewrite open source code—but can it rewrite the license, too?

2 Shares

Computer engineers and programmers have long relied on reverse engineering as a way to copy the functionality of a computer program without copying that program's copyright-protected code directly. Now, AI coding tools are raising new issues with how that "clean room" rewrite process plays out both legally, ethically, and practically.

Those issues came to the forefront last week with the release of a new version of chardet, a popular open source python library for automatically detecting character encoding. The repository was originally written by coder Mark Pilgrim in 2006 and released under an LGPL license that placed strict limits on how it could be reused and redistributed.

Dan Blanchard took over maintenance of the repository in 2012 but waded into some controversy with the release of version 7.0 of chardet last week. Blanchard described that overhaul as "a ground-up, MIT-licensed rewrite" of the entire library built with the help of Claude Code to be "much faster and more accurate" than what came before.

Speaking to The Register, Blanchard said that he has long wanted to get chardet added to the Python standard library but that he didn't have the time to fix problems with "its license, its speed, and its accuracy" that were getting in the way of that goal. With the help of Claude Code, though, Blanchard said he was able to overhaul the library "in roughly five days" and get a 48x performance boost to boot.

Not everyone has been happy with that outcome, though. A poster using the name Mark Pilgrim surfaced on GitHub to argue that this new version amounts to an illegitimate relicensing of Pilgrim's original code under a more permissive MIT license (which, among other things, allows for its use in closed-source projects). As a modification of his original LGPL-licensed code, Pilgrim argues this new version of chardet must also maintain the same LGPL license.

"Their claim that it is a 'complete rewrite' is irrelevant, since they had ample exposure to the originally licensed code (i.e., this is not a 'clean room' implementation)," Pilgrim wrote. "Adding a fancy code generator into the mix does not somehow grant them any additional rights. I respectfully insist that they revert the project to its original license."

Whose code is it, anyway?

In his own response to Pilgrim, Blanchard admits that he has had "extensive exposure to the original codebase," meaning he didn't have the traditional "strict separation" usually used for "clean room" reverse engineering. But that tradition was set up for human coders as a way "to ensure the resulting code is not a derivative work of the original," Blanchard argues.

In this case, Blanchard said that the new AI-generated code is "qualitatively different" from what came before it and "is structurally independent of the old code." As evidence, he cites JPlag similarity statistics showing that a maximum of 1.29 percent of any chardet version 7.0.0 file is structurally similar to the corresponding file in version 6.0.0. Comparing version 5.2.0 to version 6.0.0, on the other hand, finds up to 80 percent similarity in some corresponding files.

"No file in the 7.0.0 codebase structurally resembles any file from any prior release," Blanchard writes. "This is not a case of 'rewrote most of it but carried some files forward.' Nothing was carried forward."

Blanchard says starting with a "wipe it clean" commit and a fresh repository was key in crafting fresh, non-derivative code from the AI. Credit: Dan Blanchard / Github

Blanchard says he was able to accomplish this "AI clean room" process by first specifying an architecture in a design document and writing out some requirements to Claude Code. After that, Blanchard "started in an empty repository with no access to the old source tree and explicitly instructed Claude not to base anything on LGPL/GPL-licensed code."

There are a few complicating factors to this straightforward story, though. For one, Claude explicitly relied on some metadata files from previous versions of chardet, raising direct questions about whether this version is actually "derivative."

For another, Claude's models are trained on reams of data pulled from the public Internet, which means it's overwhelmingly likely that Claude has ingested the open source code of previous chardet versions in its training. Whether that prior "knowledge" means that Claude's creation is a "derivative" of Pilgrim's work is an open question, even if the new code is structurally different from the old.

And then there's the remaining human factor. While the code for this new version was generated by Claude, Blanchard said he "reviewed, tested, and iterated on every piece of the result using Claude. ... I did not write the code by hand, but I was deeply involved in designing, reviewing, and iterating on every aspect of it." Having someone with intimate knowledge of earlier chardet code take such a heavy hand in reviewing the new code could also have an impact on whether this version can be considered a wholly new project.

Brave new world

All of these issues have predictably led to a huge debate over legalities of chardet version 7.0.0 across the open source community. "There is nothing 'clean' about a Large Language Model which has ingested the code it is being asked to reimplement," Free Software Foundation Executive Director Zoë Kooyman told The Register.

But others think the "Ship of Theseus"-style arguments that can often emerge in code licensing dust-ups don't apply as much here. "If you throw away all code and start from scratch, even if the end result behaves the same, it’s a new ship," Open source developer Armin Ronacher said in a blog post analyzing the situation.

The legal status of AI-generated code is still largely unsettled. Credit: Getty Images

Old code licenses aside, using AI to create new code from whole cloth could also create its own legal complications going forward. Courts have already said that AI can't be the author on a patent or the copyright holder on a piece of art but have yet to rule on what that means for the licensing of software created in whole or in part by AI. The issues surrounding potential "tainting" of an open source license with this kind of generated code can get remarkably complex remarkably quickly.

Whatever the outcome here, the practical impact of being able to use AI to quickly rewrite and relicense many open source projects—without nearly as much effort on the part of human programmers—is likely to have huge knock-on effects throughout the community.

"Now the process of rewriting is so simple to do, and many people are disturbed by this," Italian coder Salvatore "antirez" Sanfilippo wrote on his blog. "There is a more fundamental truth here: the nature of software changed; the reimplementations under different licenses are just an instance of how such nature was transformed forever. Instead of combating each manifestation of automatic programming, I believe it is better to build a new mental model and adapt."

Others put the sea change in more alarming terms. "I'm breaking the glass and pulling the fire alarm!" open source evangelist Bruce Perens told The Register. "The entire economics of software development are dead, gone, over, kaput! ... We have been there before, for example when the printing press happened and resulted in copyright law, when the scientific method proliferated and suddenly there was a logical structure for the accumulation of knowledge. I think this one is just as large."

Read full article

Comments



Read the whole story
LeMadChef
1 day ago
reply
Denver, CO
Share this story
Delete

Your Data is Made Powerful By Context (so stop destroying it already) (xpost)

2 Shares

Your Data Is Made Powerful By Context (so stop destroying it already)

In logs as in life, the relationships are the most important part. AI doesn’t fix this. It makes it worse.

(cross-posted)

After twenty years of devops, most software engineers still treat observability like a fire alarm — something you check when things are already on fire.

Not a feedback loop you use to validate every change after shipping. Not the essential, irreplaceable source of truth on product quality and user experience.

This is not primarily a culture problem, or even a tooling problem. It’s a data problem. The dominant model for telemetry collection stores each type of signal in a different “pillar”, which rips the fabric of relationships apart — irreparably.

Your observability data is self-destructing at write time

The three pillars model works fine for infrastructure1, but it is catastrophic for software engineering use cases, and will not serve for agentic validation.

But why? It’s a flywheel of compounding factors, not just one thing, but the biggest one by far is this:

✨Data is made powerful by context✨

The more context you collect, the more powerful it becomes

Your data does not become linearly more powerful as you widen the dataset, it becomes exponentially more powerful. Or if you really want to get technical, it becomes combinatorially more powerful as you add more context.

I made a little Netlify app here where you can enter how many attributes you store per log or trace, to see how powerful your dataset is.

  • 4 fields? 6 pairwise combos, 15 possible combinations.
  • 8 fields? 28 pairwise combos, 255 possible combinations.
  • 50 fields? 1.2K pairwise combos, 1.1 quadrillion (2^250) possible combinations, as seen in the screenshot below.

When you add another attribute to your structured log events, it doesn’t just give you “one more thing to query”. It gives you new combinations with every other field that already exists.

The wider your data is, the more valuable the data becomes. Click on the image to go futz around with the sliders yourself.

Note that this math is exclusively concerned with attribute keys. Once you account for values, the precision of your tooling goes higher still, especially if you handle high cardinality data.

Data is made valuable by relationships

“Data is made valuable by context” is another way of saying that the relationships between attributes are the most important part of any data set.

This should be intuitively obvious to anyone who uses data. How valuable is the string “Mike Smith”, or “21 years old”? Stripped of context, they hold no value.

By spinning your telemetry out into siloes based on signal type, the three pillars model ends up destroying the most valuable part of your data: its relational seams.

AI-SRE agents don’t seem to like three pillars data

posted something on LinkedIn yesterday, and got a pile of interesting comments. One came from Kyle Forster, founder of an AI-SRE startup called RunWhen, who linked to an article he wrote called “Do Humans Still Read Logs?”

Humpty Dumpty traced every span, Humpty Dumpty had a great plan.

In his article, he noted that <30% of their AI SRE tools were to “traditional observability data”, i.e. metrics, logs and traces. Instead, they used the instrumentation generated by other AI tools to wrap calls and queries. His takeaway:

Good AI reasoning turns out to require far less observability data than most of us thought when it has other options.

My takeaway is slightly different. After all, the agent still needed instrumentation and telemetry in order to evaluate what was happening. That’s still observability, right?

But as Kyle tells it, the agents went searching for a richer signal than the three pillars were giving them. They went back to the source to get the raw, pre-digested telemetry with all its connective tissue intact. That’s how important it was to them.

Huh.

You can’t put Humpty back together again

I’ve been hearing a lot of “AI solves this”, and “now that we have MCPs, AI can do joins seamlessly across the three pillars”, and “this is a solved problem”.

Mmm. Joins across data siloes can be better than nothing, yes. But they don’t restore the relational seams. They don’t get you back to the mathy good place where every additional attribute makes every other attribute exponentially more valuable. At agentic speed, that reconstruction becomes a bottleneck and a failure surface.

Humpty Dumpty stored all the state, Humpty Dumpty forgot to replicate.

Our entire industry is trying to collectively work out the future of agentic development right now. The hardest and most interesting problems (I think) are around validation. How do we validate a change rate that is 10x, 100x, 1000x greater than before?

I don’t have all the answers, but I do know this: agents are going to need production observability with speed, flexibility, TONS of context, and some kind of ontological grounding via semantic conventions.

In short: agents are going to need precision tools. And context (and cardinality) are what feed precision.

Production is a very noisy place

Production is a noisy, rowdy place of chaos, particularly at scale. If you are trying to do anomaly detection with no a priori knowledge of what to look for, the anomaly has to be fairly large to be detected. (Or else you’re detecting hundreds of “anomalies” all the time.)

But if you do have some knowledge of intent, along with precision tooling, these anomalies can be tracked and validated even when they are exquisitely minute. Like even just a trickle of requests2 out of tens of millions per second.

Let’s say you work for a global credit card provider. You’re rolling out a code change to partner payments, which are “only” tens of thousands of requests per second — a fraction of your total request volume of tens of millions of req/sec, but an important one.

This is a scary change, no matter how many tests you ran in staging. To test this safely in production, you decide to start by rolling the new build out to a small group of employee test users, and oh, what the hell — you make another feature flag that lets any user opt in, and flip it on for your own account.

You wait a few days. You use your card a few times. It works (thank god).

On Monday morning you pull up your observability data and select all requests containing the new build_id or commit hash, as well as all of the feature flags involved. You break down by endpoint, then start looking at latency, errors, and distribution of request codes for these requests, comparing them to the baseline.

Hm — something doesn’t seem quite right. Your test requests aren’t timing out, but they are taking longer to complete than the baseline set. Not for all requests, but for some.

Further exploration lets you isolate the affected requests to a set with a particular query hash. Oops.. how’d that n+1 query slip in undetected??

You quickly submit a fix, ship a new build_id, and roll your change out to a larger group: this time, it’s going out to 1% of all users in a particular region.

The anomalous requests may have been only a few dozen per day, spread across many hours, in a system that served literally billions of requests in that time.

Humpty Dumpty: assembled, redeployed, A patchwork of features half-built, half-destroyed. “It’s not what we planned,” said the architect, grim. “But the monster is live — and the monster is him.”

Precision tooling makes them findable. Imprecise tooling makes them unfindable.

How do you expect your agents to validate each change, if the consequences of each change cannot be found?[3]

Well, one might ask, how have we managed so far? The answer is: by using human intuition to bridge the gaps. This will not work for agents. Our wisdom must be encoded into the system, or it does not exist.

Agents need speed, flexibility, context, and precision to validate in prod

In the past, excruciatingly precise staged rollouts like these have been mostly the province of your Googles and Facebooks. Progressive deployments have historically required a lot of tooling and engineering resources.

Agentic workflows are going to make these automated validation techniques much easier and more widely used; at the exact same time, agents developing to spec are going to require a dramatically higher degree of precision and automated validation in production.

It is not just the width of your data that matters when it comes to getting great results from AI. There’s a lot more involved in optimizing data for reasoning, attribution, or anomaly detection. But capturing and preserving relationships is at the heart of all of it.

In this situation, as in so many others, AI is both the sickness and the cure[4]. Better get used to it.

 

 

 

1 — Infrastructure teams use the three pillars for one extremely good reason: they have to operate a lot of code they did not write and can not change. They have to slurp up whatever metrics or logs the components emit and store them somewhere.

2 — Yes, there are some complications here that I am glossing past, ones that start with ‘s’ and rhyme with “ampling”. However, the rich data + sampling approach to the cost-usability balance is generally satisfied by dropping the least valuable data. The three pillars approach to the cost-usability problem is generally satisfied by dropping the MOST valuable data: cardinality and context.

3 — The needle-in-a-haystack is one visceral illustration of the value of rich context and precision tooling, but there are many others. Another example: wouldn’t it be nice if your agentic task force could check up on any diffs that involve cache key or schema changes, say, once a day for the next 6-12 months? These changes famously take a long time to manifest, by which time everyone has forgotten that they happened.

4 — One sentence I have gotten a ton of mileage out of lately: “AI, much like alcohol, is both the cause of and solution to all of life’s problems.”

Read the whole story
LeMadChef
1 day ago
reply
Denver, CO
Share this story
Delete

I don't know if my job will still exist in ten years

1 Comment and 2 Shares

In 2021, being a good software engineer felt great. The world was full of software, with more companies arriving every year who needed to employ engineers to write their code and run their systems. I knew I was good at it, and I knew I could keep doing it for as long as I wanted to. The work I loved would not run out.

In 2026, I’m not sure the software engineering industry will survive another decade. If it does, I’m certain it’s going to change far more than it did in the last two decades. Maybe I’ll figure out a way to carve out a lucrative niche supervising AI agents, or maybe I’ll have to leave the industry entirely. Either way, the work I loved is going away.

Tasting our own medicine

It’s unseemly to grieve too much over it, for two reasons. First, the whole point of being a good software engineer in the 2010s was that code provided enough leverage to automate away other jobs. That’s why programming was (and still is) such a lucrative profession. The fact that we’re automating away our own industry is probably some kind of cosmic justice. But I think any working software engineer today is worrying about this question: what will be left for me to do, once AI agents have fully diffused into the industry?

The other reason it’s unseemly is that I’m probably going to be one of the last to go. As a staff engineer, my work has looked kind of like supervising AI agents since before AI agents were a thing: I spend much of my job communicating in human language to other engineers, making sure they’re on the right track, and so on. Junior and mid-level engineers will suffer before I do. Why hire a group of engineers to “be the hands” of a handful of very senior folks when you can rent instances of Claude Opus 4.6 for a fraction of the price?

Overshooting and undershooting

I think my next ten years are going to be dominated by one question: will the tech industry overshoot or undershoot the capabilities of AI agents?

If tech companies undershoot - continuing to hire engineers long after AI agents are capable of replacing them - then at least I’ll hold onto my job for longer. Still, “my job” will increasingly mean “supervising groups of AI agents”. I’ll spend more time reviewing code than I do writing it, and more time reading model outputs than my actual codebase.

If tech companies tend to overshoot, it’s going to get a lot weirder, but I might actually have a better position in the medium term. In this world, tech companies collectively realize that they’ve stopped hiring too soon, and must scramble to get enough technical talent to manage their sprawling AI-generated codebases. As the market for juniors dries up, the total number of experienced senior and staff engineers will stagnate, driving up the demand for my labor (until the models get good enough to replace me entirely).

Am I being too pessimistic?

Of course, the software engineering industry has looked like it was dying in the past. High-level programming languages were supposed to let non-technical people write computer code. Outsourcing was supposed to kill demand for software engineers in high-cost-of-living countries. None of those prophecies of doom came true. However, I don’t think that’s much comfort. Industries do die when they’re made obsolete by technology. Eventually a crisis will come along that the industry can’t just ride out.

The most optimistic position is probably that somehow demand for software engineers increases, because the total amount of software rises so rapidly, even though you now need fewer engineers per line of software. This is widely referred to as the Jevons effect. Along these lines, I see some engineers saying things like “I’ll always have a job cleaning up this AI-generated code”.

I just don’t think that’s likely. AI agents can fix bugs and clean up code as well as they can write new code: that is, better than many engineers, and improving each month. Why would companies hire engineers to manage their AI-generated code instead of just throwing more and better AI at it?

If the Jevons effect is true, I think we would have to be hitting some kind of AI programming plateau where the tools are good enough to produce lots of code (we’re here already), but not quite good enough to maintain it. This is prima facie plausible. Every software engineer knows that maintaining code is harder than writing it. But unfortunately, I don’t think it’s true.

My personal experience of using AI tools is that they’re getting better and better at maintaining code. I’ve spent the last year or so asking almost every question I have about a codebase to an AI agent in parallel while I look for the answer myself, and I’ve seen them go from hopeless to “sometimes faster than me” to “usually faster than me and sometimes more insightful”.

Right now, there’s still plenty of room for a competent software engineer in the loop. But that room is shrinking. I don’t think there are any genuinely new capabilities that AI agents would need in order to take my job. They’d just have to get better and more reliable at doing the things they can already do. So it’s hard for me to believe that demand for software engineers is going to increase over time instead of decrease.

Final thoughts

It sucks. I miss feeling like my job was secure, and that my biggest career problems would be grappling with things like burnout: internal struggles, not external ones. That said, it’s a bit silly for software engineers to complain when the automation train finally catches up to them.

At least I’m happy that I recognized that the good times were good while I was still in them. Even when the end of zero-interest rates made the industry less cosy, I still felt very lucky to be a software engineer. Even now I’m in a better position than many of my peers, particularly those who are very junior to the industry.

And hey, maybe I’m wrong! At this point, I hope I’m wrong, and that there really is some je ne sais quoi human element required to deliver good software. But if not, I and my colleagues are going to have to find something else to do.

edit: This post got some comments on Hacker News. Some commenters are doubtful, either because they don’t think AI coding is very good, or because they think human creativity/big-picture thinking/attention to detail will always be valuable. Others think ten years is way too optimistic. The top comment repeats the irony that I describe in the third paragraph of this post.

edit: This post also got some comments on the Serbian r/programming subreddit, some excellent comments on Tildes, which is a new one to me, and some more comments on lobste.rs.

Read the whole story
LeMadChef
1 day ago
reply
My experience using the latest models (In May 2026) is not the same as the authors. Legacy code is still too high a hurdle for today's models. I am currently working on a Windows to Web version of my appliation and I've been struggling with a complex bit of code that still not 1:1 copy of the legacy code. I don't know how the legacy code works (I don't know all the edge conditions, but I do have access to the source) and, after 2 weeks I still don't have a 100% compliant new version of the code that passes simple tests.
Denver, CO
acdha
63 days ago
reply
Washington, DC
Share this story
Delete

Can coding agents relicense open source through a “clean room” implementation of code?

1 Comment and 2 Shares

5th March 2026

Over the past few months it’s become clear that coding agents are extraordinarily good at building a weird version of a “clean room” implementation of code.

The most famous version of this pattern is when Compaq created a clean-room clone of the IBM BIOS back in 1982. They had one team of engineers reverse engineer the BIOS to create a specification, then handed that specification to another team to build a new ground-up version.

This process used to take multiple teams of engineers weeks or months to complete. Coding agents can do a version of this in hours—I experimented with a variant of this pattern against JustHTML back in December.

There are a lot of open questions about this, both ethically and legally. These appear to be coming to a head in the venerable chardet Python library.

chardet was created by Mark Pilgrim back in 2006 and released under the LGPL. Mark retired from public internet life in 2011 and chardet’s maintenance was taken over by others, most notably Dan Blanchard who has been responsible for every release since 1.1 in July 2012.

Two days ago Dan released chardet 7.0.0 with the following note in the release notes:

Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate!

Yesterday Mark Pilgrim opened #327: No right to relicense this project:

[...] First off, I would like to thank the current maintainers and everyone who has contributed to and improved this project over the years. Truly a Free Software success story.

However, it has been brought to my attention that, in the release 7.0.0, the maintainers claim to have the right to “relicense” the project. They have no such right; doing so is an explicit violation of the LGPL. Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a “complete rewrite” is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a “clean room” implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights.

Dan’s lengthy reply included:

You’re right that I have had extensive exposure to the original codebase: I’ve been maintaining it for over a decade. A traditional clean-room approach involves a strict separation between people with knowledge of the original and people writing the new implementation, and that separation did not exist here.

However, the purpose of clean-room methodology is to ensure the resulting code is not a derivative work of the original. It is a means to an end, not the end itself. In this case, I can demonstrate that the end result is the same — the new code is structurally independent of the old code — through direct measurement rather than process guarantees alone.

Dan goes on to present results from the JPlag tool—which describes itself as “State-of-the-Art Source Code Plagiarism & Collusion Detection”—showing that the new 7.0.0 release has a max similarity of 1.29% with the previous release and 0.64% with the 1.1 version. Other release versions had similarities more in the 80-93% range.

He then shares critical details about his process, highlights mine:

For full transparency, here’s how the rewrite was conducted. I used the superpowers brainstorming skill to create a design document specifying the architecture and approach I wanted based on the following requirements I had for the rewrite [...]

I then started in an empty repository with no access to the old source tree, and explicitly instructed Claude not to base anything on LGPL/GPL-licensed code. I then reviewed, tested, and iterated on every piece of the result using Claude. [...]

I understand this is a new and uncomfortable area, and that using AI tools in the rewrite of a long-standing open source project raises legitimate questions. But the evidence here is clear: 7.0 is an independent work, not a derivative of the LGPL-licensed codebase. The MIT license applies to it legitimately.

Since the rewrite was conducted using Claude Code there are a whole lot of interesting artifacts available in the repo. 2026-02-25-chardet-rewrite-plan.md is particularly detailed, stepping through each stage of the rewrite process in turn—starting with the tests, then fleshing out the planned replacement code.

There are several twists that make this case particularly hard to confidently resolve:

  • Dan has been immersed in chardet for over a decade, and has clearly been strongly influenced by the original codebase.
  • There is one example where Claude Code referenced parts of the codebase while it worked, as shown in the plan—it looked at metadata/charsets.py, a file that lists charsets and their properties expressed as a dictionary of dataclasses.
  • More complicated: Claude itself was very likely trained on chardet as part of its enormous quantity of training data—though we have no way of confirming this for sure. Can a model trained on a codebase produce a morally or legally defensible clean-room implementation?
  • As discussed in this issue from 2014 (where Dan first openly contemplated a license change) Mark Pilgrim’s original code was a manual port from C to Python of Mozilla’s MPL-licensed character detection library.
  • How significant is the fact that the new release of chardet used the same PyPI package name as the old one? Would a fresh release under a new name have been more defensible?

I have no idea how this one is going to play out. I’m personally leaning towards the idea that the rewrite is legitimate, but the arguments on both sides of this are entirely credible.

I see this as a microcosm of the larger question around coding agents for fresh implementations of existing, mature code. This question is hitting the open source world first, but I expect it will soon start showing up in Compaq-like scenarios in the commercial world.

Once commercial companies see that their closely held IP is under threat I expect we’ll see some well-funded litigation.

Read the whole story
LeMadChef
1 day ago
reply
My opinion is this is an incorrect "clean room" implementation. Since the original is open source the LLM was almost certainly trained on the source code. One cannot claim "clean room" status with a tool that has been trained on the source.

This is like claiming you have a "clean room" implementation of Moby Dick - yet you have read Moby Dick several times in your life.
Denver, CO
acdha
63 days ago
reply
Washington, DC
Share this story
Delete

John Berkey, “The Sightless Bird”

1 Comment and 2 Shares
An abstracted, mostly white spaceship looks a little like a bird, complete with a head-like front portion and several swoops and splashes of bright orange that look like plumage.ALT

John Berkey, “The Sightless Bird”

Read the whole story
LeMadChef
1 day ago
reply
Love his art!
Denver, CO
jhamill
3 days ago
reply
California
Share this story
Delete
Next Page of Stories