“On the hottest and coldest days, when demand for electricity peaks and the price soars, the bitcoin miners either sell power back to providers at a profit or stop mining for a fee, paid by ercot. Doing so has become more lucrative than mining itself. In August 2023 Riot collected $32m from curtailing mining and just $8.6m from selling bitcoin.”
The developer tools community remains active in Twitter’s dying light. On Tuesday, August 27th, in a now-deleted tweet and blog post, Matteo Collina boasted benchmarks across several JS frameworks that showed his fastify outperformed alternatives in server-side rendering performance:
There was just one problem: the implementations were written by a large language model. In a paragraph that will live as a cautionary tale, Collina discloses (my emphasis):
For this, we needed to generate a non-trivial sample document that includes a large number of elements, to have a very large page for the test, and consequently have more running time to capture each libraries’ performance. So, we asked an LLM to write some code to draw a spiral into a container using divs[…]
For the authors of the other frameworks, the benchmarks did not pass the smell test. Svelte’s Rich Harris was the first I saw on the scene, and he made a single-line fix that immediately changed the outcome of the benchmarks:
tiles =[...tiles,{ x, y,id: idCounter++}]
became, in Rich’s pull request,
tiles.push({ x, y,id: idCounter++})
In the Svelte implementation only, spread is used to add tile instances to the tiles array. This has performance implications because on each iteration of the loop that creates tiles, the array has to be created anew, hobbling Svelte’s efficiency. Using push(), by contrast, doesn’t incur the overhead of creating a new array on each cycle. It preserves and mutates the existing array.
Both cases are, strictly speaking, working code. But one works far differently than the other.
These are the risks of using LLMs for business. Collina and his colleagues have, at this point, lost any labor savings they might have enjoyed from generating this code. They’re reviewing pull requests from aggrieved maintainers of other projects, they’re responding to messages, they’re doing damage control, making apologies… To say nothing of the sector-wide labor costs of other maintainers correcting the record, and the benchmark code.
All because the LLM isn’t actually able to apply any sort of judgment or discernment. The process for generating this code, for example, did not catch the anomalous implementation of the Svelte loop, compared to the approaches in other frameworks. Generating the different projects was likely to be the output of distinct runs of the LLM.
But more than that, the LLM is not considering the performance implications of spread vs. push because the LLM cannot consider anything. It’s a loom that generates patterns of information, endlessly.
Sometimes that ends up as working code. Last month, I used Claude to get a demo application working for a class I was teaching on the social consequences of algorithms. I wanted a toy Twitter whose feed the class could manipulate in real time by applying boosts and penalties to certain kinds of content.
Claude had me up and running in half an hour, and everything worked great. To my shock! I’d never gotten an LLM to generate multiple, interdependent components that worked together successfully in the past. This was very helpful to me, and given my own time constraints, made the difference between a nice idea and a working demo.
But there’s a big difference between code written for toys and demos and code written for accomplishing a business purpose. In the fastify case, we see that the LLM is no match for the judgment of real humans.
There are obvious uses and benefits to a technology that can extrude workable patterns of information, especially code. But uncritical use of that code is going to create a real mess.
There’s a whole job category implied by this LLM future:
People who, like these framework maintainers, apply their experience and judgment to remediating LLM lapses.
This article gives a flavor of "A lie can travel halfway around the world while the truth is putting on its shoes"
Specifically this: "These are the risks of using LLMs for business. Collina and his colleagues have, at this point, lost any labor savings they might have enjoyed from generating this code. They’re reviewing pull requests from aggrieved maintainers of other projects, they’re responding to messages, they’re doing damage control, making apologies… To say nothing of the sector-wide labor costs of other maintainers correcting the record, and the benchmark code."
My students break out into groups of four, each quartet gathered around a collection of pink and green post-it notes.
For the first few minutes, the room remains quiet while each student jots down a personal story about one way that cars have impacted their lives. As the students finish and begin to read their stories to each other, the air fills with the gentle hum of conversation.
The first quartet to finish their discussion approaches the whiteboard. Two of their post-its share a theme: stories about a car getting a family member to the hospital quickly. Those post-its get placed on the whiteboard together. The other two stories—one about working at a car wash as a teenager, the other about getting hit by a car turning right on red while cycling through an intersection—go to different respective whiteboard locations.
More groups approach the board and attach their post-its. Stories under the same theme go together, creating several little collections. Each collection tends to have one color of post-it (green for perceived positive impact, pink for perceived negative impact). A few cases have mixed colors: for example, a few additional post-its join the one about working at a car wash. In the next step, when I ask students to students label each collection, this grouping will receive the label “car-related jobs.” Students either loved or hated the jobs.
For homework, I ask the students to do two things. First, they add digital versions of their post-its to an enormous Google slide, grouped together and labeled just like our whiteboard. Second, they collect more car-related stories from three sources:
Their own experiences: 2 more posit-its
Two additional friends or family members: 2 more post-its
A Twitter hashtag—a different one selected by each student from a list that I predicted would broaden the range of perspectives beyond those available inside the classroom. The hashtags bring in the anecdotes of mechanics, cyclists, pedestrians, traffic engineers, pulmonologists, and a variety of other groups. Again, 2 more post-its.
By the next class period, the Google slide’s septupled post-it population includes several more groups, which we finish labeling together. Our original list includes car-related jobs, emergency medical transport, cycling accidents, parking costs, noise. Now we add several more: pollution, storefront visibility, status symbols, child safety, travel time, carbon footprint, and others.
Next, I offer each student an unlimited number of three types of emojis:
A small fire for “consequences”
A human silhouette for “prevalence”
A snake for “sneakiness”
I ask each student to place the fire emoji next to each label for which the outcomes sound dire to them: most students put these, for example, on the labels with obvious and immediate life or death outcomes. Similarly, they apply the silhouette sticker next to labels that they think affect a lot of people, and the snake next to labels that they think people might not notice or think about relative to the size of the outcome.
We’re doing an impact assessment not unlike the risk assessments I ask of students in software engineering classes. This class, though, focuses not on code, but on what code does to Us—and on how we, its authors, can evaluate and influence that.
The next homework assignment lets each student select a different category label from the slide. They’ll each do some research and come back with a short report comparing the class’s emojified impact assessments to actual literature and findings. Where are we accurate? Where do our estimates not match the information students find?
Over the course of the nine week quarter, we do this three-session activity three times.
Each time we start from our own anecdotes, then broaden our search to additional perspectives, and finally compare our own assessments to population data and available studies. First, we discuss the impact of cars. Then, we discuss the impact of the consumer internet. Finally, we do generative text and image models—in other words, the thing colloquially termed “AI.”
Three times sounds like overkill, but I have found that the prior practice on less nascent, hotly contested developments like cars and the internet helps defuse conflict on the third topic. After we do the internet, I ask students to reflect on the patterns of impact they’ve seen across the car and internet discussions. Their comparisons draw out some common themes:
Both cars and the internet boast some excellent anecdotal outcomes in isolated cases.
Both cars and the internet boast some positive outcomes for millions.
Both cars and the internet have produced negative outcomes that affect a lot of the same people who experience the positive outcomes, plus some collection more.
Many of the negative outcomes emerge from failure to predict and implement against emergent properties of the technologies becoming popular and widely accessible.
The direst consequences often affect different people than the most glittery benefits, and those divisions tend to break along class lines.
I also ask students to reflect on the patterns they witness in the difference between their impact assessments before and after their research projects. Again, common themes arise:
Students tend to overestimate the proportion of the population that experiences the positive outcomes relative to the negative outcomes (worth noting here that the opportunity to participate in a graduate program tends to select for class).
Students tend to drastically overindex on their own experience when estimating the impact (positive or negative) of a technology on any given life globally.
When faced with an explicit menu of positive and negative impacts, students tend to rapidly identify new-to-them opportunities to retain positive outcomes while reducing or eliminating negative outcomes.
When I then tell students we’re about to do this exercise a third time for “AI,” they approach with curiosity. They wonder whether the patterns they’ve observed will appear again. They hope to beat their prior collective accuracy at assessing impact in each category label.
A few weeks ago, my day job hosted a training about ChatGPT.
Members of the company expressed both wanton excitement and deep concern about the normalization of its use in the workplace.
Afterwards, a colleague of mine asked me about the discussion. He’d been using the products since GPT-2 in 2019, and he really enjoys them. He wanted to know: “outside of speculating net benefit, why so many concerns with Gen AI? And what are they?”
Then he shared an anecdote: the chatbot had helped his wife uncover the root of some health issues she was having that were otherwise terminal (sic). He also described really enjoying using ChatGPT, personally. From his perspective, sure, there are environmental concerns and social problems with oppression, but these don’t overwhelm the positives.
We talked about the topic for a while.
Full disclosure: I don’t think “net impact” makes sense to estimate at scale.
I understand why businesses like to calculate this. They want to compare cost to revenue to figure out whether they’ll have more money or less money as a result of building their product. But besides a particular, well-circumscribed set of individuals deciding whether a project lives or dies, I don’t have much use for the calculation beyond philosophical navel-gazing.
How could we possibly approach an accurate estimate? We’re not just talking about money here: we’re talking about general impact. That amount doesn’t come in a convenient, single numerical unit. It’s instead delivered in a million different currencies, from health to ecstasy to money to misery. You can’t convert these into like metrics.
And we don’t get to scope our analysis to a relatively homogeneous customer base like companies get to do. Global impact, by definition, includes everyone, and it taunts us with the question “impact to whom?” It ain’t as small a world as the specious aphorism claims. Most of us live within ten miles of neighborhoods in which no one thinks like us, and we’ve never had a single person from there over for board game night, and we never will. By traveling twelve thousand miles east or west, most of us can find ourselves in a land whose language we can’t even begin to comprehend, whose cultural customs would tie us in knots, and whose celebrities we’ve never heard of. The mini-worlds into which we isolate by building networks of people like ourselves—those worlds are small. THE world? Enormous, and capable of absorbing a kaleidoscope of impact patterns whose news will never, ever reach us.
I don’t ask my students to decide whether the arrival of cars, or the internet, or generative models were good or bad for society. In fact, I want them to reach the conclusion that they’re wholly unqualified to do so. I watch them discover the patterns of inaccuracy in their assumptions about impact. Then I watch them look at their post-it maps and brainstorm ways to improve the net impact regardless of where the navel-gazing lands on its current value. For me, this is enough.
But for folks who have spent years in industry, hounded by executives to consider the bottom line first and foremost, I understand the motivation to peg a theoretical impact number relative to zero. My conversation with my colleague starts there.
We begin, like the students, 135 years before ChatGPT: with cars.
A car permits someone, my colleague notes, to be driven to the hospital when found in critical condition an unreachable by an ambulance. It’s true that cars allow individuals to drive other individuals to hospitals. I’d place this among the “excellent anecdotal outcomes” students notice in their assessments. Along with that, the annual death toll of car crashes in the US alone—that “same people who experience the positive outcomes, plus some collection more” group that my students identify—is in the tens of thousands. This does not count pedestrian deaths or cyclist deaths: just people in cars. Then there are deaths caused by comorbidities with asthma, whose rates have risen due to pollution from vehicle exhaust. A dire outcome, and one that affects many who live in polluted areas and without the financial resources to access a car themselves. “Okay,” my colleague concedes, “but what about the distribution of food and access to information?” A fair point. As it happens, most commercial freight travels by train or ship or semi-truck, not cars. But then, of course, what about the drive to the grocery store? That last leg requires a car, doesn’t it? For some, yes. For others, no. For still others living in food deserts, the answer might be yes if they had the resources to access a car, which again, many don’t.
Reluctant to get too deep in the weeds and lose the original thread, we move on to discussing the impact of the internet. I like this example because often folks discuss the value of generative models, not in terms of what they can do right now, but in terms of what they theoretically should be able to do 30 years from now. The consumer internet is a little over 30 years old and aptly demonstrates the ambiguity of attempted impact calculations.
I ask my colleague if the internet has improved the world. He replies “You and I are talking right now using it, so yes.” I point out “you and me. The top 2% of privilege. How many have the opportunity to interact like this?” The official number, as of this writing: 5.35 billion. That’s about two-thirds of the population, officially; I couldn’t find definitive details on what amount of activity constitutes using the internet, nor what the network strength distribution looks like over those people.
It’s worth noting that we (myself included), people reading this, generally consider the existence of the internet to be a well-received development. It’s also worth noting that our impression comes from personal reference and survivorship bias. For us to hear someone’s opinion on this, they have to either have access to the internet, or they have to have a relationship to us (which means, given that we have access to the internet and given that we network extremely homogeneously, they almost certainly have access to the internet).
I’m not saying kill the internet and live on subsistence farms, which is what my colleague accused me of next. It’s just that, for a software engineer to estimate a global 30 year positive impact for a nascent technology because that person has had a positive individual experience with the technology, doesn’t mean a lot. This is a key takeaway for my students from the estimation exercise. During both the hashtag part and also the research project part, they tend to learn about impacts of cars, the internet, and generative models that they had no idea existed. They tend to conclude—this is my favorite part—that they might not be in the best position, based on their current knowledge and perspective, to estimate the technology’s global impact. They also tend to conclude that they are more likely to see and experience many of the benefits while avoiding, and even remaining oblivious to, some of the costs.
When we reached the topic of generative models, my colleague put it this way:
“If I distribute the environmental impact to every human being, and then measure the net positive of Gen AI for that individual, given that so much of the world doesn’t even have access to that technology let alone cars, I can see where it’s definitely a net negative.”
Again, setting aside attempts to estimate global net impact, what impacts do we know about from generative models? We know that they bring joy to many who can afford a license and chat to them regularly. We know that thousands of content creators applaud them as a tool for brainstorming (and many, it must be said, outright print and distribute designs generated entirely by the models). We know that the models offer assistance to folks who need to write documents in English for work, who speak English as a second language, or who struggle with writing.
We also know that the model creators’ unfettered use of copyright material without creators’ consent constitutes outright thievery. The environmental impact of these models, climate scientists agree, drives us toward catastrophe. OpenAI’s exploitation of global south labor at a wage of two cents per hour, only to turn around and take ten billion from Microsoft, clearly consolidates global wealth to, basically, 1-3 dudes.
Our ethical struggle with generative models derives in part from the fact that we…sort of can’t have them ethically, right now, to be honest. We have known how to build models like this for a long time, but we did not have the necessary volume of parseable data available until recently—and even then, to get it, companies have to plunder the internet. Sitting around and waiting for consent from all the parties that wrote on the internet over the past thirty years probably didn’t even cross Sam Altman’s mind. But also, I can tell you, based on numbers I have seen as an engineer at a company that asks for user consent before changing almost anything, it wouldn’t work. Too many people would say no. The project would not obtain sufficient consenters for companies to have enough data to build generative models.
On the environmental front, fans of generative model technology insist that eventually we’ll possess sufficiently efficient compute power to train and run these models without the massive carbon footprint. That is not the case at the moment, and we don’t have a concrete timeline for it. Again, wait around for a thing we don’t have yet doesn’t appeal to investors or executives.
The exploitation part is unfortunately a time-honored tradition in global enterprise. The tech industry specifically gets away with new and more atrocious versions all the time by dint of creating things that regulatory bodies have never seen, let alone regulated. Tech companies move quickly; regulatory bodies move slowly. It took the Sherman Anti-Trust Act almost fifteen years to catch up to Google, and the sentence from that ruling is expected to take another 3-5 years. I understand, and even agree with, calls to regulate what folks have named the AI industry. But gurl, that will not be a quick process.
The invention of the automobile offers prescient precedent here. Though it’s hard to argue against the utility of cars to those that can access them, emergent developments after their introduction dragged the “net impact” meter down. The United States built cities around the assumption that everyone had a car. This forced residents to depend on them—a situation we compare disfavorably to life in cities with reliable transit, bicycle infrastructure, and pedestrian options. As cars got bigger, survival in car crashes favored the larger car, resulting in a car size arms race. Could we have done all this in a better way, that better leveraged cars for positive impact while avoiding some of the negative emergent properties? I’d like to believe we could have.
Could we theoretically improve the net benefit of any given technical development today if we make efforts to maximize its positive outcomes and mitigate its negative ones? I believe so, and I believe that’s basically the option available to us.
Fine, Chelsea; you want us all to abandon modernity, live in huts, and gather berries?
Jesus, guys, we’ve been over this. Crying luddism every time somebody critiques your pet thing is a super weird look.
I do want practitioners to do three things.
1. I want people to recognize and acknowledge the massive difference between “what this thing has done for me” and “what this thing is doing for everyone.”
I think the internet has made life better for me. I am immensely grateful for that. I work at Mozilla because I believe in the internet and I want to maximize its positive impact. I think it’s okay to acknowledge that just because something was good for me doesn’t mean it saved the world. GenAI saved my colleague’s wife’s life (or, I’d argue, his wife used a tool in a positive way to save her own life). It’s reasonable for him to believe in it. I am grateful that they had it available to do that. That’s different from calling it a global net positive, and that’s okay.
2. I want people to think about “net impact” in a more nuanced, complex way than product net revenue.
The emergent outcomes of a technical shift on the entirety of global society cannot be reduced to the same mathematics we use to decide if a widget made a profit. I don’t think we have to start huge; I have my students start with personal anecdotes and then broaden from there. The approach I use for them is abbreviated, a little hamfisted, and certainly imperfect. But it captures more nuance than we often do when we make a gut estimate about whether a development was good for the world. I’d like to see engineering practitioners more regularly engage in a practice of organized, collective investigation and reflection.
3. I want people to arrive, not at an estimate of net impact, but at a set of ideas for improving that net impact.
This is what I think is within our purview to execute against anyway. The last time I did this exercise I watched students arrive, unprompted, at a long and varied list of ideas for how they might shape a future with generative models in it. “Can we build tools for content creators to proactively thwart their materials’ digestion by online scrapers?” “Can technical innovation permit us to minimize energy use in retraining somehow?” “What is universal basic data income, and is there any path to viability for that?” “How…how might someone like me chart a course to becoming a subject matter expert for regulation development?”
I hope these questions inform students’ choices as practitioners in the field. And I think arriving at similar questions can inform the choices of current practitioners. It’s tempting to sit around and speculate about whether a parallel universe without <insert development here> might be better off. But we don’t live in that universe; we live in this one. This universe is the one where we get to answer that question: not with navel-gazing, but with empirical evidence derived from attempts to drive change.
Unless I am wrong (which I could be), I expect this post to conclude my writing on this topic for some time. I have other subjects I’d like to discuss here, including some interesting books I’ve read lately. Stay tuned for more, and if you can’t get enough, join me on Patreon: I put out audio recordings of blog posts every Monday and maintain a few regular written columns.
"It’s just that, for a software engineer to estimate a global 30 year positive impact for a nascent technology because that person has had a positive individual experience with the technology, doesn’t mean a lot. This is a key takeaway for my students from the estimation exercise."
I have been worried about the state of FOSS in general and having read these two posts is a good an excuse as any for getting the rudimentary outline of the worry out onto the page.
Short version: my mental model of FOSS is that it’s a function of industry and labour surplus:
Industry. The software industry has extremely high margins (products that are both non-rivalrous and non-excludable will do that) and historically easy access to investment because of both low interest rates and the pervasive belief among the financial class that successful tech companies grow exponentially for extended periods.
Labour. Even though coders come from varying backgrounds, once they have a career many, if not most, become relatively high income middle class with significant spare time. A non-trivial number of coders in California also have moderate wealth from being secondary or tertiary beneficiaries of industry liquidity events which lets them work on FOSS as much as they want. (This is what keeps a surprising number of FOSS projects afloat.)
Industry surplus also leads to labour surplus in that companies let coders work on related FOSS projects during work.
The derived FOSS surplus generates billions, if not trillions, of dollars of value for the economy and most of the costs – cost of creation, opportunity cost, and the cost of OSS competing with your more lucrative proprietary products – is absorbed by the makers.
We’re in a period where the surpluses that the FOSS surplus is derived from are decreasing:
High interest rates decreases available investment.
Less investment in any software that isn’t “AI”, which itself doesn’t really do real open source. (I know of exactly one LLM that both has openly licenced code and has documented the training well enough for it to be deterministically replicated. That might qualify as OSS if you squint. The rest isn’t even in the ballpark.)
COVID growth reverting to the mean triggering a reassessment of tech industry growth.
Industry management pop culture has fixated on lay-offs as a magic cure and increased coder unemployment leads to less time for OSS. And OSS participation doesn’t really improve your chances of employment any more, unless it’s adjacent to the decidedly non-free generative model sector.
OSS burnout. Very few FOSS projects are lucky enough to have grown a sustainable and supportive community. Most of the time, it seems to be a never-ending parade of angry demands with very little reward. When the software labour market was growing steadily, burned out maintainers often got replaced by fresh-eyed graduates or coders who relied on the project at work.
Companies in many sectors are cutting costs after years of overspending.
As the surplus decreases, the costs associated with FOSS participation become less tenable to most organisations.
Why compete with AWS or similar services that will offer your own OSS projects at a dramatically lower price?
Why subsidise projects of little to no strategic value that contribute anything meaningfully to the bottom-line?
Why spend time on OSS when other work is likely to have higher ROI?
Why give your work away to an industry that treats you as disposable?
Anecdotally, F/OSS also seems to be losing users:
There’s less funding for non-AI software startups, who are usually very heavy OSS users.
Some are reaching for LLM-generated code before they even look for an OSS project, both disconnecting those projects from opportunities to grow a sustainable community and nullifying the strategic advantage of having made an OSS solution for a problem. Note that the models are originally trained on that OSS.
People who are unemployed or jaded by the software industry have fewer side projects, because – let’s be honest – there are healthier hobbies available.
Best case scenario, seems to me, is that Free and Open Source Software enters a period of decline. After all, that’s generally what happens to complex systems with less investment. Worst case scenario is a vicious cycle leading to a collapse:
Declining surplus and burnout leads to maintainers increasingly stepping back from their projects.
Many of these projects either bitrot serious bugs or get taken over by malicious actors who are highly motivated because they can’t relay on pervasive memory bugs anymore for exploits.
OSS increasingly gets a reputation (deserved or not) for being unsafe and unreliable.
That decline in users leads to even more maintainers stepping back.
Some of this is an inevitable correction. The JS npm ecosystem, for example, is almost certainly unsustainable in its current form and has coasted on years of overinvestment in useless startups and Microsoft’s blatant attempts to own the entirety of software development.
But a “correction” is still destructive if you are unknowingly relying on an unsustainable system.
We don’t yet know which parts of the FOSS system is sustainable, and which is coasting on the hot air of startup funding, the post-acquisition savings of startup employees, and Microsoft’s belief that, currently, paying for all of this shit helps them make money somewhere else.
And because we don’t know, it’s hard not to worry about all of it.
I love doing research. My first reaction to a gnarly problem is to try to discover everything I can about it and, crucially, to see how others have tackled it.
It’s not just because other people’s work often means I have to do less of my own, but experiencing the plurality of expression that’s available out there i a thing of joy.
For example, a problem that regularly comes up when you’re making software that lets users upload their own files is ensuring the consistency and integrity of said file.
How do you do that? is a solid question to research.
I don’t need to discover the nuances and full back story of the field, just enough to not fall into the Chesterton’s fence trap – know the why of the status quo before criticising it.
Through my research, I discovered that there are, effectively, three ways to do this:
You generally need both a hash and a checksum and if you are potentially in an adversarial environment – like, y’know, the internet – the hash probably needs to be a secure cryptographic one.
Checksums and CRCs are incredibly fast – optimised versions can use as little as a single CPU instruction for each pass – and are sensitive to the specific kinds of changes that are associated with transmission errors, network errors or file corruption.
Hashes assert a unique identity for a file but are generally slower and usually checked at a later stage where it might be more tricky to retry the request.
You could use only a fast non-cryptographic hash such as xxHash (an excellent non-cryptographic hash, if you’re looking for one) but the ones with a shorter bit length don’t actually assert uniqueness (collisions in a 64 bit hash can happen out of accident and lead to bugs in production that are hard to fix, especially if the systems is specifically built around the hash fitting in a 64 bit BigInt) and the longer ones can still be fooled by an adversary, which might let them substitute a user’s file without your systems discovering them. So, they’re about as useful as the even shorter checksums but usually a tad slower.
Cryptographic hashes can be quite fast these days. Not as fast as the non-cryptographic ones, but plenty fast for this purpose. Even supposedly unsafe ones, like SHA-1, can be relatively safe if the rest of the system is sensibly designed, which is why the world hasn’t fallen over yet even though git still defaults to SHA-1.
And if you’re going to use a cryptographic hash, you should generally use whatever comes with your platform because that’s the likeliest to be hardware accelerated. For the web, that means SHA-256, SHA-384, or SHA-512. Fast algorithms like Blake3 are sometimes fast enough to make up for the lack of hardware acceleration, especially if implemented using native code or WASM – in my own benchmarks WASM implementations of Blake3 even outperformed the browser’s built-in cryptographic hashing for some use cases, but not by a large enough margin for it to matter in production.
So, to condense days, if not weeks, of research and experimentation – a process that was a lot of fun: store files with a 32 bit CRC integer (first line of defence) and a cryptographic hash (primary defence) and check one or the other at various points in your systems. You’re going to want both. I’m glossing over a few details here and there and criminally oversimplifying a bunch of things, but in general you’re going to want both.
(By the way, my favourite implementation of CRC has to be this one in JS which extracts the generally excellent but opaquely implemented fflate’s indecipherable implementation and rewrites it into much clearer code.)
The reason why I wrote the above is, beyond the fact that I’m boring enough to find this stuff interesting, after 28?, 29?, years of doing this (that can’t be right) my head is full of shit like this and my first reaction to getting asked on social media about something tangential to a topic I’ve researched used to be to go out and explain it in an extended info-dump. Like I did above.
The problem is, and I say this with love, that all too many of you are gaping assholes.
This problem, though not unique to it, is especially acute on Mastodon.
The issue is threefold:
Too many of the responses are bad faith arguments from people intentionally or unintentionally being assholes. They engage in conversation specifically to be syphilitic dicks in your face.
It’s often hard to tell the difference between when somebody is starting a bad faith argument or when somebody is just being, y’know, German or Nordic where a more brusque manner of engagement is more normal. I can handle that. I get it in spades here in Iceland, so I have quite a bit of practice in spotting it and I think I generally can.
Even when I know for a fact, or am reasonably certain, that the interaction is being made in good faith, Mastodon and other social networks surface replies to followers, which means that any reply whatsoever has a good chance of attracting assholes looking for an opportunity to posture themselves into the conversation.
It’s reached a point where my first reaction to any reply is a mild anxiety attack followed by me closing the tab (or app) and ignoring it for the rest of the day. Simple replies and casual conversation are fine – they’re the foundation that weak social ties are built on – but anything that resembles the start of a “debate” makes my face twitch.
So, apologies in advance if I don’t reply to your reply. It’s not you, it’s one of the many symptoms of my general social media burnout. I’m not asking you to stop sending whatever good faith questions you might have my way. I just probably won’t answer.
And extra apologies if I do actually reply, because then there’s a good chance you got a serialised info-dump like the above piled into the replies.
She was a national treasure. It’s interesting to watch these as both historical glimpses as well as thinking about the areas of computing where we have and have not made significant progress.