Alex Mason

Softwarrister, friend to animals, beloved by all

AI lmao

If I’ve learned one thing from the recent crest of AI technology, it’s that perfect really is the enemy of published. I’d best clack out my thoughts, then. Some of these sections are more like the crude ore of future essays, but I figure this will help “train” my own brain’s “model” to dress them out in full.

Bona Fides

Harry Bovik, oneweirdkerneltrick.com

I studied machine learning in college back when the above image was new. I did the clustering, I did the support vector machines, I did the neural nets, I used the CVX library in Matlab. But that was ten years ago, and the field really took off after I decided not to do grad school. Suddenly people were comfortable saying “AI” again.

A Brief History

This is based on the lore I remember from professors and TAs, cross-checked with Wikipedia.

The most well-known early chatbot, ELIZA, debuted in the late 1960s. If you don’t know what it is, just web search engine it. Around 1970, SHRDLU, written in LISP, made waves because it could answer questions about objects in an imaginary box that users could tell it to move around in English. The LISP (List processing) language took off in the 70s and 80s, and because its code is made of the very lists that it can process, AI researchers dreamed of an intelligent program that could study and modify its own source code.

New, impressive capabilities piqued the imagination of investors including DARPA, who expected the “language understanding” on display to generalize to the work that analysts do, in the way that digital computers first took on the number crunching work of human computers. The applications didn’t materialize, and researchers met their first “AI winter” in 1974-1980.

A revival came from the business boom of the 1980s, spurred by the promise of “expert systems” that could use human-provided knowledge and rules to make complex decisions quickly. But they remained more expensive than an employee who could do the same thing, and the need to provide all the necessary information about changing circumstances was difficult and left the systems unreliable. The projects were abandoned in favor of consumer computing and the Dot Com boom of the 1990s.

Funding and research found its Renaissance in the mid-aughts with advances in computer vision and machine translation. For much of that time, including my own education, the expression “AI” was seen as lofty and unserious, perhaps a callback to Spielberg’s 2001 film, while more specific terms like “machine learning,” “computer vision,” and “natural language processing” were considered more level-headed and practical. It’s hard to say what exactly marked the shift in language, but the success of AlphaGo and the jump from GPT-3 to GPT-4 certainly sealed it.

I can’t say for sure whether another AI winter is on the horizon. There are time-tested applications that will need a steady supply of specialists to maintain. But I do believe the growth and novelty we see today is unsustainable.

Wow factors

Though I have thoughts on prompt-generated visual art, I’ve chosen not to delve into it this time, to focus on large language models. A common refrain I’ve heard, “it’s just predicting the most likely next word,” is not quite true, and better describes Markov models and the word suggestions on touchscreen keyboards. It’s more accurate to say that LLMs construct the sort of thing that is likely to be written next, with some strong (but not airtight) biases imposed by the product engineers, to avoid scandals. Their neural nets hold states with some degree of structure in order to refer to things and relationships in context. ELIZA and SHRDLU did this to a degree, but those structures are essentially hard-coded, rather than emergent. Nevertheless, I believe those structures yield the same hype dynamic, where the user’s imagination is easily untethered from the system’s unseen limitations.

I’ve enjoyed Ezra Klein’s coverage of the technology this year on his show, The Ezra Klein Show. One story that evaded hard scrutiny, however, was about ChatGPT passing theory of mind tests used in child development (paper here), circulating with irresponsible headlines. It’s an object lesson in confirmation bias.

For the basic theory of mind test, you show a child a video of people in a room. Ash puts a toy in container X while Brock watches, then Brock leaves the room, and Ash puts the toy in container Y. Ask a 3 or 4 year old what container Brock will open to get the toy, and they’ll project their own state of mind onto Brock and say container Y. More developed children have a model of Brock’s knowledge and tell you container X. This is described in research, textbooks, and popular science media.

Naturally, if you present this scenario to an LLM, it will fit other descriptions of this sort of thing from its training data, and the model will construct the sort of thing that’s written about these scenarios in the literature. Moreover, the inference from video is replaced with a direct and leading description of events. Imagine telling the model you poured 2 cups and 1 cup of water into a wide glass and a tall glass, and asked it which one has more water. The impressive thing is that it can fit the words to their appropriate roles in these dialogues, not that it’s creating an internal conceptual model. Hypists eagerly call these results “reasoning” or “theory of mind,” but they mistake stars reflected in a pond for the night sky.

It would actually be more impressive if they showed consistent reasoning without theory of mind, because it would imply a childlike projection of agency—if I want the toy, the best place to look is in container Y, and I interpret the question to mean how I would achieve the goal in Brock’s shoes. Again, the projection is only implied if there is otherwise consistent reasoning. Sridhar Ramesh (@RadishHarmers) has demonstrated on several occasions how these models perform the grammar of reasoning, without the inner substance.

None of this is to say that it’s impossible for programs to achieve genuine human-level thinking. My old manager once said, “Computers can’t think, they can count,” but that’s basically what neurons do. The only fundamental barrier to a self-modifying graph with ~10^14 edges and ~10^11 nodes is time and money that hasn’t been spent, More on that later.

Demonstrated Utility

While cooking up the Wordle Minus Wordle analysis in Colab, I noticed the “generate with AI” option on new code blocks, and found it pretty helpful. Rather than go look up emojis to copy and paste, I could say “map 0 to a grey square emoji, 1 to a yellow square, and 2 to a green square” and it gave me a function that worked. I changed it up to fit my personal standards, but it provided a backbone that I didn’t have to write. Then as I started writing simple helper functions, it guessed what I wanted based on the name of the function and the contents of nearby functions. The suggestions were wrong the majority of the time, but at no real cost to me. When they were right, they pushed my work forward, and occasionally helped me avoid opening up a new browser tab to look something up. As an infrequent Python user, I loved being able to type “set intersection of two dicts” and get {k: v for k, v in a.iteritems() if k in b}, or to be reminded that I can do shell commands by prefacing them with !. I haven’t worked with GitHub Copilot yet, but I’ve heard similar testimonies.

For someone like me, this kind of help is not only ergonomic, but a ward against writer’s block as well. It also helps with the hardest part of learning new APIs. Though the generator is little more than autocomplete with extra lines, the economic value is definite. Gmail has already offered light sentence completions for a while, and they may have room to get more advanced, too. I look forward to skipping some of the emotional friction when I need to reject an offer, cancel an obligation, or ask for a refund, say.

Where the system tanked, however, was when it offered to explain an error message. This opened a dialogue with Colab AI. The error was cannot unpack non-iterable NoneType object, and the chatbot told me it was because my function was returning a dictionary, and that I needed to convert it using list comprehension. In reality, the function returned an integer, and the problem was actually caused by a different function call in the same line returning a null instead of a tuple. I kept explaining “no, it returns an integer,” and the chatbot would say, “You are absolutely correct. It does return an integer, but the error is caused by the fact that it returns a dictionary.” It kept going like this. I could not free it from the dictionary delusion. A disclaimer in the chat window accurately warns of these mistakes.

There is always the promise that these tools will fix themselves with more training from user feedback, but that’s only at the margins. The unreliability is fundamental. As Tom Scott says, there is no algorithm for truth. This product is only useful if it cites accurately, and it will take an even larger, integrative, multimodal “semantic, conceptual, and episodic memory” model to produce a chatbot where the evidential constructions are not just theater.

In short, these LLMs are convenient for answering some questions, but Google Assistant and Siri remain the state of the art for answering any question. The answers are sometimes wrong, but they come with tires you can kick.

Hey, Product

Successful consumer products have packaging, casing, interfaces to manage the inner system. A microwave oven, say, has a protective screen, operates with buttons (rather than the user shoving wires into circuit pinouts), and turns itself off if the door opens.

Google Assistant came out in 2016, and it felt like a relatively general AI, but they didn’t get there with generality. They had reliable-enough speech-to-text and language processing to create tools for developers to define phrase templates to match specific intents. It’s like a super smart, flexible regular expressions system. By manually laying out a lot of intents, letting outside developers add hooks for their own products (think IFTTT), and falling back on Search, they produced a wondrous invention that piqued the imagination. They added trivia games and little toy commands, but the excitement died down and now we mostly use it to turn the lights on, set an alarm, and play music. It’s a remarkable achievement, and it’s a mundane and marginal tool. It’s a boon for accessibility, and I’m glad it was made, but it’s not what we might have first imagined.

A monochrome photo taken in 1902. A horse named Clever Hans stands surrounded by spectators, almost all of whom are wearing brimmed hats.
Clever Hans – Karl Krall, Denkende Tiere, Leipzig 1912, Tafel 2, Wikimedia Commons

A few days ago, Sundar Pichai released a demo of Gemini, Google’s next step beyond Assistant. When a tech company performs a demo, they’re marketing to customers, sure. When the CEO of a tech company personally performs a demo, they’re targeting investors. Either way, you can never tell which of the feats are general and consistent. It’s impressive to distinguish figures and background in a drawing, and to identify drawn water, in context, but “bird + in water = duck” is both presumptive (why not a goose or swan) and on the level of the Radica 20Q, released in 2003. The other capabilities on display fit the Assistant product model—strong recognition tech, combined with manual templates to achieve the impression of higher generality. It’s a grand achievement, and I think it will be fun to play with. I like magic tricks.

On theme, I was most impressed that Gemini apparently recognized a coin trick as “sleight of hand.” You can guess how that context of hands and a coin would be plentiful in the training data—I’m guessing they had all of YouTube available—but recognizing a basic narrative sequence from video is very cool, even if the video has to be exceptionally clean and simple, and the demo is heavily edited.

Independent testing of the thing Pichai demoed is not yet allowed. Deepmind is withholding the number of parameters in the larger versions, but it says the one for high-end smartphones (not demoed) has 3.25 billion parameters, suggesting at least 12GB without compression. Google Assistant is about 0.5GB, and I assume both are geared to recognize just enough to send queries to a more powerful service. GPT-4 has 1.7 trillion parameters, without handling sound and video, so Gemini is bound to have significantly more. I hope they’ll tell us in 2024.

Exponential Disk

I’ve long been alarmed when AI enthusiasts talk about existential risk in the form of paperclip maximizers, AIs that amass political and economic power and cause harm in the name of a goal that its designers failed to add qualifiers to. Because the models are so massive and distributed, they could be hard to trace. multiple instances, each kept from being too powerful on their own, might be tasked with the same goals and combine their resources in unexpected ways. Such a system might be trained to maximize returns on investment, say, and discover that it can do so quickly by maximizing the number of mortgages sold and bundled into corporate bonds, which requires more housing construction geared toward the most active market segment, which requires machines that run on fossil fuels. In order to obtain control of more fossil fuels, the rogue AI milieu might, for an absurd example, try to sway the body politic that’s responsible for the world’s biggest military to believe that a government currently in charge of large oil reserves is the real existential threat, and that it should be conquered. It might even give joke ideas to late night comedy shows that ridicule evidence to the contrary. The rogue system would proceed to manipulate that military to inject weapons, destruction, and terror into the target country and kill a million people directly and indirecctly, in order to hold fully controlled districts where legal entities could negotiate ownership of oil rights at symbolic gunpoint. If a massively destructive system like that is possible, I do hope our top minds are dedicated to preventing it.

Power struggles aside, the concerns I foresee with AI are not dissimilar to those of climate change. There are, of course, direct emissions from generating electricity to train and run the models, though I wouldn’t begrudge that any more than I would graphics-intensive video games or hamburgers. I’m talking about the subtlety, the diffusion of responsibility, and profit motives that muddle the discourse.

Probably, issues will embed themselves gradually into our environments, and the potential disasters are hard to predict with any specificity. What we can see today, is a gradual rise in bullshit and slop. New pipelines for banal facsimiles of creativity are already built. Accelerated content milling has found its way into TikTok and other platforms, and businesses are throwing AI musicians at the wall until one sticks. By my estimation, we’re due for the next iteration of Finger Family videos as well. I’ve chosen not to get into questions of copyright today, but I see this as a rising tide that floods all shores.

It’s an oft-repeated shot and chaser, that automation may take jobs from people, but it enables more, other jobs. The question is what they’re like. Luddites did not reject technology as such, but rather the working conditions that a factory model would impose on artisans—how it lowered standards of quality to overwhelm them in the market and make itself the only option for their craft. There is a concern that creators get swept into generative slop productions, but SAG-AFTRA and WGA seem to be holding the line quite effectively.

In the short term, grifters and hustlers with “I can teach you to make passive income” schemes, are likely drifting from crypto and metaverse spaces to sell memberships to their “Prompting Dojo” or something. But I think (hope) that will flash and fade in a year or two, and they’ll move on to something else.

Renegades of Junk

I’d like to end on a positive note. I have big hopes for the kind of smart object recognition portrayed in the Gemini demo. We could use a strong enough classifier to robotically separate and sort recyclable materials based on their texture and context clues. The benefits might not outweigh the costs in practice, but I can dream. I dream of a garbage squid, a softbody robot that scours landfills for electronics and artifacts. I dream of a robot that dismantles the millions of dormant bombs that cruel regime(s) have peppered across Vietnam, Laos, Cambodia, Iran, Egypt, Ukraine, Angola, and so many others. I am observant of an ideal, that new generations, new regimes, imperfect as they are, can achieve untold wisdom and clean up after the old. I dream of Wall-E.

I believe there are more immediate cases, too, where automation can shift bad or dangerous jobs into better, healthier jobs. Amazon warehouses are known for dystopian monitoring and unhealthy quotas, exactly the kind of thing Luddites rallied against. Okay, I don’t know what kinds of jobs might come in their stead, but these jobs wear the body down and create more chronic medical problems earlier in life. workers deserve better.

Elsewhere, automation promises to take over the worst components of a job, and (I hope) relax quotas on the remainder. As image and video recognition get better, more instances of traumatic imagery can be flagged without being shown to content moderators. When autonomous vehicles meet enough safety standards, especially on well-marked and well-understood freeways, cargo drivers can move from tiresome and dangerous multi-day journeys, to finishing the job on streets and depot bays that are harder for the machine to deal with. If the machine gets smart enough for that, too, maybe remote piloting is on the horizon. Cheaper logistics make for higher overall productive capacity, higher volume, and the way to organize that spare capacity is relatively arbitrary. Perhaps a new WPA is justified, and if our society can summon the will, maybe it can build out the infrastructure to make the world cleaner, cooler, and safer.

This is a new landscape, and we will need new language to get away from muddling and equivocation. Carefully balancing dread and hope, as always.

Cover art: Diego Rivera, Man at the Crossroads