A Tireless Font of Knowledge

I’d like to share an analogy that’s been bouncing around my head, on the impact of large language models on the makeup of software products. Then, more interestingly, I’d like to take this analogy to its natural conclusion, which will serve as a personal forecast for how the emergence of LLMs may affect the evolutionary trajectory of software. Note that I’m not speaking of changes in personal productivity, or how the code is written, but an evolution in the internals of software products.

The Man-made Glacier

With the knowledge on the internet and other curated datasets compacted into a portable vessel of a couple billion parameters, world-leading LLMs contain an incredible amount of potential energy. The generative distribution is unimaginably complex. But amazingly, with the appropriate prompting methods, we can reliably elicit latent concepts embedded within the broader output distribution. Alternatively, we can fine-tune an LLM – i.e. slightly reshape the vessel – to make some desired concepts bubble up and become more saliently represented.

My analogy is that LLMs are like a frozen glacier atop Mt. Shasta (or any other large body of water at high elevation). It’s an exploitable resource and store of potential energy. Glacier water flows downhill via rivers and streams; through a complex plumbing infrastructure of pipes, dams and treatment facilities, humanity has turned water into an invaluable resource that is instantly available to many households with the turn of a spigot. We shower, steep tea, and cook spaghetti with ease. It powers household appliances which obviate menial labor and dramatically improve quality of life for all who have regular, uninterrupted water access.

I’m comparing training an LLM (i.e. compressing textually-encoded human knowledge into a neural network) to creating an artificial glacier and airlifting it atop the nearest mountain. Relatedly, the LLM abstractions and software infrastructure currently reifying – e.g. prompting as programming or RAG search systems – are about figuring out how to get the water of knowledge to flow towards the desired destination with the correct characteristics: freshly, reliably, equitably, at scale.

Products such as LLM search, service chatbots, writing assistants, and coding copilots allow you to experience the output directly, or feel the water running over your hands. It’s like taking a shower instead of fetching water from the well and drawing a bath. With these products, life will clearly get easier and more streamlined in certain respects.

Concurrently, developers are contributing to a ecosystem of apps where LLMs can perform actions on your behalf: navigate a web form, order an Uber, book a flight. Per this glacier analogy, we can think of this line of work as linking the water line into a separate apparatus (think sprinkler system or washing machine). It’s a readily available, tireless resource that powers the next stage of the Rube Goldberg machine of your life, perhaps automating out a nagging, recurring item from the to-do list. The human’s role then shifts from laborer to maintainer.

One Step Further

Let’s get into speculative territory by taking both sides of the analogy one step further. Clearly, water plays a larger role as a resource in our societal infrastructure beyond being piped into our homes. While moving, treating, and cleaning water consumes energy, water is also used for energy production, forming the water-energy nexus. Energy production with water happens directly via hydroelectric generation, or indirectly via hydraulic fracking, cooling thermoelectric plants, or processing materials needed to build photovoltaic cells. In short, water is a resource that can be exploited to either directly generate energy, or enable access to alternative energy stores.

These alternative energy sources or sinks – direct hydroelectric power, solar powered batteries, or a barrel of crude oil – each have distinct characteristics with varying use cases. Figuring out energy conversion has empowered an entire other realm of complex machinery. We can now turn on the lamp in the living room with a flick of a switch, and charge our EV’s from our solar-powered batteries.

In the next section, let’s bring this analogy home. Instead of piping the output of a LLM directly to the user, how can we use LLMs to produce or refine alternative stores of value and potential energy? And what software mechanisms or features could these alternative stores bring into the realm of possibility, or make dramatically easier to maintain?

The Current State

Behind-the-scenes of a modern software product, there are many critical ML-backed workflows linking various components and keeping the lights on. I’m not speaking of user-facing ML models (e.g. content curation or metric forecasting), but of models deployed internally and maintained as a means of bridging/translation between two internal systems. Some examples include:

  • Time series forecasting and anomaly detection models, serving as input to downstream flagging or recommendation systems.
  • Cleaning, structuring, reformatting or summarizing potentially noisy user/machine generated data, populating intermediate data stores for downstream analytics usage.
  • Extracting information from or tagging user-provided or web-scraped content, for semantic querying or piping content to the right locations.
  • User behavior categorization, as input for product research or collaborative filtering recommendation models.

As any machine learning practitioner will happily tell you, training, deploying, and monitoring each new model is a non-trivial upfront and continued investment. The high variance of problem domains means a correspondingly high variance in datasets, model architectures, and inference stacks. After shipping the model, when business or downstream usage requirements invariably change (e.g. an input feature is removed, or an output class is added), a laborious renovation of the entire model lifecycle is triggered.

An Evolved Software Stack

Indirect Energy Production

I’m not here to advocate for LLMs as as silver bullet for the hardships mentioned above, but my point is that in many of the above scenarios, leveraging LLMs with few-short or zero-shot in-context learning could reduce frictions in the ML lifecycle dramatically. I’m calling this category of value-generation – raising the efficiency or lowering time-to-market of familiar ML components – as “indirect energy production”.

LLMs are pretty good at few-shot classification, few-shot time series forecasting, and translation tasks involving restructuring inputs while preserving semantic content. Imagine that instead of standing up multiple models to handle each category of task mentioned above, we fully leaned into optimization of few-shot in-context learning prompts. Of course, maintenance of test datasets for model evaluation is still critical. In this scenario, which parts of the ML lifecycle become easier to manage?

  • Instead of bootstrapping a model and dataset from scratch, it becomes an order of magnitude easier to stand up a baseline model with acceptable (or even exemplary) performance.
  • Training data management and model training would become prompt versioning and optimization instead. This would make change management easier by removing the need to summarily relabel or reformat an enormous training dataset.
  • Inference infrastructure becomes more homogenous, and thus easier to monitor and optimize. We no longer have to host many models with disparate architectures and input/output requirements.

On the flip side, what doesn’t change?

  • Production model monitoring remains critical, as concerns of model drift in production usage remain.
  • Model evaluation remains critical, and thus test dataset management is still vitally important.
  • The possibility of data noise and edge cases provoking unexpected model outputs might increase, thus making input cleaning and output validation crucial.

Direct Energy Production

If easing aspects of the ML lifecycle constitutes “indirect energy production”, then what does “direct energy production” look like? What is the equivalent of running a hydroelectric generator?

Most obviously, LLMs are revolutionizing the formerly labor-intensive tasks of data labeling and model bootstrapping. With a niche machine learning need, a knack for prompt engineering, and a few hundred dollars, you can derive a well-sized, relatively-accurately labeled dataset within a few hours, which would have taken a seriously non-trivial amount of crowdsourced manpower just a year or two ago.

In another area, researchers in the field of weak to strong generalization are seeking to exploit the generalization capabilities of larger LLMs to leverage a “weaker” model into a “stronger” model, by having the smaller model teach the larger model. Continued success in this direction could mean, through a managed series of teacher-student training regimens involving progressively larger models, we can trade up in performance in regards to a particular task, potentially past the point of human performance.

Conclusion

This personally amusing thought experiment was inspired by the fact that the increased prevalence of LLM usage in the tech industry is correlated with accelerated build-ups of data centers, and thus a literal increase in aggregation and consumption of water for cooling purposes. I also have a yen to think beyond chat-based LLM applications. Chat is a compelling demonstration of the power of generative models, but I also believe that this conflation of LLMs with chat interferes with a more widespread understanding of the authentic value of the underlying technology. By leveraging LLMs outside of consumer-facing use cases, what kind of architectural shifts might we see as the software abstractions around LLMs harden, and LLMs become more of a cog to be properly fit into a larger machine?

Hopefully the analogy was not too far of a stretch, and certain aspects of the forecast ring true to other practitioners.

Leave a comment