When ‘convenience’ is used as a selling point, there’s always a hidden price to pay. Whether fast food or a digital personal assistant, the risks are usually rarely initially understood. We have learned that when it comes to the crunch, the cost of convenience is more often than not, a price we are not willing to pay.
Convenience has often come at the price of our health, the exploitation of our personal data or the propagation of unethical treatment of ‘conveniently’ hidden out of sight and out of mind workers. When it comes to the generative AI and Large Language Model based AI rapidly propagating across the world the questions should be asked – are the public and businesses aware of the risks they are being exposed to, and do they know the dangerous cost to individuals, society and business?
A foundation built on sand
It’s important to understand how Large Language Models and Generative AI work in order to identify the root causes of how things could go awry. In layman’s terms, a Large Language Model (LLM) like GPT works by analysing massive amounts of text data to learn patterns, relationships, and context between words and phrases.
Imagine a super-advanced predictive text tool that doesn’t just guess the next word, but can generate entire paragraphs based on the input it receives. These AI’s break down your input into smaller pieces (tokens), processes them using layers of mathematical functions called neural networks, and predicts the most likely and relevant response. It doesn’t “understand” in a human sense but generates text that aligns with the patterns it has learned from its training data.
Transformers in LLMs like GPT are not fully transparent. While certain parts of their workings, like the attention mechanism, can be analysed and visualized (e.g., seeing which words or tokens are given more importance during processing), the overall decision-making process is extremely complex and often opaque.
The model’s layers and millions (or billions) of parameters interact in ways that are difficult to trace or explain. This makes it challenging to pinpoint why the model produces a specific response or to guarantee that its reasoning aligns with human logic. In this sense, while there are tools and techniques to understand parts of its functionality, the overall system is more of a “black box” when it comes to full transparency.
Essentially, this breaks the ability to understand how a conclusion was reached, significantly what data was used to understand how to formulate an answer. In business and it many other walks of life, the inability to explain and validate the thinking behind a conclusion would negate any trust in the answer, in law, in journalism, in education and in any regulated business for example. Setting a standard of ‘it’ll do, even though I can’t explain why’ is a dangerous precedent to set.
‘It must be true, I found it on the internet’
The most important problem which is likely too far gone now when it comes to LLM / GPT based AI is data. Rubbish in – rubbish out. Whilst this may seem obvious, it makes for an interesting exercise to explore the depth of the risk when it comes to the data used to train these future tools of convenience in the context of how they’re being sold to us.
We all know that the internet is a little dodgy at the best of times, that Google search results for a while now have not been the most relevant as billed, instead we are being directed to the highest bidder to sell their wares to us. Not quite what we signed up for but the change has been incremental over time in a way that most won’t have noticed.
Then there is social media. By it’s very definition, social media is predominantly user generated content and the great proportion is opinion or narratives that are being driven for a cause, whether that political, commercial or ideological. Not quite the truth based internet that was hoped for by Tim Berners Lee back in the day.
Whilst it doesn’t take a rocket scientist to figure out that the majority of ‘the internet’ is not the haven of trusted data we would stake our lives and business decisions on, there are discrepancies with the claims made by the big companies pushing their LLM models of being ‘trained on all the data in the world’ – Were this true, it would be deeply concerning. The use of copyrighted material should also be considered, however, that’s an entire other article on it’s own.
The fact remains that companies tend not to be in the business of sharing their critical data publicly, nor do they give public access to their accounting systems, sales CRMs, customer lists, bank accounts or HR systems out publicly. This means much of the data that does matter for doing business is not being considered. In reality, nor should it, as that would be open season on malpractice and create a legal hailstorm.
If you know the data being used is probably flaky and you can’t find out how any conclusions on that questionable data is being made, are we really willing to just accept the answer as fact and truth? It’s convenient to save lots of time spent researching, learning, understanding, thinking critically decision making, but at what cost? Accepting an AI answer on face value knowing the above, should send alarms bells ringing if those decisions have any ability to have consequence to your life or business.
AI Hallucinations – Digital blagging
If you’re not aware of LLM / GPT based AI Hallucinations, you probably should. In layman’s terms, it means just making stuff up. Why? Who knows, it just does it, especially when it comes to facts. Hallucinations create ‘alternative facts’ which are then present in the LLM datasets for future iterations of itself to learn on. Given enough time of self-learning whilst incorporating alternative truths, the output given to the user, who will have no desire to check or verify will be to accept this new truth as a conclusion. If you say it enough, it becomes the truth apparently. If that doesn’t concern you, then probably very little will.
On a positive note, whilst there is no doubt that LLM / GPT technology has deep rooted risks that have and will continue to propagate serious real-world risks to both people and businesses, there is an extremely powerful technology here that undeniable has the ability perform extremely useful functions that could yield massive benefits, the fact remains that this AI, and all AI can only be as good as the data it is using.
How could we look at this?
Instead of having one AI that does everything and descends into uncontrolled chaos, we believe in a more structured, disciplined approach to building Trusted and Transparent AI such as the LGM that respects data, its source and integrity, then builds around that capabilities that can perform powerful functions with sacrificing trust and truth.