Large Language Models
Welcome to the LLM Hub. Here we will focus on what LLMs are and how to work with them when building Conversational AI Assistants and Chatbots.
We will start with a basic overview of what LLMs are and how they actually work. Then we will dive into the topic of how to work with them. We will look at the four main ways of working with LLMs: Prompt Design & Prompt Engineering, Knowledge Bases, Fine-tuning and Pre-Training an LLM.
What are LLMs & How do they Work?
What is an LLM?
When a new technology really wows and gets us excited it becomes a part of us. We make it ours and we anthropomorphize it. We project human like qualities on it and this can hold us back from really understanding what we are actually dealing it.
So let's consider a few questions. Mainly what is an LLM and what are its limitations?
Perhaps these questions and ideas will illuminate our understanding:
-
Are LLMs a program?
-
Are LLMs a knowledge base?
-
Do LLMs know anything?
-
If an LLM is a program how does it compute its 70-100 Billion parameters in only a few seconds?
-
If an LLM is a knowledge base, why does it need to predict?
-
How can an LLM Model with Billions of parameters that has been trained on pretty much the entire internet, fit on a 100GB drive?
-
What are some simple tasks that LLMs can't do?
Hopefully, these questions dispel some of the mystic around LLMs. There are a number of things that most people believe about them that are contradictory and wrong.
First, LLMS are not knowledge bases and they are not really programs either. What they are is a statistical representation of knowledge bases.
In other words, an LLM like ChatGPT4 has been trained on hundreds of billions of parameters that it has condensed into statistical patterns. It doesn't have any knowledge but it has the patterns of knowledge. When you ask it a question, it predicts the answer based on its statistical model.
How LLMs Work
LLMs condense knowledge into patterns and this includes words, word order, and how they are related to each other. These are represented mathematically via tokens. When you ask an LLM a questions, your question is turned into tokens and based on the LLM can predict what token should come next.
ChatGPT does it reading in one direction and is great an output. BARD does this bi-directionally and can predict future and past tokens.
One way to think about that questions asks like a filter. The filter is essentially the tokenized expression of your questions. The filter is then used to predict what will come out. If it's a coffee filter, probably coffee.
Limitations
Why do good models do bad things? The answer lies in the models and how they are built.
LLMs are Statistical Representations of Knowledge Bases. They have taken the world's information and knowledge and boiled it down to statistical principles.
These principles are like icons. Icons represent something much more than what they are. They are a low resolution images that represent a much bigger chunk of information. They give you a lot more information than meets the eye.
Additionally, LLMs were trained on biased data. We know this because the internet is full of biased data. For example, most the internet is in English and represents Western values, yet the global population doesn't speak mostly english or hold western values.
When we combine both low-resolution models and bias we are bound to have hallucinations and poor accuracy.
How does this work and when does it happen?
It happens when you ask a detailed question about something specific. In our example, if you ask detailed questions about the icon, the model might make up those details in a way that conforms to its biases.
Getting Started with LLMs
Overview
There are a number of ways to use LLMs effectively and get the most out of the. Let's look at some of them.
Prompt Design
Prompt design guides language models to intended answers. NLP-based prompts are clear, effective instructions or inquiries. Prompts let you ask an AI a question like a real human. Language models need good prompt design to provide accurate, high-quality responses.
Prompt engineering differs from design. Prompt design creates high-quality prompts, while prompt engineering manages their utilization. An optimal, well-rounded AI strategy for enterprises requires both.
Prompt
Engineering
Prompt Engineering is the practice of developing and optimizing prompts to use language models for a variety of tasks efficiently.
They focus on improving performance and may use domain-specific knowledge, using keywords, NLP, and techniques like adding examples of desired results and prompt chaining to get to the desired results.
Prompt
Tunning
Prompt tuning uses “soft prompts” that are generated by a small set of learnable parameters. “Soft” prompts are continuous feature vectors that can be optimized through gradient-based methods.
Knowledge Bases
The LLMs can use Retrieval-Augmented Generation (RAG) and get information from our Knowledge Bases.
With Knowledge Bases, the LLM works the same way a librarian does. The librarian should know everything about what is in her library. She would know exactly which chapter of which book to suggest to a visitor who asked a certain question.
This explains a semantic search engine in a more technical way. In this case, embeddings are vectorial representations of document parts, and they make it possible to describe mathematically what each section actually means. By comparing embeddings, we can figure out which parts of writing have the same meaning as other parts. This is important for the process of recovery shown below.
Based on the question, you first get the most relevant information from your internal knowledge base. You then add to the normal generation part by passing this relevant information directly to the generator component.
We have created an entire section with case studies on how to design a knowledge base so that they are easily digestible by LLMs.
Fine Tuning an LLM
Fine-tuning is the process of training a large language model (LLM) to a specific task or domain of knowledge. It involves re-training a pre-trained model on a smaller, targeted dataset. The process adjusts the model's weights based on the data, making it more tailored to the application's unique needs.
For example, an LLM used for diagnosing diseases based on medical transcripts can be fine-tuned with medical data. This LLM will offer far superior performance compared to the base model, which lacks the required medical knowledge.
Fine-tuning can help you create highly accurate language models, tailored to your specific business use cases.
Fine-tuning a large language model (LLM) can be expensive and complicated. It involves retraining a pre-trained model on a smaller, task-specific dataset. The new dataset is labeled with examples relevant to the target task. The model can then adjust its parameters and internal representations to become well-suited for the target task.
Here are some considerations when fine-tuning an LLM:
-
Your dataset needs to represent the target domain or task.
-
You need enough training examples in your data for the model to learn patterns.
-
You might not be able to mimic GPT with an open source model.
-
Fine-tuning a large language model (LLM) can cost between $0.0004 and $0.0300 per 1,000 tokens. The cost depends on the type of model you're using and the fine-tuning algorithm you choose. Some algorithms are more computationally expensive than others.
There are also a few Disadvantages to fine-tuning:
-
You will need to maintain the upkeep of your model.
-
Your model will essentially be an expert in a domain instead of a librarian that retrieves information. This makes it more difficult to update, change or remove information from the model.
-
The model can integrate concepts, which lead to new ways of communicating ideas which can be inaccurate
-
It can be difficult to uncover where certain answers come from
Pre-Training Model
Pre-training is the process of training a model on a large corpus of text, usually containing billions of words. This phase helps the model to learn the structure of the language, grammar, and even some facts about the world. It’s like teaching the model the basic rules and nuances of a language. Imagine teaching a child the English language by reading a vast number of books, articles, and web pages. The child learns the syntax, semantics, and common phrases but may not yet understand specific technical terms or domain-specific knowledge.
Training a large language model (LLM) can cost millions of dollars. The cost of training a single model can range from $3 million to $12 million. However, the cost of training a model on a large dataset can be even higher, reaching up to $30 million.
Challenges & Limitations
Bias
What is Bias? Is it ever a good thing? It's very important that we are on the same page of what it means so we can understand it better.
Here is the Webster definition of Bias:
-
an inclination of temperament or outlook especially : a personal and sometimes unreasoned judgment : PREJUDICE
-
an instance of such prejudice
-
deviation of the expected value of a statistical estimate from the quantity it estimates And/OR systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others
Transitive verb
-
to give a settled and often prejudiced outlook to his background biases him against foreigners
-
to apply a slight negative or positive voltage to (something, such as a transistor)
The most important insight here is that, bias has its roots in our preferences. For example, if you prefer coffee over tea, you are more likely to show bias towards coffee. You might believe that more people drink coffee, that coffee increases your mental focus better, even that it is healthier. At the bare minimum, you will have more information about coffee which will skew how you view both coffee and tea.
How does Bias Form?
Every interaction has three components. For example, as your reading this sentence, there is the words on the screen, you the reader and the meaning you are gathering form this information.
The first stage is attention, you are paying attention to this instead of something else. The second aspect is your perspective. You are seeing these ideas from a point of view that is limited in time and space. Consider how different this perspective would be if you read this five years ago. Lastly, there is mean making. All of these words will mean something to you. Depending on your background and education, consider how differently an engineer, a linguist and a conversational designer would interpret this paragraph.
Perceptions & Bias:
1. Attention: The world has too much information. Based on what we value, we decide where to look and what facts to pay attention to a-priori.
BIAS: By doing so, we are implying that some information is more important than other information. We are showing a preference.
2. Perspective: We only see objects from a point of view. Our perspective skews how we see the things we are paying attention to.
BIAS: Seeing a limited perspective or only one side of an object, event, person, topic, etc. leaves us open to confirmation bias, selection bias, sampling bias, reporting bias, volunteer bias, publication bias....
4. Mean Making: We turn limited data on limited things into meaning. Mean making is a process that involves our identities, beliefs, culture, personalities, etc. For example, consider how an adult and a child would interpret a similar event.
BIAS: The entire knowledge base is a construct, something we fabricate. It is a useful invention that doesn't exist outside of us.
To properly address BIAS, we need to be aware of it at every stage of the process.
Hallucinations & Poor Accuracy
Why do good models do bad things?
The answer lies in the models and how they are built.
LLMs are Statistical Representations of Knowledge Bases. They have taken the world's information and knowledge and boiled it down to statistical principles.
These principles are like icons. Icons represent something much more than what they are. They are a low resolution images that represent a much bigger chunk of information. They give you a lot more information than meets the eye.
Additionally, LLMs were trained on biased data. We know this because the internet is full of biased data. For example, most the internet is in English and represents Western values, yet the global population doesn't speak mostly english or hold western values.
When we combine both low-resolution models and bias we will have a hallucinations and poor accuracy.
How does this work and when does it happen?
It happens when you ask a detailed question about something specific. In our example, if you ask detailed questions about the icon, the model might make up those details in a way that conforms to its biases.
Prompt Tuning
Prompt tuning utilizes 'soft prompts', which are created by AI itself. The AI can continue to tune prompts until it reaches an ideal prompt for successfully completing a task or goal. The advantages here are in efficiency, cost, automation, however one key disadvantage is that the prompt will be unknowable since it is string of numbers.
By selectively combining Prompt Design, Prompt Tuning, and Prompt Engineering, we can solve many of the issues plaguing LLMs and increase response quality and accuracy.
Fine Tuning Tutorial
Solutions
Knowledge Bases
Using your own knowledge base as a primary point of information for your chatbot project is a great start to improving quality.
To increase the odds that the bot will answer correctly you need to design and organize the information in your knowledge base so that the LLM will gain the correct vector representations within the document parts.
There are a number of variables to consider such as ontologies, taxonomies, and semantics. There is a connection between the Hierarchical Structure, the Marco context, the micro context, tagging, and words matter,
Right now we are conducting experiments to discover the best techniques in organizing knowledge bases to insure correct vector representations within the document parts.
Want to learn more on how to do this? We created an entire section for Knowledge Bases which includes experiments!
Prompt Tuning
Using a combination of Prompt Design, Prompt Engineering, and Prompt Tuning to elicit the most appropriate answers from a LLM.
Fine-Tuning
Fine-tuning can be used on proprietary data that can be leveraged to create new and unique experiences. In some way, we think this is more in line with products and productizing knowledge.
Future of the Internet
Knowledge is more than just information. It can open up a person's eyes to seeing the world in a whole new way. It has the power to reveal. A good example of this is Amazon. Using your smartphone, you can see what a couch would look like in your living room. They have taken the information about the couch and turned into into an experience. This is where the internet us going.