Edited, memorised or added to reading queue

on 05-Apr-2025 (Sat)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

large language models or LLMs, generate new texts by repeatedly predicting what is the next word they should output
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Unknown title
from huge amounts of texts. You download them from the Internet so that when given a few words as the input, the model can predict the next word that comes after. These models, which are called <span>large language models or LLMs, generate new texts by repeatedly predicting what is the next word they should output. Given the widespread attention on LLMs, let's look briefly on the next slide in greater detail at how they work. Large language models are built by using supervised learning to train a




Large language models are built by using supervised learning to train a model to repeatedly predict the next word. For example, if an AI system has read on the Internet a sentence like my favorite drink is lychee bubble tea, then the single sentence would be turned into a lot of A to B data points for the model to learn to predict the next word.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Unknown title
ate new texts by repeatedly predicting what is the next word they should output. Given the widespread attention on LLMs, let's look briefly on the next slide in greater detail at how they work. <span>Large language models are built by using supervised learning to train a model to repeatedly predict the next word. For example, if an AI system has read on the Internet a sentence like my favorite drink is lychee bubble tea, then the single sentence would be turned into a lot of A to B data points for the model to learn to predict the next word. Specifically, given this sentence, we now have one data point that says, given the phrase my favorite drink, what do you predict is the next word? In this case, the right answer is give




When you train a very large AI system on a lot of data, say hundreds of billions or even over a trillion words, then you get a large language model like ChatGPT that given an initial piece of text called a prompt, is very good at generating some additional words in response to that prompt. The description I presented here does omit some technical details like how the model learns to follow instructions rather than just predict the next word found on the Internet. Also how developers make the model less likely to generate inappropriate outputs, such as one that exhibit discrimination or hand out harmful instructions. If you're interested, you can learn more about these details in the calls generative AI for everyone. At the heart of LLMs though, is this technology that learns from a lot of data to predict what is the next word using supervised learning
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Unknown title
on until you have used all the words in the sentence. This one sentence is turned into multiple inputs A and outputs B for the model to learn given a few words as input, what is the next word? <span>When you train a very large AI system on a lot of data, say hundreds of billions or even over a trillion words, then you get a large language model like ChatGPT that given an initial piece of text called a prompt, is very good at generating some additional words in response to that prompt. The description I presented here does omit some technical details like how the model learns to follow instructions rather than just predict the next word found on the Internet. Also how developers make the model less likely to generate inappropriate outputs, such as one that exhibit discrimination or hand out harmful instructions. If you're interested, you can learn more about these details in the calls generative AI for everyone. At the heart of LLMs though, is this technology that learns from a lot of data to predict what is the next word using supervised learning. In summary, supervised learning just learns input, output, or A to B mappings. On one hand, input, output A to B seems quite limiting. But when you find the right application scenario,




supervised learning just learns input, output, or A to B mappings. On one hand, input, output A to B seems quite limiting. But when you find the right application scenario, this turns out to be incredibly valuable
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Unknown title
s in the calls generative AI for everyone. At the heart of LLMs though, is this technology that learns from a lot of data to predict what is the next word using supervised learning. In summary, <span>supervised learning just learns input, output, or A to B mappings. On one hand, input, output A to B seems quite limiting. But when you find the right application scenario, this turns out to be incredibly valuable. Now, the idea of supervised learning has been around for many decades, but it's really taken off in the last few years. Why is this? When my friends asked me, hey, Andrew, why is super




with modern AI, with neural networks, and deep learning, what we saw was that if you train a small neural network, then the performance looks like this. Where as you feed in more data, performance keeps getting better for much longer. If you train even slightly larger neural network, say a medium sized neural net, then the performance may look like that. If you train a very large neural network, then the performance just keeps on getting better and better. For applications like speech recognition, online advertising, building self-driving car, where having a high performance, highly accurate say speech recognition system is important, this has enable these AI systems get much better and make, say, speech recognition products much more acceptable to users, much more valuable to companies and to users. Now here are a couple of implications of this figure. If you want the best possible levels of performance, your performance to be up here, to hit this level of performance, then you need two things. One is it really helps to have a lot of data. That's why sometimes you hear about big data. Having more data almost always helps. The second thing is you want to be able to train a very large neural network. The rise of fast computers, including Moore's law, but also the rise of specialized processors, such as graphics processor units or GPU's, which you hear more about in the later video, has enabled many companies, not just the giant tech companies, but many other companies to be able to train large neural nets on a large enough amount of data in order to get very good performance and drive business value. In fact, it was also this type of scaling increasing the amount of data and the size of the models that was instrumental to the recent breakthroughs in training generative AI systems, including the large language models that we discussed just now. The most important idea in AI has been machine learning and specifically, supervised learning, which means A to B or input output mappings. What enables it to work really well is data.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Unknown title
really taken off recently due to the rise of neural networks and deep learning. I'll define these terms more precisely in later videos, so don't worry too much about what it means for now. But <span>with modern AI, with neural networks, and deep learning, what we saw was that if you train a small neural network, then the performance looks like this. Where as you feed in more data, performance keeps getting better for much longer. If you train even slightly larger neural network, say a medium sized neural net, then the performance may look like that. If you train a very large neural network, then the performance just keeps on getting better and better. For applications like speech recognition, online advertising, building self-driving car, where having a high performance, highly accurate say speech recognition system is important, this has enable these AI systems get much better and make, say, speech recognition products much more acceptable to users, much more valuable to companies and to users. Now here are a couple of implications of this figure. If you want the best possible levels of performance, your performance to be up here, to hit this level of performance, then you need two things. One is it really helps to have a lot of data. That's why sometimes you hear about big data. Having more data almost always helps. The second thing is you want to be able to train a very large neural network. The rise of fast computers, including Moore's law, but also the rise of specialized processors, such as graphics processor units or GPU's, which you hear more about in the later video, has enabled many companies, not just the giant tech companies, but many other companies to be able to train large neural nets on a large enough amount of data in order to get very good performance and drive business value. In fact, it was also this type of scaling increasing the amount of data and the size of the models that was instrumental to the recent breakthroughs in training generative AI systems, including the large language models that we discussed just now. The most important idea in AI has been machine learning and specifically, supervised learning, which means A to B or input output mappings. What enables it to work really well is data. In the next video, let's take a look at what is data and what data you might already have and how to think about feeding this into AI systems. Let's go on to the next video. with modern




You might decide that the size of the house is A and the price of the house is B, and have an AI system learn this input to output or A to B mapping. Now, rather than just pricing a house based on the size, you might say, well, let's also collect data on the number of bedrooms of this house. In that case, A can be both of these first two columns, and B can be just the price of the house. So given a table of data, given a data set, it's actually up to you, up to your business use case to decide what is A and what is B. Data is often unique to your business. And this is an example of a data set that a real estate agency might have if they're trying to help price houses. And it's up to you to decide, what is A and what is B, and how to choose these definitions of A and B to make it valuable for your business.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Unknown title
column is the price of the house. And so, if you're trying to build an AI system or machine learning system to help you set prices for houses, or figure out if a house is priced appropriately. <span>You might decide that the size of the house is A and the price of the house is B, and have an AI system learn this input to output or A to B mapping. Now, rather than just pricing a house based on the size, you might say, well, let's also collect data on the number of bedrooms of this house. In that case, A can be both of these first two columns, and B can be just the price of the house. So given a table of data, given a data set, it's actually up to you, up to your business use case to decide what is A and what is B. Data is often unique to your business. And this is an example of a data set that a real estate agency might have if they're trying to help price houses. And it's up to you to decide, what is A and what is B, and how to choose these definitions of A and B to make it valuable for your business. As another example, if you have a certain budget and you want to decide what is the size of house you can afford, then you might decide that the input A is how much does someone spend,