🤖 100 Days of Generative AI - Day 2 - LLM Tokens vs Parameters? 🤖

🤖 100 Days of Generative AI – Day 2 – LLM Tokens vs Parameters? 🤖

We often hear/read this terminology: Llama 3.1 is trained with 8 billion, 70 billion, and 405 billion parameters and 15 trillion tokens, while gpt3 is trained with 175 billion parameters and hundreds of billions of tokens (there is no official number for gpt4 but it trained with many more parameters and tokens). But what are parameters and tokens? Here is my attempt to explain in simple terms

✅ Parameters: Think of parameters as the parameters or weights that a language model learns during training. These parameters determine how the model processes and generates text. The more parameters, the more powerful the model is and can understand and generate more complex language models.

✅ Tokens: Tokens are fragments of text that the model processes. Depending on the model design, a token can be as short as one character or as long as one word or more. During training, the model reads and learns from many tokens. The more tokens, the more text the model has been exposed to and has a richer understanding of the language.

✅ Why do we need tokens? Can’t we feed the model directly?
One of the main reasons we use tokens instead of feeding text directly to a deep learning model is that deep learning models, especially those based on neural networks, cannot process raw text directly. They require numeric inputs. Text in its raw form (strings) cannot be processed directly by neural networks. Tokenization converts text into numeric representations (usually integers or vectors) that the model can process. After tokenization, tokens are usually converted into embeddings before being processed by a deep learning model.