GenAI Pointers (Primarily Text Domain)
summary = "GenAI Notes for Text - Mostly pointers, not details."
Terms Worth Understanding
Tokenization
Encoding/Decoding
This video by Andrej Karpathy is a good source for understanding as he walks us through an implementation of a tokenizer. **
Terms Worth Being Aware Of
Worth being aware of to the extent of understanding their motivations.
- Fill In the Middle (FIM)
- Making models learn to infill text
- ?
- Multi Query Attention (MQA)
- ?
- Grouped Query Attention (GQA)
Softwares
Tokenization/Embeddings
General Thoughts
- Previous approaches to provide inputs to intelligent systems required a well-defined input structure and shape to be decided and fixed up-front. The newer approaches have done away with that, allowing for variable-length inputs. Additionally, data is not required to be processed sequentially. This is powerful because it enables the new architectures to capture "context" in more powerful ways.