GenAI Pointers (Primarily Text Domain)

summary = "GenAI Notes for Text - Mostly pointers, not details."

Terms Worth Understanding

Tokenization

Encoding/Decoding

This video by Andrej Karpathy is a good source for understanding as he walks us through an implementation of a tokenizer. **

Terms Worth Being Aware Of

Worth being aware of to the extent of understanding their motivations.

Fill In the Middle (FIM)
Making models learn to infill text
?
Multi Query Attention (MQA)
?
Grouped Query Attention (GQA)

Softwares

Tokenization/Embeddings

General Thoughts

  • Previous approaches to provide inputs to intelligent systems required a well-defined input structure and shape to be decided and fixed up-front. The newer approaches have done away with that, allowing for variable-length inputs. Additionally, data is not required to be processed sequentially. This is powerful because it enables the new architectures to capture "context" in more powerful ways.