LLM Learning Initial
LLM Learning Initial
Welcome to the world of Large Language Models
For this summer vacation, I will begin the learning process for the fundamental principle and downstream techniques and applications for LLM (Large Language Models). I hope to find the journey interesting and fruitful!
All the code will be refactored and categorized into LLM Learn
LLM Learning Materials
-
Courses and Videos
-
CS25: Transformers United V5: For advanced architectures for transformers.
-
Transformers for 3b1b: A good visualize demo!
-
Advanced Natural Language Processing: A very good course for detailed lecture notes and homework. (I will try to finish that project in the next semester.)
-
LLMs and Transformers: with several discussion topics (lecture notes, blogs and papers are included)
-
Dive into LLMs: The chinese version of learning large language models
-
Learning Large Language Models from scratch: The course for learning LLMs in stanford.
-
Large Language Models for DataWhale: Courses of Chinese version.
-
Several books and codes:
-
Hands on Large Language Models (English Version)
-
Original English Version: Hands on Large Language Models
-
-
Hands on Large Language Models (CHN Version)
-
Build a Large Language Model (From Scratch)
-
-
Projects:
-
LLM Hero to Zero:
-
Build a simple GPT from scratch.
-
-
-
Tool Usage
-
HuggingFace for downloading models and datasets
-
LLM Learning Contents
-
Basic architecture for Large Language Models
-
Attention Mechanism (Attention is all you need!) ✅
-
RNN, LSTM, GRU (will be covered in the future)
-
Seq2Seq Model ✅
-
Transformer Architecture ✅
-
Other basic NLP knowledge (word embeddings, etc.)
-
-
Pre Training for LLM
-
Loading Datasets
-
Self-supervised Learning
-
More advanced architecture for LLM pre-training, see advanced structure part.
-
-
Post Training for LLM
-
Quantization for Model Optimization
-
Knowledge Distillation
-
Fine-tuning Techniques
- SFT
- RFT
- RLHF (Reinforcement Learning from Human Feedback)
-
LLM Evaluation
-
-
Advanced Structure for LLM
-
Sparse Attention & Lightning Attention
-
KV cache
-
Mixture of Experts (MoE)
-
LoRA: Low-Rank Adaptation of Large Language Models
-
PPO, GRPO, DPO, etc. (Deep Reinforcement Learning)
-
Test time compute for LLM (after training)
-
LLM Reasoning (CoT, ToT, etc.)
- Recommend Blog: Why we think by Lilian Weng
-
-
LLM DownStream Applications
-
This section will be recorded in the future.
-
RAG
-
LangChain Community
-
Updating Status
-
2025/07/28
: Finish two long-standing blog posts:AINN-Attention
&AINN-Transformer
- Finish tutorial for basic Attention mechanism and Transformer Structure.
Current Todo List
-
Finish the implementation code of Transformer Module in dl2ai.
-
Learning courses: Word Embedding and basic NLP knowledge.