LLM Learning Initial

Welcome to the world of Large Language Models

For this summer vacation, I will begin the learning process for the fundamental principle and downstream techniques and applications for LLM (Large Language Models). I hope to find the journey interesting and fruitful!

All the code will be refactored and categorized into LLM Learn

LLM Learning Materials

Courses and Videos
- Post Training by DeepLearning Ai
- CS25: Transformers United V5: For advanced architectures for transformers.
- Transformers for 3b1b: A good visualize demo!
- Advanced Natural Language Processing: A very good course for detailed lecture notes and homework. (I will try to finish that project in the next semester.)
- LLMs and Transformers: with several discussion topics (lecture notes, blogs and papers are included)
- Dive into LLMs: The chinese version of learning large language models
- Learning Large Language Models from scratch: The course for learning LLMs in stanford.
- Large Language Models for DataWhale: Courses of Chinese version.
Several books and codes:
- Hands on Large Language Models (English Version)
  - Original English Version: Hands on Large Language Models
  - https://www.llm-book.com/
- Hands on Large Language Models (CHN Version)
  - Source Code
- Build a Large Language Model (From Scratch)
  - Github Repo
  - Additional Technical Blog
Projects:
- LLM Hero to Zero:
  - Build a simple GPT from scratch.
  - LLM from hero to zero, karpathy version
    - Source Code
    - Lectures Videos
  - LLM from hero to zero, CHN version
    - Source Code
Tool Usage
- HuggingFace for downloading models and datasets
- vllm

LLM Learning Contents

Basic architecture for Large Language Models
- Attention Mechanism (Attention is all you need!) ✅
- RNN, LSTM, GRU (will be covered in the future)
- Seq2Seq Model ✅
- Transformer Architecture ✅
- Other basic NLP knowledge (word embeddings, etc.)
Pre Training for LLM
- Loading Datasets
- Self-supervised Learning
- More advanced architecture for LLM pre-training, see advanced structure part.
Post Training for LLM
- Quantization for Model Optimization
- Knowledge Distillation
- Fine-tuning Techniques
  - SFT
  - RFT
  - RLHF (Reinforcement Learning from Human Feedback)
- LLM Evaluation
Advanced Structure for LLM
- Advanced Transformer
- Sparse Attention & Lightning Attention
- KV cache
- Mixture of Experts (MoE)
  - MoE Introduction
  - MoE Advanced
- LoRA: Low-Rank Adaptation of Large Language Models
- PPO, GRPO, DPO, etc. (Deep Reinforcement Learning)
Test time compute for LLM (after training)
- LLM Reasoning (CoT, ToT, etc.)
  - Recommend Blog: Why we think by Lilian Weng
LLM DownStream Applications
- This section will be recorded in the future.
- RAG
- LangChain Community

Updating Status

2025/07/28: Finish two long-standing blog posts: AINN-Attention & AINN-Transformer
- Finish tutorial for basic Attention mechanism and Transformer Structure.
2025/08/23: Finish the first three course for CS336 Building large language models from scratch
- Lecture1: The course overview & word embeddings
- Lecture2: resource accounting & Pytorch basics (some fancy techniques)
- Lecture3: Different variants on transformer architectures, some tricks on hyperparameter and stable training
- Learning Plan adjusting!: For CS 336 is a bit hard for advanced techniques, we will first learn some lessons on CS224n, focusing on training.
  - Unofficial Course Website: Course on bilibili
  - Week5: Pre-Training
  - Week6: Post-Training
  - Optional: Hugging Face Tutorial
  - Optional: MultiModel DeepLearning
  - Optional: Model Interpretability & Editing

Current Todo List

Finish the implementation code of Transformer Module in dl2ai.
Learning courses: Word Embedding and basic NLP knowledge.

Artificial Intelligence > LLM

#Tutorial #LLM

LLM Learning Initial

https://xiyuanyang-code.github.io/posts/LLM-Learning-Initial/

Author

Xiyuan Yang

Posted on

July 27, 2025

Updated on

August 23, 2025

Licensed under

CLI Roadmap Previous

Python Scraping Tutorial Next