Blog Word Counter
Blog Word Counter - README
- The github repo: Github Repo
- You can see the current updating status via here!
Overview
The Blog Word Counter is a Python utility designed to help bloggers and content creators track their writing progress by counting words in Markdown (.md) files. This tool is particularly useful for:
- Tracking writing milestones
- Measuring content production over time
- Analyzing writing habits
- Maintaining consistency in blog post lengths
The tool handles both English and Chinese content, providing accurate word counts for multilingual bloggers. It generates detailed logs of word counts per article and summary statistics of your entire blog repository.
Features
- Bilingual Support: Accurately counts words in both English and Chinese
- Directory Scanning: Processes all Markdown files in a directory recursively
- Progress Tracking: Maintains historical logs of your word counts
- Statistics Generation: Provides total article count and aggregate word count
- JSON Output: Produces machine-readable output for further analysis
Installation
Just clone the directory via:
1 |
|
Usage
By default, the script looks for Markdown files in /mnt/d/Blog/source/_posts
. You need to replace it with your own directory.
run the scripts via:
1 |
|
Expected Output
1 |
|
Output Files
The script generates two types of output files:
- Individual Article Logs: Timestamped JSON files in the
log/
directory containing word counts for each article - Summary Statistics: A
total.json
file with aggregate statistics
Code Explanation
Core Functions
1. count_words(text: str = "")
This function handles the actual word counting logic.
Key Features:
- Uses regular expressions to identify Chinese characters (
[\u4e00-\u9fff]
) - Counts English words using word boundary detection (
\b[a-zA-Z]+\b
) - Combines counts for bilingual content
Implementation Details:
1 |
|
2. count_words_in_directory(directory: str)
Processes all Markdown files in a directory recursively.
Key Features:
- Walks through directory tree using
os.walk()
- Filters for
.md
files - Returns a list of dictionaries with filename and word count
Implementation Details:
1 |
|
3. Logging and Statistics Functions
get_log_filename(log_dir="log")
- Creates timestamped log filenames
- Ensures log directory exists
save_to_json(blog_word_counts)
- Saves individual article counts to JSON
do_calculate(blog_word_counts)
- Computes total articles and words
- Includes timestamp in output
Main Execution Flow
- Directory scanning and word counting
- Log file generation
- Summary statistics calculation
- Output file creation
Example Outputs
Individual Article Log (log/log_20250421_190058.txt)
1 |
|
Summary Statistics (total.json)
1 |
|