Python Pythonic
Python Advanced Programming
Introduction
简单来说,Pythonic不仅仅指编写出能够运行的 Python 代码,更深层次地,它代表了一种符合 Python 语言哲学和最佳实践的编程风格。简单来说,Pythonic 的代码就是“用 Python 的思维方式”写出来的代码。
它强调代码的可读性、简洁性、明确性以及效率。
Table of Contents
-
Meta Programming
-
Duck Typing
-
Callable & Iterable
-
Generator
-
Functional Programming & FCC (First Class Citizen)
Meta Programming
元编程(Meta Programming) 是一种编程技术,允许程序在运行时动态地生成、分析或修改代码结构,甚至改变自身的行为。它的核心思想是“编写能够操作代码的代码”,从而提升抽象层次,减少重复工作,增强灵活性。
Core Principles
- 代码即数据(Code as Data)
将代码视为可操作的数据结构(如字符串、抽象语法树/AST),允许程序像处理普通数据一样生成或修改代码。 - 运行时与编译时
- 编译时元编程:在编译阶段生成或转换代码(如宏、模板)。
- 运行时元编程:在程序运行时动态修改行为(如反射、动态方法生成)。
- 自省(Introspection)
程序能够分析自身的结构(如获取类的方法、检查参数类型)。 - 反射(Reflection)
在运行时动态调用或修改代码(如通过类名实例化对象、动态调用方法)。
这么说貌似有点抽象了!我们来看一个具体的例子:
Hello world!
1 |
|
程序将会输出Hello, world
。exec()
是 Python 中用来执行 Python 代码字符串的(比如 exec("print(1+1)")
)。
这就是Meta Programming中Code as Data的思想,即编写操作代码的代码。
如果这样想的话,各种脚本文件(bash,Dockerfile,CMakelists)不都是Code as Data的思想吗?
当然,你也可以使用os
库实现更加高级的操作:
1 |
|
在有了最基本的感知之后,我们来一条一条分析Meta Programming的基本思想。
编译时 & 运行时元编程
其实C++中的宏就是一种编译时元编程。即在编译的阶段将#define
的东西转换进程序中然后开始编译。
当然这不止于C++,例如Rust中的println!()
就是宏而不是函数。
Decorators
下面本文将要介绍装饰器(Decorators)。
语法糖(英语:Syntactic sugar)是由英国计算机科学家彼得·兰丁发明的一个术语,指计算机语言中添加的某种语法,这种语法对语言的功能没有影响,但是更方便程序员使用。语法糖让程序更加简洁,有更高的可读性。语法糖通常是常见操作的简写,也可以用另一种更冗长的形式表达:程序员可以选择使用较短的形式还是较长的形式,但通常会使用较短的形式,因为它更短,更容易输入和阅读。
装饰器是Python中一种强大的语法糖,允许在不修改原函数代码的情况下扩展或修改函数的行为。例如运行下面的代码:
1 |
|
程序输出如下:
1 |
|
可以看到,装饰器在不改变原函数的情况下实现了对函数的包裹,即内部的wrapper()
,这样相当于函数控制函数,同时各函数之间的代码相互独立。
我们来看两个更加复杂的装饰器:
- Time Decorator
1 |
|
程序的输出结果:
1 |
|
- A more advanced time decorator!
1 |
|
This decorator is specially designed for class method’s usage, rather than a simple function.
- Decorator for type check (Reflection for Meta Programming)
因为Python是一种弱类型的编程语言,我们可以在写一个decorator来实现简单的语法类型检查。
*args
代表接受任意的关键字参数,而**kwargs
可以接受任意的位置参数。
1 |
|
Python的类型审查相对比较松散,因此通过类型审查的decorator可以使Python函数的编写变的更加安全。
1 |
|
- Decorators with parameters
1 |
|
Applications
Make & Makefile
Linux中有自带的make
工具,可以实现编译的脚本自动化,和.sh
非常的类似,但是相比于Bash脚本更加的高级。
- Makefile:主要用于构建和管理项目,特别是编译型语言的编译过程
- Bash脚本:通用的自动化脚本工具,用于执行各种系统任务
Makefile很大的优势是他会最小化编译的成本,例如如果没有任何的源代码或者dependencies发生改变,Makefile不会重新编译这部分的dependencies。同时,Makefile有自己的语法框架,相比于灵活的bash脚本更加的规整一点。
例如,在编译C++程序的时候:
1 |
|
Makefile要求代码中要给出目标(target)、依赖(dependencies)和命令(commands)。例如在语句:
1 |
|
program
就是最后编译需要生成的binary files,而main.o utils.o
就是这一个语句需要使用的dependencies,然后在第二行就使用特定的命令行语句来实现。
这一点相对于Bash脚本来说更规整,尤其对于C++这种需要编译语言。(不过Cmakelists才是神中神)
我们再来举一个例子:
1 |
|
运行make
之后,程序的输出结果如下:
1 |
|
不过谁会这样使用Makefile呢?🤣不如直接写bash
Duck Typing
Introduction to Duck Typing
Duck Typing is a programming concept often used in dynamically-typed languages, where the type or class of an object is determined by its behavior (methods and properties) rather than its explicit inheritance or interface implementation. The term originates from the phrase:
“If it walks like a duck and quacks like a duck, then it must be a duck.”
In essence, Duck Typing focuses on what an object can do rather than what it is. This approach provides flexibility and promotes code reusability, but it also requires careful handling to avoid runtime errors due to missing methods or properties.
Key Characteristics of Duck Typing
-
Behavior Over Type:
- An object’s suitability is determined by the presence of specific methods or properties, not by its class or type.
- For example, if an object has a
quack()
method and awalk()
method, it can be treated as a “duck” regardless of its actual class.
-
Flexibility:
- Code can work with any object that implements the required behavior, making it highly adaptable and reusable.
- This eliminates the need for strict type hierarchies or interfaces.
-
Common in Dynamic Languages:
- Duck Typing is frequently used in dynamically-typed languages like Python, Ruby, and JavaScript, where type checking is done at runtime rather than compile time.
Example
Below is an example that demonstrates Duck Typing using a Duck
class, a Person
class, and even a Chicken
class. The key idea is that any object with the required methods (quack()
and walk()
) can be treated as a “duck.”
1 |
|
Explanation of the Code
-
Duck Typing in Action:
- The
check_if_duck
function does not check the type of theanimal
object. Instead, it checks whether the object has thequack()
andwalk()
methods. - If the object has these methods, it is treated as a “duck,” regardless of whether it is an instance of
Duck
,Person
, or evenChicken
.
- The
-
Flexibility and Reusability:
- The
Chicken
class is not a duck, but it can still be treated as one because it implements the required methods. - This demonstrates how Duck Typing allows objects of different types to be used interchangeably, as long as they exhibit the expected behavior.
- The
-
Runtime Risks:
- If an object passed to
check_if_duck
does not have the required methods (e.g., aDog
class withoutquack()
andwalk()
), the program will raise anAttributeError
at runtime.
- If an object passed to
Iterable
Iterator
In Python, an iterable is an object that can be iterated over. More technically, an iterator is an object which implements the iterator protocol, which consist of the methods __iter__()
and __next__()
. An iterator is an object that can be iterated upon, meaning that you can traverse through all the values.
-
__iter__()
: init an iterator -
__next__()
: get the next element of the iterator.
For examples:
-
Lists, tuples and Sets and are all iterable objects!
-
For custom classes, you need to implement yourself!
-
They are iterable containers which you can get an iterator from.
-
All these objects have an
iter()
method to get an iterator.
Traditional Index & Generalized Index
Traditional Index allows you to directly access an element at a specific position using an integer, like my_list[0]
or my_string[2]
.
However, in generalized iterable object, we cannot access to an element at a specific position, (就像字符串和链表的区别,为了内存友好,对于较大的迭代器对象不可能将一整段存储全部读入,但是迭代器的设置只需要读入一个单个元素就可以)
for 循环通常和迭代器搭配使用。或者说for循环的本质,就是调用
__iter__()
和__next__()
方法的过程。
1 |
|
1 |
|
1 |
|
1 |
|
迭代器的迭代器是他本身:当一个对象是本身就是迭代器时,它的 __iter__()
方法(也就是 iter() 函数在后台调用的方法)被设计成直接返回 self
(也就是它自己)。
这保证了无法重置迭代器以及惰性求值。并且,迭代器的消耗是一个不可逆过程。
1 |
|
对于自定义的类和对象,需要完成__iter__()
和 __next__()
的方法实现,之后就可以通过for循环进行调用:
1 |
|
Iterable
如果一个类型是iterable的,那其就可以实现很多更加高级的操作。
- Unpacking
1 |
|
Something more interesting…
1 |
|
-
Other methods
find()
andjoin()
instr
merge()
in List & Dictupdate()
in dict
itertools
itertools: Functions creating iterators for efficient looping
This module implements a number of iterator building blocks inspired by constructs from APL, Haskell, and SML. Each has been recast in a form suitable for Python.
The module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an “iterator algebra” making it possible to construct specialized tools succinctly and efficiently in pure Python.
See itertools for more info.
product()
: generating a list for cartesian product, equivalent to a nested for-loop.
1 |
|
permutations(p[,r])
: r-length tuples, all possible orderings, no repeated elements
1 |
|
combinations(p, r)
: r-length tuples, in sorted order, no repeated elements
1 |
|
combinations_with_replacement
: r-length tuples, in sorted order, with repeated elements.
collection
Container Datatype | Usage Description |
---|---|
namedtuple() |
Factory function for creating tuple subclasses with named fields. |
deque |
List-like container with fast appends and pops on either end. |
ChainMap |
Dict-like class for creating a single view of multiple mappings. |
Counter |
Dict subclass for counting hashable objects. |
OrderedDict |
Dict subclass that remembers the order entries were added. |
defaultdict |
Dict subclass that calls a factory function to supply missing values. |
UserDict |
Wrapper around dictionary objects for easier dict subclassing. |
UserList |
Wrapper around list objects for easier list subclassing. |
UserString |
Wrapper around string objects for easier string subclassing. |
For more info, you can visit: Collections
Structural Pattern Matching (PEP 636)
Source Code: PEP 636
First, we can have a small demo, where split()
is a string function which returns a list of the substrings in the string, using sep as the separator string.
1 |
|
The problem with that line of code is that it’s missing something: what if the user types more or fewer than 2 words? To prevent this problem you can either check the length of the list of words, or capture the ValueError that the statement above would raise. Or, we can use a string matching instead!
The match statement evaluates the “subject” (the value after the match keyword), and checks it against the pattern (the code next to case). A pattern is able to do two different things:
Verify that the subject has certain structure. In your case, the [action, obj]
pattern matches any sequence of exactly two elements. This is called matching.
It will bind some names in the pattern to component elements of your subject. In this case, if the list has two elements, it will bind action = subject[0]
and obj = subject[1]
.
If there’s a match, the statements inside the case block will be executed with the bound variables. If there’s no match, nothing happens and the statement after match is executed next.
- For example, if I use
match()
in previous situation:
1 |
|
It is much more convenient and easier than switch
in C++!
- Or you can match specific value:
1 |
|
- Or patterns:
1 |
|
1 |
|
- Adding conditions for simple if judgement
1 |
|
Generator
迭代器和可迭代属性是Python抽象编程中非常重要的属性,他不仅和Python中常见的数据结构相耦合(列表,元组,字典等),并且灵活性高,也支持自定义类型。下面,我们重点关注一类built-in的可迭代数据结构:列表及其高阶玩法。
List Comprehension
请看下面的demo:
1 |
|
-
A list comprehension consists of brackets [ ] containing an expression followed by a for clause, then zero
or more for or if clauses. -
Format:
-
[expression for target in iterable if condition]
: single loop -
[expression for target1 in iterable1 if condition1 for target2 in iterable2 if condition2 .. for targetN in iterableN if conditionN]
: multiple loop
-
-
The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it.
List is Fast
CPython 是 Python 最常用的解释器,它是用 C 语言实现的。这意味着 Python 代码在执行时,很多底层操作实际上是通过 C 语言代码完成的。C 语言的执行速度远快于 Python 本身。当你编写纯粹的 Python 循环或逻辑时,解释器需要执行大量的字节码操作,这些操作本身就带有 Python 层面的开销。但是,像列表推导式、内置函数(如 sum()、map()、filter() 等)以及某些列表方法(如 append()、sort() 等)在 CPython 内部都有高度优化的 C 语言实现。当你的代码调用这些 C 实现的部分时,速度会大大提升。
-
能用系统操作就不要用for循环,Python已经优化好了
-
能用列表推导 & 切片等操作就不要用for循环
SIMD 是一种处理器架构或指令集,允许 CPU 在单个指令周期内同时处理多个数据点。例如,如果你要对一个数组中的每个元素进行相同的加法操作,SIMD 可以在一次操作中同时对多个元素执行加法,而不是一个一个地处理。
Generator Functions
使用range()
而不是list
生成数据就是为了减少存储,只在需要的时候生成即可。
-
Generator functions (available since 2.3) are coded as normal def statements, but use yield statements to return results one at a time, suspending and resuming their state between each. 当一个函数包含 yield 关键字时,它就不再是一个普通的函数,而变成了生成器函数。调用生成器函数不会立即执行函数体内的代码,而是返回一个生成器对象(Generator Object)。
-
Generator expressions (available since 2.4) are similar to the list comprehensions of the prior section, but they return an object that produces results on demand instead of building a result list.
-
函数执行到 yield 后暂停,并返回一个值。
-
函数的状态(包括局部变量)会被保存下来。
-
下次调用 next() 时,函数会从上次暂停的地方继续执行。
-
一个生成器函数可以包含多个 yield 表达式,每次调用 next() 返回一个不同的值。
-
1 |
|
1 |
|
Generator Expressions
Syntactically, generator expressions are just like normal list comprehensions, and support all their syntax, including if filters and loop nesting, but they are enclosed in parentheses instead of square brackets (like tuples, their enclosing parentheses are often optional)
1 |
|
当你把一个生成器对象作为参数传递给 list() 构造函数时,list() 会完全消耗这个生成器。它会从生成器中取出所有可用的值,直到生成器耗尽(即抛出 StopIteration 异常),然后用这些值构建一个完整的列表。一旦生成器被消耗,它就不能再次生成值了,因为它是一个一次性迭代器。
Functional Programming
函数式编程(Functional Programming,简称 FP)是一种编程范式,它将计算视为数学函数的求值,并强调避免状态改变和可变数据。
-
纯函数(Pure Functions):
-
相同输入,相同输出: 每次给定相同的输入,函数总是产生相同的输出。
-
无副作用: 函数不会修改其外部的任何数据(比如全局变量)或对外部产生任何可见的影响(比如打印到控制台、修改文件等)。
-
-
不可变性(Immutability):
- 数据一旦创建就不能被修改。如果需要改变数据,你会创建一个新的数据副本,而不是原地修改。
-
函数是一等公民(First-Class Functions):
-
函数可以像任何其他数据类型(如数字、字符串)一样被处理。
-
这意味着函数可以作为参数传递给其他函数,可以作为其他函数的返回值,也可以赋值给变量。
-
-
避免共享状态(No Shared State):
- 程序中的不同部分不会共享和修改同一个可变状态。这使得代码更模块化,减少了并发编程中的复杂性。
See more in Functional Programming
LEGB
LEGB is an acronym that stands for:
-
L - Local: The innermost scope, where names defined inside the current function are found. Variables created within a function (not declared
global
ornonlocal
) live here. -
E - Enclosing (Nonlocal): The scope of any outer functions that wrap the current function. If you have nested functions, names in the immediate parent function’s scope are considered “enclosing.”
-
G - Global: The module-level scope. Names defined outside of any function at the top level of a Python file (script) are global.
-
B - Built-in: The broadest scope, containing names pre-defined in Python, like
print()
,len()
,True
,False
,None
,list()
, etc.
When Python tries to find the definition of a name (like a variable, function, or class) in your code, it follows this specific order:
- First, it looks in the
Local
scope. If it finds the name there, it uses it. - If not found locally, it moves to the
Enclosing
scope(s). It searches outwards through any nested functions. - If still not found, it checks the
Global
scope. - Finally, if the name isn’t found anywhere else, it checks the
Built-in
scope.
If the name is not found in any of these four scopes, Python raises a NameError
.
1 |
|
variable x
and y
is local to outer()
but enclosing to inner()
.
The nonlocal
keyword is used to work with variables inside nested functions, where the variable should not belong to the inner function.