Python厨书笔记-5

2020-01-15

1. 迭代器与生成器

可迭代特性意味着它不会立马读取所有序列

2. 底层

>>> items = [1, 2, 3]
>>> # Get the iterator
>>> it = iter(items) # Invokes items.__iter__()
>>> # Run the iterator
>>> next(it) # Invokes it.__next__()
1
>>> next(it)
2

这里的 iter() 函数的使用简化了代码， iter(s) 只是简单的通过调用 s.__iter__() 方法来返回对应的迭代器对象，就跟 len(s) 会调用 s.__len__() 原理是一样的。

3. 自己构造

构造容器

class Node:
    def __init__(self, value):
        self._value = value
        self._children = []

    def __repr__(self):
        return 'Node({!r})'.format(self._value)

    def add_child(self, node):
        self._children.append(node)

    def __iter__(self):
        return iter(self._children)
class MyNumbers:
  def __iter__(self):
    self.a = 1
    return self
 
  def __next__(self):
    x = self.a
    self.a += 1
    return x

构造函数——生成器
1
2
3
4
5
6
7
8
9
10
11
12
def frange(start, stop, increment):
x = start
while x < stop:
yield x
x += increment
>>> for n in frange(0, 2, 0.5):
... print(n)
...
0
0.5
1.0
1.5
一个函数中需要有一个 yield 语句即可将其转换为一个生成器。跟普通函数不同的是，生成器只能用于迭代操作
一个使用 yield 语句的函数或方法被称作一个 生成器函数。这样的函数在被调用时，总是返回一个可以执行函数体的迭代器对象：调用该迭代器的 iterator.__next__()方法将会导致这个函数一直运行直到它使用 yield 语句提供了一个值为止。当这个函数执行 return 语句或者执行到末尾时，将引发StopIteration异常并且这个迭代器将到达所返回的值集合的末尾

一个生成器函数主要特征是它只会回应在迭代中使用到的 next 操作。一旦生成器函数返回退出，迭代终止。我们在迭代中通常使用的for语句会自动处理这些细节，所以你无需担心

底层原理

>>> def countdown(n):
...     print('Starting to count from', n)
...     while n > 0:
...         yield n
...         n -= 1
...     print('Done!')
...

>>> # Create the generator, notice no output appears
>>> c = countdown(3)
>>> c
<generator object countdown at 0x1006a0af0>

>>> # Run to first yield and emit a value
>>> next(c)
Starting to count from 3
3

>>> # Run to the next yield
>>> next(c)
2

>>> # Run to next yield
>>> next(c)
1

>>> # Run to next yield (iteration stops)
>>> next(c)
Done!
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
StopIteration

4. 深度遍历

在一个对象上实现迭代最简单的方式是使用一个生成器函数

class Node:
    def __init__(self, value):
        self._value = value
        self._children = []

    def __repr__(self):
        return 'Node({!r})'.format(self._value)

    def add_child(self, node):
        self._children.append(node)

    def __iter__(self):
        return iter(self._children)

    def depth_first(self):
        yield self
        for c in self:
            yield from c.depth_first()

# Example
if __name__ == '__main__':
    root = Node(0)
    child1 = Node(1)
    child2 = Node(2)
    root.add_child(child1)
    root.add_child(child2)
    child1.add_child(Node(3))
    child1.add_child(Node(4))
    child2.add_child(Node(5))

    for ch in root.depth_first():
        print(ch)
    # Outputs Node(0), Node(1), Node(3), Node(4), Node(2), Node(5)

5. 反向迭代器

使用内置的 reversed() 函数
- 反向迭代仅仅当对象的大小可预先确定或者对象实现了 __reversed__() 的特殊方法时才能生效。如果两者都不符合，那你必须先将对象转换为一个列表才行
  1
  2
  3
  f = open('somefile')
  for line in reversed(list(f)):
  print(line, end='')

定义一个反向迭代器可以使得代码非常的高效，因为它不再需要将数据填充到一个列表中然后再去反向迭代这个列表

def __iter__(self):
    n = self.start
    while n > 0:
        yield n
        n -= 1
  
# Reverse iterator
def __reversed__(self):
    n = 1
    while n <= self.start:
        yield n
        n += 1

6. 带有外部状态的生成器函数

如果你想让你的生成器暴露外部状态给用户，别忘了你可以简单的将它实现为一个类，然后把生成器函数放到 __iter__() 方法中过去
如果你在迭代操作时不使用for循环语句，那么你得先调用 iter() 函数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> f = open('somefile.txt')
>>> lines = linehistory(f)
>>> next(lines)
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
TypeError: 'linehistory' object is not an iterator

>>> # Call iter() first, then start iterating
>>> it = iter(lines)
>>> next(it)
'hello world\n'
>>> next(it)
'this is a test\n'
>>>

7. 迭代器切片

itertools.islice()

>>> def count(n):
...     while True:
...         yield n
...         n += 1
...
>>> c = count(0)
>>> c[10:20]
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable

>>> # Now using islice()
>>> import itertools
>>> for x in itertools.islice(c, 10, 20):
...     print(x)

8. 跳过开头

itertools.dropwhile()

假定你在读取一个开始部分是几行注释的源文件，如果你想跳过开始部分的注释行的话，可以这样做
仅仅跳过开始部分满足测试条件的行，在那以后，所有的元素不再进行测试和过滤了
1
2
3
4
>>> from itertools import dropwhile
>>> with open('/etc/passwd') as f:
... for line in dropwhile(lambda line: line.startswith('#'), f):
... print(line, end='')
如果你已经明确知道了要跳过的元素的个数的话，那么可以使用 itertools.islice() 来代替

9. 排列组合

itertools.permutations()排列

>>> items = ['a', 'b', 'c']
>>> for p in permutations(items, 2):
...     print(p)
...
('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')

itertools.combinations()组合

>>> for c in combinations(items, 2):
...     print(c)
...
('a', 'b')
('a', 'c')
('b', 'c')

itertools.combinations_with_replacement()

>>> for c in combinations_with_replacement(items, 3):
...     print(c)
...
('a', 'a', 'a')
('a', 'a', 'b')
('a', 'a', 'c')
('a', 'b', 'b')
('a', 'b', 'c')
('a', 'c', 'c')
('b', 'b', 'b')
('b', 'b', 'c')
('b', 'c', 'c')
('c', 'c', 'c')
>>>

10. 索引值迭代

enumerate()

额外定义一个计数变量

lineno = 1
for line in f:
    # Process line
    ...
    lineno += 1
    
for lineno, line in enumerate(f):
    # Process line
    ...

解压元组

data = [ (1, 2), (3, 4), (5, 6), (7, 8) ]

# Correct!
for n, (x, y) in enumerate(data):
    ...

11. 迭代多个序列

zip()成对处理数据

zip(a, b) 会生成一个可返回元组 (x, y) 的迭代器，其中x来自a，y来自b。一旦其中某个序列到底结尾，迭代宣告结束。因此迭代长度跟参数中最短序列长度一致
1
2
3
4
5
>>> for x, y in zip(xpts, ypts):
... print(x,y)

>>> zip(a, b)
<zip object at 0x1007001b8>
虽然不常见，但是 zip() 可以接受多于两个的序列的参数

itertools.zip_longest()

取最长

>>> from itertools import zip_longest
>>> for i in zip_longest(a,b):
...     print(i)
...
(1, 'w')
(2, 'x')
(3, 'y')
(None, 'z')

12. 不同集合上元素的迭代

itertools.chain()

接受一个或多个可迭代对象作为输入参数。然后创建一个迭代器，依次连续的返回每个可迭代对象中的元素。这种方式要比先将序列合并再迭代要高效的多，省内存

>>> from itertools import chain
>>> a = [1, 2, 3, 4]
>>> b = ['x', 'y', 'z']
>>> for x in chain(a, b):
... print(x)

13. 展开嵌套的序列

from collections import Iterable

def flatten(items, ignore_types=(str, bytes)):
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, ignore_types):
            yield from flatten(x)
        else:
            yield x

items = [1, 2, [3, 4, [5, 6], 7], 8]
# Produces 1 2 3 4 5 6 7 8
for x in flatten(items):
    print(x)

isinstance(x, Iterable) 检查某个元素是否是可迭代的。如果是的话， yield from 就会返回所有子例程的值
- 例程
  可以这么简单地来理解：把一段相对独立的代码写成单独的一个模块就是函数的概念。我们可以在自己的程序中编写很多个函数，从而实现模块化编程。但这些模块或者说函数并不一定向外输出（即提供给别的程序使用），只用于当前这个程序里面。此时这些函数就仅仅具有独立函数的意义，但不是例程
  如果我们把这些函数编写为DLL动态库的输出函数的话，此时虽然对于编写这个DLL的程序员来讲，仍然可以用函数的概念来理解这些DLL提供的功能，但对于以后调用这个DLL的程序来说，DLL里面提供的输出函数（或者说服务）就是例程了。因此“例程”的基本概念就包含了“例行事务性子程序”的含义

14. 顺序迭代合并后的排序迭代对象

heapq.merge()

>>> import heapq
>>> a = [1, 4, 7, 10]
>>> b = [2, 5, 6, 11]
>>> for c in heapq.merge(a, b):
...     print(c)
...
1
2
4
5
6
7
10
11

15. 代替while循环

def reader(s):
    while True:
        data = s.recv(CHUNKSIZE)
        if data == b'':
            break
        process_data(data)
        
def reader2(s):
    for chunk in iter(lambda: s.recv(CHUNKSIZE), b''):
        pass
        # process_data(data)
        
>>> import sys
>>> f = open('/etc/passwd')
>>> for chunk in iter(lambda: f.read(10), ''):
...     n = sys.stdout.write(chunk)
...

iter 函数一个鲜为人知的特性是它接受一个可选的 callable 对象和一个标记(结尾)值作为输入参数。当以这种方式使用的时候，它会创建一个迭代器，这个迭代器会不断调用 callable 对象直到返回值和标记值相等为止。