Input:
#!usr/bin/env/python
from itertools import *
r=islice(count(),5)
iterator_1,iterator_2=tee(r)
print "iterator_1,iterator_2:",list(iterator_1),list(iterator_2)
#you can create n number of iterators,but default is set to tw0 iterators.
print"\nExample 2:"
r_2=islice(count(),5)
i1,i2=tee(r_2)
print 'r_2:',
for i in r_2:
print i,
if i>1:
break
print
print'i1,i2:',list(i1),list(i2)
Output:
iterator_1,iterator_2: [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
Example 2:
r_2: 0 1 2
i1,i2: [3, 4] [3, 4]
Itertools is a great library, but some methods definitely receive more attention than others – for instance, I’d wager that
chain
is a lot more well-known than
tee
. Python’s documentation describes
tee
as follows:
Return n independent iterators from a single iterable.
It also adds this caveat:
Once tee() has made a split, the original iterable should not be used
anywhere else; otherwise, the iterable could get advanced without the
tee objects being informed.(Example-2 is Illustrated).
Caution:Don't read the rest....you may feel boring.Learn about Iterators,Generators and come back.
More on this is covered in this url:
http://jezng.com/2012/06/inside-python-tee/
However, the documentation doesn’t really explain
why you might want to use
tee
, instead of copying the iterable
n
times. The short answer is that
tee()
uses some heuristics and strategies to make the generation of these
n
iterators more memory efficient. This article will peek inside Python’s internals and explore these strategies in more detail.
Iterables, Iterators, and the Iterator Protocol
To understand the basis for the optimizations, it is essential to
know how Python’s iterator protocol works. (More experienced Pythonistas
might want to skim this section – it’s here for completeness.)
Iterables are Python objects that define an
__iter__()
method, which, when called, returns an iterator.
Iterators define the
__iter__()
method as well, but implement it by simply returning the iterator itself. Iterators also implement a
next()
method, which returns the next element in a sequence. To signal the end of a sequence,
next()
raises a
StopIteration
exception. Any object that implements both
__iter__
and
next
in the aforementioned manner is said to implement the
iterator protocol.
To actually use these iterators and iterables, we turn to Python’s classic
for item in sequence
loop. Here,
sequence
can be either an iterator or an iterable – it doesn’t matter, because the loop will begin by implicitly invoking
seq
’s
__iter__
method, which will return an iterator. Now, on each round through the loop, the Python interpreter will invoke the iterator’s
next
method and assign it to
item
, and will continue to do so until
next
raises
StopIteration
.
Perhaps you are wondering why Python makes the distinction between iterables and iterators, and why iterators don’t have
prev
or
rewind
methods. These two questions are in fact related. It can be expensive
or complicated to implement these other methods – for instance, reading
sequentially forwards from a file corresponds simply to
fread()
in C, but rewinding to the start is a more expensive
fseek
call, and there is no efficient way to read sequentially in reverse.
More importantly,
prev
and
rewind
are not essential for iteration – they can be built upon the primitive
next
method. For instance, if we really wish to access the earlier contents
of the file in Python, we could buffer the earlier results ourselves.
However, buffering all the earlier values is generally inefficient and
wasteful: many programs have no need to access earlier values of an
iterator, and those that do tend to limit themselves to one or two
elements before the current one. Rewinding an iterable is a more common
use case, but the same thing can be achieved by obtaining an entirely
new iterator that points to the start of the sequence. To do that, we
need a factory that produces iterators – which is exactly what an
iterable is.
In sum, Python’s iterator protocol is simple to implement, and it works well for the common use case of
for .. in
loops.
Sometimes Buffering is Useful
The starkness of Python’s iterator design means that more complicated
use cases will need to build their own abstractions on top of it.
Fortunately, its simplicity also means that it is easy to build upon.
tee()
is one such abstraction, built to efficiently create
n
independent iterators. Consider file I/O again: we might not want to create
n
file iterations by calling
fopen()
over and over, especially if
n
is large. Nor would we want to copy iterators that do heavy computation each time we called
next()
. In both cases buffering is a better solution, and this is what
tee
does, collating in one central list the values returned by the original iterator. Its return value consists of
n
tee
iterator objects,
1
each of which stores a single integer index that indicates the next
element it should return from the list. The list itself is populated
lazily – whenever any of the
n
iterators asks for a value at an index that is not yet in the buffer,
tee()
calls
next
on the underlying iterator, then caches and returns the new value.
With this implementation detail in mind, the reasoning behind the
documentation’s caveat becomes clearer – using the iterator outside of
tee()
would cause some (or all) values to be lost from the cache.
Sometimes Buffering is Not Useful
Buffering is not always the most efficient way to create
n
independent iterators. For example,
next
could be a computationally cheap function, and it would be more efficient for us to just copy it
n
times. We can indicate this fact by defining a
__copy__
method on our iterator.
2 If
tee()
detects the presence of
__copy__
, it will copy the iterator instead of doing buffering. Moreover, it will only do this copying
n - 1
times – the last iterator will simply be the original one that was
passed in! Since the original iterator should never be used after being
passed to
tee()
, this makes perfect sense; the end result will appear the same to the library consumer.
In fact,
tee
objects themselves implement
__copy__
! Since any
tee
object is backed by a central buffer, the most efficient way to
duplicate one is to have the copy point to the same buffer. So not only
does
tee
duplicate any iterator efficiently, it also ensures that future duplications are efficient.
Notably, as a consequence of this optimization, passing in a non-
tee
object that implements
__copy__
will return a tuple of copies of that object,
not tee
objects. In practice, the type of the object should not matter as long as one sticks to using it solely as an iterator.
Efficient Buffering: The gritty details
If the total number of items generated by the iterator is very large, storing
all of the generated values might take up a lot of memory. If we can keep track of all the indices of the
tee
objects generated by a
tee()
function call, we can delete old values once all of the indices have
passed it. (Since iterators can only progress forwards, we know that
these values will never be used again.) With CPython’s reference
counting, this is actually quite simple: the buffer can be implemented
as a singly linked list, with each
tee
object holding a pointer to the next element that it will return. As a natural consequence, once the last
tee
iterator is done with a buffer element, the element’s reference count goes to zero and it gets garbage collected.
However, linked lists perform worse than arrays in terms of memory
fragmentation. Moreover, they incur lots of overhead in memory
allocations and deallocations. As a compromise, CPython uses linked
arrays – a linked list with arrays as individual elements. Moreover,
these arrays are sized such that the entire array element is 64 bits in
size, fitting nicely into cache lines.
A Python Implementation
Python’s documentation gives pure-Python implementation of
tee()
,
but it simplifies things and leaves out the buffering and copy
semantics. I’ve written up a slightly more involved one that reflects
the optimizations described above. Unlike the version in the docs, this
implementation passes the standard library’s test suite. It still makes
some concessions to simplicity: for instance, buffering uses a single
list, instead of the linked arrays described above.
class TeeIterator(object):
def __new__(cls, tee_data):
if isinstance(tee_data, TeeData):
self = super(TeeIterator, cls).__new__(cls)
self.tee_data = tee_data
else:
self = TeeIterator.from_iterable(tee_data)
self.index = 0
return self
def __copy__(self):
return TeeIterator(self.tee_data)
def __iter__(self):
return self
def next(self):
rv = self.tee_data[self.index]
self.index += 1
return rv
@classmethod
def from_iterable(cls, iterable):
if isinstance(iterable, cls):
return iterable.__copy__()
tee_data = TeeData(iter(iterable))
return TeeIterator(tee_data)
class TeeData(object):
def __init__(self, iterator):
self.iterator = iterator
self.buffer = []
def __getitem__(self, index):
if index == len(self.buffer):
self.buffer.append(next(self.iterator))
return self.buffer[index]
def tee(iterable, n=2):
if n < 0:
raise ValueError("n must be >= 0")
elif n == 0:
return ()
if hasattr(iterable, '__copy__'):
copyable = iterable
else:
copyable = TeeIterator.from_iterable(iterable)
result = [ copyable ]
for i in xrange(1, n):
result.append(copyable.__copy__())
return tuple(result)
http://code.activestate.com/recipes/305588-simple-example-to-show-off-itertoolstee/
Learn python for fun.The popular blog with questions and answers to the python.Solutions to facebookhackercup,codejam,codechef.The fun way to learn python with me.Building some cool apps.