- search: `twarc.py --search ferguson > tweets.json`
- stream: `twarc.py --stream ferguson > tweets.json`
- hydrate: `twarc.py --hydrate ids.txt > tweets.json`
Notice that twarc no longer decides what filename to use, and attempt to pick up where it once left off by reading the last tweet id from a previous file. The reason for this is that this functionality predated the ability to stream directly. twarc.py now just writes line oriented JSON to stdout, which you can send where you want including potentially compressing it:
twarc.py --search ferguson | gzip - > tweets.json.gz
The three command line modes map directly on to the programmatic usage. You first create a `Twarc` instance and then call `search`, `stream` and `hydrate` methods:
python
from twarc import Twarc
t = Twarc()
for tweet in t.search('ferguson'):
print tweet
for tweet in t.stream('ferguson'):
print tweet
for tweet in t.hydrate(open('ids.txt')):
print tweet
The nice thing about these changes is that they have consolidated and simplified the rate limiting logic, and have removed about 1/3 of the code base. Please give it a try and let us know how it goes!