Mrjob

Latest version: v0.7.4

Safety actively analyzes 681912 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 8 of 10

0.3.2

* Docs:
* 'Testing with mrjob' section in docs (includes 321)
* MRJobRunner.counters() included in docs (321)
* terminate_idle_job_flows is spelled correctly in docs (339)
* Running jobs:
* local mode:
* Allow non-string jobconf values again (this changed in v0.3.0)
* Don't split *.gz files (333)
* emr mode:
* Spot instance support via ec2_*_instance_bid_price and renamed instance
type/number options (219)
* ami_version option to allow switching between EMR AMIs (306)
* 'Error while reading from input file' displays correct file (358)
* python_bin used for bootstrap_python_packages instead of just 'python'
(355)
* Pooling works with bootstrap_mrjob=False (347)
* Pooling makes sure a job flow has space for the new job before joining
it (324)
* EMR tools:
* create_job_flow no longer tries to use an option that does not exist
(349)
* report_long_jobs tool alerts on jobs that have run for more than X hours
(345)
* mrboss no longer spells stderr 'stsderr'
* terminate_idle_job_flows counts jobs with pending (but not running)
steps as idle (365)
* terminate_idle_job_flows can terminate job flows near the end of a
billable hour (319)
* audit_usage breaks down job flows by pool (239)
* Various tools (e.g. audit_usage) get list of job flows correctly (346)

0.3.1

* Instance-type command-line arguments always override mrjob.conf (Issue 311)
* Fixed crash in mrjob.tools.emr.audit_usage (Issue 315)
* Tests now use unittest; python setup.py test now works (Issue 292)

0.3.0

* Configuration:
* Saner mrjob.conf locations (Issue 97):
* ~/.mrjob is deprecated in favor of ~/.mrjob.conf
* searching in PYTHONPATH is deprecated
* MRJOB_CONF environment variable for custom paths
* Defining Jobs (MRJob):
* Combiner support (Issue 74)
* *_init() and *_final() methods for mappers, combiners, and reducers
(Issue 124)
* mapper/combiner/reducer methods no longer need to contain a yield
statement if they emit no data
* Protocols:
* Protocols can be anything with read() and write() methods, and are
instances by default (Issue 229)
* Set protocols with the *_PROTOCOL attributes or by re-defining the
*_protocol() methods
* Built-in protocol classes cache the encoded and decoded value of the
last key for faster decoding during reducing (Issue 230)
* --*protocol switches and aliases are deprecated (Issue 106)
* Set Hadoop formats with HADOOP_*_FORMAT attributes or the hadoop_*_format()
methods (Issue 241)
* --hadoop-*-format switches are deprecated
* Hadoop formats can no longer be set from mrjob.conf
* Set jobconf with JOBCONF attribute or the jobconf() method (in addition
to --jobconf)
* Set Hadoop partitioner class with --partitioner, PARTITIONER, or
partitioner() (Issue 6)
* Custom option parsing (Issue 172)
* Use mrjob.compat.get_jobconf_value() to get jobconf values from environment
* Running jobs:
* All modes:
* All runners are Hadoop-version aware and use the correct jobconf and
combiner invocation styles (Issue 111)
* All types of URIs can be passed through to Hadoop (Issue 53)
* Speed up steps with no mapper by using cat (Issue 5)
* Stream compressed files with cat() method (Issue 17)
* hadoop_bin, python_bin, and ssh_bin can now all take switches (Issue 96)
* job_name_prefix option is gone (was deprecated)
* Better cleanup (Issue 10):
* Separate cleanup_on_failure option
* More granular cleanup options
* Cleaner handling of passthrough options (Issue 32)
* emr mode:
* job flow pooling (Issue 26)
* vastly improved log fetching via SSH (Issue 2)
* New tool: mrjob.tools.emr.fetch_logs
* default Hadoop version on EMR is 0.20 (was 0.18)
* ec2_instance_type option now only sets instance type for slave nodes
when there are multiple EC2 instances (Issue 66)
* New tool: mrjob.tools.emr.mrboss for running commands on all nodes and
saving output locally
* inline mode:
* Supports cmdenv (Issue 136)
* Passthrough options can now affect steps list (Issue 301)
* local mode:
* Runs 2 mappers and 2 reducers in parallel by default (Issue 228)
* Preliminary Hadoop simulation for some jobconf variables (Issue 86)
* Misc:
* boto 2.0+ is now required (Issue 92)
* Removed debian packaging (should be handled separately)

0.2.8

* Fix log parsing crash dealing with timeout errors
* Make mr_travelling_salesman.py work with simplejson
* Add emr_additional_info option, to support EMR beta features
* Remove debian packaging (should be handled separately)
* Fix crash when creating tmp bucket for job in us-east-1

0.2.7

* All runner options can be set from the command line (Issue 121)
* Including for mrjob.tools.emr.create_job_flow (Issue 142)
* New EMR options:
* availability_zone (Issue 72)
* bootstrap_actions (Issue 69)
* enable_emr_debugging (Issue 133)
* Read counters from EMR log files (Issue 134)
* Clean old files out of S3 with mrjob.tools.emr.s3_tmpwatch (Issue 9)
* EMR parses and reports job failure due to steps timing out (Issue 15)
* EMR bootstrap files are no longer made public on S3 (Issue 70)
* mrjob.tools.emr.terminate_idle_job_flows handles custom hadoop streaming
jars correctly (Issue 116)
* LocalMRJobRunner separates out counters by step (Issue 28)
* bootstrap_python_packages works regardless of tarball name (Issue 49)
* mrjob always creates temp buckets in the correct AWS region (Issue 64)
* Catch abuse of __main__ in jobs (Issue 78)
* Added mr_travelling_salesman example

0.2.6

* Set Hadoop to run on EMR with --hadoop-version (Issue 71).
* Default is still 0.18, but will change to 0.20 in mrjob v0.3.0.
* New inline runner, for testing locally with a debugger
* New --strict-protocols option, to catch unencodable data (Issue 76)
* Added steps_python_bin option (for use with virtualenv)
* mrjob no longer chokes when asked to run on an EMR job flow running
Hadoop 0.20 (Issue 110)
* mrjob no longer chokes on job flows with no LogUri (Issue 112)

Page 8 of 10

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.