Mrjob

Latest version: v0.7.4

Safety actively analyzes 682679 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 10

0.5.0

* supports Python 3 (989)
* requires boto 2.35.0 or newer (980)
* removed many workarounds for S3 and EMR (980), IAM (1062)
* jobs:
* is_mapper_or_reducer() is now is_task() (1072)
* mr() no longer takes positional arguments (814)
* removed jar() (use mrjob.step.JarStep)
* removed testing methods parse_counters() and parse_output()
* protocols:
* protocols are strict by default (724)
* JSON protocols use ujson when available, then simplejson (1002, 1266)
* can explicitly choose Standard, Simple or Ultra JSON protocol
* raw protocols handle bytes or unicode depending on Python version
* can explicitly choose Text or Bytes protocol
* mrjob.step:
* JarStep only takes "args" and "main_class" keyword args
* removed MRJobStep (use MRStep)
* runners:
* All runners:
* totally revamped log handling (1123)
* runner status/log messages are less noisy (1044)
* don't bootstrap mrjob if interpreter is set (1041)
* fs methods path_exists() and path_join() are now exists() and join()
* deprecation warning: use runner.fs explicitly (1146)
* changes to cleanup options:
* removed IS_SUCCESSFUL (use ALL)
* LOCAL_SCRATCH is now LOCAL_TMP (318)
* new HADOOP_TMP option handles HDFS cleanup (1261)
* REMOTE_SCRATCH is now CLOUD_TMP (1261)
* base_tmp_dir option is now local_tmp_dir (318)
* non-inline runners raise StepFailedException on step failure (1219)
* steps_python_bin defaults to current python interpreter (1038)
* _job_name is now _job_key (982)
* EMR:
* default AWS region is us-west-2 (1025)
* default instance type is m1.medium (992)
* visible_to_all_users defaults to true (1016)
* matches your minor version of Python 2 on 3.x and 4.x AMIs (1265)
* 4.x AMIs are supported (1105)
* added --release-label switch (--ami-version 4.x.y also works)
* can fetch counters and probable cause of failure on 3.x and 4.x AMIs
* SSH tunnel now works on 3.x and 4.x AMIs (1013)
* ssh_tunnel_to_job_tracker option is now ssh_tunnel
* correctly fetch step logs by step ID (1117)
* bootstrap_python option
* s3_scratch_uri option is now s3_tmp_dir (318)
* aws_region is no longer inferred from s3_tmp_dir
* create/select temp bucket in same region as EMR jobs (687)
* added iam_endpoint option (1067)
* removed s3_conn args from methods in EMRJobRunner and S3Filesystem
* S3 Filesystem:
* connect to each S3 bucket on appropriate endpoint (1028)
* fall back to default if we can't get bucket location (1170)
* removed special treatment of _$folder$ keys
* removed deprecated S3Filesystem method get_s3_folder_keys()
* recurse "subdirectories" even if uri lacks trailing / (1183)
* removed iam_job_flow_role option (use iam_instance_profile)
* custom hadoop_streaming_jar gets properly uploaded
* job cleanup temporarily disabled (1241)
* pooling respects key pair (1230)
* idle cluster self-termination respects non-streaming jobs (1145)
* deprecated "latest" AMI version not passed through to EMR (1269)
* emr_job_flow_id option is now cluster_id (1082)
* emr_job_flow_pool_name is now pool_name (1082)
* pool_emr_job_flows is now pool_clusters (1082)
* Hadoop
* works out-of the-box on most Hadoop setups (1160)
* works out-of the box inside EMR (2.x, 3.x, and 4.x AMIs)
* counters are parsed from Hadoop binary stderr in YARN (1153)
* can find logs and probable cause of failure in YARN (1195)
* will search in <output dir>/_logs, to support Cloudera (565)
* HDFS Filesystem:
* use fs -ls -R and fs -rm -R in YARN (1152)
* mkdir() now uses -p on YARN (991)
* fs.du() now works on YARN (1155)
* fs.du() now returns 0 for nonexistent files instead of erroring
* fs.rm() now uses -skipTrash
* dropped support for Hadoop prior to 0.20.203 (1208)
* added hadoop_log_dirs option
* hdfs_scratch_dir option is now hadoop_tmp_dir (318)
* hadoop_home is deprecated
* uses -D and correct property name when step has no reduces (1213)
* Inline/Local
* runner.fs raises IOError if passed URIs (1185)
* version-agnostic by default (735)
* removed ignored hadoop_extra_args and hadoop_streaming_jar opts (1275)
* inline runner uses multiple splits by default (1276)
* removed mrjob.compat.get_jobconf_value() (use jobconf_from_env())
* removed mrjob.compat methods to support Hadoop prior to 0.20.203:
* supports_combiners_in_hadoop_streaming()
* supports_new_distributed_cache_options()
* uses_generic_jobconf()
* removed mrjob.conf.combine_cmd_lists()
* removed fetch-logs tool (1127)
* mrjob subcommands use "cluster" rather than "job-flow" (1082)
* create-job-flow is now create-cluster
* terminate-idle-job-flows is now terminate-idle-clusters
* terminate-job-flow is now terminate-cluster
* Python-version-specific mrjob-x and mrjob-x.y commands (1104)
* use followlinks=True with os.walk()
* all internal constants/functions/methods explicitly start with _ (681)
* mrjob.util:
* file_ext() takes filename, not path
* random_identifier() moved here from mrjob.aws
* buffer_iterator_to_line_iterator() is now to_lines()
* to_lines() no longer appends a newline to data (819)
* removed extract_dir_for_tar()
* gunzip_stream() now yields chunks, not lines
* removed hash_object()

0.4.6

* PyYAML>=3.08 is required
* !clear tag in conf files (1162)
* combine_lists() and combine_path_lists() can handle scalars (1172)
* include: paths in conf files are relative to real path of conf file (1166)
* mrjob.conf.combine_cmd_lists() is deprecated (1168)
* EMR runner: pool_wait_minutes can now be loaded from mrjob.conf (1070)
* support for wheel packaging format (1140)

0.4.5

* boto>=2.6.0 is required (used to be 2.2.0)
* runners:
* EMR:
* moved off deprecated DescribeJobFlows API (876)
* time-to-end-of-hour now uses creation time, not "start" time
* aws_security_token for temporary credentials (1003)
* Use AWS managed policies when creating IAM objects (1026)
* Fall back to default role/instance profile when no IAM access (1008)
* added emr_tags option (1058)
* added get_ami_version() method
* hadoop_version option no longer has any effect (1017)
* Hadoop:
* --hadoop-home switch now works (1037)
* EMR tools:
* added switches for AWS connection options etc. (1087)
* mrboss is available from command line tool: mrjob boss [args]
* terminate_idle_job_flows:
* less prone to race condition (910)
* prints results to stdout in dry_run mode (1102)
* job flows stuck in STARTING state no longer considered idle
* report_long_jobs reports job flows stuck in STARTING state
* collect_emr_stats and job_flow_pool are deprecated
* more efficient decoding of bz2 files
* mrjob.retry.RetryWrapper raises exception when out of tries (1093)

0.4.4

* runners:
* EMR:
* Create IAM objects as needed (unbreaks mrjob for new accounts) (999)
* --iam-job-flow-role renamed to --iam-instance-profile (1001)
* new --iam-service-role option (1005)

0.4.3

* jobs:
* MRStep's constructor treats kwarg=None same as not setting it (970)
* parse_counters() and parse_output() are deprecated (829)
* self.mr is deprecated in favor of MRStep (815)
* runners:
* All runners:
* You can now set strict_protocols from mrjob.conf (726)
* new --no-strict-protocols command-line option
* streaming output from closed runner shows a warning (853)
* EMR:
* --check-input-paths and --no-check-input-paths options (864)
* skip (very slow) validation of s3 buckets if boto < 2.25.0 (865)
* Fix for max_hours_idle bug that was terminating job flows early (932)
* --emr-api-param allows users to pass additional parameters to boto's
EMR API (879)
* unset paramaters with --no-emr-api-param
* bootstrap_python_packages (deprecated) now works on 3.x EMR AMIs (863)
* Use TERMINATE_CLUSTER instead of deprecated TERMINATE_JOB_FLOW (974)
* updated EC2 instance type data for pooling (995)
* Hadoop:
* exclude hadoop source jars when looking for streaming jar (861)
* Fixed mkdir_on_hdfs for Hadoop version 2.x (923)
* Fixed hadoop_bin on Windows (843)
* Local
* bootstrap mrjob by default (984)
* Inline
* fix for add_file_option() (851)
* cd to job's working directory before instantiating mrjob class (988)
* Use pytest to run tests (898)
* collect-emr-active-stats subcommand (947)
* Using xtrace flag to get more output during bootstrap (943)
* Fixed log printouts for command line tools (901)
* Fix to avoid interpreting windows paths as URIs (880)
* Better error message when ssh keyfile is missing (858)
* Update EMR tool ISO8601 parsing to be consistent with EMR runner (869)
* Dropped support for Python 2.5 (713)
* Dropped support for the 1.x EMR AMI series, which uses Python 2.5

0.4.2

* jobs:
* can interpolate input and output path(s) into arguments of JarSteps,
so they can be part of multi-step jobs (773)
* see mrjob/examples/mr_jar_step_example.py
* JarStep now takes keyword arguments only (769)
* removed useless "name" field; "step_args" is now just "args"
* MRJobStep (usually accessed via MRJob.mr()) is now MRStep
* runners:
* All runners:
* --setup is now fully functional (206)
* --python-archive, --setup-cmd, and --setup-script are deprecated
* --bootstrap option works and uses sh (206)
* --bootstrap-cmd, --bootstrap-file, --bootstrap-python-package,
--bootstrap-script are deprecated
* setup commands can no longer corrupt a task's input and output (803)
* sh_bin is now "sh -e" by default so setup fails fast (810)
* default is "/bin/sh -e" on EMR
* EMR:
* JarSteps work again (763)
* auto-uploads jars for JarSteps (772)
* JARs on the EMR instances can be accessed with file:/// URIs
* ssh_cat() no longer raises an error when catting a file
containing an error (807)
* Fixed SignatureDoesNotMatchError that happens with boto 2.10.0+
with Python prior to 2.7.5 (778)
* Hadoop:
* now handles JarSteps too (770)
* Fix to mrjob.parse.urlparse() that was breaking Python 2.5
* mrjob.util.buffer_iterator_to_line_iterator() is now more efficient
and uses a bounded amount of memory
* bz2 decompression no longer discards data (817)

Page 6 of 10

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.