Mrjob

Latest version: v0.7.4

Safety actively analyzes 681926 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 10

0.6.5

* all runners:
* can turn off log parsing with --no-read-logs (1825)
* EMR Runner:
* transient API errors:
* EMR client won't retry faster than check_cluster_every option (1799)
* all AWS clients will retry on SSL timeouts (1827)
* RetryWrapper now passes through docstring of wrapped methods
* AMIs:
* default AMI is now 5.16.0 (1818)
* choose custom AMIs with image_id option (1805)
* find base AMIs with mrjob.ami.describe_base_emr_images() (1829)
* added make_ec2_client() method and ec2_endpoint option
* choose EBS root volume size with ebs_root_volume_gb option (1812)
* new clusters are tagged with __mrjob_label and __mrjob_owner (1828)
* idle self-termination script (max_hours_idle):
* retries if shutdown fails (1819)
* logs its output to mrjob-idle-termination.log
* cluster pooling recover now works on single-node clusters (1822)

0.6.4

* drop support for Python 3.3
* unvendored google-cloud-dataproc, version 0.2.0+ required (1796)
* link static files to MRJobs with FILES, DIRS, ARCHIVES (1431)
* also added files(), dirs(), archives() methods
* termination protection doesn't make terminate-idle-clusters crash (1801)

0.6.3

* jobs:
* use mapper_raw() to read entire file, in any format (754)
* log interpretation:
* handles "not a valid JAR" error from Hadoop (1771)
* less dependencies on Google libraries (1746):
* google-cloud-logging 1.5.0+
* google-cloud-storage 1.9.0+
* google-cloud-dataproc is vendored (future releases will require 0.11.0+)
* RetryWrapper now sets __name__ of wrapped functions (1790)
* runners:
* all runners:
* don't stream output if --output-dir is specified (1739)
* --no-output switch is now --no-cat-output
* added --cat-output switch
* cloud runners (Dataproc, EMR):
* renamed cloud_upload_part_size option to cloud_part_size_mb (1774)
* DataprocJobRunner:
* options now supported:
* cloud_part_size_mb: control chunked uploading (1404)
* {core,master,task}_instance_config (1681):
* set disk_config, is_preemptible, other instance options
* cluster_config: set properties in Hadoop config files (1680)
* hadoop_streaming_jar: specify custom Hadoop streaming JAR (1676)
* network/subnet: specify network/subnetwork (1683)
* service_account: specify custom IAM service account (1682)
* service_account_scopes: specify custom permissions for cluster (1682)
* ssh_tunnel/ssh_tunnel_is_open: access resource manager (1670)
* fs:
* cat() streams data rather than dumping to a temp file (1674)
* exists() no longer swallows exceptions (1675)
* full support for parsing probable cause of job failure (1672)
* full support for fetching and parsing counters (1703)
* job progress messages (1671)
* can now run JAR steps (1677)
* uses Dataproc's built-in idle timeout, not a script (1705)
* bootstrap script runs in temp dir, not / (1601)

0.6.2

* runners:
* local runners
* added --num-cores option to control parallelism and file splits (1727)
* cloud runners (EMR and Dataproc):
* idle timeout script has 10-minute grace period (1694)
* Dataproc:
* replaced google-api-python-client with google-cloud-sdk (1730)
* works without gcloud util config installed (1742)
* credentials can be read from $GOOGLE_APPLICATION_CREDENTIALS
* or from gcloud util config (if installed)
* no longer required to set region or zone (1732)
* auto zone placement (just set region) is enabled
* defaults to auto zone placement in us-west1
* no longer reads zone or region from gcloud GCE configs
* Dataproc Quickstart is now up-to-date (1589)
* api_client attr has been replaced with cluster_client and job_client
* GCSFilesystem method changes:
* api_client attr has been replaced with client
* create_bucket() no longer takes a project ID
* delete_bucket() is disabled (use get_bucket(...).delete())
* get_bucket() returns a google.cloud.storage.bucket.Bucket
* list_buckets() is disabled (use get_all_bucket_names())
* EMR:
* much faster error log parsing (1706)
* may have to wait for logs to transfer to S3 on some AMIs
* tools:
* terminate-idle-job-flows is faster and uses less API calls

0.6.1

* fixed serious error log parsing issue (1708)
* added mrjob diagnose utility to find why previously run jobs failed (1707)
* exposed EMRJobRunner.get_job_steps() (1625)

0.6.0

* dropped support for Python 2.6
* use boto3 instead of boto (1304)
* job output is now byte-based, not line-based:
* runner.fs.cat() now yields chunks of bytes, not lines (1533)
* runner.cat_output() yields chunks of bytes (1604)
* runner.stream_output() is deprecated
* job.parse_output() translates chunks of bytes to records (1604)
* job.parse_output_line() is deprecated
* replaced optparse with argparse (1587)
* renamed attributes/methods (old name is a deprecated alias)
* add_file_option() -> add_file_arg()
* add_passthrough_option() -> add_passthru_arg()
* configure_options() -> configure_args()
* load_options() -> load_args()
* pass_through_option() -> pass_arg_through()
* self.args -> self.options.args
* duplicate file upload args are passed through to job
* runners:
* all runners:
* file_upload_args kwarg is deprecated (1656)
* pass path dicts to extra_args instead (see mrjob.setup)
* sim runners (inline and local):
* local mode runs one mapper/reducer per CPU (1083)
* step_output_dir works (1515)
* only sort by reducer key unless SORT_VALUES is set (660)
* files in working dir are marked user-executable (1619)
* don't crash if os.symlink doesnt work on Windows (1649)
* input passed to jobs as stdin, not as arguments (567)
* input decompressed by runner, not mrjob.cat utility
* cloud runners (Dataproc and EMR):
* bootstrap can take archives and dirs as well as files (1530)
* bootstrap files are now only made executable by current user (1602)
* extra_cluster_params for unsupported API params (1648)
* max_mins_idle option (1686)
* default is 10 minutes
* <10 minutes may result in premature cluster shutdown (see 1693)
* max_hours_idle is deprecated
* EMRJobRunner:
* persistent clusters will always idle-time-out
* default AMI is 5.8.0 (1594)
* instance_fleets option (1569)
* instance_groups option (use for EBS volumes) (1357)
* region names are now case-sensitive
* 'EU' alias for EMR region 'eu-west-1' no longer works (1538)
* __mrjob_pool_hash and __mrjob_pool_name EMR tags on cluster (1086)
* __mrjob_version tag on cluster (1600)
* jobs no longer add tags to cluster (1565)
* enable_emr_debugging now also works AMI 4.x and later
* SSH filesystem no longer dumps file contents to memory (1544)
* bootstrapping errors no longer logged as JSON (1580)
* "latest" AMI alias no longer works (1595)
* implicitly dropped support for AMI 2.4.2 and earlier (no Python 2.7)
* deprecated visible_to_all_users option
* mins_to_end_of_hour no longer works (EMR now bills by the second)
* pooling changes:
* ensure that extra instance roles (e.g. "task") can run job (1630)
* only running instances are counted (1633)
* boto -> boto3 changes:
* added make_emr_client()
* added make_iam_client()
* removed make_*_conn() methods (see below)
* emr_api_params no longer works (1574) (use extra_cluster_params)
* boto3 reads $AWS_SESSION_TOKEN, not $AWS_SECURITY_TOKEN
* added fs methods:
* added get_all_bucket_names()
* added make_s3_client()
* added make_s3_resource()
* removed methods that return boto 2 objects (see below)
* create_bucket()'s second arg is now named region, not location
* mrjob tools:
* updated billing calculations in audit-emr-usage (1688)
* mrjob.util
* deprecated read_file() (1605) (use mrjob.cat.decompress())
* deprecated read_input()
* removed code (mostly due to deprecation)
* (for runner changes, see mrjob.conf, mrjob.options, and mrjob.runner)
* mrjob.aws:
* removed emr_endpoint_for_region()
* removed emr_ssl_host_for_region()
* removed s3_endpoint_for_region()
* removed s3_location_constraint_for_region()
* mrjob.cat:
* this module is no longer executable
* mrjob.cmd:
* removed deprecated mrjob command aliases:
* create-job-flow
* terminate-idle-job-flows
* terminate-job-flow
* mrjob.conf:
* removed OptionStore class and subclasses (1615)
* mrjob.emr:
* removed s3_key_to_uri()
* EMRJobRunner:
* removed make_emr_conn() (use make_emr_client())
* removed make_iam_conn() (use make_iam_client())
* removed get_ami_version() (use get_image_version())
* removed get_emr_job_flow_id() (use get_cluster_id())
* make_persistent_job_flow() (use make_persistent_cluster())
* see also mrjob.fs.s3.S3Filesystem
* mrjob.fs.base:
* base filesystem methods:
* removed path_exists() (use exists())
* removed path_join() (use join())
* mrjob.fs.s3:
* removed wrap_aws_conn()
* S3Filesystem
* removed make_s3_conn() (use make_s3_client() or make_s3_resource())
* removed get_all_buckets() (use get_all_bucket_names())
* removed get_s3_key()
* removed get_s3_keys()
* removed make_s3_key()
* mrjob.fs.ssh:
* SSHFilesystem
* removed ssh_slave_hosts()
* mrjob.job:
* MRJob:
* see mrjob.launch.MRJobLauncher
* removed loose protocols (1021)
* mrjob.launch
* MRJobLauncher:
* removed generate_file_upload_args()
* removed generate_passthough_arguments()
* removed is_mapper_or_reducer() (use is_task())
* removed mr() (use mrjob.step.MRStep)
* removed *job_runner_kwargs()
* removed --partitioner switch
* removed OptionGroups (1611):
* removed all_option_groups()
* removed *_opt_group attributes
* mrjob.options:
* removed deprecated options (1022):
* bootstrap_cmds
* bootstrap_files
* bootstrap_scripts
* hadoop_home
* hadoop_streaming_jar_on_emr
* num_ec2_instances
* python_archives
* setup_cmds
* setup_scripts
* strict_protocols
* removed deprecated option aliases:
* ami_version
* aws_availability_zone
* aws_region
* aws_security_token
* base_tmp_dir
* check_emr_status_every
* ec2_core_instance_bid_price
* ec2_core_instance_type
* ec2_instance_type
* ec2_master_instance_bid_price
* ec2_master_instance_type
* ec2_slave_instance_type
* ec2_task_instance_bid_price
* ec2_task_instance_type
* emr_job_flow_id
* emr_job_flow_pool_name
* emr_tags
* hdfs_scratch_dir
* num_ec2_core_instances
* num_ec2_task_instances
* pool_emr_job_flows
* s3_log_uri
* s3_scratch_uri
* s3_sync_wait_time
* s3_tmp_dir
* s3_upload_part_size
* ssh_tunnel_to_job_tracker
* mrjob.parse:
* removed is_windows_path()
* removed iso8601_to_timestamp()
* removed iso8601_to_datetime()
* removed parse_key_value_list()
* removed parse_port_range_list()
* mrjob.retry:
* removed RetryGoRound
* mrjob.runner:
* MRJobRunner:
* removed get_job_name()
* removed OPTION_STORE_CLASS attribute
* removed deprecated passthrough to runner.fs
* removed deprecated JOB_FLOW and *SCRATCH cleanup types
* mrjob.setup:
* removed BootstrapWorkingDirManager
* mrjob.step:
* removed INPUT, OUTPUT attributes from JarStep
* mrjob.ssh (removed entirely)
* mrjob.util:
* removed args_for_opt_dest_subset()
* removed bash_wrap()
* removed buffer_iterator_to_line_iterator()
* removed bunzip2_stream() (now in mrjob.cat)
* removed gunzip_stream() (now in mrjob.cat)
* removed populate_option_groups_with_options()
* removed scrape_options_and_index_by_dest()
* removed scrape_options_into_new_groups()

Page 3 of 10

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.