Streamsx-hdfs

Latest version: v0.1.0

Safety actively analyzes 638466 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 2

2.1alpha

This alpha release adds the ability to read text and sequence files in parallel via the HadoopReader operator. If you are looking for a production-release, the latest one is [v2.0.0](https://github.com/IBMStreams/streamsx.hdfs/releases/tag/v2.0.0).

Unlike the HDFS2FileSource, which reads either lines or binary blobs, the HadoopReader reads key-value pairs. (For text files, the key is the position in the file.)

When in a parallel region, the HadoopReader reads a portion of the file as determined by its channel. Note that files compressed with unsplittable compression cannot be read in parallel, and only channel 0 will produce any tuples. However, sequence files, text files, and text files compressed with splitable compression (ie, with bz2) are read in parallel.

Some limitations of the operator are given [here](https://github.com/IBMStreams/streamsx.hdfs/wiki/Extensions-to-HadoopReader).

The `demos/WordCount` directory gives an example of using this operator to do word count.

Note that as this is a pre-release. The operator interface (and even then name) may change, and there is no guarantee that this will be in the the official HDFS toolkit v2.1.0, the next product version, or in any future version. The code is in the SequenceFile branch, not the master branch.

2.0.0

This is an official release of the HDFS v2.0 toolkit to support InfoSphere Streams v4.0.
Highlights of the release include:
- Updates to all operators to support Application Bundle
- Support for consistent region
- Support for InfoSphere Big Insight v3.0.0.2
- Support for Cloudera CDH 5
- Support for HortonWorks HDP 2.2.0
- Support for reading and writing data into HDFS in binary format
- Support for dyanamic filename for HDFS2FileSink operator

1.2.0.20141219

In this release, we have the following changes:
- HDFS operators are renamed back to HDFS2 operators
- Fixing https://github.com/IBMStreams/streamsx.hdfs/issues/31

1.0.0.20140807

This is a prerelease of the HDFS toolkit. This release contains:

1) a snapshot of the HDFS2\* operators from Streams 3.2.1 release
2) Issue 15 : HDFSFile Sink does not flush buffer on job cancel

Page 2 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.