Imessage-conversation-analyzer

Latest version: v2.3.0

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

3.39.0

Beyond that, there are a wealth of other small improvements to refactor and polish up the codebase.

2.3.0

- Added a count for audio messages to the `attachment_totals` analyzer
- The exposed `attachments` dataframe has been updated to include columns for:
- The filename of the attachment, if applicable
- The ID of the associated message
- The `messages` dataframe has been updated to include a column for the ID of the message

2.2.0

- Rewrote the most_frequent_emojis analyzer to be substantially faster and more accurate
- The time complexity of the algorithm has been reduced from O(n^2) to O(n), resulting is significant speedups (e.g. 10s to 3s, or 4s to 2s)
- The new algorithm also handles combined emojis correctly (e.g. 👨‍💻, which is a combination of 👨 and 💻, is now counted correctly)
- Small refactoring improvements to clean up the codebase

2.1.0

- Fixed a bug where ICA could not infer the format from an `*.md` file extension when passing a Markdown file as an output path
- A `FormatNotSupportedError` has been added, and is now raised if the specified format is unsupported (either on the CLI via `-f`/`--format`, or when calling `ica.output_results` with the `format` parameter)
- Refactored `ica.output_results` tests to be much more robust

2.0.0

ICA v2 is the next major release of the library that represents as significant of a milestone as the initial v1 release!
https://pypi.org/project/imessage-conversation-analyzer/

TL;DR

1. In addition to the CLI, a comprehensive Python API has been added so that you can write custom programs to integrate with the library more easily
2. It adds support for many more emoji
3. It fixes some major bugs and makes the tool more intuitive to use
4. It adds support for writing to Excel files
5. It adds support for non-US phone numbers
6. It adds timezone support to eliminate any potential for date/time ambiguity

Python API

Most notably is the addition of a fully-typed Python API which allows you to write custom analyzers that integrates with ICA with greater power and flexibility.

v1 had a concept of "metric files", which were rather limited in capability because they could only be called via the CLI and did not allow for post-processing.

In v2, these "metric files" have been re-dubbed "analyzers" for better clarity, and the new Python API allows for importing of the `ica` package in your module.

This new API was designed to be adaptable to different kinds of needs. That is, the processing of the message data provided by the library can be as simple or as sophisticated as you'd like. For example, you can either choose to integrate with the built-in CLI, or you can write in your own processing logic.

We encourage you to look at the built-in analyzer modules as examples of how to use this new API.

Improved Emoji Support

Previously, ICA only supported a small subset of emoji for the "Most Frequent" analyzer. ICA v2 adds support for over 1,800 of the emoji supported by the Unicode standard. This should cover the majority of emojis that people use in their message conversations.

Parsing of Typedstream-Encoded Message Data

Certain messages in the macOS message database are encoded using Apple's binary typedstream format in a special `attributedBody` column. In ICA v1, these types of messages could not be parsed and therefore were excluded from the dataset and from certain analytics (like emoji counts).

In ICA v2, new logic has been added to decode these typedstream-encoded messages and merge them into the main dataset, thanks to help from the [pytypedstream](https://pypi.org/project/pytypedstream/) package. This means that you can place confidence that ICA will analyze the entirety of your message data for a conversation, not merely a subset of it.

Excel Support

The CLI and the Python API now support outputting your analyzer dataframe to Excel. This is achieved by specifying the new `-o`/`--output` flag on the CLI with a file path ending in `.xlsx`. You can also pass `--format=xlsx` if you want to capture or redirect the binary output for your own purposes.

For the Python API, you can pass the `output` parameter to `ica.output_results()` with an `.xlsx` file path. Alternatively, you can pass `format='excel'`, with `output` as a `BytesIO` object.

sh
ica transcript -c 'Thomas Riverstone' -o ./my_transcript.xlsx


python
ica.output_results(my_df, output='excel')

Timezone Support

Previously, all dates/times in ICA v1 would assume the local system timezone of the user running the CLI. In v2, this is still the default behavior, but a new `-t`/`--timezone` option (or `timezone` parameter for `ica.get_dataframes`) has been added. This new parameter accepts any IANA timezone name (e.g. `America/New_York` or `UTC`).

sh
ica message_totals -c 'John Doe' -t UTC


python
dfs = ica.get_dataframes(contact_name=my_contact_name, timezone='UTC')

Default Format Changes

The default format (i.e. when you omit the `--format`/`-f`/`format` option) has changed slightly from using the `tabulate` package to using [pandas.DataFrame.to_string](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_string.html). This improves the consistency of the API to allow for writing data in the default format to a buffer or file (like other formats).

**Before:**

Date Total
------------------- -------
2024-01-26 00:00:00 12
2024-01-27 00:00:00 45
2024-01-28 00:00:00 56


**After:**

Date Total
2024-01-26 12
2024-01-27 45
2024-01-28 56

Support for Non-US Phone Numbers

ICA v2 now integrates with the [phonenumbers](https://pypi.org/project/phonenumbers/) package to standardize the parsing of phone numbers when looking up the conversation for a particular contact. A benefit of this integration is that non-US phone numbers are supported.

Dependency Upgrades and Changes

All project dependencies have been updated to their latest versions:

Upgraded (Existing) Dependencies)

- [pandas](https://pypi.org/project/pandas/2.2.0/) has been upgraded to v2.2.0
- [tabulate](https://pypi.org/project/tabulate/) has been upgraded to v0.9.0
New Dependencies

- [openpyxl](https://pypi.org/project/openpyxl/3.1.2/) (for reading and writing Excel files)
- [pyarrow](https://pypi.org/project/pyarrow/15.0.0/) (per the recommendation of pandas v2)
- [phonenumbers](https://pypi.org/project/phonenumbers/8.13.29/) (to standardize the parsing of contact phone numbers)
- [tzlocal](https://pypi.org/project/tzlocal/5.2/) (for determining the local timezone of the user's system)
Full Test Suite

ICA v2 adds a full test suite, boasting 96% code coverage across the entire codebase. This includes tests for the core `ica` package and all built-in analyzers, for both the Python API and the CLI utility. With this, you may have greater confidence that the package will behave correctly in all the relevant cases.

CLI Changes

You may have noticed with the above examples that the Command Line API has also changed slightly. The `-m` parameter has been dropped in favor of specifying the analyzer name as a single positional parameter.

**Before:**
sh
ica -c 'John Doe' -m ica/metrics/message_totals.py -f csv


**After:**
sh
ica message_totals -c 'John Doe' -f csv


Bug Fixes

1. Emojis with a count of zero are now excluded from the "Most Frequent Emojis" data
2. Dates with no messages sent are now excluded from the "Totals by Day" analyzer
3. Fixed "Days Missed" and "Days with No Reply" calculation for the "Message Totals" analyzer

2.0.0beta.1

ICA v2 is the next major release of the library that represents as significant of a milestone as the initial v1 release!

TL;DR

1. In addition to the CLI, a comprehensive Python API has been added so that you can write custom programs to integrate with the library more easily
2. It adds support for many more emoji
3. It fixes some major bugs and makes the tool more intuitive to use
4. It adds support for writing to Excel files
5. It adds support for non-US phone numbers
6. It adds timezone support to eliminate any potential for date/time ambiguity

Python API

Most notably is the addition of a fully-typed Python API which allows you to write custom analyzers that integrates with ICA with greater power and flexibility.

v1 had a concept of "metric files", which were rather limited in capability because they could only be called via the CLI and did not allow for post-processing.

In v2, these "metric files" have been re-dubbed "analyzers" for better clarity, and the new Python API allows for importing of the `ica` package in your module.

This new API was designed to be adaptable to different kinds of needs. That is, the processing of the message data provided by the library can be as simple or as sophisticated as you'd like. For example, you can either choose to integrate with the built-in CLI, or you can write in your own processing logic.

We encourage you to look at the built-in analyzer modules as examples of how to use this new API.

Improved Emoji Support

Previously, ICA only supported a small subset of emoji for the "Most Frequent" analyzer. ICA v2 adds support for over 1,800 of the emoji supported by the Unicode standard. This should cover the majority of emojis that people use in their message conversations.

Parsing of Typedstream-Encoded Message Data

Certain messages in the macOS message database are encoded using Apple's binary typedstream format in a special `attributedBody` column. In ICA v1, these types of messages could not be parsed and therefore were excluded from the dataset and from certain analytics (like emoji counts).

In ICA v2, new logic has been added to decode these typedstream-encoded messages and merge them into the main dataset, thanks to help from the [pytypedstream](https://pypi.org/project/pytypedstream/) package. This means that you can place confidence that ICA will analyze the entirety of your message data for a conversation, not merely a subset of it.

Excel Support

The CLI and the Python API now support outputting your analyzer dataframe to Excel. This is achieved by specifying the new `-o`/`--output` flag on the CLI with a file path ending in `.xlsx`. You can also pass `--format=xlsx` if you want to capture or redirect the binary output for your own purposes.

For the Python API, you can pass the `output` parameter to `ica.output_results()` with an `.xlsx` file path. Alternatively, you can pass `format='excel'`, with `output` as a `BytesIO` object.

sh
ica transcript -c 'Thomas Riverstone' -o ./my_transcript.xlsx


python
ica.output_results(my_df, output='excel')

Timezone Support

Previously, all dates/times in ICA v1 would assume the local system timezone of the user running the CLI. In v2, this is still the default behavior, but a new `-t`/`--timezone` option (or `timezone` parameter for `ica.get_dataframes`) has been added. This new parameter accepts any IANA timezone name (e.g. `America/New_York` or `UTC`).

sh
ica message_totals -c 'John Doe' -t UTC


python
dfs = ica.get_dataframes(contact_name=my_contact_name, timezone='UTC')

Default Format Changes

The default format (i.e. when you omit the `--format`/`-f`/`format` option) has changed slightly from using the `tabulate` package to using [pandas.DataFrame.to_string](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_string.html). This improves the consistency of the API to allow for writing data in the default format to a buffer or file (like other formats).

**Before:**

Date Total
------------------- -------
2024-01-26 00:00:00 12
2024-01-27 00:00:00 45
2024-01-28 00:00:00 56


**After:**

Date Total
2024-01-26 12
2024-01-27 45
2024-01-28 56

Support for Non-US Phone Numbers

ICA v2 now integrates with the [phonenumbers](https://pypi.org/project/phonenumbers/) package to standardize the parsing of phone numbers when looking up the conversation for a particular contact. A benefit of this integration is that non-US phone numbers are supported.

Dependency Upgrades and Changes

All project dependencies have been updated to their latest versions:

Upgraded (Existing) Dependencies)

- [pandas](https://pypi.org/project/pandas/2.2.0/) has been upgraded to v2.2.0
- [tabulate](https://pypi.org/project/tabulate/) has been upgraded to v0.9.0
New Dependencies

- [openpyxl](https://pypi.org/project/openpyxl/3.1.2/) (for reading and writing Excel files)
- [pyarrow](https://pypi.org/project/pyarrow/15.0.0/) (per the recommendation of pandas v2)
- [phonenumbers](https://pypi.org/project/phonenumbers/8.13.29/) (to standardize the parsing of contact phone numbers)
- [tzlocal](https://pypi.org/project/tzlocal/5.2/) (for determining the local timezone of the user's system)
Full Test Suite

ICA v2 adds a full test suite, boasting 96% code coverage across the entire codebase. This includes tests for the core `ica` package and all built-in analyzers, for both the Python API and the CLI utility. With this, you may have greater confidence that the package will behave correctly in all the relevant cases.

CLI Changes

You may have noticed with the above examples that the Command Line API has also changed slightly. The `-m` parameter has been dropped in favor of specifying the analyzer name as a single positional parameter.

**Before:**
sh
ica -c 'John Doe' -m ica/metrics/message_totals.py -f csv


**After:**
sh
ica message_totals -c 'John Doe' -f csv


Bug Fixes

1. Emojis with a count of zero are now excluded from the "Most Frequent Emojis" data
2. Dates with no messages sent are now excluded from the "Totals by Day" analyzer
3. Fixed "Days Missed" and "Days with No Reply" calculation for the "Message Totals" analyzer

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.