A notebook containing some of the features of MSTICPy 2.0
is available at [What's new in MSTICPy 2.0](https://github.com/microsoft/msticpy/blob/main/docs/notebooks/What's%20New%20in%20MSTICPy%202.0.ipynb)
If you are new to MSTICPy or just want to catch up and get a quick
overview check out our new [MSTICPy Quickstart Guide](https://msticpy.readthedocs.io/en/latest/getting_started/QuickStart.html).
Contents
* Dropping Python 3.6 support
* Package re-organization and module search
* Simplifying imports in MSTICPy
* Folium map update - single function, layers, custom icons
* Threat Intelligence providers - async support
* Time Series simplified - analysis and plotting
* DataFrame to graph/network visualization
* Pivots - easy initialization/dynamic data pivots
* Consolidating Pandas accessors
* MS Sentinel workspace configuration
* MS Defender queries available in MS Sentinel QueryProvider
* Microsoft Sentinel QueryProvider
* New queries
* Documentation Additions and Improvements
* Miscellaneous improvements
* Previous feature changes since MSTICPy 1.0
---
Dropping Python 3.6 support
As of this release we only officially support Python 3.8 and above.
We will try to support Python 3.6 if the fixes required are small
and contained but make no guarantees of it working completely on
Python prior to 3.8.
---
Package re-organization and module search
One of our main goals for V2.0.0 was to re-organize MSTICPy to be more logical and easier to
use and maintain. Several years of organic growth had seen modules created in places that
seemed like a good idea at the time but did not age well.
The discussion about the V2 structure can be found here 320.
**Due to the re-organization, many features are no longer in places
where they used to be imported from!**
We have tried to maintain compatibility with old locations by adding "glue" modules.
These allow import of many modules from their previous locations but will issue a
*Deprecation* warning if loaded from the old location.
The warning will contain the new location of the module -
so you should update your code to point to this new location.
This table gives a quick overview of the V2.0 structure
| folder | description |
|-----------|----------------------------------------------------------------------------------|
| analysis | data analysis functions - timeseries, anomalies, clustering |
| auth | authentication and secrets management |
| common | common used utilities and definitions (e.g. exceptions) |
| config | configuration and settings UI |
| context | enrichment modules geoip, ip_utils, domaintools, tiproviders, vtlookup |
| data | data acquisition/queries/storage/uploaders |
| datamodel | entities, soc objects |
| init | package loading and initialization - nbinit, pivot modules |
| nbwidgets | nb widgets modules |
| transform | simple data processing - decoding, reformatting, schema change, process tree |
| vis | visualization modules including browsers |
Notable things that have moved:
* most things from the `sectools` folder have migrated to context, transform or analysis
* most things from the `nbtools` folder have migrated to:
* `msticpy.init` - (not to be confused with `__init__`) - package initialization
* `msticpy.vis` - visualization modules
* pivot functionality has moved to `msticpy.init`
Module Search
If you are having trouble finding a module, we have added a simple search function:
python
import msticpy as mp
mp.search("riskiq")
Matches will be returned in a table with links to the module documentation
<div style="border: solid; padding: 5pt">
<h4>Modules matching 'riskiq'</h4>
<table class='table_mod'>
<tr class='cell_mod'><th>Module</th><th>Help</th></tr>
<tr class='cell_mod'><td>msticpy.context.tiproviders.riskiq</td><td>
<a href='https://msticpy.readthedocs.io/en/latest/api/msticpy.context.tiproviders.riskiq.html' target='_blank'>msticpy.context.tiproviders.riskiq</a></td>
</tr>
</table>
</div>
---
Simplifying imports in MSTICPy
The root module in MSTICPy now has several modules and
classes that can be directly accessed from it (rather than
having to import them individually).
We've also decided to adopt a new "house style" of importing
`msticpy` as the alias `mp`. Slavishly copying the idea from
some of admired packages that we use (pandas -> `pd`,
numpy -> `np`, networkx -> `nx`) we thought it would save
a bit of typing. You are free to adopt or ignore this style -
it obviously has no impact on the functionality.
python
import msticpy as mp
mp.init_notebook()
qry_prov = mp.QueryProvider("MDE")
ti = mp.TILookup()
Many commonly-used classes and functions are exposed as
attributes of msticpy (or `mp`).
Also a number of commonly-used classes are imported by default
by `init_notebook`, notably all of the entity classes.
This makes it easier to use pivot functions without any initialization
or import steps.
python
import msticpy as mp
mp.init_notebook()
IpAddress can be used without having to import it.
IpAddress.whois("123.45.6.78")
`init_notebook` improvements
* You no longer need to supply the `namespace=globals()` parameter when
calling from a notebook. `init_notebook` will automatically obtain the
notebook global namespace and populate imports into it.
* The default verbosity of `init_notebook` is now 0, which produces
minimal output - use `verbosity=1` or `verbosity=2` to get more
detailed reporting.
* The Pivot subsystem is automatically initialized in `init_notebook`.
* All MSTICPy entities are imported automatically.
* All MSTICPy magics are initialized here.
* Most MSTICPy pandas accessors are initialized here (some, which
require optional packages, such as the timeseries accessors are
not initialized by default).
* `init_notebook` supports a `config` parameter - you can use this to
provide a custom path to a `msticpyconfig.yaml` overriding the usual
defaults.
* searching for a `config.json` file is only enabled if you are running
MSTICPy in Azure Machine Learning.
---
Folium map update - single function, layers, custom icons
The Folium module in MSTICPy has always been a bit complex to use
since it normally required that you convert IP addresses to MSTICPy
IpAddress entities before adding them to the map.
You can now
plot maps with a single function call from a DataFrame containing
IP addresses or location coordinates. You can group the data
into folium layers, specify columns to populate popups and tooltips
and to customize the icons and coloring.
![folium_layers](https://user-images.githubusercontent.com/13070017/171067563-2009621c-cf2a-4a7a-8fcf-fd7983d1e9aa.png)
plot_map
A new `plot_map` function (in the msticpy.vis.foliummap module) that
lets you plot mapping points directly from a DataFrame. You can
specify either an `ip_column` or coordinates columns (`lat_column` and
`long_column`). In the former case, the geo location of the IP address
is looked up using the MaxMind GeoLiteLookup data.
You can also control the icons used for each marker with the
`icon_column` parameters. If you happen to have a column in your
data that contains names of FontAwesome or GlyphIcons icons
you can use that column directly.
More typically, you would combine the `icon_column` with the
`icon_map` parameter. You can specify either a dictionary or a
function. For a dictionary, the value of the row in `icon_column`
is used as a key - the value is a dictionary of icon parameters
passed to the Folium.Icon class. For a function, the `icon_column`
value is passed to the function as a single parameter and the return value
should be a dictionary of valid parameters for the `Icon` class.
You can read the documentation for this function in the
[docs](https://msticpy.readthedocs.io/en/latest/api/msticpy.vis.foliummap.html)
plot_map pandas accessor
Plot maps from the comfort of your own DataFrame!
Using the msticpy `mp_plot` accessor you can plot maps directly
from a DataFrame containing IP or location information.
The `folium_map` function has the same syntax as `plot_map`
except that you omit the `data` parameter.
python
df.mp_plot.folium_map(ip_column="ip", layer_column="CountryName")
![pd_accessors](https://user-images.githubusercontent.com/13070017/171068607-beaef572-5ee7-4d42-8f39-e2461f6b883a.png)
Layering, Tooltips and Clustering support
In `plot_map` and `.mp_plot.folium_map` you can specify
a `layer_column` parameter. This will group the data
by the values in that column and create an
individually selectable/displayable layer in Folium. For performance
and sanity reasons this should be a column with a relatively
small number of discrete values.
Clustering of markers in the same layer is also implemented by
default - this will collapse multiple closely located markers
into a cluster that you can expand by clicking or zooming.
You can also populate tooltips and popups with values
from one or more column names.
"Classic" interface
The original FoliumMap class is still there for more manual
control. This has also been
enhanced to support direct plotting from IP, coordinates or GeoHash
in addition to the existing IpAddress and GeoLocation entities.
It also supports layering and clustering.
---
Threat Intelligence providers - async support
When you have configured more than one TI provider, MSTICPy will
execute requests to each of them asynchronously. This will bring big
performance benefits when querying IoCs from multiple providers.
Note: requests to individual providers are still executed synchronously
since we want to avoid swamping provider services with multiple
simultaneous requests.
We've also implemented progress bar tracking for TILookups, giving a visual
indication of progress when querying multiple IoCs.
Combining the progress tracking with asynchronous operation means
that not only is performing lookups for lots of observables faster
but you will also less likely to be left guessing whether or not your kernel
has hung.
*Note* that asynchronous execution only works with `lookup_iocs` and TI lookups
done via the pivot functions. `lookup_ioc` will run queries to multiple providers in seqence
so will usually be a lot slower than `lookup_iocs`.
python
don't do this
ti_lookup.lookup_ioc("12.34.56.78")
do this (put a single IoC in a list)
ti_lookup.lookup_iocs(["12.34.56.78"])
TI Providers are now also loaded on demand - i.e. only when you have
a configuration entry in your msticpyconfig.yaml for that provider.
This prevents loading of code (and possibly import errors) due to providers
which you are not intending to use.
Finally, we've added functions to enable and disable providers
after loading TILookup:
* [ti_lookup.enable_provider](https://msticpy.readthedocs.io/en/latest/api/msticpy.context.tilookup.html#msticpy.context.tilookup.TILookup.enable_provider)
* [ti_lookup.disable_provider](https://msticpy.readthedocs.io/en/latest/api/msticpy.context.tilookup.html#msticpy.context.tilookup.TILookup.disable_provider)
* [ti_lookup.set_provider_state](https://msticpy.readthedocs.io/en/latest/api/msticpy.context.tilookup.html#msticpy.context.tilookup.TILookup.set_provider_state)
python
from msticpy.context import TILookup
ti_lookup = TILookup()
iocs = ['162.244.80.235', '185.141.63.120', '82.118.21.1', '85.93.88.165']
ti_lookup.lookup_iocs(iocs, providers=["OTX", "RiskIQ"])
![ti_providers_async](https://user-images.githubusercontent.com/13070017/171067531-9859b675-cedf-4310-a8ba-370b2649a385.png)
---
Time Series simplified - analysis and plotting
Although the Time Series functionality was relatively simple to
use, it previously required several disconnected steps to compute
the time series, plot the data, extract the anomaly periods. Each of
these needed a separate function import. Now you can do all of these
from a DataFrame via pandas accessors.
(currently there is a separate accessor `df.mp_timeseries` but we are
still working on consolidating our pandas accessors so this may change
before the final release.)
Because you typically still need these separate outputs, the accessor
has multiple methods:
* `df.mp_timeseries.analyze` - takes a time-summarized DataFrame
and returns the results of a time-series decomposition
* `df.mp_timeseries.plot` - takes a decomposed time-series and
plots the anomalies
* `df.mp_timeseries.anomaly_periods` - extracts anomaly periods
as a list of time ranges
* `df.mp_timeseries.anomaly_periods` - extracts anomaly periods
as a list of KQL query clauses
* `df.mp_timeseries.apply_threshold` - applies a new anomaly
threshold score and returns the results.
[See documentation](https://msticpy.readthedocs.io/en/latest/api/msticpy.analysis.timeseries.html#msticpy.analysis.timeseries.MsticpyTimeSeriesAccessor)
Analyze data to produce time series.
python
df = qry_prov.get_networkbytes_per_hour(...)
ts_data = df.mp_timeseries.analyze()
Analyze and plot time series anomalies
python
df = qry_prov.get_networkbytes_per_hour(...)
ts_data = df.mp_timeseries.analyze().mp_timeseries.plot()
![Time series plot](https://github.com/microsoft/msticpy/blob/main/docs/source/visualization/_static/TimeSeriesAnomalieswithRangeTool.png)
Analyze and retrieve anomaly time ranges
python
df = qry_prov.get_networkbytes_per_hour(...)
df.mp_timeseries.analyze().mp_timeseries.anomaly_periods()
raw
[TimeSpan(start=2019-05-13 16:00:00+00:00, end=2019-05-13 18:00:00+00:00, period=0 days 02:00:00),
TimeSpan(start=2019-05-17 20:00:00+00:00, end=2019-05-17 22:00:00+00:00, period=0 days 02:00:00),
TimeSpan(start=2019-05-26 04:00:00+00:00, end=2019-05-26 06:00:00+00:00, period=0 days 02:00:00)]
python
df = qry_prov.get_networkbytes_per_hour(...)
df.mp_timeseries.analyze().mp_timeseries.kql_periods()
raw
'| where TimeGenerated between (datetime(2019-05-13 16:00:00+00:00) .. datetime(2019-05-13 18:00:00+00:00)) or TimeGenerated between (datetime(2019-05-17 20:00:00+00:00) .. datetime(2019-05-17 22:00:00+00:00)) or TimeGenerated between (datetime(2019-05-26 04:00:00+00:00) .. datetime(2019-05-26 06:00:00+00:00))'
---
DataFrame to graph/network visualization
You can convert a pandas DataFrame into a NetworkX graph or
plot directly as a graph using Bokeh interactive plotting.
You pass the functions the column names for the source and target nodes to build a basic graph. You can also name other columns to be node or edge attributes. When displayed these attributes are visible as popup details courtesy of Bokeh’s Hover tool.
python
proc_df.head(100).mp_plot.network(
source_col="SubjectUserName",
target_col="Process",
source_attrs=["SubjectDomainName", "SubjectLogonId"],
target_attrs=["NewProcessName", "ParentProcessName", "CommandLine"],
edge_attrs=["TimeGenerated"],
)
![Graph plot](https://github.com/microsoft/msticpy/blob/main/docs/source/visualization/_static/network-graph-wheelzoom.png)
---
Pivots - easy initialization/dynamic data pivots
The pivot functionality has been overhauled. It is now initialized
automatically in `init_notebook` - so you don't have to import
and create an instance of Pivot.
Better data provider support
Previously, queries from
data providers were added at initialization of the Pivot subsystem.
This meant that you had to:
* create your query providers before starting Pivot
* every time you created a new QueryProvider instance you had to
re-initialize Pivot.
Data providers now dynamically add relevant queries as pivot
functions when you authenticate.
Multi-instance provider support
Some query providers (such as MS Sentinel) support multiple instances.
Previously this was not well supported in Pivot functions - the last
provider loaded would overwrite the queries from earlier providers. Pivot now
supports separate instance naming so that each Workspace has a
separate instance of a given pivot query.
Threat Intelligence pivot functions
The naming of the Threat Intelligence pivot functions has been
simplified considerably.
VirusTotal and RiskIQ relationship queries should now be available as
pivot functions (you need the VT 3 and PassiveTotal packages installed
respectively to enable this functionality).
More Defender query pivots
A number of MS Defender queries (using either the MDE or MSSentinel
QueryProviders) are exposed as Pivot functions.
---
Consolidating Pandas accessors
Pandas accessors let you extend a pandas DataFrame or Series with
custom functions. We use these in MSTICPy to let you call analysis or
visualization functions as methods of a DataFrame.
Most of the functions previously exposed as pandas accessors, plus
some new ones, have been consolidated into two main accessors.
* df.**mp** - contains all of the transformation functions like base64 decoding, ioc searching, etc.
* df.**mp_plot** - contains all of the visualization accessors (timeline, process tree, etc.)
`mp` accessor
* b64extract - base64/zip/gzip decoder
* build_process_tree - build process tree from events
* ioc_extract - extract observables by pattern such as IPs, URLs, etc.
* mask - obfuscate data to hide PII
* to_graph - transform to NetworkX graph
`mp_plot` accessor
* folium_map - plot a Folium map from IP or coordinates
* incident_graph - plot an incident graph
* matrix - plot correlation between two values
* network - plot graph/network from tabular data
* process_tree - plot process tree from process events
* timeline - plot a multi-grouped timeline of events
* timeline_duration - plot grouped start/end of event sequence
* timeline_values - plot timeline with a scalar values
Example usage (note: the required parameters, if any, are not shown)
python
df.mp.ioc_extract(...)
df.mp.to_graph(...)
df.mp.mask(...)
df.mp_plot.timeline(...)
df.mp_plot.timeline_values(...)
df.mp_plot.process_tree(...)
df.mp_plot.network(...)
df.mp_plot.folium_map(...)
One of the benefits of using accessors is the ability to
chain them into a single pandas expression (mixing
with other pandas methods).
python
(
my_df
.mp.ioc_extract(...)
.groupby(["IoCType"])
.count()
.reset_index()
.mp_plot.timeline(...)
)
---
MS Sentinel workspace configuration
From `MPConfig` edit you can more easily import and update
your Sentinel workspace configuration.
![Config Editor showing import URL and Resolve Settings buttons](https://github.com/microsoft/msticpy/blob/main/docs/source/getting_started/_static/mpconfig_edit_new_workspace.png)
Resolve Settings
If you have a minimal configuration (e.g. just the Workspace ID and Tenant ID)
you can retrieve other values such as Subscription ID, Workspace Name
and Resource Group and save them to your configuration using the **Resolve
Settings** button
Import Settings from URL
You can copy the URL from the Sentinel portal and paste it into the
the MPConfigEdit interface. It will extract and lookup the full
details of the workspace to save to your settings.
Expanded Sentinel API support
The functions used to implement the above functionality are
also available standalone in the MSSentinel class.
python
from msticpy.context.azure import MicrosoftSentinel
MicrosoftSentinel.get_workspace_details_from_url(url)
MicrosoftSentinel.get_workspace_name(ws_id)
MicrosoftSentinel.get_workspace_settings(resource_id)
MicrosoftSentinel.get_workspace_settings_by_name(ws_name, sub_id, res_group)
MicrosoftSentinel.get_workspace_id(ws_name, sub_id, res_group)
---
MS Defender queries available in MS Sentinel QueryProvider
Since Sentinel now has the ability to import Microsoft Defender data, we've
made the Defender queries usable from the MS Sentinel provider.
python
qry_prov = QueryProvider("MSSentinel")
qry_prov.MDE.list_host_processes(host_name="my_host")
This is a more general functionality that allows us to share
compatible queries between different QueryProviders.
Many of the MS Defender queries are also now available as Pivot functions.
---
Microsoft Sentinel QueryProvider
* The MS Sentinel provider now support a timeout parameter allowing you
lengthen and shorten the default.
python
qry_prov.MDE.list_host_processes(
host_name="myhost",
timeout=600,
)
* You can set other options supported by Kqlmagic when initializing
the provider
python
qry_prov = mp.QueryProvider("MSSentinel", cloud="government")
* You can specify a workspace name as a parameter when connecting
instead of creating a WorkSpaceConfig instance or supplying
a connection string. To use the Default workspace supply "Default"
as the workspace name.
python
qry_prov.connect(workspace="MyWorkspace")
---
New queries
Several new Sentinel and MS Defender queries have been added.
| QueryGroup | Query | Description |
|------------------|----------------------------------------|-----------------------------------------------------------|
| AzureNetwork | network_connections_to_url | List of network connections to a URL |
| LinuxSyslog | notable_events | Returns all syslog activity for a host |
| LinuxSyslog | summarize_events | Returns a summary of activity for a host |
| LinuxSyslog | sysmon_process_events | Get Process Events from a specified host |
| WindowsSecurity | account_change_events | Gets events related to account changes |
| WindowsSecurity | list_logon_attempts_by_ip | Retrieves the logon events for an IP Address |
| WindowsSecurity | notable_events | Get noteable Windows events not returned in other queries |
| WindowsSecurity | schdld_tasks_and_services | Gets events related to scheduled tasks and services |
| WindowsSecurity | summarize_events | Summarizes a the events on a host |
Over 30 MS Defender queries can now also be used in MS Sentinel workspaces if
MS Defender for Endpoint/MS Defender 365 data is connected to Sentinel
Additional Azure resource graph queries
| QueryGroup | Query | Description |
|----------------|------------------------------------------|-----------------------------------------------------------------------------------------------------|
| Sentinel | get_sentinel_workspace_for_resource_id | Retrieves Sentinel/Azure monitor workspace details by resource ID |
| Sentinel | get_sentinel_workspace_for_workspace_id | Retrieves Sentinel/Azure monitor workspace details by workspace ID |
| Sentinel | list_sentinel_workspaces_for_name | Retrieves Sentinel/Azure monitor workspace(s) details by name and resource group or subscription_id |
See the updated [built-in query list](https://msticpy.readthedocs.io/en/latest/data_acquisition/DataQueries.html)
---
Documentation Additions and Improvements
The documentation for V2.0 is available at <https://msticpy.readthedocs.io>
(Previous versions are still online and can be accessed through
the ReadTheDocs interface).
New and updated documents
* New [MSTICPy Quickstart Guide](https://msticpy.readthedocs.io/en/latest/getting_started/QuickStart.html)
* Updated [Installing guide](https://msticpy.readthedocs.io/en/latest/getting_started/Installing.html)
* Updated [MSTICPy Package Configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)
* Updated [Threat Intel Lookup documentation](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html)
* Updated [Time Series analysis documentation](https://msticpy.readthedocs.io/en/latest/visualization/TimeSeriesAnomalies.html)
* New [Plot Network Graph from DataFrame](https://msticpy.readthedocs.io/en/latest/visualization/NetworkGraph.html)
* Updated [Plotting Folium maps](https://msticpy.readthedocs.io/en/latest/visualization/FoliumMap.html)
* Updated [Pivot functions](https://msticpy.readthedocs.io/en/latest/data_analysis/PivotFunctions.html)
* Updated [Jupyter and Sentinel](https://msticpy.readthedocs.io/en/latest/getting_started/JupyterAndAzureSentinel.html)
API documentation
As well as including all of the new APIs, the API documentation has
been split into a module-per-page to make it easier to read and navigate.
InterSphinx
The API docs also now support "InterSphinx".
This means that MSTICPy references to objects in other packages (e.g. Python
standard library, pandas, Bokeh) have active links that will take you
to the native documentation for that item.
Sample notebooks
The sample notebooks for most of these features have been updated
along the same lines. See [MSTICPy Sample notebooks](https://github.com/microsoft/msticpy/tree/main/docs/notebooks)
There are three new notebooks:
* ContiLeaksAnalysis
* Network Graph from DataFrame
* What's new in MSTICPy 2.0
ContiLeaks notebook added to MSTICPy Repo
We are privileged to host Thomas's awesome ContiLeaks notebook that
covers investigation into attacker forum chats including
some very cool illustration of using natural language translation
in a notebook.
Thanks fr0gger!
---
Miscellaneous improvements
* MSTICPy network requests use a custom User Agent header so that you
can identify or track requests from MSTICPy/Notebooks.
* GeoLiteLookup and the TOR and OpenPageRank providers no longer try
to download data files at initialization - only on first use.
* GeoLiteLookup tries to keep a single instance if the parameters
that you initialize it with are the same
* Warnings cleanup - we've done a lot of work to clean up warnings -
especially deprecation warnings.
* Moved some remaining Python unittest tests to pytest
---
Feedback
Please reach out to us on GitHub - file an issue or start a discussion on
https://github.com/microsoft/msticpy - or msticpymicrosoft.com
---