The three big changes in this release are:
* Executing MS Sentinel and Kusto queries in parallel across multiple instance
* Threaded (parallel) execution of time-split queries
* Addition of data provider to query local (exported) Velociraptor logs
Many thanks to d3vzer0 for inspiration and early work on the threaded query feature.
Many thanks juju4 for inspiration and work on the Velociraptor support.
Support for running a query across multiple connections (with optional threaded operation)
It is common for data services to be spread across multiple tenants or workloads. E.g., multiple Sentinel workspaces,
Microsoft Defender subscriptions or Splunk instances. You can use the MSTICPy `QueryProvider` to run a query across multiple connections and return the results in a single DataFrame.
To create a multi-instance provider:
* Create an instance of a QueryProvider for your data source and execute the `connect()` method to connect to the first instance of your data service.
* Then use the `add_connection()` method. This takes the same parameters as the `connect()` method (the parameters for this method vary by data provider) to add additional instance connections.
`add_connection()` also supports an ``alias`` parameter to allow you to refer to the connection by a friendly name.
python3
qry_prov = QueryProvider("MSSentinel")
qry_prov.connect(workspace="Workspace1")
qry_prov.add_connection(workspace="Workspace2, alias="Workspace2")
qry_prov.list_connections()
When you now run a query for this provider, the query will be run on all of the connections and the results will be returned as a single dataframe.
python3
test_query = '''
SecurityAlert
| take 5
'''
query_test = qry_prov.exec_query(query=test_query)
query_test.head()
Some of the MSTICPy drivers support asynchronous execution of queries against multiple instances, so that the time taken to run the query is much reduced compared to running the queries sequentially. Drivers that support asynchronous queries will use this automatically. The initial set of multi-threaded drivers are:
- MSSentinel_New (the new version of the MSSentinel driver)
- Kusto_New (the new version of the Kusto/Azure Data Explorer driver)
By default, the queries will use at most 4 concurrent threads. You can override this by initializing the QueryProvider with the
`max_threads` parameter to set it to the number of threads you want. Although you should be cautious
about using too many simultaneous connections due to the potential impact on the cluster performance.
python3
qry_prov = QueryProvider("MSSentinel", max_threads=10)
Multi-threaded support for split/shared queries
MSTICPy has supported splitting large queries by time-slice for a while. This is documented here [Splitting a Query into time chunks](https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#splitting-query-execution-into-chunks). With this release, we've added asynchronous support for this (if the driver supports threaded/async operation) so that multiple chunks of the query will run in parallel.
python3
qry_prov.SecurityAlert.list_alerts(start=start, end=end, split_by="1d")
Use the parameter `split_query_by` or `split_by` to specify a time range (the time unit uses the same syntax as pandas time intervals - e.g. "1D", "4h", etc. - the the pandas documentation for more details on this).
In this release sharding is also supported for ad hoc queries as long as you add "start" and "end" parameters to the query (this is still experimental, so let us know if you have issues with this).
Velociraptor Local Data Provider
The ``Velociraptor`` data provider can read Velociraptor log files and provide convenient query functions for each data set in the output logs.
The provider can read files from one or more hosts, stored in in separate folders. The files are read, converted to pandas DataFrames and grouped by table/event. Multiple log files of the same type (when reading in data from multiple hosts) are concatenated into a single DataFrame.
To use the Velociraptor provider, you need to create an `QueryProvider` instance, passing the string "Velociraptor" (or "VelociraptorLogs") as the `data_environment` parameter. You also need to add the `data_paths` parameter to specify specific folders that you want to search for log file (although you can set these paths in msticpyconfig.yaml, if you do this frequently).
You can specify multiple folders to have the logs from different hosts.
python3
qry_prov = mp.QueryProvider("VelociraptorLogs", data_paths=["~/my_logs"])
Calling the `connect` method triggers the provider to read the locations of the
log files (although the contents are not read until a query function is run).
python3
qry_prov.connect()
Listing Velociraptor tables
python3
qry_prov.list_queries()
['velociraptor.Custom_Windows_NetBIOS',
'velociraptor.Custom_Windows_Patches',
'velociraptor.Custom_Windows_Sysinternals_PSInfo',
'velociraptor.Custom_Windows_Sysinternals_PSLoggedOn',
....
Each query returns the table of data types retrieved from the logs.
python3
qry_prov.vc_prov.velociraptor.Windows_Forensics_ProcessInfo()
| Name | PebBaseAddress | Pid | ImagePathName | CommandLine | CurrentDirectory | Env |
| :------ | :--------------- | ----: | :----------- | :---------------- | :----------------- | :---- |
| LogonUI.exe | 0x95bd3d2000 | 804 | C:\Windows\system32\LogonUI.exe | "LogonUI.exe" /flags:0x2 /state0:0xa3b92855 /state1:0x41c64e6d | C:\Windows\system32\ | {'ALLUSERSP |
| dwm.exe | 0x6cf4351000 | 848 | C:\Windows\system32\dwm.exe | "dwm.exe" | C:\Windows\system32\ | {'ALLUSERSP |
| svchost.exe | 0x6cd64d000 | 872 | C:\Windows\System32\svchost.exe | C:\Windows\System32\svchost.exe -k termsvcs | C:\Windows\system32\ | {'ALLUSERSP |
| svchost.exe | 0x7d18e99000 | 912 | C:\Windows\System32\svchost.exe | C:\Windows\System32\svchost.exe -k LocalServiceNetworkRestricted | C:\Windows\system32\ | {'ALLUSERSP |
| svchost.exe | 0x5c762eb000 | 920 | C:\Windows\system32\svchost.exe | C:\Windows\system32\svchost.exe -k LocalService | C:\Windows\system32\ | {'ALLUSERSP |
What's Changed
* Ianhelle/velociraptor provider 2023 05 19 by ianhelle in https://github.com/microsoft/msticpy/pull/668
* Updating github checkout and upload-artifact to v3 by ianhelle in https://github.com/microsoft/msticpy/pull/669
* Added multithreading support for additional connections (+fixes) by d3vzer0 in https://github.com/microsoft/msticpy/pull/645
* Bump readthedocs-sphinx-ext from 2.2.0 to 2.2.2 by dependabot in https://github.com/microsoft/msticpy/pull/679
* Bump sphinx-rtd-theme from 1.2.0 to 1.2.2 by dependabot in https://github.com/microsoft/msticpy/pull/675
* Bump httpx from 0.24.0 to 0.24.1 by dependabot in https://github.com/microsoft/msticpy/pull/666
* Ianhelle/fix func query names 2023 06 30 by ianhelle in https://github.com/microsoft/msticpy/pull/680
**Full Changelog**: https://github.com/microsoft/msticpy/compare/v2.5.3...v2.6.0