Highlights
This release of robomimic brings integrated support for mobile manipulation datasets from the recent [MOMART](https://sites.google.com/view/il-for-mm/home) paper, and adds modular features for easily modifying and adding custom observation modalities and corresponding encoding networks.
MOMART Datasets
We have added integrated support for MOMART [datasets](https://sites.google.com/view/il-for-mm/datasets), a large-scale set of multi-stage, long-horizon mobile manipulation task demonstrations in a simulated kitchen environment collected in iGibson.
Using MOMART Datasets
Datasets can be easily downloaded using [download_momart_datasets.py](https://github.com/ARISE-Initiative/robomimic/tree/master/robomimic/scripts/download_momart_datasets.py).
For step-by-step instructions for setting up your machine environment to visualize and train with the MOMART datasets, please visit the [Getting Started](https://sites.google.com/view/il-for-mm/datasets#h.whukwluu16gm) page.
Modular Observation Modalities
We also introduce modular features for easily modifying and adding custom observation modalities and corresponding encoding networks. A **modality** corresponds to a group of specific observations that should be encoded the same way.
Default Modalities
robomimic natively supports the following modalities (expected size from a raw dataset shown, excluding the optional leading batch dimension):
- `rgb` (H, W, 3): Standard 3-channel color frames with values in range `[0, 255]`
- `depth` (H, W, 1): 1-channel frame with normalized values in range `[0, 1]`
- `low_dim` (N): low dimensional observations, e.g.: proprioception or object states
- `scan` (1, N): 1-channel, single-dimension data from a laser range scanner
We have default encoder networks which can be configured / modified by setting relevant parameters in your config, e.g.:
python
These keys should exist in your dataset
config.observation.modalities.obs.rgb = ["cam1", "cam2", "cam3"] Add camera observations to the RGB modality
config.observation.modalities.obs.low_dim = ["proprio", "object"] Add proprioception and object states to low dim modality
...
Now let's modify the default RGB encoder network and set the feature dimension size
config.observation.encoder.rgb.core_kwargs.feature_dimension = 128
...
To see the structure of the observation modalities and encoder parameters, please see the [base config](https://github.com/ARISE-Initiative/robomimic/blob/master/robomimic/config/base_config.py#L195) module.
Custom Modalities
You can also easily add your own modality and corresponding custom encoding network! Please see our example [add_new_modality.py](https://github.com/ARISE-Initiative/robomimic/tree/master/examples/add_new_modality.py).
Refactored Config Structure
With the introduction of modular modalities, our `Config` class structure has been modified slightly, and will likely cause breaking changes to any configs you have created using version 0.1.0. Below, we describe the exact changes in the config that need to be updated to match the current structure:
Observation Modalities
The `image` modality have been renamed to `rgb`. Thus, you would need to change your config in any places referencing `image` modality, e.g.:
python
Old format
config.observation.modalities.image.<etc>
New format
config.observation.modalities.rgb.<etc>
The `low_dim` modality remains unchanged. Note, however, that we have additionally added integrated support for both `depth` and `scan` modalities, and can be referenced in the same way, e.g.:
python
config.observation.modalities.depth.<etc>
config.observation.modalities.scan.<etc>
Observation Encoders / Randomizer Networks
We have modularized the encoder / randomizer arguments so that they are general, and are unique to each type of observation modality. All of the original arguments in v0.1.0 have been preserved, but are now re-formatted as follows:
python
OLD
Previously, a single set of arguments were specified, and was hardcoded to process image (rgb) observations
Assumes that you're using the VisualCore class, not general!
config.observation.encoder.visual_feature_dimension = 64
config.observation.encoder.visual_core = 'ResNet18Conv'
config.observation.encoder.visual_core_kwargs.pretrained = False
config.observation.encoder.visual_core_kwargs.input_coord_conv = False
For pooling, is hardcoded to use spatial softmax or not, not general!
config.observation.encoder.use_spatial_softmax = True
kwargs for spatial softmax layer
config.observation.encoder.spatial_softmax_kwargs.num_kp = 32
config.observation.encoder.spatial_softmax_kwargs.learnable_temperature = False