Changelog:
1. **Encapsulation**:
- Enclosed both `MaxViT` and `RT2` functionalities within a single class named `RT2` to streamline the initialization and usage process.
2. **Default Parameters**:
- Set default values for various parameters in the `RT2` class. This allows users to instantiate the class without having to provide every single parameter, unless they need a non-default configuration.
3. **Training and Evaluation Modes**:
- Introduced `train()` and `eval()` methods to easily toggle between training and evaluation modes for the `RT2` model, reflecting standard practice in PyTorch.
4. **Unified Forward Method**:
- Created a `__call__` method that wraps around the forward method of the `RT2` model. This provides an intuitive way to process videos and instructions by directly invoking the instance of the `RoboticTransformer` class.
5. **Conditional Execution**:
- Modified the forward process (via the `__call__` method) to conditionally use the `cond_scale` argument if provided, ensuring that it's used only during evaluation as hinted in the provided code.
6. **Example Usage**:
- Added an example at the end to demonstrate how to use the new `RT2` class for training and evaluation.
Overall, these changes are geared towards making the code more user-friendly and modular, encapsulating intricacies, and providing a more Pythonic interface to users.