**browsergym-core**
- New features
- :tada: Set-of-Marks :tada: a new method is available to easily overlay element boxes and `bid` attributes on top of the screenshot, following ideas from [WebVoyager](https://arxiv.org/html/2401.13919v3) and [OSWorld](https://arxiv.org/abs/2404.07972)
python
from browsergym.utils.obs import overlay_som
...
obs, info = env.reset()
screenshot_with_som = overlay_som(obs["screenshot"], obs["extra_element_properties"], fontsize = 12, linewidth = 2, tag_margin = 2)
![screenshot_som](https://github.com/ServiceNow/BrowserGym/assets/1726818/4b971dc6-ce53-45cb-b90d-30ad4c36b191)
- new high-level actions `upload_file` and `mouse_upload_file`
- new field `"extra_element_properties"` in each observation. Contains a dict with `bid` keys, which gives the extra properties computed by browsergym for every element with a bid on the current page. Example:
python
{
"23": {
"visibility": 0.6, float between 0 and 1
"bbox": [56, 345, 12, 20], [x, y, width, height]
"clickable": True, boolean
"set_of_marks": False, boolean
}
- new `set_of_marks` property (computed with JS tag `browsergym_set_of_marks`), following [WebVoyager](https://github.com/MinorJerry/WebVoyager/blob/main/utils.py) implementation (boolean 0 or 1, whether element should be part of the set-of-marks overlay)
- new `clickable` property, extracted from Chrome's DOMSnapshot's `isClickable`
- new info fields `"action_exec_start"`, `"action_exec_timeout"` and `"action_exec_stop"` after each `env.step()` call, useful for video editing
- new `resizeable_window` parameter in `BrowserEnv` to switch between setting the viewport size via Chrome (previous behavior, resizeable window and viewport) or via Playwright (new default behaviour, viewport is not resizeable)
- Breaking changes
- changed visibility tag in JS from `browsergym_is_in_viewport` (boolean 0 or 1) to `browsergym_visibility_ratio` (value between 0.0 and 1.0), extracted as the `visibility` extra property (see new features)
- `BrowserEnv` parameters `viewport` (viewport size), `slow_mo` (pause between playwright calls) and `timeout` (default playwright timeout) are now provided by the task. They can still be set in the environment's constructor to override the value provided by the task, which will display a warning.
- each task inheriting `AbstractBrowserTask` must now take a seed at instantiation (in constructor), instead of via the `task.setup()` method. This is also where each task should decide its desired browser setting by setting its attributes `task.viewport`, `task.slow_mo` and `task.timeout` (see point above)
- Refactors
- bid-based high-level actions fail faster (500 ms)
- shorter nested bids with alphabetical bids for iframes (`21-53` -> `a53`)
- fix mouse display position in demo mode (`absolute` -> `fixed`)
- modern chat theme
- refactored coordinate computation using Chrome's DOMSnapshot instead of JS, should be more robust to edge cases
- refactored visibility computation using the `IntersectionObserver` API, should be more robust to edge cases
- more robust frame marking, supports edge cases such as sandboxed iframes, and pdf viewers in `<embed>` tags
**browsergym-miniwob**
- fixed goal conversion to text in task `browsergym/miniwob/click-menu-2`