Create/parse [arcp](https://tools.ietf.org/html/draft-soilandreyes-arcp-03) (Archive and Package) URIs.
Introduction
------------
`arcp` provides functions for creating [arcp](https://tools.ietf.org/html/draft-soilandreyes-arcp-03) URIs, which can be used for identifying or parsing hypermedia files packaged in an archive or package, like a ZIP file.
arcp URIs can be used to consume or reference hypermedia resources bundled inside a file archive or an application package, as well as to resolve URIs for archive resources within a programmatic framework.
This URI scheme provides mechanisms to generate a unique base URI to represent the root of the archive, so that relative URI references in a bundled resource can be resolved within the archive without having to extract the archive content on the local file system.
An arcp URI can be used for purposes of isolation (e.g. when consuming multiple archives), security constraints (avoiding “climb out” from the archive), or for externally identiyfing sub-resources referenced by hypermedia formats.
Examples:
- `arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/doc.html`
- `arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/`
- `arcp://ni,sha-256;F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0/`
- `arcp://name,gallery.example.org/`
The different forms of URI [authority](https://tools.ietf.org/id/draft-soilandreyes-arcp-03.htmlrfc.section.4.1) in arcp URIs can be used depending on which uniqueness constraints to apply when addressing an archive. See the [arcp](https://tools.ietf.org/html/draft-soilandreyes-arcp-03) specification (*draft-soilandreyes-arcp*) for details.
Note that this library only provides mechanisms to *generate* and *parse* arcp URIs, and do *not* integrate with any particular archive or URL handling modules like `zipfile` or `urllib.request`.
Installing
----------
You will need Python 2.7, Python 3.4 or later (Recommended: 3.6).
If you have [pip](https://docs.python.org/3/installing/), then the easiest is normally to install from <https://pypi.org/project/arcp/> using:
pip install arcp
If you want to install manually from this code base, then try:
python setup.py install
Usage
-----
For full documentation, see <http://arcp.readthedocs.io/> or use `help(arcp)`
This module provides functions for creating [arcp](https://tools.ietf.org/html/draft-soilandreyes-arcp-03) URIs, which can be used for identifying or parsing hypermedia files packaged in an archive or package, like a ZIP file:
>>> from arcp import *
>>> arcp_random()
'arcp://uuid,dcd6b1e8-b3a2-43c9-930b-0119cf0dc538/'
>>> arcp_random("/foaf.ttl", fragment="me")
'arcp://uuid,dcd6b1e8-b3a2-43c9-930b-0119cf0dc538/foaf.ttlme'
>>> arcp_hash(b"Hello World!", "/folder/")
'arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/'
>>> arcp_location("http://example.com/data.zip", "/file.txt")
'arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt'
arcp URLs can be used with `urllib.parse`, for instance using `urljoin` to resolve relative references::
>>> css = arcp.arcp_name("app.example.com", "css/style.css")
>>> urllib.parse.urljoin(css, "../fonts/foo.woff")
'arcp://name,app.example.com/fonts/foo.woff'
In addition this module provides functions that can be used to parse arcp URIs into its constituent fields:
>>> is_arcp_uri("arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt")
True
>>> is_arcp_uri("http://example.com/t")
False
>>> u = parse_arcp("arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt")
ARCPSplitResult(scheme='arcp',prefix='uuid',name='b7749d0b-0e47-5fc4-999d-f154abe68065',
uuid='b7749d0b-0e47-5fc4-999d-f154abe68065',path='/file.txt',query='',fragment='')
>>> u.path
'/file.txt'
>>> u.prefix
'uuid'
>>> u.uuid
UUID('b7749d0b-0e47-5fc4-999d-f154abe68065')
>>> u.uuid.version
5
>>> parse_arcp("arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/").hash
('sha-256', '7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069')
The object returned from `parse_arcp` is similar to `ParseResult` from `urlparse`, but contains additional properties `prefix`, `uuid`, `ni`, `hash` and `name`, some of which will be `None` depending on the arcp prefix.
The function `arcp.parse.urlparse` can be imported as an alternative to `urllib.parse.urlparse`. If the scheme is `arcp` then the extra arcp fields like `prefix`, `uuid`, `hash` and `name` are available as from `parse_arcp`, otherwise the output is the same as from regular `urlparse`:
>>> from arcp.parse import urlparse
>>> urlparse("arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/soup;sads")
ARCPParseResult(scheme='arcp',prefix='ni',
name='sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk',
ni='sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk',
hash=('sha-256', '7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069',
path='/folder/soup;sads',query='',fragment='')
>>> urlparse("http://example.com/help?q=a")
ParseResult(scheme='http', netloc='example.com', path='/help', params='',
query='q=a', fragment='')
>>> from arcp.parse import urlparse
>>> urlparse("arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/soup;sads")
ARCPParseResult(scheme='arcp',prefix='ni',
name='sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk',
ni='sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk',
hash=('sha-256', '7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069',
path='/folder/soup;sads',query='',fragment='')
>>> urlparse("http://example.com/help?q=a")
ParseResult(scheme='http', netloc='example.com', path='/help', params='',
query='q=a', fragment='')