The following improvements have been made to Ricgraph.
Ricgraph explorer:
- If you have harvested from more than one source, it may be possible that a record which you've harvested from _system 2_ (say, _ORCID 1234_ for Alice), may already be in Ricgraph because it was also present in _system 1_. This is perfectly fine behaviour. This _ORCID_ record will not be inserted twice, because it is already present. The only modification will be that _system 2_ is added to the __source_ list of the _ORCID_ record of Alice. In this new release, Ricgraph explorer has an option to create a table that shows the overlap in harvests from different source systems. You do a query in Ricgraph (e.g. show all _ORCID_ nodes), and then you choose to show a table that summarizes the number of _ORCID_ nodes which were only found in one source, and which were found in multiple sources. Another table gives a detailed overview how many nodes originate from which different source systems. Then, you can drill down by clicking on a number in one of these two tables to find the nodes corresponding to that number. In the example _ORCID_ node of Alice, the first table will tell you that there is 1 node found in multiple sources, and the second table will tell you that this node originated from _system 1_ and _system 2_ by showing a "1" in row and column representing these systems. Another use of this table is that you e.g. find Alices node, and then the tables will show which of the nodes connected to Alice (e.g. her journal articles or software packages) are unique to only one source system, and which nodes originate from multiple sources.
Ricgraph:
- Now there are global research output type names defined in _ricgraph.py_, such als _journal article_ or _software_. Each harvest script has a mapping table that translates the name used in that source (e.g. _article_) to the name which is used in Ricgraph (_journal article_). The advantage of this is that there is one place where all research output type names are defined, and that there is only one way how they show up in Ricgraph.
Harvest scripts:
- Modifications for the mapping table for research output type names.
- For organizations, now their names are used for the _value_ field, so you can search for an organization name. Previously, Ricgraph used identifiers, such as local Pure UUIDs or RORs. Although the latter might be preferred, I changed this because there do not seem to be generally used organization identifiers yet for sub organizations (for e.g. faculties or departments of a university).
- Pure harvesting: suppose Alice works for University X, Faculty Y, and Department Z. Previously this hierarchy was represented in Ricgraph. Now, each of the (sub) organizations a person works for, are directly connected to the _person-root_ node of this person. In the example with Alice, the node for University X is connected to the _person-root_ node of Alice, as are the nodes for Faculty Y and Department Z. This has the advantage that you are able to select e.g. University X, and find the persons working for this university, or that you can select e.g. Faculty Y, and find the persons involved with that faculty, etc.
- Pure harvesting: if a person works for multiple (sub) organizations, this person will be connected to all of these (sub) organizations. Previously a person was only connected to one (sub) organization.
- The batch harvesting script _batch_harvest.py_ now has some error checking. Also, you can have this script modify your graph specific for your organization. For example, for Utrecht University, in the Pure harvest we find organization name _University: Universiteit Utrecht_, while this same organization is called _Utrecht University_ in OpenAlex. In the batch script we change the name from the former to the latter, so that records harvested from OpenAlex will be mapped to the same organization as records harvested from Pure. This ensures a more concise graph.