Archive

Posts Tagged ‘OAI-PMH’

OAI harvest and PORSCHE: Testing intraLibrary 3.5

December 15, 2011 1 comment

Quick recap: a requirement of PORSCHE was to transfer metadata between Jorum and the NHS eLearning repository:

The primary aim of the PORSCHE project is to provide “seamless access to academic and clinical learning resources to healthcare students”… “establishing the basis for a long term national partnership between the NHS and academia by sharing of appropriately licensed content between Jorum and the NeLR”.  To achieve this, PORSCHE has been exploring metadata transfer between the two repositories using OAI-PMH. Jorum is currently running on a modified DSpace (v1.5.2) and the NeLR on intraLibrary (3.1) both of which will of course expose an OAI-PMH feed, however, there would be an additional requirement for each repository to ingest each other’s feeds which is functionality that is not currently supported by either platform, though it is supported by the NDLR running on a later version of DSpace (v1.6) while PORSCHE are currently liaising with Intrallect to develop similar functionality in intraLibrary.

https://acerep.wordpress.com/2011/06/21/oer-infrastructure-discussions-with-mimas-and-ndlr/

Along with the PORSCHE team I’ve recently been testing a pre-release intraLibrary 3.5 which now includes an OAI-PMH harvest facility so that the metadata from external repositories can appear as a “collection” in intraLibrary. In the context of our own ACErep project, for example, this would also allow us to harvest the ALPS collections from the repositories at the University of Leeds and York St John, thereby offering a form of federated searching as collections are all brought under the standard intraLibrary search mechanism.

harvest admin

Harvest admin allows multiple harvests to be set up with the relevant oai url; user selects the desired collection (oai set) from the drop-down.

Selecting the OER collection from the Leeds Met repository (corresponds to OAI set=23) I am able to select harvested metadata format (dc or lom) and type of harvest (All Records, Since Last Success, From Date), time to run, and select intraLibrary Group/Process and Collection:

Bingo!

Bingo!

Bingo!

…and then, crucially for ACErep, I’m able to search my harvested collection (called mrnick) by SRU:

http://harvest.intralibrary.com/IntraLibrary-SRU?recordSchema=lom&operation=searchRetrieve&version=1.1&maximumRecords=10&startRecord=1&query=rec.collectionIdentifier=mrnick

Not sure yet when we’ll be able to get intraLibrary 3.5 but it should then be straightforward to harvest ALPS collections from the University of Leeds and York St John and search all ALPS resources across the three institutions from an ALPS branded interface.

In the context of PORSCHE, when the NHS is running 3.5 they will be able to harvest relevant collections from Jorum (e.g. HE – Medicine and Dentistry) into the NeLR.

<set>
<setSpec>hdl_123456789_39</setSpec>
<setName>HE – Medicine and Dentistry</setName>
</set>

http://dspace.jorum.ac.uk/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=hdl_123456789_39

Work to achieve similar from Jorum is ongoing via liaison with PORSCHE – see also my post on the UKCoRR blog at http://ukcorr.blogspot.com/2011/12/jorum-steering-group.html and this recent post on the Jorum blog on Coming improvements to the Jorum UserExperience

Advertisements

ACErep update for the PORSCHE Executive Board meeting on Wednesday 10th August 2011

August 12, 2011 Leave a comment

This was the final meeting of the PORSCHE Executive Board – I attended by telephone and submitted this written update from ACErep:

After initial testing of the EasyDeposit SWORD client with the Jorum development server (see April 6th update) we have now configured the client with a SWORD endpoint for each of the several repository platforms we hope to deposit into. These include a demo install of DSpace – http://demo.dspace.org/ – the NDLR (in lieu of Jorum), the WRRO EPrints repository and our own intraLibrary repository.

We have been able to successfully deposit into http://demo.dspace.org/ and into the NDLR (also DSpace) but the client fails with EPrints with a generic error message. I have not yet established what the problem might be and queries have been raised with ‘sword-app-tech@lists.sourceforge.net’ and ‘eprints-tech@ecs.soton.ac.uk’.

SWORD has now also been implemented at York St John (Archivalware) and is currently being tested – I have not yet been able to test with EasyDeposit.

DSpace, EPrints and Archivalware should all accept METS packages but in order to deposit into intraLibrary it is necessary to extend EasyDeposit to package as IMS – negotiation is currently underway to carry out this work in partnership with Intrallect hopefully in time for the ALPS Showcase on 16th September* (there will still also the requirement to standardise Application Profiles across the repositories).

In terms of repository cross-search it was suggested that we may be able explore doing this via the DSpace (v1.6) API (the version run by the NDLR) though I have not yet seen any documentation though our content has now been successfully harvested into the NDLR (test install) via OAI-PMH.

* This work has now been scheduled for w/c 29th August

OER infrastructure – discussions with Mimas and NDLR

June 21, 2011 2 comments

As I posted recently over on repositorynews and as ukoer folk will still be aware, from 1st August, Jorum, historically overseen jointly by the national data-centres in Edinburgh (EDINA) and Manchester (Mimas), will be managed exclusively by Mimas and will liaise more closely with the NDLR in Ireland.

Yesterday I took part in a telephone conference organised by the PORSCHE project team with representatives of Mimas, the NDLR and their technical partners Enovation – this post is to summarise the discussion and describe the processes of data transfer required by PORSCHE and ACErep.

Just to emphasise at the outset that Mimas are extremely positive about moving forward with a community driven process for the development of Jorum – informed by the ukoer programme and projects like PORSCHE and ACErep – but it is still very early days and we should be careful not to raise expectations unrealistically – the service does not pass to Mimas until August who are still in the process of appointing a Jorum Technical Coordinator – a role that is obviously central to these discussions.

The primary aim of the PORSCHE project is to provide “seamless access to academic and clinical learning resources to healthcare students”… “establishing the basis for a long term national partnership between the NHS and academia by sharing of appropriately licensed content between Jorum and the NeLR”.  To achieve this, PORSCHE has been exploring metadata transfer between the two repositories using OAI-PMH. Jorum is currently running on a modified DSpace (v1.5.2) and the NeLR on intraLibrary (3.1) both of which will of course expose an OAI-PMH feed, however, there would be an additional requirement for each repository to ingest each other’s feeds which is functionality that is not currently supported by either platform, though it is supported by the NDLR running on a later version of DSpace (v1.6) while PORSCHE are currently liaising with Intrallect to develop similar functionality in intraLibrary.

N.B. There may be an additional requirement for PORSCHE to transfer files as well as metadata (i.e. full content packages) between the two systems and it was suggested that OAI-ORE (Open Archives Initiative Object Reuse and Exchange) might meet this requirement – I don’t know enough about this standard to comment.  (In intraLibrary I think it is possible to use a custom metadata template to expose the URI for the full content package in the OAI-PMH feed – might this be a possibility for file transfer?)

In lieu of a Jorum software upgrade to match the specification of their Irish counterpart, the NDLR have agreed to test the intraLibrary OAI-PMH feed (both from the NeLR and the Leeds Met repository) into their test-server which should be relatively straightforward (some metadata mapping notwithstanding). Moreover DSpace (v1.6) includes an API that supports search and retrieve and  v1.8 due for release in October of this year includes an improved (RESTful) API which may preclude the need for further development of the Jorum API – identified as a pre-requisite for both ACErep and PORSCHE.

In summary, a proof-of concept metadata exchange will be developed between the NHS Repository and the NDLR, future work by Jorum with hopefully mean that this can this concept can then be replicated:

OAI-PMH transfer between DSpace and intraLibrary

OAI-PMH transfer between DSpace and intraLibrary

Just a note that the NDLR is already ingesting OAI-PMH from Jorum but I’ve spotted an issue in that the resource URL isn’t necessarily being included in the simple record, appearing only in the full record (unlinked) in a dc:identifier field – see https://dspace.ndlr.ie/jspui/simple-search?query=ukoer for examples including this one from Leeds Met https://dspace.ndlr.ie/jspui/handle/10633/29188

N.B. It looks to me like this has arisen in instances where there are multiple dc:identifier fields where the first isn’t a URL – ours often have an intraLibrary ID in the first field (see below). I don’t expect this will be a major issue to resolve?

Unlinked Jorum handle in full metadata record

Unlinked Jorum handle in full metadata record

Project update

March 4, 2011 1 comment

One of the most challenging aspects of ACErep continues to be working across multiple institutions and organisations and recently progress has stalled somewhat as we liaise with our partners and wait for development work essential to the overall infrastructure.

In November last year we were able to build a prototype using Xpert which harvests OAI-PMH from our repository at Leeds Met and which we are then able to selectively search by keyword. We had initially hoped to work with Jorum but at that time there was no Open API, nor were there any plans to harvest by OAI-PMH, so we liaised instead with Pat Lockley of Xpert who was able to help with both requirements.

A lot has happened since November however, both at Xpert and Jorum. Pat is moving on to another (OER related) post which raises questions for us around the sustainability of that service now it has lost its key developer and I have been working with the PORSCHE project at Newcastle University which aims to provide seamless access to academic and clinical learning resources for healthcare students primarily from the respective collections in Jorum and the NHS National eLearning Repository (NeLR). As such, PORSCHE has also requested that Jorum harvest by OAI-PMH and provide an open API; the project team includes the Jorum service manager Hiten Vaghmaria as well as Kate Lomax of the NeLR (which, like Leeds Met’s repository, is also based on intraLibrary.) Jorum have now released a first iteration of an open API and hopefully both projects, and the UKOER community at large, can now work with the national OER service to develop that API to meet our requirements – PORSCHE’s Suzanne Hardy has set up a the Jorum API discussion space (wiki) at http://jorumapi.pbworks.com/w/page/35601929/Jorum-API-discussion-space. I am, in fact, yet to contribute to the discussion myself partly due to a lack of knowledge around API development – I’m far from clear, for example, how we may most effectively use an API to return just a subset of the resources in Jorum (i.e. ALPS/medical resources). In the Xpert prototype we just filtered on keyword but this feels a little unsatisfactory and Pat suggested a custom URL search would be better…

The other pieces of the jigsaw are repository-shaped and variously lacking the dove-tailing standards required to make the model workable. At Leeds University the learning and teaching repository is ExLibris’ DigiTool while York St John have a system called ArchivalWare, both systems are OAI-PMH compliant but do not support SWORD – development work is underway to rectify this at York St John while Jodie has set up an out of the box test install of e-prints to investigate OER object management.

Mike has also made some progress with EasyDeposit which is now working on a Leeds Met server and, after a few teething problems, will now successfully deposit a METS package to Jorum (DSpace), we will still need to test with EPrints (METS should be OK I think) and ArchivalWare when SWORD is integrated with that platform…probably METS again. There is an issue with intraLibrary, however, in that it only accepts IMSCP by SWORD, not METS, so we will also need to write an IMS content packager for EasyDeposit.

We are very grateful to Tamsin Treasure-Jones of ALPs for her continued support with this challenging project and I’m hopeful that all of these pieces can be put together over the Spring and Summer as we move towards a local, regional and national infrastructure for sharing medical teaching, learning and assessment material with an approach that will have the benefit of digital assets being preserved in one location (an institutional repository) while providing several points of access as well as allowing the ALPS branded web-site and the institutional repositories to “piggyback” on Jorum’s Google pagerank and improving discoverability.

Working search prototype and a SWORD fight

November 23, 2010 3 comments

Since the last meeting, Mike has done a great job in putting together a working (search) prototype that uses the Xpert API to search http://www.nottingham.ac.uk/xpert/ for just those resources tagged “alpsportal” i.e. ALPS resources that I have added to our repository and that have been harvested by Xpert.  Currently Xpert is only harvesting Leeds Met but it should now be fairly straightforward to also harvest YSJ and Leeds such that any resources appropriately tagged will be returned from the portal.  It’s on a restricted test server at the moment so this what it looks like (with a few explanatory slides):

And the SWORD fight? There are 3 repositories (possibly 4 if we count JORUM) that we need to be able to deposit into – this is a problem as currently only one of them – intraLibrary at Leeds Met – has fully functioning SWORD.  This did lead us to consider a hybrid deposit process allowing for manual package forwarding but given limited technical resources (this approach would have its own, not inconsiderable, overheads) and the clear advantages of utilising SWORD, it probably makes sense to focus on a prototype that can utilise the protocol and that can, hopefully, plug into the other repositories in the future.

One of the issues we face is that intraLibrary is based on IEEE LOM and only accepts IMS Content Packages by SWORD whereas the majority of (open source) repositories are based on Dublin Core metadata and only accept METS by SWORD; I have been working with Jorum to test their SWORD deposit – as a customised DSpace repository it accepts METS (not IMS) so we have had to map IMSCP for one of our resources -> METS in order for it to be accepted.  Currently I have only been able to achieve this with a very simple package comprising just title, description and author (I’ve also tried to add keyword and rights but these don’t seem to be picked up by Jorum – I’m not sure if this is a problem with Jorum or with my XML – probably the latter!)

It is still a moot point whether we will need METS or IMSCP for YSJ/Leeds until various questions around their specific repository implementations have been answered but we probably need to progress on the basis that we should support both. We will need to get Mike involved on the technical side and I’m hoping that he can start developing a web-based SWORD deposit client that, eventually, will be able to post to any SWORD service URL whether at Leeds Met, Leeds, YSJ or Jorum; as the only viable target repository at this stage is Leeds Met we will want to package as IMSCP in the first instance, ultimately with a view to also packaging as METS depending on the requirements of the destination repository.

MT – I may be asking for the moon on a stick here but I know you’ll put me straight and tell me what we can actually achieve and in what time-scale…just to summarise, below, what (I think) I understand about SWORD so far and what I anticipate might be required of this type of implementation:

SWORD works by means of a service URL that a package is posted to via the ATOM publishing protocol – this service URL allows a “service document” to be retrieved from the target repository which itemises available collections for SWORD deposit. The desktop client I have been using with our repository and with Jorum is available from Sourceforge – http://sourceforge.net/projects/sword-app/.  It enables me to post a .zip containing a resource + its metadata in an imsmanifest.xml to intraLibrary (or resource + mets.xml to Jorum)

Screenshots below:

Authenticate to repository / retrieve service doc (note SWORD service URL)

Authenticate to repository / retrieve service document (note SWORD service URL)

Service doc successfully retrieved – choose collection to deposit into:

Service doc successfully retrieved - choose collection to deposit into:

Confirm collection details for post operation:

Confirm colection details for post operation

Browse to zip on hard drive (already containing imsmanifest.xml) and click button to post to repository:

Browse to zip on hard drive (already containing imsmanifest.xml) and click button to post to repository

Resource and full metadata record successfully posted to repository:

Resource and full metadata record successfully posted to repository:

I imagine we would need a web-form that allows similar metadata to be captured and transformed into an imsmanifest.xml, then allows file upload and zips resource and manifest into an IMSCP that is then posted to a specified collection in the service doc at http://repository-intralibrary.leedsmet.ac.uk/IntraLibrary-Deposit/service.

Erm, sounds a bit tricky to me, Mike?

It might also be worth looking again at Stuart Lewis’ EasyDeposit (blogged about at http://repositorynews.wordpress.com/2010/06/04/easydeposit-the-sword-client-creation-toolkit/) which packages as METS for DSpace/EPrints etc.

A potential infrastructure (2)

June 24, 2010 1 comment

Thanks to Gareth Waller from Jorum who has been in touch to say that there are currently no public APIs available for JorumOpen and no immediate plans to release one.  The only public interfaces at present are:

  • An OAI-PMH target for harvesting metadata records i.e. pulling metadata from JO
  • A SWORD target for depositing METs packages ie pushing content into JO

There is an SRU add on for DSpace but, as Gareth has indicated in his comment on the previous post, it is not enabled due to a bug; it may be enabled in the future if the bug can be fixed.

Rather than SRU, a better way to search JorumOpen will be the new search tool to be launched at the end of the month that will provide a web application to search records from both JorumOpen (DSpace) and JorumUK (intraLibrary) and also provide features such as RSS feeds on search results (I presume this is the tool based on OAI-PMH?)

Wherever possible, Jorum are keen to include the actual resource rather than just metadata record/link to resource in JorumOpen – to allow other users to search on the content of that resource e.g. text in PDF  – which is the approach we have taken with the Unicycle project via bulk upload of IMS.  Whether it will always be appropriate to duplicate resources in this way, however, is still something of a moot point, especially in the context of medical resources where an audit trail/version control might be more of a priority than for Unicycle as flagged up by the MEDEV OOER project.

With this in mind, Gareth has suggested that a better way to integrate with Jorum would be to deposit directly into JorumOpen via SWORD; we could then then use the new search tool for our bespoke portal:

Gareth's suggestion for a modified infrastructure