Home > ACErep, Technical consultation > Using Jorum (and/or Xpert?) for ACErep

Using Jorum (and/or Xpert?) for ACErep

Thanks to Gareth and Hiten at Edina who spared their time to speak to Peter and I yesterday and answer our questions about Jorum to help us determine how we might integrate with ACErep.

In a nutshell we were interested in whether we can deposit into Jorum via SWORD – in addition to our respective institutional repositories – with a view to using the national repository to search across all ALPS resources from the three partner institutions. Ideally we would want to conduct a search from our own ALPS portal and display/format search results in our own environment. However, it seems there is no “open” search facility to query Jorum and return data in a format that we could process ourselves (i.e. XML) and while we may be able to conduct a search from the portal, our only option would then be to “jump-off” to the results in Jorum itself.

In view of limited resources it may be that this is the route we choose to go down but we will need to speak to our stakeholders first to see if it is acceptable to them – it’s obviously not ideal and the additional functionality we could add to the portal would be limited in this scenario (eg. comments/discussion on ALPS resources to bridge the theory/practice gap in health education as suggested by our stakeholders).

The approach that would give us the greatest flexibility, of course, is if we were able to harvest/index/search our three repositories ourselves and Peter will do some research to determine whether we might look into this.   Realistically, however, we may lack the resources (mostly time!) for this to be viable and there is another service that may be worth looking at first:

Xpert, as I blogged recently, is a service at Nottingham University that harvests RSS and OAI-PMH feeds from learning object/OER repositories including Leeds Met (and as of last week Jorum itself) to create a “distributed e-learning repository”.

So…our question for @xpert_project is: Are we be able to query Xpert with an appropriate level of sophistication (tbc!) and return XML that we can process, format and display ourselves?

Peter and I had a look at the APIs recently released as Xpert Labs which includes base URLs to return a variety of data formats including XML; I don’t really know enough about querying databases/data transfer to know whether this, in itself, is sufficient to solve our problem and Peter suggested that the XML returned by this service is called up by the user’s browser rather than being in a format that we could further process (?) but we would be very interested to speak with Xpert to see if there is any mileage in these ideas. If we were able to utilise Xpert in some way then a further caveat is that it would necessitate a delay between deposit and discovery to allow for harvest – Xpert harvests every night I think so we would be looking at an overnight delay – is this acceptable?  Any harvesting solution would also necessitate a delay of course; SWORD deposit to Jorum should mean resources are discoverable immediately.

In the short term, Peter and I intend to pursue both the Jorum and Xpert routes; so far I’ve just tried to sketch out the broad picture – as always the devil is in the detail and in this case that devil, in one way or another, is likely to reside in metadata Hell…or Hades at least.

We have had only the briefest discussion about an Application Profile for ALPS but it is probably desirable to adopt a lightweight AP based on UKOER (see previous post) – with the only additional requirement being that resources are presented utilising a bespoke taxonomy to accommodate “specific learning/assessment outcomes” (tbc).  While it should not be too difficult to map between our disparate systems’ metadata standards to arrive at an AP based on UKOER (Title, Description, Keyword, Classification, Contributor etc) there is a potential issue in that Jorum classifies by JACS and Gareth could not say, without some experimentation, whether non-JACS classification data is indexed and hence searchable; we intend to submit some test METS packages to Jorum by SWORD in order to test this.  There may also be issues around managing Content Packages – especially if we want to deposit them by SWORD (but also harvest by Xpert); our original intention was to deposit everything into Jorum via SWORD (which would require authentication with a UKFed user-account – best option probably to set up a specific ALPS account with an UKFed institutional email?) and this might still be the best option; even if we do go down the Xpert route would we want resources to be harvested from the institutional repositories or from Jorum?  Jorum only accepts METS by SWORD, so IMSCP could not be deposited via the standard and we would need to have some sort of contingency process whereby Content Packages are transferred to Jorum by an alternative mechanism which would also precipitate a (further) delay – such a process is already in place for Unicycle resources.

N.B. In actual fact, I suspect we are unlikely to get large numbers of IMSCPs but the contingency needs considering nevertheless!

At this stage, of course, it still a moot point whether we can query Xpert at all and return data in an appropriate format but we would also want to be sure that we could query a bespoke taxonomy… There may also be issues with respect to harvesting Content Packages (see discussion on this post)

(Attempting) to summarise:

Option 1: Submit to Jorum by SWORD – search Jorum from portal but jump-off to Jorum itself for results. In addition, submit to one of three institutional repositories (depending on user affiliation)

Pro: Integration with national OER infrastructure / deposited resources available immediateley / (relatively) low developmental overheads

Con: User taken out of portal to results in Jorum / limits development of additional functionality

Option 2: Submit to one of three institutional repositories (depending on user affiliation); harvest/index/search the metadata ourselves.

Pro: Maximum control

Con: Resources / timescale / delay associated with harvest / developmental overheads are potentially prohibitive

Option 3: Submit to one of three institutional repositories (depending on user affiliation); ensure all three repositories are harvested by Xpert – utilise API to search Xpert and return XML that we can display/format ourselves.

Pro: (relatively) low developmental overheads (compared to option 2)

Con: Unknown issues (is it feasible?) / delay associated with harvest / does Xpert have resources to help us in ACErep project timescale?

Option 4: Submit to Jorum by SWORD AND one of three institutional repositories (depending on user affiliation); records harvested from Jorum by Xpert – utilise API to search Xpert and return XML that we can display/format ourselves.

Pros/cons: As above (Options 1 & 3)

Advertisements
  1. Pat
    October 12, 2010 at 12:01 pm

    Hello,

    The Xpert Labs also has a file you can download to reformat XML or JSON – http://www.nottingham.ac.uk/xpert/labs/xpertapi.zip

    Within it is HTML you can use effectively as a search engine. So you could set up a leeds met search portal using that code if you wanted to.

    I’d also be interested in looking at alternative ways of harvesting – for example – posting XML at Xpert, or tying completed SWORD uploads into Xpert. Can you code either of these?

    Thanks for the blog

    Pat

  2. Nick
    October 12, 2010 at 12:36 pm

    Much of this will need input from my colleague Mike – he’s been head down on another project but I’m hoping he will be able to cast his eye over this soon as I am now officially 43% beyond my technical ken. Sure he could look at posting XML/tying completed SWORD uploads into Xpert…I’ll ask him.

  3. Pat
    October 12, 2010 at 12:41 pm

    Ok, i could make a page you could paste XML into? So you could use that?

    • Nick
      October 12, 2010 at 12:48 pm

      So I’d just cut and paste XML from, say, an SRU query into the page?

      • Pat
        October 12, 2010 at 12:58 pm

        Yep, i’d make a page where you could request a key, and a more secure importer page than the harvester. But yes, either paste in or upload a downloaded file. Either or.

  4. Nick
    October 12, 2010 at 1:01 pm

    Might that also be quicker than harvesting?

    • Pat
      October 12, 2010 at 1:03 pm

      It’d be roughly the same code to process, but obviously we’d not have to come to you to download the XML.

      But you’d be in the database in under a minute or so.

  5. October 12, 2010 at 3:27 pm

    Hi Nick,

    Thanks for the post – it is a good summary of our meeting yesterday.

    There may be an alternative, which may be of use to you, which the Jorum repository can provide. I didn’t want to raise this yesterday as it isn’t *exactly* what you want i.e. it isn’t an API call but it would allow you to obtain an XML document listing the items which matched a search in the repository.

    As you know, JorumOpen uses a modified version of DSpace and in particular we use the Manakin (XML) interface to DSpace as opposed to the JSP interface. The up side to this for you is that for any rendered page returned to a user’s browser, it is possible to obtain a XML document which is used internally to “build” the content you see in the browser.

    What I am describing in DSpace terms is the “DRI” :

    http://dspace.org/1_5_2Documentation/ch13.html

    As always, the best way to describe how this might be useful is via an example:

    Use the following link below to obtain an XML document which shows the first page of DSpace handles which match the search term “leeds”:

    http://open.jorum.ac.uk/xmlui/search?query=leeds&rpp=10&sort_by=0&order=DESC&XML

    The important bit to notice is the inclusion of “XML” in the query string portion of the URL i.e. the bit after the “?”.

    The inclusion of “XML” in the query string instructs DSpace (or rather Cocoon to be more precise), to return the DRI document to the browser.

    As you will see from the XML, there are a number of “reference” elements which point to a number of “mets.xml” files e.g.

    Those references point to XML documents containing the metadata for the search result e.g.

    http://open.jorum.ac.uk/xmlui/metadata/handle/123456789/1496/mets.xml

    Requesting the above will obtain the “DIM” file for the item in DSpace. This is an internal document representing the metadata for the item and as you can see for JorumOpen it will contain the qualified Dublin Core metadata.

    In summary, it is possible to obtain XML from a search of JorumOpen directly via DSpace but it would be a multi-stage process for you to obtain all the information you need:

    1. Request the DRI for the first page of search results
    2. This will show how many search results and you can work out how many pages of results there are
    3. For each search result on that page, request the DIM
    4. Repeat for all pages in the search results

    Hope that helps,

    If you have any questions, let me know.

    Gareth
    Jorum Technical Manager

    • Nick
      October 13, 2010 at 8:01 am

      Thanks Gareth

      Yes I think that could be really useful – I’ll run all these ideas past our web-dev Mike Taylor to see what thinks.

      Thanks again for your help.

  1. December 23, 2010 at 9:37 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: