Technical themes from OR2010
July 20, 2010 Leave a comment
Open Repositories 2010 wrapped up on 9th July. This post summarises some of the main technical themes from the conference.
Search and Discovery
The main single technical theme that struck me at the conference was the issue of search and discovery. This seemed to be driven in part by dissatisfaction with the search functionality of existing software but also by a recognition of the importance of this aspect of a repository by those developing bespoke systems.
Almost universally the speakers highlighting search and discovery are building on Apache Solr as a platform for providing faceted search. Of particular interest in this area is the Blacklight project which can provide an interface on top of any Solr index. Tom Cramer and Matt Zumvalt described this tool and its use with a Fedora repository but it opens up the possibility of separating the search/browse interface from the administration interface for any repository that can generate a Solr index.
Author Identification and Disambiguation
Another recurrent theme the absence of a clear international standard for author identification is still causing problems for repositories worldwide. While ORCID seems to be gaining ground in this area individual projects are also variously using other services such as AuthorClaim or rolling their own regional or national solutions.
The SWORD API for deposit continues to see widespread adoption. Discussion was not limited to the excellent introductory workshop on Friday afternoon with its showcase of the EasyDeposit SWORD client builder. Examples of diverse application for the API were the Author add-in for Microsoft Office, integration with CRIS systems such as Symplectic, publication of packaged items from the filesystem and proposed SWORD-based export functionality for upcoming releases of DSpace.
DuraSpace were on-hand to provide details of their DuraCloud product which they see being used as an extra storage/backup layer for repositories and also claimed that 50% of surveyed institutions anticipate using cloud services in the next 12 months. EPrints also highlighted their storage manager’s ability to leverage cloud storage. Samuele Kaplun mentioned CERN’s intention to look into cloud processing for resource intensive tasks in their Invenio system.
Linked Data and the Semantic Web
David De Roure’s keynote address and its description of Linked Data compliance in myExperiment set the scene for elements of semantic web functionality bubbling under many of the discussions in the conference. While this hasn’t hit the mainstream yet, with little support for Linked Data in most of the main repository applications I think this could be something to watch out for in the future.
A number of speakers discussed the gathering of usage statistics. While there is a question of how important these really are for repositories of research outputs (reading isn’t the same as citation) it’s always nice with any web application to get some accurate usage data. As Graham Triggs found at the start of his presentation on Google Analytics and Visualization APIs a large number of repositories are already using GA so the interest is definitely there. Judging by the attendance of the stats sessions there is also clearly a desire to see better and more standardised statistics and Daniel Metje and Peter Shepherd presented some moves in this direction.
Some other themes that I spotted as worth a mention included social networking functionality and integration, visualization and workflows/pipelines.