ADMIRAL project announcements: progressPosts

Showing posts with label progressPosts. Show all posts

Thursday, 31 March 2011

ADMIRAL: the final push

Our most recent development review is at http://imageweb.zoo.ox.ac.uk/wiki/index.php/20110331_Quick_Review.

Over the past few months of the ADMIRAL project, we've been transitioning from a substantially feature development oriented mode to stabilization and maintenance, in order to allow us to ramp up user engagement activities. As part of this, the project management style has evolved to shorter, less elaborately planned sprints. Summaries can be seen on the project plan page. These essentially consist of a combined review and planning session conducted at approximately 1-week intervals, driven by requirements recorded in the project issues list.

We intend to complete all items recorded as high and medium priority in the issues list, except where blocked by issues noted that we are unable to resolve with available resources. This represents a kind of feature freeze in the ADMIRAL data store function, with enhancements focused on stabilization and manageability of the system. With ADMIRAL user features stabilized, and a stable deployment of Databank, we will update all of the deployed systems, and encourage researchers to deposit real research data sets from ADMIRAL to Databank for preservation and publication. Getting real research data deposited and published with DataCite DOIs represents the main project goal that we now want to see realized before the end.

Specifically, over the next three months, we aim to:

Test and integrate the remaining Databank features (see issue list items tagged "Databank")
Issue 9: displaying content tree of dataset prior to confirmation of submission
Issue 42: a web interface for user administration
Issue 45: basic Debian packaging for ADMIRAL (which we expect to allow us to deploy easily on more recent versions of Ubuntu)
if and when time permits, picking up and progressing some of the lower priority technical debt issues

while also dealing with any other critical issues that may arise.

In parallel, we will engage with the various research groups to learn more about how and to what extent they are using ADMIRAL, and encourage them to start submitting datasets to the Databank service.

Friday, 4 March 2011

ADMIRAL Sprint 17

We have recently completed our review of Sprint 17.

Sprint 17 planning: http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_PlanMeeting_17
Sprint 17 review: http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_17

This review was somewhat overdue, as we've been very busy with follow-ups to ADMIRAL deployment with two additional research groups (making a total of three deployments now). Main acheivements over the past month have been:

Two new ADMIRAL deployments; Silk group storage upgraded to use departmental iSCSI facility
Resolved awkward technical Apache+LDAP issue
Started construction of stand-alone demonstration environment
Deployment and management improvements
Bug-fixing and usability improvements
Documentation of technical problem areas
Benefits case study write-up
ADMIRAL packaging adopted for 1st protoype of Wf4Ever project

One of the recent lessons is that the general level of requirement for data storage has increased dramatically since our initial user surveys. Where most groups were originally content with 200-400Gb storage, they are now asking for Terabytes (due to increased use of high definition video for observations). So the ability to connect to a departmental iSCSI network storage facility has turned out to be a crucial development for us, especially for new research proposals that are required to include data management and sharing plans.

Resolving the Apache+LDAP problems has been a most satisfying advance for us; the awkwardness of the Apache web server configuration had been a long-standing difficulty for us, and we will now be able to simplify the overall ADMIRAL configuration and monitoring.

Looking forward, as we enter the final stages of this project, we intend to change our approach to sprint planning. Instead of preparing a separate plan, we intend to be more reactive, responding to issues in the project issue list (http://code.google.com/p/admiral-jiscmrd/issues/list), as these most closely reflect user feedback and other issues that need to be addressed. We will still undertake periodic reviews to help us ensure that efforts are sensibly focused. In addition to dealing with the issue list, two other developments are planned:

Web interface for user management
Investigation of Debian installation package for ADMIRAL deployment

The rationale for choosing these is that they appear to be key features for facilitating continued management and new deployment of ADMIRAL systems within the department.

Tuesday, 22 February 2011

AJAX content negotiation browser differences

We have been experiencing hard-to-explain problems with the behaviour of ADMIRAL web pages in different browsers. They would work fine for Firefox, but not with IE, Safari or Google Chrome.

Javascript (using jQuery) is used at the client end to retrieve RDF/XML information from the ADMIRAL server using AJAX calls. The server is capable of returning different formats for the requested data, basing its decision on HTTP Accept headers .

The calling Javascript code looks like this:

jQuery.ajax({
type:         "GET",
url:          "...",
username:     "...",
password:     "...",
dataType:     "text",
beforeSend:   function (xhr)
{
xhr.setRequestHeader("Accept", "application/rdf+xml");
},
success:      function (data, status, xhr)
{
...
},
error:        function (xhr, status)
{
...
},
cache:        false
});

Using Wireshark to observe the HTTP traffic, we find that Firefox sends the following header:

Accept: application/rdf+xml

But when using Safari we see:

Accept: text/plain,*/*,application/rdf+xml

IE and Chrome also send something different from Firefox, but at the time of writing we've lost the exact trace.

The effect of this has been that even when we write an Ajax call to accept just RDF/XML, the server is seeing the additional Accept header options and in some cases is choosing the wrong response format when using browsers other than Firefox.

We have not yet found a simple work-around that works in all situations. But, generally, servers need to be aware that browsers sometimes add commonly required options to the HTTP Accept header. Prioritizing matches for uncommon content options might go some way to ensuring consistent behaviour across browsers. E.g. in the case illustrated here, servers should favour the less common option application/rdf+xml over the more common text/plain content type. Also favouring non-wildcard matches that appear later in the Accept header may help in some cases.

Tuesday, 8 February 2011

Reading RDF/XML in Internet Explorer with rdfQuery

We've just spent the better part of two days tracking down a stupid bug in Internet Explorer.

Under the guise of providing better security, Internet Explorer will not recognize as XML any MIME type other than text/xml or application/xml, and then only when the URI (or Content-disposition header filename) ends with .xml [1]. (I say guise of better security, because a server or intercept that is determined to falsely label XML data can do so in any case: refusing to believe the server's content-type when the data properly conforms to that type does not help; maybe what they are really protecting against is Windows' flawed model of using the filename pattern to determine how to open a file.)

In our case, we use jQuery to request XML data, and pass the resulting jQuery XML object to rdfQuery to build a local RDF "databank" from which metadata can be extracted. On Firefox and Safari, this works just fine. But on Internet Explorer it fails with a "parseerror", which is generated by jQuery.ajax when the retrieved data does not match the requested xml type.

Fortunately, rdfQuery databank.load is also capable of parsing RDF from plain text as well as from a parsed XML document structure. So the fix is simple, albeit not immediately obvious: when performing the jQuery.ajax operation, request text rather than XML data. For example:

jQuery.ajax({
   type: "GET",
   url: "/admiral-test/datasets/"+datasetName,
   username: "...",
   password: "...",
   dataType: "text", // To work on IE, NOT "xml"!
   cache: false
   beforeSend: function (xhr)
   {
   xhr.setRequestHeader("Accept", "application/rdf+xml");
   },
   success: function (data, status, xhr)
   {
   var databank = jQuery.rdf.databank();
   databank.load(data);
   ...
   },
   error: function (xhr, status)
   {
   ...
   },
   });

Sigh!

[1] http://technet.microsoft.com/en-us/library/cc787872(WS.10).aspx

Monday, 24 January 2011

Data Storage Costs

The ADMIRAL project is creating a local data management facility and testing its use with researchers in the Zoology Department at Oxford University. The facility is built using a standard Linux operating system running in a VMWare hosting environment installed within the department. The environment has a limited amount of local fast SCSI disk storage, but is also capable of using network storage via the iSCSI protocol.

One of the research groups for whom we deployed an ADMIRAL system started using it so enthusiastically that they rapidly filled up their allocated space, at which point they then stopped using the system without telling us.

With opportune timing, the Zoology Department has recently installed an iSCSI network storage array facility, to which disk storage capacity can be added on an as-needed basis. This facility is to be paid for on a cost recovery basis by projects that use it. We have performed tests to prove that we can use this facility within the ADMIRAL data management framework, and are now waiting for the Silk Group and IT Support to order and install the additional storage capacity so we can deploy an enlarged ADMIRAL server to meet the Silk Group's storage requirements.

We have also been in discussion with other research groups not currently involved with ADMIRAL about storage provision to meet funding body requirements for data sharing. Recently released BBSRC data sharing requirements (http://www.bbsrc.ac.uk/web/FILES/Policies/data-sharing-policy.pdf) stipulate that research data should be made available for sharing for at least 10 years beyond the life of a research project (i.e. a total of 12-14 years for a typical BSRC-funded project), and claims to allow the cost of this to be added to a research grant proposal to cover the costs thus incurred. See section "BBSRC Data Sharing Policy Statement " in their data sharing policy document. This does not require that data be kept online for this period, but considering cost, attrition rate and time to obsolescence of alternative solutions, some kind of online facility would appear to be as cost effective as any.

The cost of providing the departmental network storage facility has been estimated at about £400 per Terabyte over a 5 year period. This is estimated on a hardware cost recovery basis, allowing for a normal rate of disk failures over the 5 year period, but not including departmental IT support personnel costs. In discussions with our IT support team, we estimate that the project+10 year duration of required availability will approximately double the hardware-only cost of delivering on such a requirement. Based on these discussions, my current recommendation to Zoology department researchers bidding to meet these requirements would be to cost data preservation and sharing to meet BSRC requirements at £1000/Tb, assuming that departmental IT support remains committed to operational management of the storage server. I would also recommend that they allow 1 person day for each month of the project at appropriate FEC cost to cover data management support, especially if the research team itself does not have IT server systems expertise: I estimate this would reasonably cover ongoing support of an ADMIRAL-like system for a project.

For comparison, Oxford University Computing Service offers a 5 year data archive facility with multiple offsite tape backups for a Full Economic Cost of about £4000/Terabyte. They do not offer an archive service of longer duration. (It is important to distinguish here between a backup service and an archive service: the same OUCS facility provides a daily backup service, but said backups are destroyed after just a few months of no update or access.)

The university library service has tentatively indicated a slightly higher cost per Terabyte for perpetual storage. The meaning of "perpetual" here is open to debate, but the intent is to maintain copies of the data in storage for at least several decades, much as historical books and papers are help by the Bodleian Library for the long term.

Friday, 21 January 2011

ADMIRAL Sprint 16

We have just completed our review of Sprint 16:

Sprint 16 planning: http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_PlanMeeting_16
Sprint 16 review: http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_16

This was the first sprint in Phase 2 of the project. For phase 2, our focus will be to consolidate and extend the ADMIRAL deployments with research groups, and to ensure that we can continue to support them and thereby gain greater understanding of local-level data management concerns through other projects to be conducted over the coming years.

Much of our effort in this sprint has ben focused on deployed system quality improvements that allow us to confidently roll out further ADMIRAL deployments, and in particular to separate configuration data from from the deployed system software so we can update the software without too much disruption to the ADMIRAL users.

Specific enhancements completed in this sprint include:

Dataset repository submission tool usability enhancements
Improved deployment scripts to facilitate software updates
Improved test suite and generally improved system robustness
Tested ADMIRAL with a new departmental storage server

With these quality improvements mostly completed, our next immediate goal will be to extend the use of ADMIRAL by our research users.

Friday, 19 November 2010

ADMIRAL Sprint 15 Plan

Notes from the planning meeting for sprint 15 can be found at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_PlanMeeting_15.

The primary goal of this sprint is to complete the work for Phase 1 of the ADMIRAL project.

As such, rather than set a time-frame for the sprint, then pick some activities and tasks to fill it, we have listed the work needed to wrap this phase, the main goal for which is closing the loop from researchers' data, submitted via ADMIRAl to the Library Databank service, and viewed via the web by the original researchers.

This will form a basis for iterative, researcher-led improvements in phase 2 of the project.

Friday, 12 November 2010

ADMIRAL Sprint 14 review

A review of sprint 14 can be see at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_14.

Progress has been quite satisfying, with several scheduled activities completed, and more progress than planned on the ADMIRAL-to-Databank dataset submission tool, all in spite of experiencing a disk failure on one of our servers that took about 1.5 days effort to recover. This is balanced by less-than-planned progress on deploying ADMIRAL for the Development Group, and not yet having researcher feedback on the dataset submission interface.

We are still awaiting a finalized Databank API test suite.

A new ADMIRAL instance for the Evolutionary Development has been built and deployed, but has not yet been finally configured and handed over for use by them.

We have a functional dataset submission tool and web interface, but there are some significant usability problems that need to be adddressed before we want to consider showing it to researchers. This remaining work is mainly in the area of dataset selection, and the required server components have been developed and tested: it just remains to update the web page to use the new capabilities.

As intended, we completed work to revise the ADMIRAL local store test suite, before scheduling the sprint reviewed here.

Tuesday, 26 October 2010

ADMIRAL Sprint 14 Plan

Notes from a planning meeting for sprint 14 can be found at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_PlanMeeting_14.

The planned focus for this sprint is:

Deploy ADMIRAL system for Development group
Complete repository submission testing
Implement a repository submission tool for researchers to use

Monday, 25 October 2010

ADMIRAL Sprint 13 review

A review of sprint 13 can be see at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_13.

In summary, Progress has been quite satisfying, with most scheduled activities completed, and also some unscheduled progress. Our new recruit is coming up to speed well with the project development and testing activities.

The initial dataset selection and display functions that will allow researchers to view their submitted datasets are functionally complete (apart from finalizing their deployment via the RDF Databank).

We have made substantial progress on updating the local file store system build scripts and test suite, with a view to building a system for use by the Evolutionary Development group.

Some progress has been made on updating and finalizing the Library Services RDF Databank service, but we are still awaiting a version that we can test and use as a target for dataset submission scripts.

For ongoing work, our immediate priorities are to (a) finish updating the local file store test suite, and (b) configure and deploy a system for the Development group. Beyond that, we need to focus on finalizing the RDF Databank API test suite, and building tools to allow researchers to select and submit datasets to the Databank.

ADMIRAL Sprint 13 Plan

Notes from a planning meeting for sprint 13 can be found at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_PlanMeeting_13.

Due to an unplanned staff departure, sprint 12 was abandoned part way through. Sprint 13 was deferred until a replacement could be engaged. (The intervening period was used to wrap up some development work on the MILARQ project.)

(Further, although the sprint was planned per schedule, writing up the meeting got overlooked until the date of the sprint review, hence this rather overdue post.)

Monday, 6 September 2010

ADMIRAL Sprint 12 plan

Sprint 12 has been planned to run over the next 3 weeks.

The Sprint 12 planning meeting is reported at:
http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_PlanMeeting_12.

The three main activities scheduled are:

Deploy ADMIRAL system for Development group
Repository submission testing (mainly coordinating with AR)
Implement repository content display

Friday, 3 September 2010

ADMIRAL Sprint 11 review

This sprint has been substantially taken up with introducing a new developer to the various ADMIRAL technologies. Progress (not enough) has been made with improving and debugging the ADMIRAL system generation scripts, but the resulting system is not passing its tests. Some progress has been made to creating a utility for displaying dataset content in user-friendly fashion.

Review of sprint 11: http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_11

Wednesday, 25 August 2010

ADMIRAL Sprint 10 review and Sprint 11 plan

A new (half-time) developer started on the ADMIRAL project yesterday. After the usual administrative details, and setting up as development environment, we did a mini sprint plan for the next 2 weeks of the ADMIRAL project. I say a mini-sprint plan, as we didn't do the full activity/user story selection, task breakdown and scope bartering, but rather reviewed the remnants of the most recent active sprint plan and identified key unfinished tasks to be tackled.

The next goal for the project is to complete the functionality covered by phase 1 of the project plan by the end of October, with front-to-back submission of research datasets to the Library Services Databank repository service, and providing visible web-based feedback to our research partners of the submitted datasets. This we intend to use as the basis for iterative improvements and enhancements in phase 2 of the project, with the researchers guiding us concerning what constitutes useful metadata to capture and expose with the submitted datasets.

The sprint plan for the period to 7 September aims to:

review, debug and update documentation for the ADMIRAL system scripted creation procedure
create a new ADMIRAL file sharing deployment for the evolutionary development group
file store bug fixes (password over unencrypted HTTP channel; https access reporting server configuration error
progress work on Shuffl RDF support

Review of sprint 10:
http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_10

Plan for sprint 11:
http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintPlan_11

We've also had a technical meeting with the Library Services developer of RDFDatabank (aka Databank), the data repository system:
http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_20100825_technical_meeting_with_library_services

Saturday, 24 July 2010

JSON for linked data - migrating to RDF

See:
http://shuffl-announce.blogspot.com/2010/07/json-for-linked-data-migrating-to-rdf.html

Project partners meeting - 17-Jun-2010 (delayed post)

This entry was mistakenly posted to the wrong blog, so I'm belatedly moving it to it's rightful home, for the record.

We held a project partners meeting on 17 June, notes from which can be found at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_20100617_project_partners_meeting

The main focus for this meeting was discussion of our initial experiences with the OULS data repository, and much of the meeting was taken up with quite arcane technical discussion. Overall, it's looking very promising: we have been able to create test submissions through the provided web interface, and the API looks very straightforward for us to use by other means.

Monday, 19 July 2010

ADMIRAL Sprint 9 review and Sprint 10 planning

The review of sprint 9 has been published at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_9. The main achievements for this sprint were:

a very useful project partner meeting
substantially completed the Databank test suite, except for tests dealing with metadata merging
started work on RDF serialization for Shuffl, via JRON
initial page to display RDFDatabank contents - using jQuery and rdfQuery to create a user display from RDF data
brief project description published in D-Lib

We start sprint 10 with the unfortunate news that our second developer is resigning with immediate effect to attend to family commitments. This will certainly impact achievements in the coming weeks, though we hope to maintain forward progress while we investigate and put in place alternative arrangements. This will surely test our risk assessment claim that an agile approach to development helps to mitigate staffing-related risks!

The planning notes and plan for sprint 10 are at:

Out focus for the next sprint will be:

RDF Databank test cases for metatada merging
finish RDF Databank web page to display dataset details
progress deployment of ADMIRAL data store with other research groups
progress RDF/XML serialization for Shuffl data

Wednesday, 7 July 2010

ADMIRAL Sprint 9 plan

The ADMIRAL sprint 9 plan has been posted at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintPlan_9.

This was a very simplified planning meeting for a short sprint, with a focus on scheduling priority tasks from those already planned.

For the next sprint, we aim to:

Complete test suite for accessing ADMIRAL databank
Progress RDF creation using Shuffl
Arrange meetings with Development and Elephant groups

With a view to working towards a front-to-back, researcher-to-repository, test deployment, including:

capturing dataset descriptions as RDF
packaging data+description and submiting to the RDFDatabank test system

Tuesday, 6 July 2010

ADMIRAL Sprint 8 Review

The review for sprint 8 is posted at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_8.

Project activity has been impacted by unplanned staff absences, but useful progress has been made on a number of technical fronts. Interaction with Library Services to develop the data repository service has been going well, and we look forward to rapid progress in this area. Work to add RDF serialization to Shuffl is under way. Rolling out ADMIRAL data stores to additional research groups has been stalled.

This review is being posted nearly a week later than planned, so includes a period not covered by the original sprint plan.

Summary of achievements this sprint:

project partner and OULS meetings
attendance at JISC TransferSummit meeting
brief project description for D-Lib magazine
Shuffl WebDAV file browsing implemention complete
RDFDatabank access
RDFDatabank initial test suite
negotiating revisions to RDFDatabank API in light of experience
started on RDF serialization for Shuffl

SWORD white paper: relevant to ADMIRAL?

I've just read through a white paper about directions for the SWORD deposit protocol: http://sword2depositlifecycle.jiscpress.org/

I'm recognizing many of the discussion points we've been having about the submission API for ADMIRAL to the library service RDF Databank appearing here:

submitting datasets as packages of files
selective updating within dataset packages
accessing manifest and content
accessing metadata about the package
etc.

I'm not advocating at this stage that we should be trying to track the SWORD word, but I do think we should try to ensure that noting prevents us from creating a full SWORD interface to RDF databank at some stage in the future.