Monday, 24 January 2011

Data Storage Costs

The ADMIRAL project is creating a local data management facility and testing its use with researchers in the Zoology Department at Oxford University. The facility is built using a standard Linux operating system running in a VMWare hosting environment installed within the department.  The environment has a limited amount of local fast SCSI disk storage, but is also capable of using network storage via the iSCSI protocol.

One of the research groups for whom we deployed an ADMIRAL system started using it so enthusiastically that they rapidly filled up their allocated space, at which point they then stopped using the system without telling us.

With opportune timing, the Zoology Department has recently installed an iSCSI network storage array facility, to which disk storage capacity can be added on an as-needed basis. This facility is to be paid for on a cost recovery basis by projects that use it. We have performed tests to prove that we can use this facility within the ADMIRAL data management framework, and are now waiting for the Silk Group and IT Support to order and install the additional storage capacity so we can deploy an enlarged ADMIRAL server to meet the Silk Group's storage requirements.

We have also been in discussion with other research groups not currently involved with ADMIRAL about storage provision to meet funding body requirements for data sharing. Recently released BBSRC data sharing requirements (http://www.bbsrc.ac.uk/web/FILES/Policies/data-sharing-policy.pdf) stipulate that research data should be made available for sharing for at least 10 years beyond the life of a research project (i.e. a total of 12-14 years for a typical BSRC-funded project), and claims to allow the cost of this to be added to a research grant proposal to cover the costs thus incurred. See section "BBSRC Data Sharing Policy Statement " in their data sharing policy document. This does not require that data be kept online for this period, but considering cost, attrition rate and time to obsolescence of alternative solutions, some kind of online facility would appear to be as cost effective as any.

The cost of providing the departmental network storage facility has been estimated at about £400 per Terabyte over a 5 year period.  This is estimated on a hardware cost recovery basis, allowing for a normal rate of disk failures over the 5 year period, but not including departmental IT support personnel costs.  In discussions with our IT support team, we estimate that the project+10 year duration of required availability will approximately double the hardware-only cost of delivering on such a requirement. Based on these discussions, my current recommendation to Zoology department researchers bidding to meet these requirements would be to cost data preservation and sharing to meet BSRC requirements at £1000/Tb, assuming that departmental IT support remains committed to operational management of the storage server. I would also recommend that they allow 1 person day for each month of the project at appropriate FEC cost to cover data management support, especially if the research team itself does not have IT server systems expertise: I estimate this would reasonably cover ongoing support of an ADMIRAL-like system for a project.

For comparison, Oxford University Computing Service offers a 5 year data archive facility with multiple offsite tape backups for a Full Economic Cost of about £4000/Terabyte. They do not offer an archive service of longer duration. (It is important to distinguish here between a backup service and an archive service: the same OUCS facility provides a daily backup service, but said backups are destroyed after just a few months of no update or access.)

The university library service has tentatively indicated a slightly higher cost per Terabyte for perpetual storage.  The meaning of "perpetual" here is open to debate, but the intent is to maintain copies of the data in storage for at least several decades, much as historical books and papers are help by the Bodleian Library for the long term.

Friday, 21 January 2011

ADMIRAL Sprint 16

We have just completed our review of Sprint 16:
This was the first sprint in Phase 2 of the project.  For phase 2, our focus will be to consolidate and extend the ADMIRAL deployments with research groups, and to ensure that we can continue to support them and thereby gain greater understanding of local-level data management concerns through other projects to be conducted over the coming years.

Much of our effort in this sprint has ben focused on deployed system quality improvements that allow us to confidently roll out further ADMIRAL deployments, and in particular to separate configuration data from from the deployed system software so we can update the software without too much disruption to the ADMIRAL users.

Specific enhancements completed in this sprint include: 
  • Dataset repository submission tool usability enhancements
  • Improved deployment scripts to facilitate software updates
  • Improved test suite and generally improved system robustness
  • Tested ADMIRAL with a new departmental storage server
With these quality improvements mostly completed, our next immediate goal will be to extend the use of ADMIRAL by our research users.

Friday, 19 November 2010

ADMIRAL Sprint 15 Plan

Notes from the planning meeting for sprint 15 can be found at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_PlanMeeting_15.

The primary goal of this sprint is to complete the work for Phase 1 of the ADMIRAL project.

As such, rather than set a time-frame for the sprint, then pick some activities and tasks to fill it, we have listed the work needed to wrap this phase, the main goal for which is closing the loop from researchers' data, submitted via ADMIRAl to the Library Databank service, and viewed via the web by the original researchers.

This will form a basis for iterative, researcher-led improvements in phase 2 of the project.

Friday, 12 November 2010

ADMIRAL Sprint 14 review

A review of sprint 14 can be see at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_14.

Progress has been quite satisfying, with several scheduled activities completed, and more progress than planned on the ADMIRAL-to-Databank dataset submission tool,  all in spite of experiencing a disk failure on one of our servers that took about 1.5 days effort to recover. This is balanced by less-than-planned progress on deploying ADMIRAL for the Development Group, and not yet having researcher feedback on the dataset submission interface.

We are still awaiting a finalized Databank API test suite.

A new ADMIRAL instance for the Evolutionary Development has been built and deployed, but has not yet been finally configured and handed over for use by them.

We have a functional dataset submission tool and web interface, but there are some significant usability problems that need to be adddressed before we want to consider showing it to researchers. This remaining work is mainly in the area of dataset selection, and the required server components have been developed and tested: it just remains to update the web page to use the new capabilities.

As intended, we completed work to revise the ADMIRAL local store test suite, before scheduling the sprint reviewed here.

Tuesday, 26 October 2010

ADMIRAL Sprint 14 Plan

Notes from a planning meeting for sprint 14 can be found at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_PlanMeeting_14.

The planned focus for this sprint is:
  • Deploy ADMIRAL system for Development group
  • Complete repository submission testing
  • Implement a repository submission tool for researchers to use

Monday, 25 October 2010

ADMIRAL Sprint 13 review

A review of sprint 13 can be see at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_13.

In summary, Progress has been quite satisfying, with most scheduled activities completed, and also some unscheduled progress. Our new recruit is coming up to speed well with the project development and testing activities.

The initial dataset selection and display functions that will allow researchers to view their submitted datasets are functionally complete (apart from finalizing their deployment via the RDF Databank).

We have made substantial progress on updating the local file store system build scripts and test suite, with a view to building a system for use by the Evolutionary Development group.

Some progress has been made on updating and finalizing the Library Services RDF Databank service, but we are still awaiting a version that we can test and use as a target for dataset submission scripts.

For ongoing work, our immediate priorities are to (a) finish updating the local file store test suite, and (b) configure and deploy a system for the Development group. Beyond that, we need to focus on finalizing the RDF Databank API test suite, and building tools to allow researchers to select and submit datasets to the Databank.

ADMIRAL Sprint 13 Plan

Notes from a planning meeting for sprint 13 can be found at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_PlanMeeting_13.

Due to an unplanned staff departure, sprint 12 was abandoned part way through. Sprint 13 was deferred until a replacement could be engaged. (The intervening period was used to wrap up some development work on the MILARQ project.)

(Further, although the sprint was planned per schedule, writing up the meeting got overlooked until the date of the sprint review, hence this rather overdue post.)