ADMIRAL project announcements: endUser

Showing posts with label endUser. Show all posts

Tuesday, 30 March 2010

First attempt at a security model for ADMIRAL

As we are offering to safeguard real users' research data, I thought we should attempt "due diligence" that we were doing so in a reasonable fashion. To this end, I have been working on a security model for the ADMIRAL data stores.

This is new territory for me, and I'm fairly sure there are many things that I have overlooked, failed to properly think through, or just got plain wrong. But, like all the ADMIRAL working documents, it's public and open to review, which in this case I would eagerly welcome.

The security model is documented at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_LSDS_security_model.

I fully expect to have to revisit this over the course of the project, as requirements are developed and concerns are identified. I hope that by starting now with an imperfect model, we'll have plenty of time to clean it up and make it fit for purpose.

Thursday, 25 March 2010

Reviewing new survey returns - "steady as she goes"

We held a brief meeting to review some additional data usage survey returns from the Behaviour group. Notes are at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_20100325_Data_Surveys_Review.

There is nothing here to suggest any change in our main priorities, but some interest was noted for automatic versioning of data and data visualization.

Meanwhile, we're making progress on some tricky access control configuration issues to meet specific requirements from the Silk group, and are learning to use Linux ACLs to meet the requirements. Some notes about how this is done are at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_ACL_file_access_control, but until we have a full test suite in place, this remains work in progress.

Friday, 19 March 2010

The role of workflows

These are some notes from a discussion with my colleague, Jun Zhao, who has been asked asked about using our research partners as use-case studies for workflow sharing.

Our immediate response to this request was one of skepticism, based on our belief that none of our research partners would be willing to try using workflow-based tools because we couldn't see that they would gain sufficient benefits to justify the "activation energy" of deploying and learning to use such tools. In the past, our partners have been dismissive of using even very simple systems for which they could not perceive immediate benefits.

This was a somewhat surprising conclusion given the enthusiasm for workflow sharing among other bioinformatics researchers, and also researchers in other disciplines, and we wondered why this might be.

We considered each of our research group partners, covering Drosphila genomics, evolutionary development, animal behaviour, mechanical properties and evolutionary factors affecting silk, and elephant conservation in Africa. We noticed that:

each research group used quite manually intensive experimental procedures and data analysis, of which the mechanized data analysis portions were quite a small proportion,
the nature of the procedures and analysis techniques used in the different groups was very diverse, with very little opportunity for sharing between them.

This seems to stand in contrast to genetic studies that screen large numbers of samples for varying levels of gene products, or high throughput sequencing looking for significant similarities of differences in the gene sequences of different sample populations. The closest our research partners come to this is the evolutionary development group, who use shotgun gene sequencing approaches to look for interesting gene products, but even here the particular workflows used appear to be highly dependent on the experimental hypothesis being tested.

What conclusions can we draw from this? Mainly, we think, that it would be a mistake for computer scientists and software tool developers to assume that a set of tools that has been found useful by one group of researchers is useful to all groups studying similar natural phenomena. Details of experiment design would appear to be a dominant indicator for the suitability of a particular type of tool.

Meeting with Silk Group researcher

Held a meeting today with CH of the Silk Group. This was partly a follow-up from the data surveys, and partly to prepare for our first live LSDS deployment. There were (mercifully) few surprises, the main points noted being:

Expectations for usability of access control interface set by Lacie NAS box that the group currently use. For the time being we'll configure the users manually, and later we'll look into UI for creating and modyfing LDAP entries.
File sharing with automatic backup is an important advance in functionality over bare NAS.
Desirability/priority of looking at automatic harvesting to LSDS is raised by our discussions; we will raise the priority of looking at solutions for this. (We've already tried to deploy the Fascinator "Watcher", but that didn't work for us. Another promising option is iFolder,)

Meeting notes are at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_20100319_meeting_CH

Friday, 19 February 2010

First review of data usage surveys

An initial meeting has been held to review initial returns from the data usage survey from the Silk and Development research groups.

Meeting notes are at
http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_20100219_meeting_Data_Surveys_Review.

The main outcome is that our current development priorities appear to be about what the researchers would like to see us provide.

Thursday, 4 February 2010

Research data management support issues

We had a useful meeting today with our departmental IT support group to discuss ADMIRAL and its ongoing support beyond the life of the ADMIRAL project. I believe this meeting brought into focus some policy management issues that will apply across a range of systems that attempt to support data management for individuals and small research groups that don't have resources to do their own IT support. Notes from the meeting are at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_20100204_meeting_-_IT_Support.

The issue raised is that the IT support team are (understandably) hesitant to take on support for any particular system - even when these are just assemblies of off-the-shelf components, such as the base ADMIRAL data sharing platform. Yet, if we are to succeed in providing data management services to researchers, some framework for ongoing support must be worked out. The range of options is additionally constrained by the fact that commercial cloud-hosted services bring problems of responsibility for and jurisdiction over the data. In Oxford, the legal team have expressed concern about having University-owned content hosted on systems that may be anywhere in the world.

In our department, IT support do provide some help and support for staff desktop and personal machines, though not (generally) for particular applications running on them.

Assuming we have effective systems for research data management, that researchers are actually using, then what options do we have, and what resourcing is needed, to ensure that such systems are adequately supported. One key element of support will be application of security fixes.

These considerations give rise to some additional requirements:

The basic research data management system should be simple, generic and built from standard software elements which are individually supported.
The basic system, incorporating essential data storage and backup features, should be generic and usable by a sufficient number of researchers that a standard base configuration can be deployed and supported for diverse research groups.
Essential security fixes should generally be automatically applied, with a minimum of user intervention. This means, for example, that product version that are part of a standard operating system distribution should be used in preference to newer versions. (Occasional manual restarts will inevitably be required.)
The integration of basic system components should, as far as possible, be a loose coupling by lightweight and standard protocols, so that that system as a whole is not unduly dependent on a particular version of any software component.
Any specially-written software should be incorporated in such a way that its failure does not jeopardize essential data security or accessibility. The outputs of any such software (e.g. annotation tools) should be simple, standard formats that can easily be recovered using widely available off-the-shelf applications.

I currently perceive three possible (and not mutually exclusive) modes of support:

Universities and departments recognize the value to their research goals of taking on the burden of supporting some additional systems (e.g. as Southampton University have done).
If a basic system can serve the needs of researchers across several institutions, then maybe an open-source style of support model can be used, with users as a community, backed up by a handful of technical staff, can provide mutual support.
If a sufficiently large community use the system, maybe there is an opportunity for a small business to provide support on a commercial basis, e.g. funded by a fixed fee paid from research grants where data preservation and publication is a key requirement.

These are just some initial thoughts - I think there is plenty of scope for further discussion, to involve researchers, developers, funding agencies and policy makers.

ADMIRAL project announcements