Monday 18 January 2010

Selecting a platform for ADMIRAL

Part of the past week or so has been spent coming to a (tentative) decision on the basic platform for ADMIRAL data sharing. The requirements are summarized at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_LSDS_requirements_and_survey.

Reviewing the requirements, none of the more exotic options seemed to adequately address all the points. We've also been giving heightened consideration to (a) ongoing supportability of the platform by departmental IT, and (b) allowing users to use their normal SSO credentials for accessing the shared data - this is turning out to be a more important feature for user acceptance than originally allowed. All this directs us towards a platform that consists primarily of very common components:
  • Ubuntu-based Linux server (JeOS)
  • CIFS for local file sharing (mainly because reliable clients are standard in all common desktop operating systems)
  • Apache2+WebDAV for remote file sharing (we might have tried to use WebDAV for all file sharing, but have concerns that it could be awkward to set up in some environments)
  • Apache2+WebDAV to provide the basis for web application access to data, to provide additional services, such as annotation and visualization
  • For remote field workers, we plan to experiment with Dropbox to synchronize with the shared file service as and when Internet connections allow.
  • Mercurial will be trialled as an option for providing versioning of research data.  The main advantage of Mercurial over Subversion for this is that it doesn't leave any hidden files in the working directory (e.g., I have used Subversion to version-manage Linux configuration files (such as those in /etc/...), and occasionally find that the hidden .svn directories can cause problems with some system management tools).  Mercurial is also a distributed version management system (unlike Subversion), and might be trialled as an alternative to Dropbox for synchronizing with remote workers.
  • SSH/SFTP as a fallback for access to files and other facilities.  SSH is a protocol that often succeeds where others fail, and can be used to tunnel arbitrary service protocols if necessary.  There are quite easy-to-use (though not seamless) SFTP clients for Windows (e.g. WinSCP), MacOS (e.g. CyberDuck) and Linux (e.g. FileZilla?).
For deployment, I'm currently planning to use Ubuntu-hosted KVM virtualization.  The other obvious choice would be VMWare, asd that is widely used, but I have found that remote access to a VMware server or similar hosting environment from non-Windows clients can be problematic.  Also, it appears that KVM is well-integrated with Ubuntu's cloud computing infrastructure (UEC/Eucalyptus), which is itself API-compatible with Amazon EC2.  This seems to give us a range of deployment options.

For using single-sign-on (SSO) credentials, the Oxford University SSO mechanisms are underpinned by Kerberos.  It seems that all of the key features proposed for use (CIFS, HTTP and SSH) can be configured to use Kerberos authentication, so we should be able to use standard SSO credentials for accessing all the main services.

Daily automatic backup will be provided by installing a standard Tivoli client on the system, which will perform scheduled backup to the University Hierarchical File Storage (HFS) service.  Alternative backup mechanisms could easily be configured in different deployments.

This combination of well-tried software seems to be able to meet all of our initial requirements, and provide a basis for additional features as requirements are identified.

Our aim is to create a system that will be used after the ADMIRAL project completes, so it is important that it must be something that is supportable by our departmental IT support.  To this end, the various choices are subject to review when I can have a proper discussion with our IT support, who will have more experience of likely operational problems.

It is worth noting that there are two other data management projects in Oxford with some similar requirements (NeuroHub and EIDCSR); we have arranged to keep in touch, pool resources and adopt a common solution to the extent that makes sense for each project.  The choices indicated here remain subject to review in discussion with these other projects.

No comments:

Post a Comment