ADMIRAL project announcements: productivity

Showing posts with label productivity. Show all posts

Tuesday, 22 February 2011

AJAX content negotiation browser differences

We have been experiencing hard-to-explain problems with the behaviour of ADMIRAL web pages in different browsers. They would work fine for Firefox, but not with IE, Safari or Google Chrome.

Javascript (using jQuery) is used at the client end to retrieve RDF/XML information from the ADMIRAL server using AJAX calls. The server is capable of returning different formats for the requested data, basing its decision on HTTP Accept headers .

The calling Javascript code looks like this:

jQuery.ajax({
type:         "GET",
url:          "...",
username:     "...",
password:     "...",
dataType:     "text",
beforeSend:   function (xhr)
{
xhr.setRequestHeader("Accept", "application/rdf+xml");
},
success:      function (data, status, xhr)
{
...
},
error:        function (xhr, status)
{
...
},
cache:        false
});

Using Wireshark to observe the HTTP traffic, we find that Firefox sends the following header:

Accept: application/rdf+xml

But when using Safari we see:

Accept: text/plain,*/*,application/rdf+xml

IE and Chrome also send something different from Firefox, but at the time of writing we've lost the exact trace.

The effect of this has been that even when we write an Ajax call to accept just RDF/XML, the server is seeing the additional Accept header options and in some cases is choosing the wrong response format when using browsers other than Firefox.

We have not yet found a simple work-around that works in all situations. But, generally, servers need to be aware that browsers sometimes add commonly required options to the HTTP Accept header. Prioritizing matches for uncommon content options might go some way to ensuring consistent behaviour across browsers. E.g. in the case illustrated here, servers should favour the less common option application/rdf+xml over the more common text/plain content type. Also favouring non-wildcard matches that appear later in the Accept header may help in some cases.

Tuesday, 8 February 2011

Reading RDF/XML in Internet Explorer with rdfQuery

We've just spent the better part of two days tracking down a stupid bug in Internet Explorer.

Under the guise of providing better security, Internet Explorer will not recognize as XML any MIME type other than text/xml or application/xml, and then only when the URI (or Content-disposition header filename) ends with .xml [1]. (I say guise of better security, because a server or intercept that is determined to falsely label XML data can do so in any case: refusing to believe the server's content-type when the data properly conforms to that type does not help; maybe what they are really protecting against is Windows' flawed model of using the filename pattern to determine how to open a file.)

In our case, we use jQuery to request XML data, and pass the resulting jQuery XML object to rdfQuery to build a local RDF "databank" from which metadata can be extracted. On Firefox and Safari, this works just fine. But on Internet Explorer it fails with a "parseerror", which is generated by jQuery.ajax when the retrieved data does not match the requested xml type.

Fortunately, rdfQuery databank.load is also capable of parsing RDF from plain text as well as from a parsed XML document structure. So the fix is simple, albeit not immediately obvious: when performing the jQuery.ajax operation, request text rather than XML data. For example:

jQuery.ajax({
   type: "GET",
   url: "/admiral-test/datasets/"+datasetName,
   username: "...",
   password: "...",
   dataType: "text", // To work on IE, NOT "xml"!
   cache: false
   beforeSend: function (xhr)
   {
   xhr.setRequestHeader("Accept", "application/rdf+xml");
   },
   success: function (data, status, xhr)
   {
   var databank = jQuery.rdf.databank();
   databank.load(data);
   ...
   },
   error: function (xhr, status)
   {
   ...
   },
   });

Sigh!

[1] http://technet.microsoft.com/en-us/library/cc787872(WS.10).aspx

Tuesday, 27 April 2010

WebDAV and Javascript same-origin violations

We've noticed some strange problems using WebDAV to access a server running on the local development machine (i.e. "localhost"). We're using Ajax code running in Firefox to issue the WebDAV HTTP requests, and an APache 2.2 server running mod_dav, etc., to service them. We're using a combination of FireBug and Wireshark to monitor HTTP traffic.

The immediate symptom we see is is that the HTTP requests using methods other than GET or POST (specifically DELETE and MKCOL) are being rejected with access-denied errors without being logged by FireBug. But looking at a Wireshark trace, what we see is an HTTP OPTION request and response, with no further HTTP exchange for the operation requested.

What appears to be happenning is that the HTTP OPTION request is being used to perform "pre-flight" checking of a cross-origin resource sharing request, per http://www.w3.org/TR/access-control/, which is being refused by the server's response.

This was puzzling for us in two ways:

That a request to localhost was being rejected in this way, and
The use of the cross-origin sharing protocol, which is performed "under the hood" by recent versions of Firefox.

The rejection of localhost requests is not consistent: on a MacBook, the request is allowed (still using Firefox and Apache), but on some Ubuntu systems it is refused. When the request is refused, the workaround is to send it explicitly to the fully qualified domain name rather that just "localhost". (This is a bit of a pain as it means our test cases are not fully portable, and I'm hoping we can later find an Apache configuration option to allow this level of resource sharing.)

UPDATE

It turns out that the above "observation" is a complete red herring. We had failed to notice that the web page was being loaded using the full domain name of the host, rather than http://localhost/.... When a URI of the form http://localhost/... is used to load the HTML page that invokes the Javascript code, then WebDAV access to localhost works just as expected.

Wireshark

We've been using Wireshark to help debug and understand protocol flows. I've used Wireshark before, and its predecessor Ethereal, but I've been very impressed at how easy recent versions are to install and use (on Linux and MacOS, at least) for high-level software debugging.

The HTTP protocol decode is really useful, and it handles messy details like re-assembling TCP packets so that protocol units are clearly displayed.

Also, it works very well with a local loopback interface, so it's not necessary to fiogure out arcane filters to exclude background network traffic when debugging a local client/server interaction.

Under Linux, remember that the Pcap library is also needed - the Ubuntu package name is libcap2-bin. Under recent versions of Ubuntu, it is also necessary to set appropriate privileges: see http://wiki.wireshark.org/CaptureSetup/CapturePrivileges.

Monday, 18 January 2010

Selecting a platform for ADMIRAL

Part of the past week or so has been spent coming to a (tentative) decision on the basic platform for ADMIRAL data sharing. The requirements are summarized at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_LSDS_requirements_and_survey.

Reviewing the requirements, none of the more exotic options seemed to adequately address all the points. We've also been giving heightened consideration to (a) ongoing supportability of the platform by departmental IT, and (b) allowing users to use their normal SSO credentials for accessing the shared data - this is turning out to be a more important feature for user acceptance than originally allowed. All this directs us towards a platform that consists primarily of very common components:

Ubuntu-based Linux server (JeOS)
CIFS for local file sharing (mainly because reliable clients are standard in all common desktop operating systems)
Apache2+WebDAV for remote file sharing (we might have tried to use WebDAV for all file sharing, but have concerns that it could be awkward to set up in some environments)
Apache2+WebDAV to provide the basis for web application access to data, to provide additional services, such as annotation and visualization
For remote field workers, we plan to experiment with Dropbox to synchronize with the shared file service as and when Internet connections allow.
Mercurial will be trialled as an option for providing versioning of research data. The main advantage of Mercurial over Subversion for this is that it doesn't leave any hidden files in the working directory (e.g., I have used Subversion to version-manage Linux configuration files (such as those in /etc/...), and occasionally find that the hidden .svn directories can cause problems with some system management tools). Mercurial is also a distributed version management system (unlike Subversion), and might be trialled as an alternative to Dropbox for synchronizing with remote workers.
SSH/SFTP as a fallback for access to files and other facilities. SSH is a protocol that often succeeds where others fail, and can be used to tunnel arbitrary service protocols if necessary. There are quite easy-to-use (though not seamless) SFTP clients for Windows (e.g. WinSCP), MacOS (e.g. CyberDuck) and Linux (e.g. FileZilla?).

For deployment, I'm currently planning to use Ubuntu-hosted KVM virtualization. The other obvious choice would be VMWare, asd that is widely used, but I have found that remote access to a VMware server or similar hosting environment from non-Windows clients can be problematic. Also, it appears that KVM is well-integrated with Ubuntu's cloud computing infrastructure (UEC/Eucalyptus), which is itself API-compatible with Amazon EC2. This seems to give us a range of deployment options.

For using single-sign-on (SSO) credentials, the Oxford University SSO mechanisms are underpinned by Kerberos. It seems that all of the key features proposed for use (CIFS, HTTP and SSH) can be configured to use Kerberos authentication, so we should be able to use standard SSO credentials for accessing all the main services.

Daily automatic backup will be provided by installing a standard Tivoli client on the system, which will perform scheduled backup to the University Hierarchical File Storage (HFS) service. Alternative backup mechanisms could easily be configured in different deployments.

This combination of well-tried software seems to be able to meet all of our initial requirements, and provide a basis for additional features as requirements are identified.

Our aim is to create a system that will be used after the ADMIRAL project completes, so it is important that it must be something that is supportable by our departmental IT support. To this end, the various choices are subject to review when I can have a proper discussion with our IT support, who will have more experience of likely operational problems.

It is worth noting that there are two other data management projects in Oxford with some similar requirements (NeuroHub and EIDCSR); we have arranged to keep in touch, pool resources and adopt a common solution to the extent that makes sense for each project. The choices indicated here remain subject to review in discussion with these other projects.

Wednesday, 2 December 2009

Initial project planning and reporting framework

The day-to-day ADMIRAL project planning and reporting framework is being set up along the lines used for the Shuffl JISC Rapid Innovation project. A wiki will be used for the outline project plan and sprint schedule, with separate links to individual sprint plans. This is available at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_planning. Other ongoing progress reports and commentary will be provided via this blog.

Posting tags will be used to allow aggregation of reports by interested parties, based on the tags allocated for the JISC rapid innovation projects, but using JISCMRD in place of JISCRI.

ADMIRAL project announcements