JJ Kavelaars (CADC (HIA/NRC)), Pat Dowler (CADC - NRC/HIA), Brian Major (CADC - NRC/HIA) and Dusting Jenkins (CADC - HIA/NRC)
Abstract
The CADC is now making extensive use of the VOSpace protocol for user storage. The VOSpace standard allows a diverse set of rich data services to be delivered to users via a simple protocol. Within the CADC Cloud Computing project (CANFAR) we have used VOSpace as the method for retrieving and storing processing inputs and products. The abstraction of storage is an important component of Cloud Computing and the high use level of our VOSpace service reflects this.
As part of the deployment of VOSpace the CADC provides a JAVA based command line tool (VOSpaceClient) and web browser based interface vosui. The client uses x509 certificates (managed by the CADC) for access control while the browser interface allows username/password BasicAuthentication. Both these VOSpace access modes expose the full capabilities of the CADC VOSpace implementation. In particular, the vosui component provides a connection to allow users to manipulate the group management services (cadcGMS) so that file access among research groups can be maintained. The vosui also provides a recursive download capability and public data access. Combined together these tools provide a reasonably complete tool set for accessing the contents of a user’s VOSpace and sharing those contents with team members. The mode of interaction of command line tool VOSpaceClient is, however, atomic and presents a clear boundary between the File System like interaction and VOSpace.
Many astronomers are familiar with the concepts of network storage volumes, indeed most institutions now use network storage for home directory management. Those astronomers also have good experiences with data sharing services like 'DropBox'. The process of access VOSpace inside the Unix shell via specialized command line tools limits the utility of VOSpace, making choices like DropBox filesystem more appealing as a network storage solution.
cadcVOFS provides a filesystem layer on-top of VOSpace so that standard Unix tools (such as 'find', 'emacs', 'awk' etc) can be used directly on the data objects stored in VOSpace. Once mounted the VOSpace appears as a network storage volume inside the operating system.
cadcVOFS uses the FUSE package as the layer between VOSpace service calls and the FUSE library. The FUSE kernel module and the FUSE library communicate via a special file descriptor which is obtained by opening /dev/fuse. This file can be opened multiple times, and the obtained file descriptor is passed to the mount syscall, to match up the descriptor with the mounted filesystem. FUSE is a robust package that is used in a large variety of FS mimic settings (such as sshfs and sqlfs) and is now part of many standard LINUX installs and is available as MacFUSE for OS-X. Providing a VOSpace FS (cadcVOFS) involves providing methods that translate between standard UNIX style file commands (open, read, write, flush, chmod, chgrp, etc) and the equivalent actions on the VOSpace web service.
cadcVOFS is implemented in Python 2.6 and is intended as an example of the types of ‘user land’ packages that can be developed with VOSpace as a backend. At present the cadcVOFS is composed of three components.
mountvofs is the main command line tool that does the mount action and provides the interpretation layer between python-fuse command calls and the classes in vos.py
vos.py provides small number of classes: ‘Connection’ provides an x509 connection object to the VOSpace service; ‘Client’ pulls and pushes data to the VOSpace service, ‘Node’ provides the methods that set and return information on a particular VOSpace node and ‘VOFile’ provides standard file type methods for acting on the node contents. (open, read, write, etc).
fuse.py is currently distributed with cadcVOFS and provides a thin layer between the FUSE c libraries and Python.
In addition to providing a layer between the FS and VOSpace, the cadcVOFS package also implements file caching. This is an important feature for VM/Cloud computing setups. Users VMs mount their required VOSpace(s) and can access calibration input data as needed. There is no requirement of specialized calls to VOSpace as the contents of the space appear as a filesystem. When the job completes the VOSpace is ‘unmounted’ but the cached files can be left behind. Subsequent jobs, executing on the same VM, can have their cadcVOFS mount point re-use the previous tasks file cache. This provides improved data through-put when the same large calibration files are used repeatedly by multiple tasks running in sequence on the same VM.
We will describe the implementation and use of cadcVOFS using FUSE, both the positive and negative experiences associated with this. In particular, the flexible group read/write structure allowed by VOSpace is difficult to implement as a logical filesystem mapping but the flexibility of interaction via filesystem Paper ID: P066
Poster Instructions
|