Slide 1

Motivation
Motivation for NGAS:
Handle huge amount of data streams in real time.
Reduce operational costs (man-power).
Decrease expenses in general.
Provide online and offline processing capabilities.
Ease integration of archive facility with external clients/applications.
Provide a common concept for the online archive and the long-term storage facilities (NGAS ≈ OLAS + ASTO + Jukebox SW + more). Note, no plan to replace OLAS for now.
Simplify and unify the overall infrastructure of the archive system.
Increase data security.

Main Objectives
Main Objectives of NGAS:
Provide an archive facility with services for handling all stages in the life-time of data files:
- Archiving files (+ on-the-fly checking and processing).
- Retrieving & on-the-fly processing of files.
- Ensuring data consistency.
- Providing services for managing data.
- (Executing complex, parallel data processing - TBD)
In addition, to provide a system:
- Which is adaptable to specific contexts.
- With a high performance + scalable.

NGAS: History
History of NGAS:
April 2001: Project started.
Mid June 2001: First operational prototype.
June 2001: Review + approval of design/concept.
Beginning July 2001: Installation/commissioning at La Silla (2.2m/WFI).
Mid July 2001: Entered operation at La Silla.
August 2001: Started operation of Garching NGAS Cluster.
February 2001: Upgrade from Suse to RedHat Linux.
August 2003: Installation/commissioning at Paranal (VLTI).
January 2004: Installation of second archive system for 3.6m/LS.
March 2004: First integration of NGAS on new HW (SATA).
September 2004: First tests using NGAS together with RAID5 Arrays.
September 2004: Archiving of HARPS pipeline products.
December 2004: Archiving of WFCAM frames from Cambridge/UK.

NGAS: Components
Main Components of the NGAS Project:
1. NGAS SW – NG/AMS (Next Generation Archive Management System).
2. NGAS WEB Interfaces.
3. HW – (low cost) PCs with removable ATA disks.
4. NGAS OS (Linux).
5. NGAS Utilities.
6. NGAS Installation and Configuration Tools.

NG/AMS: Basic Concepts
Basic Concepts of the NGAS SW (NG/AMS):
NG/AMS is a platform/framework providing basic services.
No information is hard-coded to support specific types of data – NG/AMS ‘does not know’ what e.g. a FITS file is.
No information is hard-coded to support specific HW configurations.
The specific behavior and the specific knowledge has to be added to the NGAS system – customizable.
Based on standard protocols and formats wherever possible – can be used as a building block.
Simple - advanced features can be added in front-end applications giving clients a different view of the data + provide specific services.

NG/AMS: Main Features/1
Main Features of NG/AMS (1):
Multi-threaded server.
Standard communication protocol (HTTP) + HTTP Authentication.
Data file archiving via Push and Pull Techniques.
Subscription Service including filter mechanism.
DB synchronization (DB Snapshot Feature).
Easy adaptation to different kinds of DBMS’ (ANSI SQL Engine/DB Driver).
Flexible/adaptable due to usage of 10 different kinds of plug-ins.
Many configurable parameters.
XML information exchange.
Email Notification Service.

NG/AMS: Main Features/2
Main Features of NG/AMS (2):
Advanced logging service (Verbose, Local Log File, Syslog).
Background Data Consistency Checking.
Operation in Cluster Mode.
Transparent data retrieval & on-the-fly processing.
APIs in ANSI-C and Python + two clients applications based on these.
Archive Client for secure and simple, remote data file archiving.
Many commands to interact with and control the system.
Portable.
Unit/Functional Tests.

NG/AMS: Server

NG/AMS: Storage Media Infrastructure
Basic Infrastructure of Storage Media:

NG/AMS: XML Information Exchange
Interprocess Data Exchange:
- Most information exchanged between NG/AMS Servers and between the NG/AMS Server and clients, is based on XML.
- Example, NgasDiskInfo Document (NG/AMS Status XML Document):

NG/AMS: HTTP Command Interface

NG/AMS: DB Synchronization
DB Synchronization:
NGAS DBs replicated from Paranal/La Silla to Garching (Unidirectional).
Synchronization between DBs of the various NGAS sites also carried out by NGAS.
NG/AMS maintains snapshot (DBM) on the disks with info about the files stored on it.
Local DB synchronized with this info when the disk reappears on a site.
DB Snapshot can be used as a table of contents for the disk.

NG/AMS: Plug-Ins
NG/AMS Plug-Ins:
Ten different kinds of plug-ins provided. These make it possible to adapt the system to different kinds of hardware and different types of data – nothing is hard-coded:
1. Online Plug-In.
2. Offline Plug-In.
3. Data Archiving Plug-In.
4. Checksum Plug-In.
5. Data Processing Plug-In.
6. Registration Plug-In.
7. Label Printer Plug-In.
8. Filter Plug-In.
9. Suspension Plug-In.
10. Wake-Up Plug-In.
Standard plug-ins delivered with the system. Possible to replace these or add new plug-ins when needed.
The plug-ins delivered with a distribution of NGAS should be viewed as belonging to the core of the system when it comes to testing.
Normal user does not need to know about the plug-ins used.

NG/AMS: Plug-Ins
Data Archiving Plug-In – Basic Functioning:

NG/AMS: XML Configuration
NG/AMS Configuration (1):
About 110 different configurable parameters.
Configuration can be loaded from an XML document or from the DB or a combination of these.
Possible to re-use DB based parameters to compose specific configurations (easier to handle many, slightly different installations).
Main groups of configurable parameters (1):
Basic Parameters: Port number, simulation mode, proxy mode, root mount point, …
Plug-Ins: The various plug-ins the system should use e.g. to handle data of a specific type.
DB Connection: The DB connection parameters.
Permissions: Archive, Retrieve, Processing, Remove Requests allowed.
Archive Handling Parameters: Parameters for handling Archive Requests.
Accepted Data Types: Types of data (mime-types) the system is can handle.

NG/AMS: XML Configuration
NG/AMS Configuration (2):
Main groups of configurable parameters (2):
Storage Sets: The disk configuration.
Streams: Defines how the different kind of data should be streamed onto the Storage Sets.
Available Processing Capabilities: Defines the types of data that can be processed and which Data Processing Plug-Ins to use.
Data Check/Janitor Thread Configuration: Parameters to tune the Data Checking and Janitor Threads.
Logging Parameters: E.g. name of log files + intensity to apply when logging.
Email Notification Parameters: Recipients of the various types of Email Notification Messages.
Host Suspension Parameters: Parameters for suspending a host + for waking up suspended hosts.
Subscription Parameters: Parameters to define if a server should subscribe for data.
Authorization Parameters: Defines the known users and their access code.

NG/AMS: Data Consistency Checking
Data Consistency Checking:
Necessary constantly to monitor the condition of the data in the archive.
Data Consistency Checking – Thread running in background.
Possible to tune the amount of resources occupied by the service.
A check run can be scheduled to run periodically via the configuration.
Checksum check, file availability, unregistered files on storage media.
A check sub-thread is started per disk (max. number configurable).
Info about files on the system dumped once in a DBM, retrieved file by file during checking.
Possible to resume a checking from where the previous was interrupted.
Email Notification send to subscribers in case problems found, e.g.:

NG/AMS: Operation in Cluster Mode/1
Example:

NG/AMS: Operation in Cluster Mode/2
Example:

Garching NGAS Cluster
NGAS Cluster

NG/AMS: Data Processing
Data Processing at Retrieval:
Simple processing supported when retrieving files.
Possible to request the system to apply a Processing Plug-In on the data and to send back the result of the plug-in rather than the data itself.
Processing performed on the sub-node hosting the data.
Possible for clients to use the NGAS Cluster as a ‘number cruncher’ to carry out parallel data processing in a simple manner.
Reduces the amount of data to be transferred to the client. I.e., a floating point number may be returned rather than the entire data file.
Can be extended by providing new Data Processing Plug-Ins for specific contexts.
Could be used to integrate NGAS with the AVO or other archive services.

NG/AMS: APIs
NG/AMS APIs + Clients:
Two APIs implemented in C (C library) and Python (class) provided.
Facilitates implementation of client applications communicating with NGAS, e.g. to retrieve data files.
Two command line utilities are provided, based on the C and Python API, which can be used to interact with an NG/AMS Server.
A standalone Archive Client is provided, based on the C-API:
Independent of any DBMS.
Can be used to archive files from any remote host which can access the NGAS Archive via HTTP.
Attempts to archive file is retried until success is returned or file classified as bad by the remote NGAS system.
Files not cleaned up before cross-checking that they are really in the remote NGAS Archive (CHECKFILE Command).
First applications: Archiving of HARPS pipeline products and WFCAM files from Cambridge/UK.

NG/AMS Client Applications
NG/AMS Archive Client

NG/AMS: Server Commands
NG/AMS Server Commands (HTTP Protocol):
Commands issued as URLs: http://<Host>:<Port>/<Command>[?<Par=Val>[&<Par=Val>]]
Commands:
ARCHIVE: Archive data with Archive Push or Archive Pull Technique.
CHECKFILE: Execute an explicit file check of the given file.
CLONE: Clone an entire disk or individual files.
CONFIG: Configure an online system.
DISCARD: Force removal of file from disk and/or DB independent of number of copies.
EXIT: Make the NG/AMS Server exit.
INIT: Re-initialize the NG/AMS Server.
LABEL: Print out disk labels.
OFFLINE: Bring server to Offline State.
ONLINE: Bring server Online.
REGISTER: Register a file of a set of file already stored on an ‘NGAS Disk’.
REMDISK: Remove a disk from the archive (only allowed if at least 3 copies of each files available).
REMFILE: Remove a file from the archive.
RETRIEVE: Retrieve a file, transparently, from the archive.
STATUS: Query status about the server or another component in the NGAS system/cluster.
SUBSCRIBE: Subscribe to new data or a set of data.
UNSUBSCRIBE: Unsubscribe a previously created subscription.

Unit/Functional Tests - Features
Unit/Functional Tests:
Extensive set of automatic tests provided, consisting of:
30 Test Suites.
~130 Test Cases.
Tests portable (platform/HW independent).
Testing the business logic of the system and correct functioning (simulation mode).
Need to add more Test Cases for testing correct and consistent behavior under abnormal conditions and stress tests.
Needs to be enhanced with ~200 Test Cases before next release.
Possible to generate Test Plan from test code (next slide - overhaul ongoing).

Unit/Functional Tests - Test Plan
Example:

NGAS WEB Interfaces
NGAS WEB Interfaces:
WEB Interfaces provided to assist operators in querying the status of the system and to search for various components (data files, disks, machines).
Used at all sites by the operators (Garching, Paranal, La Silla).
Based on Zope. WEB management system providing editing via WEB browser (http://www.zope.org).
Local Zope WEB Servers available on each site.
Tools provided to list disks, find specific files get an overview of the nodes and their status.
Also the so-called Operator’s Log Book is provided. The operators use this to log all actions carried out.
Used by the operators at Paranal/La Silla to monitor the online archiving activities.
Services missing for interacting with the system. Only possible to control the disk label printing for now.
An enhancement is planned in the near future.

NGAS System/OS
NGAS OS Distribution:
Started on a Suse Linux distribution and migrated to RedHat Linux (ESO standardization).
OS distribution prepared/managed by OTS-SOS.
Support for single-processor and multi-processor configurations.
Support for old HW (PATA) and new HW (SATA).
Limited installation, many packages removed to reduce the size of system.
Special packages needed by NGAS: Python, Sybase interface, Zope, … - installed by the NGAS Installation Tool.
Special driver SW needed for the 3ware controller.
Zope WEB server running on some nodes (optional).
3ware disk controller WEB server running on every host.
Possibility to back-up/restore complete system by means of the Mondo/Mindi tool kit (from a single CDROM) in 10 minutes.
From July 2004 NGAS OS platform installed with kickstart installation script.

NGAS HW
NGAS HW (1):
Started with 8 slots parallel ATA systems.
8 x 80 GB storage capacity per node (640 GB/node, ~1.2 TB compressed).
Since March 2004 a 24 slot serial ATA system in operation (up to 24 * 400 GB = 9.6 TB/node, 19.2 TB compressed).
Reduces price per GB.
More robust HW amongst other due to serial ATA (cleaner cabling).
Disk handling easier, more robust disk frames.
Overall HW stability (hopefully) better and less intervention needed (TBC).
Amount of data/CPU should be balanced to be able to process the data in a limited time.
TBD when to use new HW in operation at observatory sites.
Investigating usage of RAID5 rather then JBOD disks.

NGAS HW
NGAS HW (2):

NGAS Utilities
NGAS Operator’s Utilities/Installation Utilities:
Small module provided (NGAS Utilities) with utilities for the daily work of the operators:
Limited time invested in this so far, however essential tools for the operation provided (e.g. Clone Verification Tool, Check File List Tool, Clone File List Tool, …).
The function of many of these tools should be taken over by the NGAS WEB Interfaces when these have been enhanced.
The module NGAS Installation Tools provides some utilities to install and check the system:
Tool provided to build ‘NGAS layer’ on top of the ‘basic’ NGAS Linux distribution.
Functionality still to be implemented.

NGAS Infrastructure
Present ESO NGAS Infrastructure:

NGAS: Future Plans
(Near) Future Plans for NGAS:
Received detailed requirements from archive operations.
Enhance NGAS WEB Management Interfaces.
Enhancement of services for operation in cluster (extended proxy mode).
Enhancement of installation utilities.
Enhancement of unit tests (simulation of archive cluster operation).
Implement load balancing/archive cluster operation for high availability/high data rates (VST/ΩCam: up to 300 GB/night, VISTA/VistaCAM up to 1 TB/night - TBC).
Support for advanced data processing, utilizing an NGAS Cluster as a parallel processing engine (specify complex recipes, which are executing parallel data processing) – will be analyzed in the near future.
Support for the Astrophysical Virtual Observatory/GRID?

Status - December 2004
Status of NGAS Project December 2004:
In operation since July 2001.
Used heavily on a daily basis by archive operators in Garching.
Data archived daily at La Silla, Paranal and at ESO HQ.
Data archived directly into NGAS Archive in Garching from Paranal and Cambridge/WFCAM.
Some statistics:
Total number of nodes: ~25.
Total number of disks in use: ~260.
Total number of files in NGAS Archive: ~1,500,000.
Amount of compressed data in NGAS Archive: ~27 TB.
Amount of uncompressed data in NGAS Archive: ~45 TB.
Maximum throughput per node (archiving): ~400 GB/24 hours (including compression).
Major Issues to Address:
Need to invest more resources in implementing automatic tests in particular for testing robustness and handling of abnormal conditions.
Need to implement resources in implement an enhanced user interface - not very user-friendly at the moment.
Need to update the design document to reflect present status of system (not updated since it was written SPRING 2001).
Should investigate improved ways of ensuring data consistency and means for recovering lost data.