Slide 1
Motivation
|
|
|
Motivation for NGAS: |
|
|
|
Handle huge amount of data streams in
real time. |
|
Reduce operational costs (man-power). |
|
Decrease expenses in general. |
|
Provide online and offline processing
capabilities. |
|
Ease integration of archive facility
with external clients/applications. |
|
Provide a common concept for the online
archive and the long-term storage facilities (NGAS ≈ OLAS + ASTO +
Jukebox SW + more). Note, no plan to replace OLAS for now. |
|
Simplify and unify the overall
infrastructure of the archive system. |
|
Increase data security. |
|
|
Main Objectives
|
|
|
Main Objectives of NGAS: |
|
|
|
Provide an archive facility with
services for handling all stages in the life-time of data files: |
|
- Archiving files (+ on-the-fly
checking and processing). |
|
- Retrieving & on-the-fly
processing of files. |
|
- Ensuring data consistency. |
|
- Providing services for managing
data. |
|
- (Executing complex, parallel data
processing - TBD) |
|
|
|
In addition, to provide a system: |
|
- Which is adaptable to specific
contexts. |
|
- With a high performance + scalable. |
NGAS: History
|
|
|
History of NGAS: |
|
|
|
April 2001: Project started. |
|
Mid June 2001: First operational
prototype. |
|
June 2001: Review + approval of
design/concept. |
|
Beginning July 2001:
Installation/commissioning at La Silla (2.2m/WFI). |
|
Mid July 2001: Entered operation at La
Silla. |
|
August 2001: Started operation of
Garching NGAS Cluster. |
|
February 2001: Upgrade from Suse to
RedHat Linux. |
|
August 2003:
Installation/commissioning at Paranal (VLTI). |
|
January 2004: Installation of second
archive system for 3.6m/LS. |
|
March 2004: First integration of NGAS
on new HW (SATA). |
|
September 2004: First tests using NGAS
together with RAID5 Arrays. |
|
September 2004: Archiving of HARPS
pipeline products. |
|
December 2004: Archiving of WFCAM
frames from Cambridge/UK. |
NGAS: Components
|
|
|
Main Components of the NGAS Project: |
|
|
|
1. NGAS SW – NG/AMS (Next Generation
Archive Management System). |
|
2. NGAS WEB Interfaces. |
|
3. HW – (low cost) PCs with removable
ATA disks. |
|
4. NGAS OS (Linux). |
|
5. NGAS Utilities. |
|
6. NGAS Installation and Configuration
Tools. |
|
|
NG/AMS: Basic Concepts
|
|
|
|
Basic Concepts of the NGAS SW (NG/AMS): |
|
NG/AMS is a platform/framework
providing basic services. |
|
No information is hard-coded to support
specific types of data – NG/AMS ‘does not know’ what e.g. a FITS file is. |
|
No information is hard-coded to support
specific HW configurations. |
|
The specific behavior and the specific
knowledge has to be added to the NGAS system – customizable. |
|
Based on standard protocols and formats
wherever possible – can be used as a building block. |
|
Simple - advanced features can be added
in front-end applications giving clients a different view of the data +
provide specific services. |
NG/AMS: Main Features/1
|
|
|
|
Main Features of NG/AMS (1): |
|
Multi-threaded server. |
|
Standard communication protocol (HTTP)
+ HTTP Authentication. |
|
Data file archiving via Push and Pull
Techniques. |
|
Subscription Service including filter
mechanism. |
|
DB synchronization (DB Snapshot
Feature). |
|
Easy adaptation to different kinds of
DBMS’ (ANSI SQL Engine/DB Driver). |
|
Flexible/adaptable due to usage of 10
different kinds of plug-ins. |
|
Many configurable parameters. |
|
XML information exchange. |
|
Email Notification Service. |
NG/AMS: Main Features/2
|
|
|
|
Main Features of NG/AMS (2): |
|
Advanced logging service (Verbose,
Local Log File, Syslog). |
|
Background Data Consistency Checking. |
|
Operation in Cluster Mode. |
|
Transparent data retrieval &
on-the-fly processing. |
|
APIs in ANSI-C and Python + two clients
applications based on these. |
|
Archive Client for secure and simple,
remote data file archiving. |
|
Many commands to interact with and
control the system. |
|
Portable. |
|
Unit/Functional Tests. |
|
|
NG/AMS: Server
NG/AMS: Storage Media
Infrastructure
|
|
|
Basic Infrastructure of Storage Media: |
NG/AMS: XML Information
Exchange
|
|
|
Interprocess Data Exchange: |
|
- Most information exchanged between
NG/AMS Servers and between the NG/AMS Server and clients, is based on XML. |
|
- Example, NgasDiskInfo Document
(NG/AMS Status XML Document): |
NG/AMS: HTTP Command
Interface
NG/AMS: DB
Synchronization
|
|
|
|
DB Synchronization: |
|
NGAS DBs replicated from Paranal/La
Silla to Garching (Unidirectional). |
|
Synchronization between DBs of the
various NGAS sites also carried out by NGAS. |
|
NG/AMS maintains snapshot (DBM) on the
disks with info about the files stored on it. |
|
Local DB synchronized with this info
when the disk reappears on a site. |
|
DB Snapshot can be used as a table of
contents for the disk. |
|
|
NG/AMS: Plug-Ins
|
|
|
NG/AMS Plug-Ins: |
|
Ten different kinds of plug-ins
provided. These make it possible to adapt the system to different kinds of
hardware and different types of data – nothing is hard-coded: |
|
1. Online Plug-In. |
|
2. Offline Plug-In. |
|
3. Data Archiving Plug-In. |
|
4. Checksum Plug-In. |
|
5. Data Processing Plug-In. |
|
6. Registration Plug-In. |
|
7. Label Printer Plug-In. |
|
8. Filter Plug-In. |
|
9. Suspension Plug-In. |
|
10. Wake-Up Plug-In. |
|
Standard plug-ins delivered with the
system. Possible to replace these or add new plug-ins when needed. |
|
The plug-ins delivered with a
distribution of NGAS should be viewed as belonging to the core of the system
when it comes to testing. |
|
Normal user does not need to know about
the plug-ins used. |
NG/AMS: Plug-Ins
|
|
|
Data Archiving Plug-In – Basic
Functioning: |
NG/AMS: XML Configuration
|
|
|
|
|
NG/AMS Configuration (1): |
|
About 110 different configurable
parameters. |
|
Configuration can be loaded from an XML
document or from the DB or a combination of these. |
|
Possible to re-use DB based parameters
to compose specific configurations (easier to handle many, slightly different
installations). |
|
Main groups of configurable parameters
(1): |
|
Basic Parameters: Port number,
simulation mode, proxy mode, root mount point, … |
|
Plug-Ins: The various plug-ins the
system should use e.g. to handle data of a specific type. |
|
DB Connection: The DB connection
parameters. |
|
Permissions: Archive, Retrieve,
Processing, Remove Requests allowed. |
|
Archive Handling Parameters: Parameters
for handling Archive Requests. |
|
Accepted Data Types: Types of data
(mime-types) the system is can handle. |
NG/AMS: XML Configuration
|
|
|
|
|
NG/AMS Configuration (2): |
|
Main groups of configurable parameters
(2): |
|
Storage Sets: The disk configuration. |
|
Streams: Defines how the different kind
of data should be streamed onto the Storage Sets. |
|
Available Processing Capabilities:
Defines the types of data that can be processed and which Data Processing
Plug-Ins to use. |
|
Data Check/Janitor Thread
Configuration: Parameters to tune the Data Checking and Janitor Threads. |
|
Logging Parameters: E.g. name of log
files + intensity to apply when logging. |
|
Email Notification Parameters:
Recipients of the various types of Email Notification Messages. |
|
Host Suspension Parameters: Parameters
for suspending a host + for waking up suspended hosts. |
|
Subscription Parameters: Parameters to
define if a server should subscribe for data. |
|
Authorization Parameters: Defines the
known users and their access code. |
NG/AMS: Data Consistency
Checking
|
|
|
Data Consistency Checking: |
|
Necessary constantly to monitor the
condition of the data in the archive. |
|
Data Consistency Checking – Thread
running in background. |
|
Possible to tune the amount of
resources occupied by the service. |
|
A check run can be scheduled to run
periodically via the configuration. |
|
Checksum check, file availability,
unregistered files on storage media. |
|
A check sub-thread is started per disk
(max. number configurable). |
|
Info about files on the system dumped
once in a DBM, retrieved file by file during checking. |
|
Possible to resume a checking from
where the previous was interrupted. |
|
Email Notification send to subscribers
in case problems found, e.g.: |
NG/AMS: Operation in
Cluster Mode/1
NG/AMS: Operation in
Cluster Mode/2
Garching NGAS Cluster
NG/AMS: Data Processing
|
|
|
|
Data Processing at Retrieval: |
|
|
|
Simple processing supported when
retrieving files. |
|
Possible to request the system to apply
a Processing Plug-In on the data and to send back the result of the plug-in
rather than the data itself. |
|
Processing performed on the sub-node
hosting the data. |
|
Possible for clients to use the NGAS
Cluster as a ‘number cruncher’ to carry out parallel data processing in a
simple manner. |
|
Reduces the amount of data to be
transferred to the client. I.e., a floating point number may be returned
rather than the entire data file. |
|
Can be extended by providing new Data
Processing Plug-Ins for specific contexts. |
|
Could be used to integrate NGAS with
the AVO or other archive services. |
|
|
NG/AMS: APIs
|
|
|
|
|
NG/AMS APIs + Clients: |
|
Two APIs implemented in C (C library)
and Python (class) provided. |
|
Facilitates implementation of client
applications communicating with NGAS, e.g. to retrieve data files. |
|
Two command line utilities are
provided, based on the C and Python API, which can be used to interact with
an NG/AMS Server. |
|
A standalone Archive Client is
provided, based on the C-API: |
|
Independent of any DBMS. |
|
Can be used to archive files from any
remote host which can access the NGAS Archive via HTTP. |
|
Attempts to archive file is retried
until success is returned or file classified as bad by the remote NGAS
system. |
|
Files not cleaned up before
cross-checking that they are really in the remote NGAS Archive (CHECKFILE
Command). |
|
First applications: Archiving of HARPS
pipeline products and WFCAM files from Cambridge/UK. |
NG/AMS Client
Applications
NG/AMS: Server Commands
|
|
|
|
NG/AMS Server Commands (HTTP Protocol): |
|
Commands issued as URLs: http://<Host>:<Port>/<Command>[?<Par=Val>[&<Par=Val>]] |
|
Commands: |
|
ARCHIVE: Archive data with Archive
Push or Archive Pull Technique. |
|
CHECKFILE: Execute an explicit file
check of the given file. |
|
CLONE: Clone an entire disk or
individual files. |
|
CONFIG: Configure an online system. |
|
DISCARD: Force removal of file from
disk and/or DB independent of number of copies. |
|
EXIT: Make the NG/AMS Server exit. |
|
INIT: Re-initialize the NG/AMS Server. |
|
LABEL: Print out disk labels. |
|
OFFLINE: Bring server to Offline State. |
|
ONLINE: Bring server Online. |
|
REGISTER: Register a file of a set of
file already stored on an ‘NGAS Disk’. |
|
REMDISK: Remove a disk from the archive
(only allowed if at least 3 copies of each files available). |
|
REMFILE: Remove a file from the
archive. |
|
RETRIEVE: Retrieve a file,
transparently, from the archive. |
|
STATUS: Query status about the server
or another component in the NGAS system/cluster. |
|
SUBSCRIBE: Subscribe to new data or a
set of data. |
|
UNSUBSCRIBE: Unsubscribe a previously
created subscription. |
Unit/Functional Tests -
Features
|
|
|
|
Unit/Functional Tests: |
|
Extensive set of automatic tests
provided, consisting of: |
|
30 Test Suites. |
|
~130 Test Cases. |
|
Tests portable (platform/HW
independent). |
|
Testing the business logic of the
system and correct functioning (simulation mode). |
|
Need to add more Test Cases for testing
correct and consistent behavior under abnormal conditions and stress tests. |
|
Needs to be enhanced with ~200 Test
Cases before next release. |
|
Possible to generate Test Plan from
test code (next slide - overhaul ongoing). |
Unit/Functional Tests -
Test Plan
NGAS WEB Interfaces
|
|
|
|
NGAS WEB Interfaces: |
|
WEB Interfaces provided to assist
operators in querying the status of the system and to search for various
components (data files, disks, machines). |
|
Used at all sites by the operators
(Garching, Paranal, La Silla). |
|
Based on Zope. WEB management system
providing editing via WEB browser (http://www.zope.org). |
|
Local Zope WEB Servers available on
each site. |
|
Tools provided to list disks, find
specific files get an overview of the nodes and their status. |
|
Also the so-called Operator’s Log Book
is provided. The operators use this to log all actions carried out. |
|
Used by the operators at Paranal/La
Silla to monitor the online archiving activities. |
|
Services missing for interacting with
the system. Only possible to control the disk label printing for now. |
|
An enhancement is planned in the near
future. |
NGAS System/OS
|
|
|
NGAS OS Distribution: |
|
Started on a Suse Linux distribution
and migrated to RedHat Linux (ESO standardization). |
|
OS distribution prepared/managed by
OTS-SOS. |
|
Support for single-processor and
multi-processor configurations. |
|
Support for old HW (PATA) and new HW
(SATA). |
|
Limited installation, many packages
removed to reduce the size of system. |
|
Special packages needed by NGAS:
Python, Sybase interface, Zope, … - installed by the NGAS Installation Tool. |
|
Special driver SW needed for the 3ware
controller. |
|
Zope WEB server running on some nodes
(optional). |
|
3ware disk controller WEB server
running on every host. |
|
Possibility to back-up/restore complete
system by means of the Mondo/Mindi tool kit (from a single CDROM) in 10
minutes. |
|
From July 2004 NGAS OS platform
installed with kickstart installation script. |
NGAS HW
|
|
|
NGAS HW (1): |
|
Started with 8 slots parallel ATA
systems. |
|
8 x 80 GB storage capacity per node
(640 GB/node, ~1.2 TB compressed). |
|
Since March 2004 a 24 slot serial ATA
system in operation (up to 24 * 400 GB = 9.6 TB/node, 19.2 TB compressed). |
|
Reduces price per GB. |
|
More robust HW amongst other due to
serial ATA (cleaner cabling). |
|
Disk handling easier, more robust disk
frames. |
|
Overall HW stability (hopefully) better
and less intervention needed (TBC). |
|
Amount of data/CPU should be balanced
to be able to process the data in a limited time. |
|
TBD when to use new HW in operation at
observatory sites. |
|
Investigating usage of RAID5 rather
then JBOD disks. |
|
|
NGAS HW
NGAS Utilities
|
|
|
|
NGAS Operator’s Utilities/Installation
Utilities: |
|
Small module provided (NGAS Utilities)
with utilities for the daily work of the operators: |
|
Limited time invested in this so far,
however essential tools for the operation provided (e.g. Clone Verification
Tool, Check File List Tool, Clone File List Tool, …). |
|
The function of many of these tools
should be taken over by the NGAS WEB Interfaces when these have been
enhanced. |
|
The module NGAS Installation Tools
provides some utilities to install and check the system: |
|
Tool provided to build ‘NGAS layer’ on
top of the ‘basic’ NGAS Linux distribution. |
|
Functionality still to be implemented. |
NGAS Infrastructure
|
|
|
Present ESO NGAS Infrastructure: |
NGAS: Future Plans
|
|
|
|
(Near) Future Plans for NGAS: |
|
Received detailed requirements from
archive operations. |
|
Enhance NGAS WEB Management Interfaces. |
|
Enhancement of services for operation
in cluster (extended proxy mode). |
|
Enhancement of installation utilities. |
|
Enhancement of unit tests (simulation
of archive cluster operation). |
|
Implement load balancing/archive
cluster operation for high availability/high data rates (VST/ΩCam: up
to 300 GB/night, VISTA/VistaCAM up to 1 TB/night - TBC). |
|
|
|
Support for advanced data processing,
utilizing an NGAS Cluster as a parallel processing engine (specify complex
recipes, which are executing parallel data processing) – will be analyzed in
the near future. |
|
Support for the Astrophysical Virtual
Observatory/GRID? |
Status - December 2004
|
|
|
|
Status of NGAS Project December 2004: |
|
In operation since July 2001. |
|
Used heavily on a daily basis by
archive operators in Garching. |
|
Data archived daily at La Silla,
Paranal and at ESO HQ. |
|
Data archived directly into NGAS
Archive in Garching from Paranal and Cambridge/WFCAM. |
|
Some statistics: |
|
Total number of nodes: ~25. |
|
Total number of disks in use: ~260. |
|
Total number of files in NGAS Archive:
~1,500,000. |
|
Amount of compressed data in NGAS
Archive: ~27 TB. |
|
Amount of uncompressed data in NGAS
Archive: ~45 TB. |
|
Maximum throughput per node
(archiving): ~400 GB/24 hours (including compression). |
|
Major Issues to Address: |
|
Need to invest more resources in
implementing automatic tests in particular for testing robustness and
handling of abnormal conditions. |
|
Need to implement resources in
implement an enhanced user interface - not very user-friendly at the moment. |
|
Need to update the design document to
reflect present status of system (not updated since it was written SPRING
2001). |
|
Should investigate improved ways of
ensuring data consistency and means for recovering lost data. |