1
|
|
2
|
- Motivation for NGAS:
- Handle huge amount of data streams in real time.
- Reduce operational costs (man-power).
- Decrease expenses in general.
- Provide online and offline processing capabilities.
- Ease integration of archive facility with external clients/applications.
- Provide a common concept for the online archive and the long-term
storage facilities (NGAS ≈ OLAS + ASTO + Jukebox SW + more). Note,
no plan to replace OLAS for now.
- Simplify and unify the overall infrastructure of the archive system.
- Increase data security.
|
3
|
- Main Objectives of NGAS:
- Provide an archive facility with services for handling all stages in the
life-time of data files:
- - Archiving files (+ on-the-fly checking and processing).
- - Retrieving & on-the-fly processing of files.
- - Ensuring data consistency.
- - Providing services for managing data.
- - (Executing complex, parallel data processing - TBD)
- In addition, to provide a system:
- - Which is adaptable to specific contexts.
- - With a high performance + scalable.
|
4
|
- History of NGAS:
- April 2001: Project started.
- Mid June 2001: First operational prototype.
- June 2001: Review + approval of design/concept.
- Beginning July 2001: Installation/commissioning at La Silla (2.2m/WFI).
- Mid July 2001: Entered operation at La Silla.
- August 2001: Started operation of Garching NGAS Cluster.
- February 2001: Upgrade from Suse to RedHat Linux.
- August 2003: Installation/commissioning at Paranal (VLTI).
- January 2004: Installation of second archive system for 3.6m/LS.
- March 2004: First integration of NGAS on new HW (SATA).
- September 2004: First tests using NGAS together with RAID5 Arrays.
- September 2004: Archiving of HARPS pipeline products.
- December 2004: Archiving of WFCAM frames from Cambridge/UK.
|
5
|
- Main Components of the NGAS Project:
- 1. NGAS SW – NG/AMS (Next Generation Archive Management System).
- 2. NGAS WEB Interfaces.
- 3. HW – (low cost) PCs with removable ATA disks.
- 4. NGAS OS (Linux).
- 5. NGAS Utilities.
- 6. NGAS Installation and Configuration Tools.
|
6
|
- Basic Concepts of the NGAS SW (NG/AMS):
- NG/AMS is a platform/framework providing basic services.
- No information is hard-coded to support specific types of data – NG/AMS
‘does not know’ what e.g. a FITS file is.
- No information is hard-coded to support specific HW configurations.
- The specific behavior and the specific knowledge has to be added to the
NGAS system – customizable.
- Based on standard protocols and formats wherever possible – can be used
as a building block.
- Simple - advanced features can be added in front-end applications
giving clients a different view of the data + provide specific
services.
|
7
|
- Main Features of NG/AMS (1):
- Multi-threaded server.
- Standard communication protocol (HTTP) + HTTP Authentication.
- Data file archiving via Push and Pull Techniques.
- Subscription Service including filter mechanism.
- DB synchronization (DB Snapshot Feature).
- Easy adaptation to different kinds of DBMS’ (ANSI SQL Engine/DB
Driver).
- Flexible/adaptable due to usage of 10 different kinds of plug-ins.
- Many configurable parameters.
- XML information exchange.
- Email Notification Service.
|
8
|
- Main Features of NG/AMS (2):
- Advanced logging service (Verbose, Local Log File, Syslog).
- Background Data Consistency Checking.
- Operation in Cluster Mode.
- Transparent data retrieval & on-the-fly processing.
- APIs in ANSI-C and Python + two clients applications based on these.
- Archive Client for secure and simple, remote data file archiving.
- Many commands to interact with and control the system.
- Portable.
- Unit/Functional Tests.
|
9
|
|
10
|
- Basic Infrastructure of Storage Media:
|
11
|
- Interprocess Data Exchange:
- - Most information exchanged between NG/AMS Servers and between the
NG/AMS Server and clients, is based on XML.
- - Example, NgasDiskInfo Document (NG/AMS Status XML Document):
|
12
|
|
13
|
- DB Synchronization:
- NGAS DBs replicated from Paranal/La Silla to Garching (Unidirectional).
- Synchronization between DBs of the various NGAS sites also carried out
by NGAS.
- NG/AMS maintains snapshot (DBM) on the disks with info about the files
stored on it.
- Local DB synchronized with this info when the disk reappears on a site.
- DB Snapshot can be used as a table of contents for the disk.
|
14
|
- NG/AMS Plug-Ins:
- Ten different kinds of plug-ins provided. These make it possible to
adapt the system to different kinds of hardware and different types of
data – nothing is hard-coded:
- 1. Online Plug-In.
- 2. Offline Plug-In.
- 3. Data Archiving Plug-In.
- 4. Checksum Plug-In.
- 5. Data Processing Plug-In.
- 6. Registration Plug-In.
- 7. Label Printer Plug-In.
- 8. Filter Plug-In.
- 9. Suspension Plug-In.
- 10. Wake-Up Plug-In.
- Standard plug-ins delivered with the system. Possible to replace these
or add new plug-ins when needed.
- The plug-ins delivered with a distribution of NGAS should be viewed as
belonging to the core of the system when it comes to testing.
- Normal user does not need to know about the plug-ins used.
|
15
|
- Data Archiving Plug-In – Basic Functioning:
|
16
|
- NG/AMS Configuration (1):
- About 110 different configurable parameters.
- Configuration can be loaded from an XML document or from the DB or a
combination of these.
- Possible to re-use DB based parameters to compose specific
configurations (easier to handle many, slightly different
installations).
- Main groups of configurable parameters (1):
- Basic Parameters: Port number, simulation mode, proxy mode, root mount
point, …
- Plug-Ins: The various plug-ins the system should use e.g. to handle
data of a specific type.
- DB Connection: The DB connection parameters.
- Permissions: Archive, Retrieve, Processing, Remove Requests allowed.
- Archive Handling Parameters: Parameters for handling Archive Requests.
- Accepted Data Types: Types of data (mime-types) the system is can
handle.
|
17
|
- NG/AMS Configuration (2):
- Main groups of configurable parameters (2):
- Storage Sets: The disk configuration.
- Streams: Defines how the different kind of data should be streamed
onto the Storage Sets.
- Available Processing Capabilities: Defines the types of data that can
be processed and which Data Processing Plug-Ins to use.
- Data Check/Janitor Thread Configuration: Parameters to tune the Data
Checking and Janitor Threads.
- Logging Parameters: E.g. name of log files + intensity to apply when
logging.
- Email Notification Parameters: Recipients of the various types of
Email Notification Messages.
- Host Suspension Parameters: Parameters for suspending a host + for
waking up suspended hosts.
- Subscription Parameters: Parameters to define if a server should
subscribe for data.
- Authorization Parameters: Defines the known users and their access
code.
|
18
|
- Data Consistency Checking:
- Necessary constantly to monitor the condition of the data in the
archive.
- Data Consistency Checking – Thread running in background.
- Possible to tune the amount of resources occupied by the service.
- A check run can be scheduled to run periodically via the configuration.
- Checksum check, file availability, unregistered files on storage media.
- A check sub-thread is started per disk (max. number configurable).
- Info about files on the system dumped once in a DBM, retrieved file by
file during checking.
- Possible to resume a checking from where the previous was interrupted.
- Email Notification send to subscribers in case problems found, e.g.:
|
19
|
|
20
|
|
21
|
|
22
|
- Data Processing at Retrieval:
- Simple processing supported when retrieving files.
- Possible to request the system to apply a Processing Plug-In on the
data and to send back the result of the plug-in rather than the data
itself.
- Processing performed on the sub-node hosting the data.
- Possible for clients to use the NGAS Cluster as a ‘number cruncher’ to
carry out parallel data processing in a simple manner.
- Reduces the amount of data to be transferred to the client. I.e., a
floating point number may be returned rather than the entire data file.
- Can be extended by providing new Data Processing Plug-Ins for specific
contexts.
- Could be used to integrate NGAS with the AVO or other archive services.
|
23
|
- NG/AMS APIs + Clients:
- Two APIs implemented in C (C library) and Python (class) provided.
- Facilitates implementation of client applications communicating with
NGAS, e.g. to retrieve data files.
- Two command line utilities are provided, based on the C and Python API,
which can be used to interact with an NG/AMS Server.
- A standalone Archive Client is provided, based on the C-API:
- Independent of any DBMS.
- Can be used to archive files from any remote host which can access the
NGAS Archive via HTTP.
- Attempts to archive file is retried until success is returned or file
classified as bad by the remote NGAS system.
- Files not cleaned up before cross-checking that they are really in the
remote NGAS Archive (CHECKFILE Command).
- First applications: Archiving of HARPS pipeline products and WFCAM
files from Cambridge/UK.
|
24
|
|
25
|
- NG/AMS Server Commands (HTTP Protocol):
- Commands issued as URLs: http://<Host>:<Port>/<Command>[?<Par=Val>[&<Par=Val>]]
- Commands:
- ARCHIVE: Archive data with Archive Push or Archive Pull Technique.
- CHECKFILE: Execute an explicit file check of the given file.
- CLONE: Clone an entire disk or individual files.
- CONFIG: Configure an online system.
- DISCARD: Force removal of file from disk and/or DB independent of
number of copies.
- EXIT: Make the NG/AMS Server exit.
- INIT: Re-initialize the NG/AMS Server.
- LABEL: Print out disk labels.
- OFFLINE: Bring server to Offline State.
- ONLINE: Bring server Online.
- REGISTER: Register a file of a set of file already stored on an ‘NGAS
Disk’.
- REMDISK: Remove a disk from the archive (only allowed if at least 3
copies of each files available).
- REMFILE: Remove a file from the archive.
- RETRIEVE: Retrieve a file, transparently, from the archive.
- STATUS: Query status about the server or another component in the NGAS
system/cluster.
- SUBSCRIBE: Subscribe to new data or a set of data.
- UNSUBSCRIBE: Unsubscribe a previously created subscription.
|
26
|
- Unit/Functional Tests:
- Extensive set of automatic tests provided, consisting of:
- 30 Test Suites.
- ~130 Test Cases.
- Tests portable (platform/HW independent).
- Testing the business logic of the system and correct functioning
(simulation mode).
- Need to add more Test Cases for testing correct and consistent behavior
under abnormal conditions and stress tests.
- Needs to be enhanced with ~200 Test Cases before next release.
- Possible to generate Test Plan from test code (next slide - overhaul
ongoing).
|
27
|
|
28
|
- NGAS WEB Interfaces:
- WEB Interfaces provided to assist operators in querying the status of
the system and to search for various components (data files, disks,
machines).
- Used at all sites by the operators (Garching, Paranal, La Silla).
- Based on Zope. WEB management system providing editing via WEB browser
(http://www.zope.org).
- Local Zope WEB Servers available on each site.
- Tools provided to list disks, find specific files get an overview of
the nodes and their status.
- Also the so-called Operator’s Log Book is provided. The operators use
this to log all actions carried out.
- Used by the operators at Paranal/La Silla to monitor the online
archiving activities.
- Services missing for interacting with the system. Only possible to
control the disk label printing for now.
- An enhancement is planned in the near future.
|
29
|
- NGAS OS Distribution:
- Started on a Suse Linux distribution and migrated to RedHat Linux (ESO
standardization).
- OS distribution prepared/managed by OTS-SOS.
- Support for single-processor and multi-processor configurations.
- Support for old HW (PATA) and new HW (SATA).
- Limited installation, many packages removed to reduce the size of
system.
- Special packages needed by NGAS: Python, Sybase interface, Zope, … -
installed by the NGAS Installation Tool.
- Special driver SW needed for the 3ware controller.
- Zope WEB server running on some nodes (optional).
- 3ware disk controller WEB server running on every host.
- Possibility to back-up/restore complete system by means of the
Mondo/Mindi tool kit (from a single CDROM) in 10 minutes.
- From July 2004 NGAS OS platform installed with kickstart installation
script.
|
30
|
- NGAS HW (1):
- Started with 8 slots parallel ATA systems.
- 8 x 80 GB storage capacity per node (640 GB/node, ~1.2 TB compressed).
- Since March 2004 a 24 slot serial ATA system in operation (up to 24 *
400 GB = 9.6 TB/node, 19.2 TB compressed).
- Reduces price per GB.
- More robust HW amongst other due to serial ATA (cleaner cabling).
- Disk handling easier, more robust disk frames.
- Overall HW stability (hopefully) better and less intervention needed
(TBC).
- Amount of data/CPU should be balanced to be able to process the data in
a limited time.
- TBD when to use new HW in operation at observatory sites.
- Investigating usage of RAID5 rather then JBOD disks.
|
31
|
|
32
|
- NGAS Operator’s Utilities/Installation Utilities:
- Small module provided (NGAS Utilities) with utilities for the daily work
of the operators:
- Limited time invested in this so far, however essential tools for the
operation provided (e.g. Clone Verification Tool, Check File List Tool,
Clone File List Tool, …).
- The function of many of these tools should be taken over by the NGAS
WEB Interfaces when these have been enhanced.
- The module NGAS Installation Tools provides some utilities to install
and check the system:
- Tool provided to build ‘NGAS layer’ on top of the ‘basic’ NGAS Linux
distribution.
- Functionality still to be implemented.
|
33
|
- Present ESO NGAS Infrastructure:
|
34
|
- (Near) Future Plans for NGAS:
- Received detailed requirements from archive operations.
- Enhance NGAS WEB Management Interfaces.
- Enhancement of services for operation in cluster (extended proxy mode).
- Enhancement of installation utilities.
- Enhancement of unit tests (simulation of archive cluster operation).
- Implement load balancing/archive cluster operation for high
availability/high data rates (VST/ΩCam: up to 300 GB/night,
VISTA/VistaCAM up to 1 TB/night - TBC).
- Support for advanced data processing, utilizing an NGAS Cluster as a
parallel processing engine (specify complex recipes, which are
executing parallel data processing) – will be analyzed in the near
future.
- Support for the Astrophysical Virtual Observatory/GRID?
|
35
|
- Status of NGAS Project December 2004:
- In operation since July 2001.
- Used heavily on a daily basis by archive operators in Garching.
- Data archived daily at La Silla, Paranal and at ESO HQ.
- Data archived directly into NGAS Archive in Garching from Paranal and
Cambridge/WFCAM.
- Some statistics:
- Total number of nodes: ~25.
- Total number of disks in use: ~260.
- Total number of files in NGAS Archive: ~1,500,000.
- Amount of compressed data in NGAS Archive: ~27 TB.
- Amount of uncompressed data in NGAS Archive: ~45 TB.
- Maximum throughput per node (archiving): ~400 GB/24 hours (including
compression).
- Major Issues to Address:
- Need to invest more resources in implementing automatic tests in
particular for testing robustness and handling of abnormal conditions.
- Need to implement resources in implement an enhanced user interface -
not very user-friendly at the moment.
- Need to update the design document to reflect present status of system
(not updated since it was written SPRING 2001).
- Should investigate improved ways of ensuring data consistency and means
for recovering lost data.
|