Archiving and Data Management
in HST's Second Decade
Modified document following the third meeting of the Second Decade
Committee on 13/14 April 1999
Summary
The proposals presented in this document fall into four broad categories:
-
Strengthening the links with other archive centres, WWW catalog sites and
abstract services. This broadens and enriches the archive by allowing exploitation
of a multi-wavelength parameter space and keeping track of a relevant subset
of literature which bears directly on future use of the data.
-
Technical developments such as improvements of network access and effective
data transmission speeds, including the increased use of data compression,
the monitoring of market trends and the adoption of new, high capacity,
storage media.
-
Adding to the scientific utility of the archive by adopting several strategies
covering improvements in the quality of calibration and the addition of
higher level data products. The guiding principle here is the need to harvest
instrumental and data analysis expertise before it disperses and becomes
effectively lost. The result is an archive which contains a higher proportion
of science readydata. This is particularly relevant to the treatment
of the homogeneous data sets expected from the major and parallel programs.
-
Taking those steps which are necessary to enable the archive to be used
in qualitatively new ways. Commonly termed ``data mining'', these developments
require the generation of a more comprehensive description of the data
than is currently available. While the activities in point 3 above are
a necessary prerequisite for this to happen, the extra data processing
envisaged for this step goes beyond basic data calibration and combination
and probably requires the preparation of catalogs of objects and measurements
of their properties. In practice, it requires specific scientific choices
to be made during the processing steps and is likely to be quite labour
intensive.
In considering these issues, the committee is aware of the context within
which these developments will occur. The data rate from HST will soon be
dwarfed by that from ground-based optical/IR observatories - both from
their 8-10m telescopes and, especially, from the dedicated wide-field survey
facilities. The NGST archive will have to be seamlessly incorporated into
the scheme and, with a large component of multi-object or integral field
spectroscopy, will present its own special demands. The developments in
archive technology and network connectivity will be driven by requirements
other than astronomy but should, nonetheless, be closely monitored and
exploited. The interest in Data Mining techniques is very widespread and
is becoming a major topic with computer science. There will be many developments
which are exploitable by astronomers but there are also opportunities to
squander effort at a time when the requirement is to reduce operational
costs. The proposed close cooperation between the archive groups at STScI,
ST-ECF, CADC and NOAJ are strongly encouraged in order to enable these
new developments to be carried out in an efficient manner while spreading
the cost and effort.
The Committee sees these potential developments in terms of the opportunity
to multiply the scientific and, coupled with the public outreach effort,
wider cultural value of the HST Observatory. Given the necessity to reduce
operational costs, careful choices will have to be made in selecting those
areas of development which contribute mose effectively to this goal. We
believe that the highest priority goal should be to maximise the quality
of data in the archive by encouraging the recalibration efforts currently
funded for post-operational instruments as part of the ESA/NASA agreement.
The mechanisms for incorporation of high-level data products should be
developed and applied initially to the larger, homogenious subsets of data
such as the HDFs, the Key Programs, parts of the parallel data stream and,
in the future, the Major Program products. If significant, labour-intensive,
processing efforts are forseen to facilitate Data Mining programs, care
should be taken in ensuring that the efforts are driven by a clear scientific
goal.
1. Introduction
The archive of observations from the Hubble Space Telescope
is undoubtedly the largest and most heavily used collection of pointed
observations in astronomy today. The archive comprises over 6 Terabytes
(280,000 observations) of imaging, spectral, time series, polarimetry,
and engineering data covering bandpasses from the near UV through near
IR, and continues to grow at an average rate of over 100 Gigabytes per
month. The archive includes the deepest exposures of the universe ever
made - the Hubble Deep Fields (North and South) - and, reflecting the diversity
of the HST observing program, encompasses all aspects of modern astronomy:
planetary science, stars and stellar evolution, the interstellar medium,
galactic structure, normal and active galaxies, clusters of galaxies, quasars,
and cosmology. Data retrieval rates from the archive at STScI exceed the
data ingest rate, and additional retrievals are supported by the archive
sites at the ST-ECF and CADC. The HST archive enables research beyond the
scope of the original GO proposals, satisfying NASA-wide goals to maximize
the scientific return from its missions.
2. Background and Current Status
Work on the HST Archive began in 1984. A key decision made at that time
was to vest total responsibility for the archive in the STScI, rather than
making use of common archival facilities at NASA's National Space Science
Data Center. In retrospect, this was a pivotal decision which has led to
the development of a distributed data management architecture within astrophysics,
planetary science, and space physics. This architecture assures that data
sets are curated by organizations with maximum expertise in the data and
a vested scientific interest in maintaining their integrity.
The terms of the MOU between NASA and ESA required that a full copy
of the Hubble archive be established at the ST-ECF to support data distribution
to European astronomers. Limited international network connectivity led
Canada to establish the Canadian Astronomy Data Centre to host HST and
other archival data sets of interest to Canadian scientists. ST-ECF and
CADC participated in the design of the HST archive prototype, the Data
Management Facility (DMF), from the outset. DMF was later superseded by
the Data Archive and Distribution System (DADS), with data being stored
on 12-inch WORM optical disks. Procedures are established between STScI,
ST-ECF, and CADC to provide the latter sites with copies of HST science
data. ST-ECF and CADC migrated to CDROM data storage, and in order to further
economize on storage costs developed an on-the-fly calibration facility
so that only uncalibrated data need be archived (uncalibrated data compresses
more efficiently than calibrated data, further reducing archive media costs).
STScI, ST-ECF, and CADC continued collaborative and complementary efforts
on the HST archive in several areas:
-
CADC developed facilities for automatically generating preview images for
data once the proprietary period has elapsed. These are made available
to all of the archive sites.
-
ST-ECF provided the first full-function interface to the archive, STARCAT.
-
STScI developed the X-Windows-based Starview interface, the first major
software package to be developed using object-oriented methodologies and
compilers. Starview could be used both locally and distributed to users
for installation at their respective institutions.
-
ST-ECF and CADC were innovative in developing World Wide Web (WWW) based
interfaces to the archive and complementary catalogs (e.g., WDB, the Web-to-DB
query interface developed at ST-ECF). Web-browsers came along several years
after Starview was initially developed, and for many users the web provides
a sufficient interface to the archive.
-
ST-ECF undertook projects to create jitter files(a detailed record
of HST pointing during an observation derived from the FGS) for early HST
observations and to generalize the ``associations'' concept originally
adopted for STIS and NICMOS data to WFPC2 data, enabling automated alignment
and coaddition of images for cosmic ray removal.
-
ESO/ST-ECF and CADC developed the SkyCat data visualization tool, which
includes automated network access to distributed catalogs.
-
STScI undertook the Hubble Archive Re-engineering Project (HARP) to reduce
archive operations expenses and extend the useful lifetime of the DADS
optical disk-based archive system. Key elements of HARP were data segregation
(moving engineering and other less frequently used data to separate media)
and data compression.
-
Based on the popularity of the CADC/ST-ECF on-the-fly calibration (OTFC)
system, and its potential for reducing archive operations costs substantially,
STScI implemented a similar facility but with enhanced features for accommodating
changes in data formats and calibration algorithms. These augmentations
are essential if we are to store only uncalibrated data in the archive.
Throughout the past 15 years STScI, ST-ECF, and CADC have held archive
coordination meetings to share experiences and set goals. STScI has shouldered
the bulk of the day-to-day operational responsibilities and ST-ECF and
CADC have explored alternative and innovative data access and delivery
mechanisms. At this time, ST-ECF is evaluating DVD as a new archival medium
and STScI is developing a successor to Starview, Starview-II, which will
be implemented in Java and will remove the need to distribute software
to remote sites. Starview-II will permit the design of sophisticated query
screens, as in Starview, that are not possible with web-based forms, and
will enable new levels of interactivity with the archive and the associated
catalogs. ST-ECF is contributing to Starview-II by providing Java preview
display modules. STScI had also planned to migrate to DVD as a storage
medium in the expectation that DVD would quickly supersede CDROM, but the
industry has yet to settle on a standard and STScI is concerned that selection
of one DVD format over another is too risky at this time. STScI plans to
migrate to magneto-optical storage, which is a mature yet growing technology
with a large installed base, is comparable in cost to current generation
DVD, has proven long-term stability, and has higher I/O performance than
other optical media.
In the past year a fourth HST archive site has been established at the
National Astronomical Observatory of Japan (NAOJ). NAOJ is using CDROMs
for data storage, and STScI is using its bulk CDROM production system to
back populate their archive. NAOJ will host only non-proprietary data.
STScI, CADC, and ST-ECF each support archives beyond HST. CADC also
hosts data from the Canada-France-Hawaii Telescope, the James Clerk Maxwell
Telescope, and a copy of STScI's Digitized Sky Survey, and provides access
points to a number of other astronomical archives. ST-ECF's archiving responsibilities
are closely coupled with ESO, whose Science Archive Facility includes data
from the NTT and VLT and will shortly be extended to the VST - a dedicated,
wide-field survey telescope on Paranal. STScI recently took on responsibilities
as NASA's UV/optical/near-IR archive center and established the Multimission
Archive at Space Telescope (MAST). MAST includes data from the IUE, Astro
(HUT, UIT, WUPPE), and Copernicus missions, provides direct access to EUVE
data, and will also include data from the FUSE mission. MAST also supports
the Digitized Sky Survey and the VLA Faint Images of the Radio Sky at Twenty
centimeters (FIRST) survey. STScI has also entered into an agreement with
NOAO to provide archive support for the Mosaic Imager. Thus, all three
sites support both space- and ground-based data archives with a very large
integrated capacity.
STScI works closely with other astrophysics data centers and services.
STScI is a member of the Astrophysics Data Centers Coordinating Council
(ADCCC) which includes the NASA sponsored High Energy Archive Science Research
Center (HEASARC, at GSFC), Infrared Science Archive (IRSA, at Caltech/IPAC),
AXAF (now Chandra) Science Center at SAO, the NSSDC (GSFC), the Astronomical
Data Center and Astrophysics Data Facility (GSFC), and Astrophysics Data
System (SAO). The goal of the ADCCC is to increase interoperability among
archive centers and services, ultimately enabling transparent access to
these distributed data holdings. STScI, CADC, and ESO/ST-ECF all have close
ties with the catalog and bibliographic services provided by the Center
Données astrophysiques de Strasbourg (CDS) and NASA Extragalactic
Database (NED). STScI and HEASARC have led development of AstroBrowse,
a cross-archive data search and discovery tool which utilizes CDS's ``GLU''
system to maintain a distributed database of astronomical data resources.
The ADCCC has partnered with planetary science and space physics data providers
to develop a successor to AstroBrowse, called ISAIA (Interoperable Systems
for Archival Information Access). ISAIA will not only locate data of potential
interest to the user, it will integrate the query results from multiple
data providers and allow users to get a single view of all relevant information
from multiple sites and services. Both AstroBrowse and ISAIA incorporate
resources for data acquired on the ground and from space. STScI also participates
in NASA's Space Science Data System, which aims at interoperability across
all space science disciplines.
3. New Data, New Technology, New Science
The HST archival facilities are now stable and mature. Work has started
to further exploit the rich data holdings and enable multi-wavelength,
multi-mission correlative science. For example, cross-correlation facilities
at MAST, using WWW interfaces, already allow the user to search for data
from various instruments/missions for a given astronomical source and even
to look for multi-frequency data for classes of objects belonging to some
astronomical catalogs. Further work is required to provide cross-correlations
between the HST observation catalog and arbitrary object catalogs. NASA's
Astronomical Data Center (ADC) provides a generic interface to its catalog
collection which can be used to implement such cross-correlations. Using
such public access points the HST archive centers need to provide much
more direct access to complementary data holdings to enable comparison
of data taken in different spectral regions. And as a convenience to users,
alternate means of data delivery need to be studied and developed (more
efficient network access, physical distribution on CDROM, DVD, or other
high-density media, etc.).
HST is already in its second generation of instruments, and will see
a third generation (ACS, COS, WFC3) in its second decade of operations.
The calibration of earlier generation instruments will, in time, cease
to be improved aside from possible changes in fundamental reference data.
The long-term cost of maintaining calibration software will eventually
exceed the cost of archiving a final calibrated data product. ST-ECF staff
have already been working on strategies for final recalibrations of FOS
and GHRS data, and plans must be made for final calibration and rearchiving
of data from the other instruments.
In its second decade the HST archive can serve as a testbed for new
developments in scientific utility and efficiency, with a focus on preparing
for data from NGST. NGST is likely to have more homogeneous observations
than HST and will be more conducive to automated object detection and classification,
generating a data archive that comprises both pointed observations and
a derived source catalog. The requisite tools and technology can be developed
with STScI, ESO/ST-ECF, and CADC collaboration, drawing also upon the expertise
of our colleagues with experience in large scale surveys (GSC, GSC II,
Sloan Digital Sky Survey (SDSS), etc.).
The emerging field of ``data mining'' combined with newly commissioned
surveys (SDSS, 2MASS, etc.) is likely to revolutionize astronomy in the
coming decade. Data mining allows users to ask new and unanticipated questions
of an archive (``archive'' here implies a distributed resource with multiple
sources of data). For the HST archive to be conducive to data mining it
will be necessary to provide some characterization of the objects in HST
images and spectra, e.g., such as the results from the analyses of the
Hubble Deep Fields (HDFs) or Medium Deep Survey. Developing a pipeline
that extracts meaningful and useful object attributes from the highly heterogeneous
collection of HST data will be a substantial challenge, but the potential
benefits of such a facility are enormous. Users could pose queries, directly,
in scientific terms (e.g., ``are there clusters of galaxies in HST WFPC2
images at the positions of steep spectrum radio sources?).
The ST-ECF, in conjunction with ESO and the CADC, is now planning a
pilot project to evaluate the efficacy of data mining using WFPC2 associations.
The plan is to create a database for each association containing an object
list (positions, magnitudes, object shape parameters), statistics on the
object list (number of each type of object, magnitude distributions, etc.),
the limiting magnitude for the association, background characteristics,
lists of objects in the field of view from GSC I and II and from other
HST observations, and associated PSFs. The database is not considered an
end product in itself, but rather an additional resource for identifying
observations of interest to a user's scientific goals.
The ST-ECF is also considering developing associations for spectral
data. A spectral association would comprise all spectra for a given object,
grouped as a single data set, with metadata to describe the spectral resolution,
wavelength coverage, and signal-to-noise ratio. The archive of FOS spectra
will be used as a test case.
The Appendix gives three examples of major scientific investigations
which exploit access to multiple archival datasets.
4. Initiatives for the Second Decade
Specific initiatives which would expand the scientific utility of the archive,
especially given its planned growth with the acquisition of new missions
and new HST instruments, include:
-
Establishing closer ties and coordination with other archive centers, to
fully exploit the multiwavelength parameter space.
-
Establishing closer links with catalog WWW sites. The user could select
a list of sources, based on some parameters, from one of the hundreds of
astronomical catalogs available, for example, at the ADC, and then cross-correlate
that with the HST and MAST archives via a simple WWW interface.
-
Establishing closer links with abstract services (e.g., the Astrophysics
Data System [ADS]) to provide a connection between astronomical papers
and data.
-
Inclusion of objective target classifications and of important parameters
(e.g., magnitude, redshift, etc.) in the archives, to facilitate searches.
-
Data characterization and catalogs of selected data sets, providing the
astronomical community with science-readyproducts which would be
extremely useful, as demonstrated by the HDFs and Medium Deep Survey. HST
catalogs would also enable, for example, the identification of the optical
counterparts of deep surveys at various wavelengths.
-
Supporting large survey programs with HST. It is important to recognize
that supporting more large programs and/or surveys with HST will have implications
for the data archive. The ultimate scientific value and utility of survey
or key project data is often directly related to how accessible the ``science-ready''
data products are. At present, the calibrated data provided to GOs usually
require additional processing before final scientific conclusions can be
reached. If large, homogeneous survey programs become more popular in HST's
second decade, then we may wish to consider providing a more science-ready
data product to maximize the utility of the program. The GO team can be
encouraged (or even required?) to provide their final data products to
the archive for subsequent community distribution. Even if large key programs
are not adopted, the archives work more assertively with GOs to obtain
final data products from them which often have more value to the archival
researcher than the basic calibrated data currently provided. This work
could be in conjunction with item 3 above, in which the HST archives serve
as repositories for the large processed data sets described in the literature.
-
Optimization of the archive interfaces with the Internet. The outbound
bandwidth from STScI, for example, is quite high (at least 20 Megabytes/sec)
but is constricted prior to its junction with the public Internet network.
With bandwidths approaching 100 Megabytes/sec, electronic transmission
of ACS data (with typical GO programs generating ~3-9
Gigabytes) becomes feasible.
-
Transmission of compressed data (and possibly even lossy compressed data).
Lossy compression can result in a 10? reduction in data volumes but with
negligible information loss (for certain scientific applications). Providing
the user with the option to receive highly compressed data should be explored.
-
The time required to write ACS GO data to Exabyte tapes will increase by
a factor of 2.5-7.5 over current mean tape generation times. Explore alternatives
for GO media including DVD, DLT, AIT. Many new high density storage options
are now available and are well matched to the high data volumes expected
from HST. Eliminating GO media altogether in favor of high-baud rate connections
for data retrieval will produce a dramatic savings in operations costs
but does require widespread GO access to high-bandwidth internet service
providers.
5. Conclusions
In HST's second decade of operation, the archive will become an ever increasingly
important scientific resource. Broader spatial, spectral, and time coverage
will allow for analysis of more complete object samples and will make cross-correlation
with other ground- and space-based archives and catalogs increasingly fruitful.
The integration and interoperability of astrophysics data sets, both within
the context of the sites hosting the HST archive and with other astrophysics
and space science resources, will allow HST data to be used more broadly
and to answer questions that are yet to be formed. Emerging tools for distributed
data mining will be especially important, providing scientists with new
tools for inquiry and discovery of unanticipated relationships. In the
coming decade the tools available to the research astronomer for archival
data access will be remarkably more sophisticated than they are today,
and they will provide seamless access to data and catalogs that are physically
located at many different sites.
The organizations supporting the HST archive should work together toward
a common long-term vision of distributed archival services, and draw upon
the strengths of each organization and also other major players in scientific
archiving and computer science (both in terms of scientific oversight and
technical know-how) to contribute to the overall goals. These goals should
encompass an archiving strategy for NGST that builds upon the strength
and maturity of current systems, yet opens new horizons.
Contributors:
- Bob Hanisch, Paolo Padovani, Megan Donahue, Marc Postman (STScI)
- Piero Benvenuti, Benoit Pirenne, Rudi Albrecht (ESO/ST-ECF)
- Daniel Durand, David Schade (CADC)
Appendix
This gives three examples of major investigations based on the use of multiple
archival datasets.
A.1 Cosmology Studies with Archival Surveys
One outstanding demonstration of the leverage available to archival surveys
of serendipitously-observed objects is that of constraining cosmological
parameters, such as the density of the universe, with clusters of galaxies.
The key to this exercise is finding the rarest, most massive, distant clusters
of galaxies because those are the clusters that are expected to evolve
most significantly. However, the all-sky X-ray surveys are not sensitive
enough to detect the distant clusters, and the pointed deep X-ray surveys
do not have sufficient sky coverage to detect the most massive clusters,
which are the rarest. The optimal survey for this purpose, thus, is the
archival survey, which utilizes deep, pointed observations and identifies
the other serendipitously-observed objects that happened to fall into the
field of view. Such surveys are moderately deep, but have significant sky
coverage. Examples of the cluster surveys are the Extended Medium Sensitivity
Survey (EMSS) from Einstein IPC pointed observations (Henry et al. 1992,
ApJ 386, 408) and similar surveys based on ROSAT PSPC data (Rosati et al.
1998, ApJ 492, 21; Jones et al. 1998, ApJ 495, 100). If only the clusters
hotter than 8 keV and redshifts greater than 0.5 are counted, not even
the ROSAT All-Sky Survey would be expected to see any such clusters. Even
the ROSAT serendipitous surveys, despite their more sensitive flux limits,
did not detect as many high-redshift clusters as did the EMSS because the
EMSS has significantly larger coverage in sky area than the ROSAT-based
surveys.
Analogously, deep pointed observations with HST such as the Hubble Deep
Field are useful in detecting and characterizing faint but rather common
objects. But random discovery of rare, relatively bright objects, such
as quasars, blazars or clusters of galaxies, is nearly impossible in single
pointed fields. On the other end of the sky coverage scale, the Digitized
Sky Survey is useful in locating very bright objects, but is not sensitive
enough to detect distant blazars or clusters of galaxies with redshift
greater than about 0.5. It is possible to achieve intermediate sky coverage
at moderate sensitivity by exploiting HST archival data, including the
enormous number of parallel and snapshot observations. One example of this
is the Medium Deep Survey discovery of z > 0.4
clusters of galaxies (Ostrander et al. 1998, AJ 116, 2644).
Another powerful advantage of the HST archive or the archives of other
space missions is access to an enormous amount of data acquired uniformly
under extremely reliable conditions. One program can easily benefit from
the data of one or more other programs. For example, the study of a complete
sample of the morphologies of 341 distant (z = 0.3-0.9) galaxies drawn
from two redshift surveys (Brinchmann et al. 1998, ApJ 499, 112) was based
on observations of two independent HST programs which were added to the
HST observations of the Groth strip (Groth et al. 1994, BAAS 185, 5309).
The data could be consistently calibrated and intermixed, which is essential
for building large uniform samples. The systematics of various methods
of classifying galaxies could be tested and quantified, a feature lacking
in studies where the data and the methods are unavailable outside the authors'
domain.
A.2 Future Deep Surveys
Large, homogeneous datasets form the foundation for astronomy. Discoveries
are most efficiently made by specific observing programs which explore
a hitherto uncharted region of parameter space. Surveys at various wavelengths
have had a predominant role in this process. The next few years will see
a dramatic change in the way we approach surveys, with archival research
assuming a fundamental role. A huge amount of data will be produced by
new, large-area sky surveys in different bands: FIRST (radio), 2MASS and
DENIS (infrared), GSC II and SDSS (optical), GALEX (ultraviolet), ABRIXAS
and XMM (X-ray), and AGILE and GLAST (g-ray).
These, together with the surveys already available, which include, for
example, the NRAO VLA Sky Survey (NVSS), PMN, GB6 (radio), and the ROSAT
All Sky Survey (RASSBSC; X-ray), will challenge the ``classical'' approach
to surveys.
The way most surveys have been carried out so far, in fact, has required
optical identification. That is, optical spectra of all sources had to
be taken to classify the object. Take for example the EMSS, which includes
835 sources over 780 deg2 down to an X-ray flux ~10-13
erg cm-2 s-1 and has provided the deepest view of
the X-ray sky over a relatively large area for quite a few years. All sources
in the large X-ray error boxes had to be observed to identify the most
likely X-ray source (Stocke et al. 1991, ApJS 76, 813). The whole process
took about 10 years to complete. This strategy works for small-area, deep
surveys or large-area, shallow surveys which include a manageable number
of sources (say up to a thousand or so). Dedicated instruments or projects,
like the Two degree Field (2dF) and the Sloan Digital Sky Survey (SDSS),
can actually adopt the classical approach for a much larger number of sources
(of the order of 250,000 for 2dF and a million for SDSS). This however
requires populations with relatively large surface density (2dF) and large
investments (SDSS). In both cases the optical limit is relatively bright
( ~20-21 magnitude).
The majority of currently available surveys are so large that the classical
approach cannot work any longer. Consider for example that the RASSBSC
includes 18,811 sources over 92% of the sky, while the White, Giommi, and
Angelini (WGA) catalog of ROSAT Point Sources includes about 70,000 sources
over ~10% of the sky. The NVSS includes almost
2 million radio sources north of a declination of -40°. It is clear
that a spectroscopic identification of all the sources in these surveys
is not possible in a reasonable amount of time and given standard
resources.
The situation will get worse with the forthcoming large-area sky surveys
in different bands. Alternative methods for survey identifications have
to be applied. One such method relies on the cross-correlation of catalogs
in different bands to pre-select the candidates for identification. One
still needs to optically identify the selected sub-samples but the selection
efficiency increases by large amounts and the number of sources is manageable.
This works very well for relatively rare populations, as the initial pool
of candidates can be large but the class of interest is selected out based
on its spectral energy distribution. The multifrequency information can
include not only fluxes in different bands but also optical colors, radio
and X-ray spectra, source sizes, etc.
One powerful application of this kind of approach, which used four WFPC2
filters (and therefore a relatively small wavelength range), has been the
selection of high-redshift galaxies from the UV ``dropouts'' in the Hubble
Deep Field. Another example is the Deep X-ray/Radio Blazar Survey (Perlman,
Padovani et al. 1998, AJ 115, 1253). Starting from the WGA catalog (
~70,000 sources) and the GB6 and PMN catalogs (
~120,000 sources), DXRBS selects ~1,600
X-ray/radio sources. A further selection on radio spectral index reduces
the sample to ~300 objects, ~95%
of which turn out to be blazars, the class of interest in this case. The
statistical method can be carried one step further to avoid optical identification
altogether. In this case the pre-selection is done in a such way as to
select objects with nearly unique spectral energy distributions. One ultimately
might still want to identify the selected sub-samples but redshift-independent
results (like number counts) can be obtained relatively quickly and the
selection efficiency increases by orders of magnitude (see Giommi, Menna,
& Padovani 1999, MNRAS, in press, for one such example).
As surveys get deeper and deeper, statistical methods for identification,
with the consequent need for easy access to data at various wavelengths,
will need to become commonplace. The reason is simple: a 4m class telescope
can identify a source with relatively strong features (such as a quasar)
in a 1-hour exposure down to 22-23 magnitude, while a 10m class telescope
can reach 25-26 magnitude in about the same time. This is inadequate to
spectroscopically identify the faintest objects found, for example, in
the HDFs and, more generally, will be a problem for deep surveys at other
frequencies. For example, a typical XMM exposure will reach X-ray fluxes
fx ~ 10-15 erg cm-2
s-1. At these levels, using the appropriate X-ray:optical flux
ratios, basically all radio-loud AGN will be fainter than the 4m limit
for spectroscopical identification and most of them will also be below
the 10m limit. At the AXAF limit ( fx ~
10-16
erg cm-2 s-1) even most radio-quiet AGN will be so
faint as to require exceedingly long integration times for optical identification
with a 10m class telescope. Normal galaxies, having larger optical:X-ray
flux ratios, will be brighter but in that case their lack of relatively
strong features will also make spectroscopic identification problematic
at these X-ray fluxes. The same applies to the radio band: at the 1 mJy
limit of the FIRST survey, most radio-loud sources will have
~24
magnitude and a 4m telescope will not be sufficient for the spectroscopical
identification.
Cross-identification of sources at different wavelengths is already,
and will be more and more, vital to make progress at fainter fluxes. Statistical
ways of pre-selecting or even identifying specific classes of sources work
very well (especially for rare populations). Given the large number of
upcoming surveys at various frequencies, these methods will have to become
commonplace in the near future. The depth of the optical, X-ray, and radio
surveys, will also mean that spectroscopic identification for many sources
will be extremely hard or outright impossible, leaving the statistical
one as the only viable option. All this will require not only a large amount
of coordination between the archive centers but also the need to extract
catalogs from the raw data. This is especially challenging for the HST
archive.
A.3 Variability Studies
One of the benefits of an archive derives from having a large and comprehensive
database represented by observations accumulated over a period of time.
The STScI archive has also the added bonus of being NASA's UV/optical/near-IR
archive center and therefore providing access to a variety of data through
MAST.
For example, for nearly twenty years now, and for some yet time to follow,
the IUE data archive (included in MAST) has served as a treasure trove
for variable phenomena in bright UV sources and at the same time provided
a link between the UV response to these phenomena and other wavelength
regions. Consider that 50% of all IUE science images were obtained of objects
the satellite observed 10 times or more. Not surprisingly, many of
these multiple observations were made of bright hot stars, generally for
various tightly focused purposes developed in the observing proposal but
not covering other forms of variability not discovered.
Archival data have been particularly useful for studies of repetitive
stellar processes such as pulsations, rotational modulation, or binarity.
In the latter area, for example, archival data have enabled discoveries
such as:
-
The suggested 5.52 year period of Eta Car. Possibly the most massive star
in the Galaxy, this star may be a double (or triple) system in which X-ray
transients are observed with regularity. There are 543 IUE spectra in the
IUE archive extending over 15 years, almost three 5.5 yr cycles. Cross-correlation
of these spectra will lead to a radial velocity curve which would demonstrate
its orbital properties. Moreover, it can be anticipated that the strong
UV lines from different ions will respond to X-ray transients in a variety
of ways when the star passes through its periastron.
-
X Persei. This prototypical Be X-ray binary shows X-ray pulses from a neutron
star, but up to recently radial velocity variations have not been detected.
An analysis of IUE spectra ranging over 16 years suggests that the star
is in an eccentric orbit with a period of ~25
years. Newly archived Orfeus-2 spectra confirm the change in radial velocity
found from the more recent IUE data. Knowing what to look for, optical
observers can refine the period, leading ultimately to an ephemeris of
separation distances and relative velocities which can be used to test
the theory of X-ray emission from wind accretion.
Many extragalactic sources have also been observed many times by different
instruments over a long time baseline. For example, about 30 UV HST spectra
of NGC 4151, the prototype Seyfert 1 galaxies, have been taken by FOC,
FOS, and GHRS over a period of about 8 years. As HST enters in its second
decade, long-term UV spectral variability studies will become possible
also for relatively faint astronomical sources.