Small presentation by Heiko Sommer for the software department
meeting on April 18, 2002.
This document is available in html format at www.eso.org/~hsommer/XmlJava20020418.
Object-oriented language with extensive standard libraries and tools
Overview and history: java.sun.com/java2/whatis.
Getting started: developer.java.sun.com/developer/onlineTraining/new2java
Libraries API: java.sun.com/j2se/1.4/docs/api
Best practices book: Effective Java by J. Bloch.
Language specification (very detailed and yet very good): java.sun.com/docs/books/jls/second_edition/html/jTOC.doc.html
Operating system independence - Interpreted language - Virtual Machine
Java is not a scripting language; source code must be compiled to (OS-independent)
bytecode which is directly executable on specialized Java-CPUs. Since no common
CPU supports this, we need a Java Virtual Machine (JVM) to interpret Java bytecode.
The JVMs are specific to the OSs. Sun provides them for Windows, Linux, and
Solaris; other OSs need to provide their own according to the JVM
specification.
To run other languages on the JVM, see flp.cs.tu-berlin.de/~tolk/vmlanguages.html.
Java is more than a programming language, called a "platform" instead.
A large book that touches on everything is "Professional Java Server Programming
J2EE Edition", available in my office or at amazon.
Free and Open
Owned by Sun Microsystems (java.sun.com).
Open standardised evolution through Java Community Process (jcp.org).
Freely available from Sun: Standard/Enterprise/Micro Edition.
Easy, Fast, and Safe
Garbage collection, interface mechanism, dynamic class loading (no more worrying
about DLLs!), enforced exception handling, standard libraries, ...
Main usage: server-side large business applications
Initial idea: small devices in a network (like smart
toasters), but never used much for this.
Later: GUIs (AWT, Swing) serving as front-end to non-Java servers, or funny
applets in a webbrowser.
This has completely changed. Now Java is most popular for "serious"
server applications with a web-GUI, although the rich Java GUI might be revived
thanks to automated deployment with Java
Web Start.
Performance
Using just-in-time compilation of Java bytecode to native machine code, modern
JVMs match the runtime performance of non-optimizing C++ compilers, although
startup is somewhat slower. Benchmark comparisons for typical scientific computations
are given in a paper
presented at the Java
Grande - ISCOPE 2001 Conference, suggesting an average performance loss
of about 25% compared to C and Fortran compilers.
For many non-realtime situations, what matters more than execution speed in
one thread/process is good scalability, one of Java's strong points.
The site www.javagrande.org has information
on Java for high-performance computing.
EXtensible Markup Language, universal format for structured
documents and data. Along with XML, there comes a whole family of related standards
and tools.
See www.w3.org/XML, e.g. the XML-in-10-points
overview or the spec.
Another good source is xml.com.
An example of some xml data:
<?xml version="1.0" encoding="UTF-8"?> <ObsProposal> <ObsProposalEntity entityId="4711" entityVersion="1"/> <ObsProjectRef entityId="999999999" entityTypeName="ObsProject"/> <SciJustification>basic research can hardly ever be justified.</SciJustification> <PerformanceGoals>give me five black holes for one crab nebula.</PerformanceGoals> </ObsProposal>
XML is meant for hierarchical data (tree), although there is a mechanism to
specify cross-links through identifiers (graph).
It's is text-based with variable record lengths, which means it must be sequentially
parsed from the beginning to the point of interest.
Definition of your own XML language (structure, content and semantics of XML
documents that your application recognizes).
The most widely used grammars are DTD and xml schema.
See www.brics.dk/~amoeller/XML/schemas/.
DTD (Document Type Definition)
Built-in schema language, see www.w3.org/TR/2000/REC-xml-20001006#elemdecls.
Subset of SGML-DTD.
Rather particular syntax, like
XML Schema
For a good introduction, see the W3
schema primer.
Best practices are discussed at www.xfront.com/BestPracticesHomepage.html.
Advantages over DTD: Schema is written in xml (easy to parse!), data is strongly
typed, inheritance possible, null representation, ... (see www-106.ibm.com/developerworks/library/x-sbsch.html).
<?xml version = "1.0" encoding = "UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" version = "1"> <xsd:simpletype name = "EntityIdT"> <xsd:restriction base = "xsd:integer"> <xsd:totaldigits value = "9"/> <xsd:fractiondigits value = "0"/> <xsd:whitespace value = "collapse"/> </xsd:restriction> </xsd:simpletype> <xsd:element name = "ObsProposal"> <xsd:complexType> <xsd:sequence> <xsd:element ref = "ObsProposalEntity"/> <xsd:element name = "ObsProjectRef" type = "ObsProjectRefT"/> <xsd:element name = "SciJustification" type = "xsd:string"/> <xsd:element name = "PerformanceGoals" type = "xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
Since xml data should be processed by computers rather than humans equipped
with emacs, it's important to have powerful and standardized parsing mechanisms.
Depending on required speed, resource consumption, and generality of the application,
there are different parsing models in use.
Parsers can offer validation against a schema, or just validation with respect
to xml grammar conformity.
DOM (Document Object Model)
W3C standard API for a generic object model that can represent any xml document
as an in-memory tree which the application can navigate and manipulate.
Suits many programming languages, therefore somewhat awkward for Java, see JDOM.
Resource intensive, since the parser instantiates the entire document at a time.
SAX (Simple API for XML):
API for callback parsers, implemented by most xml parsers. Fast and frugal.
Callback handlers get rather complex though when many levels of hierarchy are
involved.
Not W3C, yet de-facto standard, see www.saxproject.org.
For comparison with DOM, see www.w3.org/DOM/faq#SAXandDOM.
Pull-APIs
Supposedly easier to use than event-driven callback parsers: see e.g. xmlpull.org
(API), Kxml.org (parser implementation).
XML binding
Specialized classes are generated from a DTD or schema. They parse/validate
and write xml data that conforms to that schema.
See www.rpbourret.com/xml/XMLDataBinding.htm
and the section on Castor below.
Free integrated development environment (IDE) from IBM (see www.eclipse.org
and their FAQ).
Available for Windows and Linux (mostly written in Java, but with OS-dependent
SWT GUI).
IBM will use Eclipse as the basis for the new "WebSphere Studio Application
Developer", which is the successor
of Visual Age.
Actually, Eclipse is not really an IDE, but an open platform for building IDEs,
which happens to come with plug-ins for Java and C++.
Anybody may contribute extensions, which a number of big companies plan to do.
Eclipse keeps an internal tree representation of any Java files in the project, which allows for the very powerful smart search and code generation features.
Since the download is large (49 MB), I put the latest stable Windows build here (2002-4-16). Have fun playing with it!
Ambitious open-source project, provides XML-Java binding (www.castor.org/xml-framework.html) among other things.
Use in ALMA
Data gets passed among subsystems and to the database in XML format. Since many different groups have to agree on the data definition, we happily make use of the advanced capabilities of XML Schema. Each so-called "entity object" is defined in it's own schema file.
Castor reads in the schema files and generates Java classes with accessor/manipulator
methods for all data fields. The instances can be conveniently worked on, w/o
knowing much of XML. Data is validated and transformed to its XML representation
by the marshal()
method; likewise, unmarshal()
ingests
textual XML into the Java object.