Creating a good input/output (I/O)
system is one of the more difficult tasks for the language
designer.
This is evidenced by the number of
different approaches. The challenge seems to be in covering all eventualities.
Not only are there different sources and sinks of I/O that you want to
communicate with (files, the console, network connections), but you need to talk
to them in a wide variety of ways (sequential, random-access, buffered, binary,
character, by lines, by words, etc.).
The Java library designers attacked this
problem by creating lots of classes. In fact, there are so many classes for
Java’s I/O system that it can be intimidating at first (ironically, the
Java I/O design actually prevents an explosion of classes). There was also a
significant change in the I/O library after Java 1.0,
when the original byte-oriented library was supplemented with
char-oriented, Unicode-based I/O classes. As a result there are a fair
number of classes to learn before you understand enough of Java’s I/O
picture that you can use it properly. In addition, it’s rather important
to understand the evolution history of the I/O library, even if your first
reaction is “don’t bother me with history, just show me how to use
it!” The problem is that without the historical perspective you will
rapidly become confused with some of the classes and when you should and
shouldn’t use them.
This chapter will give you an
introduction to the variety of I/O classes in the standard Java library and how
to use them.
Before getting into the classes that
actually read and write data to streams, we’ll look a utility provided
with the library to assist you in handling file directory
issues.
The File class has a deceiving
name—you might think it refers to a file, but it doesn’t. It can
represent either the name of a particular file or the names of a
set of files in a directory. If it’s a set of files, you can ask for the
set with the list( ) method, and this returns an array of
String. It makes sense to return an array rather than one of the flexible
container classes because the number of elements is fixed, and if you want a
different directory listing you just create a different File object. In
fact, “FilePath” would have been a better name for the class. This
section shows an example of the use of this class, including the associated
FilenameFilter
interface.
Suppose you’d like to see a
directory listing. The File object can be listed in two ways. If you call
list( ) with no arguments, you’ll get the full list that the
File object contains. However, if you want a restricted list—for
example, if you want all of the files with an extension of
.java—then you use a “directory filter,” which is a
class that tells how to select the File objects for
display.
Here’s the code for the example.
Note that the result has been effortlessly sorted (alphabetically) using the
java.utils.Array.sort( ) method and the AlphabeticComparator
defined in Chapter 9:
//: c11:DirList.java // Displays directory listing. import java.io.*; import java.util.*; import com.bruceeckel.util.*; public class DirList { public static void main(String[] args) { File path = new File("."); String[] list; if(args.length == 0) list = path.list(); else list = path.list(new DirFilter(args[0])); Arrays.sort(list, new AlphabeticComparator()); for(int i = 0; i < list.length; i++) System.out.println(list[i]); } } class DirFilter implements FilenameFilter { String afn; DirFilter(String afn) { this.afn = afn; } public boolean accept(File dir, String name) { // Strip path information: String f = new File(name).getName(); return f.indexOf(afn) != -1; } } ///:~
The DirFilter class
“implements” the interface FilenameFilter. It’s
useful to see how simple the FilenameFilter interface
is:
public interface FilenameFilter { boolean accept(File dir, String name); }
It says all that this type of object does
is provide a method called accept( ). The whole reason behind the
creation of this class is to provide the accept( ) method to the
list( ) method so that list( ) can “call
back” accept( ) to determine which file names should be
included in the list. Thus, this technique is often referred to as a
callback or sometimes a
functor (that is, DirFilter is a functor
because its only job is to hold a method) or the
Command Pattern.
Because list( ) takes a FilenameFilter object as its
argument, it means that you can pass an object of any class that implements
FilenameFilter to choose (even at run-time) how the list( )
method will behave. The purpose of a callback is to provide flexibility in the
behavior of code.
DirFilter shows that just because
an interface contains only a set of methods, you’re not restricted
to writing only those methods. (You must at least provide definitions for all
the methods in an interface, however.) In this case, the DirFilter
constructor is also created.
The accept( ) method must
accept a File object representing the directory that a particular file is
found in, and a String containing the name of that file. You might choose
to use or ignore either of these arguments, but you will probably at least use
the file name. Remember that the list( ) method is calling
accept( ) for each of the file names in the directory object to see
which one should be included—this is indicated by the boolean
result returned by accept( ).
To make sure the element you’re
working with is only the file name and contains no path information, all you
have to do is take the String object and create a File object out
of it, then call getName( ), which strips away all the path
information (in a platform-independent way). Then accept( ) uses the
String class
indexOf( ) method to see if the search string afn appears
anywhere in the name of the file. If afn is found within the string, the
return value is the starting index of afn, but if it’s not found
the return value is -1. Keep in mind that this is a simple string search and
does not have “glob” expression wildcard matching—such as
“fo?.b?r*”—which is much more difficult to
implement.
The list( ) method returns an
array. You can query this array for its length and then move through it
selecting the array elements. This ability to easily pass an array in and out of
a method is a tremendous improvement over the behavior of C and
C++.
This example is ideal for rewriting using
an
anonymous
inner class (described in Chapter 8). As a first cut, a method filter( )
is created that returns a reference to a
FilenameFilter:
//: c11:DirList2.java // Uses anonymous inner classes. import java.io.*; import java.util.*; import com.bruceeckel.util.*; public class DirList2 { public static FilenameFilter filter(final String afn) { // Creation of anonymous inner class: return new FilenameFilter() { String fn = afn; public boolean accept(File dir, String n) { // Strip path information: String f = new File(n).getName(); return f.indexOf(fn) != -1; } }; // End of anonymous inner class } public static void main(String[] args) { File path = new File("."); String[] list; if(args.length == 0) list = path.list(); else list = path.list(filter(args[0])); Arrays.sort(list, new AlphabeticComparator()); for(int i = 0; i < list.length; i++) System.out.println(list[i]); } } ///:~
Note that the argument to
filter( ) must be
final. This is required
by the anonymous inner class so that it can use an object from outside its
scope.
This design is an improvement because the
FilenameFilter class is now tightly bound to DirList2. However,
you can take this approach one step further and define the anonymous inner class
as an argument to list( ), in which case it’s even
smaller:
//: c11:DirList3.java // Building the anonymous inner class "in-place." import java.io.*; import java.util.*; import com.bruceeckel.util.*; public class DirList3 { public static void main(final String[] args) { File path = new File("."); String[] list; if(args.length == 0) list = path.list(); else list = path.list(new FilenameFilter() { public boolean accept(File dir, String n) { String f = new File(n).getName(); return f.indexOf(args[0]) != -1; } }); Arrays.sort(list, new AlphabeticComparator()); for(int i = 0; i < list.length; i++) System.out.println(list[i]); } } ///:~
The argument to main( ) is
now final, since the anonymous inner class uses args[0]
directly.
This shows you how anonymous inner
classes allow the creation of quick-and-dirty classes to solve problems. Since
everything in Java revolves around classes, this can be a useful coding
technique. One benefit is that it keeps the code that solves a particular
problem isolated together in one spot. On the other hand, it is not always as
easy to read, so you must use it
judiciously.
The File class is more than just a
representation for an existing file or directory. You can also use a File
object to create a new directory
or an entire directory path if it doesn’t exist. You can also look at the
characteristics of files (size,
last modification date, read/write), see whether a File object represents
a file or a directory, and delete a file. This program shows some of the other
methods available with the File class (see the HTML documentation from
java.sun.com for the full set):
//: c11:MakeDirectories.java // Demonstrates the use of the File class to // create directories and manipulate files. import java.io.*; public class MakeDirectories { private final static String usage = "Usage:MakeDirectories path1 ...\n" + "Creates each path\n" + "Usage:MakeDirectories -d path1 ...\n" + "Deletes each path\n" + "Usage:MakeDirectories -r path1 path2\n" + "Renames from path1 to path2\n"; private static void usage() { System.err.println(usage); System.exit(1); } private static void fileData(File f) { System.out.println( "Absolute path: " + f.getAbsolutePath() + "\n Can read: " + f.canRead() + "\n Can write: " + f.canWrite() + "\n getName: " + f.getName() + "\n getParent: " + f.getParent() + "\n getPath: " + f.getPath() + "\n length: " + f.length() + "\n lastModified: " + f.lastModified()); if(f.isFile()) System.out.println("it's a file"); else if(f.isDirectory()) System.out.println("it's a directory"); } public static void main(String[] args) { if(args.length < 1) usage(); if(args[0].equals("-r")) { if(args.length != 3) usage(); File old = new File(args[1]), rname = new File(args[2]); old.renameTo(rname); fileData(old); fileData(rname); return; // Exit main } int count = 0; boolean del = false; if(args[0].equals("-d")) { count++; del = true; } for( ; count < args.length; count++) { File f = new File(args[count]); if(f.exists()) { System.out.println(f + " exists"); if(del) { System.out.println("deleting..." + f); f.delete(); } } else { // Doesn't exist if(!del) { f.mkdirs(); System.out.println("created " + f); } } fileData(f); } } } ///:~
In fileData( ) you can see
various file investigation methods used to display information about the file or
directory path.
The first method that’s exercised
by main( ) is
renameTo( ), which
allows you to rename (or move) a file to an entirely new path represented by the
argument, which is another File object. This also works with directories
of any length.
If you experiment with the above program,
you’ll find that you can make a directory path of any complexity because
mkdirs( ) will do
all the work for you.
I/O libraries often use the abstraction
of a stream, which represents any data source or sink as an object
capable of producing or receiving pieces of data. The stream hides the details
of what happens to the data inside the actual I/O device.
The Java library classes for I/O are
divided by input and output, as you can see by looking at the online Java class
hierarchy with your Web browser. By inheritance, everything derived from the
InputStream or Reader classes have basic methods called
read( ) for reading a single byte or array of bytes. Likewise,
everything derived from OutputStream or Writer classes have basic
methods called write( ) for writing a single byte or array of bytes.
However, you won’t generally use these methods; they exist so that other
classes can use them—these other classes provide a more useful interface.
Thus, you’ll rarely create your stream object by using a single class, but
instead will layer multiple objects together to provide your desired
functionality. The fact that you create more than one object to create a single
resulting stream is the primary reason that Java’s stream library is
confusing.
It’s helpful to categorize the
classes by their functionality. In Java 1.0, the library designers started by
deciding that all classes that had anything to do with input would be inherited
from InputStream and all classes that were associated with output would
be inherited from OutputStream.
InputStream’s job is to
represent classes that produce input from different sources. These sources can
be:
Each of these has an
associated subclass of InputStream. In addition, the
FilterInputStream is also a type of InputStream, to provide a base
class for "decorator" classes that attach attributes or useful interfaces to
input streams. This is discussed later.
This category includes the classes that
decide where your output will go: an array of bytes (no String, however;
presumably you can create one using the array of bytes), a file, or a
“pipe.”
In addition, the
FilterOutputStream provides a base class for "decorator" classes that
attach attributes or useful interfaces to output streams. This is discussed
later.
The use of layered objects to dynamically
and transparently add responsibilities to individual objects is referred to as
the Decorator pattern.
(Patterns[57] are
the subject of Thinking in Patterns with Java, downloadable at
www.BruceEckel.com.) The decorator pattern specifies that all objects
that wrap around your initial object have the same interface. This makes the
basic use of the decorators transparent—you send the same message to an
object whether it’s been decorated or not. This is the reason for the
existence of the “filter” classes in the Java I/O library: the
abstract “filter” class is the base class for all the decorators. (A
decorator must have the same interface as the object it decorates, but the
decorator can also extend the interface, which occurs in several of the
“filter” classes).
Decorators are often used when simple
subclassing results in a large number of subclasses in order to satisfy every
possible combination that is needed—so many subclasses that it becomes
impractical. The Java I/O library requires many different combinations of
features, which is why the decorator pattern is used. There is a drawback to the
decorator pattern, however. Decorators give you much more flexibility while
you’re writing a program (since you can easily mix and match attributes),
but they add complexity to your code. The reason that the Java I/O library is
awkward to use is that you must create many classes—the “core”
I/O type plus all the decorators—in order to get the single I/O object
that you want.
The classes that provide the decorator
interface to control a particular InputStream or OutputStream are
the FilterInputStream and FilterOutputStream—which
don’t have very intuitive names. FilterInputStream and
FilterOutputStream are abstract classes that are derived from the base
classes of the I/O library, InputStream and OutputStream, which is
the key requirement of the decorator (so that it provides the common interface
to all the objects that are being
decorated).
The FilterInputStream classes
accomplish two significantly different things. DataInputStream allows you
to read different types of primitive data as well as String objects. (All
the methods start with “read,” such as readByte( ),
readFloat( ), etc.) This, along with its companion
DataOutputStream, allows you to move primitive data from one place to
another via a stream. These “places” are determined by the classes
in Table 11-1.
The remaining classes modify the way an
InputStream behaves internally: whether it’s buffered or
unbuffered, if it keeps track of the lines it’s reading (allowing you to
ask for line numbers or set the line number), and whether you can push back a
single character. The last two classes look a lot like support for building a
compiler (that is, they were added to support the construction of the Java
compiler), so you probably won’t use them in general programming.
You’ll probably need to buffer your
input almost every time, regardless of the I/O device you’re connecting
to, so it would have made more sense for the I/O library to make a special case
(or simply a method call) for unbuffered input rather than buffered
input.
The complement to DataInputStream
is DataOutputStream, which formats each of the primitive types and
String objects onto a stream in such a way that any
DataInputStream, on any machine, can read them. All the methods start
with “write,” such as writeByte( ),
writeFloat( ), etc.
The original intent of PrintStream
was to print all of the primitive data types and String objects in a
viewable format. This is different from DataOutputStream, whose goal is
to put data elements on a stream in a way that DataInputStream can
portably reconstruct them.
The two important methods in
PrintStream are print( ) and println( ), which
are overloaded to print all the various types. The difference between
print( ) and println( ) is that the latter adds a
newline when it’s done.
PrintStream can be problematic
because it traps all IOExceptions (You must explicitly test the error
status with checkError( ), which returns true if an error has
occurred). Also, PrintStream doesn’t internationalize properly and
doesn’t handle line breaks in a platform independent way (these problems
are solved with PrintWriter).
BufferedOutputStream is a modifier
and tells the stream to use buffering so you don’t get a physical write
every time you write to the stream. You’ll probably always want to use
this with files, and possibly console I/O.
Java 1.1 made some significant
modifications to the fundamental I/O stream library (Java 2, however, did not
make fundamental modifications). When you see the
Reader and
Writer classes your first
thought (like mine) might be that these were meant to replace the
InputStream and OutputStream classes. But that’s not the
case. Although some aspects of the original streams library are deprecated (if
you use them you will receive a warning from the compiler), the
InputStream and OutputStream classes still provide valuable
functionality in the form of byte-oriented I/O, while the Reader
and Writer classes provide Unicode-compliant, character-based I/O. In
addition:
The most
important reason for the Reader and Writer hierarchies is for
internationalization. The old
I/O stream hierarchy supports only 8-bit byte streams and doesn’t handle
the 16-bit Unicode characters well. Since Unicode is used for
internationalization (and Java’s native char is 16-bit
Unicode), the Reader and
Writer hierarchies were added to support Unicode in all I/O operations.
In addition, the new libraries are designed for faster operations than the
old.
As is the practice in this book, I will
attempt to provide an overview of the classes, but assume that you will use
online documentation to determine all the details, such as the exhaustive list
of methods.
Almost all of the original Java I/O
stream classes have corresponding Reader and Writer classes to
provide native Unicode manipulation. However, there are some places where the
byte-oriented InputStreams and OutputStreams are the
correct solution; in particular, the java.util.zip libraries are
byte-oriented rather than char-oriented. So the most sensible
approach to take is to try to use the Reader and Writer
classes whenever you can, and you’ll discover the situations when you have
to use the byte-oriented libraries because your code won’t
compile.
Here is a table that shows the
correspondence between the sources and sinks of information (that is, where the
data physically comes from or goes to) in the two hierarchies.
Sources &
Sinks: |
Corresponding Java 1.1
class |
InputStream |
Reader
|
OutputStream |
Writer
|
FileInputStream |
FileReader |
FileOutputStream |
FileWriter |
StringBufferInputStream |
StringReader |
(no corresponding class) |
StringWriter |
ByteArrayInputStream |
CharArrayReader |
ByteArrayOutputStream |
CharArrayWriter |
PipedInputStream |
PipedReader |
PipedOutputStream |
PipedWriter |
In general, you’ll find that the
interfaces for the two different hierarchies are similar if not
identical.
For InputStreams and
OutputStreams, streams were adapted for particular needs using
“decorator” subclasses of FilterInputStream and
FilterOutputStream. The Reader and Writer class hierarchies
continue the use of this idea—but not exactly.
In the following table, the
correspondence is a rougher approximation than in the previous table. The
difference is because of the class organization: while
BufferedOutputStream is a subclass of FilterOutputStream,
BufferedWriter is not a subclass of FilterWriter (which,
even though it is abstract, has no subclasses and so appears to have been
put in either as a placeholder or simply so you wouldn’t wonder where it
was). However, the interfaces to the classes are quite a close match.
Filters: |
Corresponding Java 1.1
class |
---|---|
FilterInputStream |
FilterReader |
FilterOutputStream |
FilterWriter (abstract class with
no subclasses) |
BufferedInputStream |
BufferedReader |
BufferedOutputStream |
BufferedWriter |
DataInputStream |
Use
DataInputStream |
PrintStream |
PrintWriter |
LineNumberInputStream |
LineNumberReader |
StreamTokenizer |
StreamTokenizer |
PushBackInputStream |
PushBackReader |
There’s one direction that’s
quite clear: Whenever you want to use readLine( ), you
shouldn’t do it with a DataInputStream any more (this is met with a
deprecation message at compile-time), but instead use a BufferedReader.
Other than this, DataInputStream is still a “preferred”
member of the I/O library.
To make the transition to using a
PrintWriter easier, it has constructors that take any OutputStream
object, as well as Writer objects. However, PrintWriter has no
more support for formatting than PrintStream does; the interfaces are
virtually the same.
The PrintWriter constructor also
has an option to perform automatic flushing, which happens after every
println( ) if the constructor flag is
set.
Java 1.0 classes without corresponding
Java 1.1 classes |
---|
DataOutputStream |
File |
RandomAccessFile |
SequenceInputStream |
DataOutputStream, in particular,
is used without change, so for storing and retrieving data in a transportable
format you use the InputStream and OutputStream
hierarchies.
RandomAccessFile is used for files
containing records of known size so that you can move from one record to another
using seek( ), then
read or change the records. The records don’t have to be the same size;
you just have to be able to determine how big they are and where they are placed
in the file.
At first it’s a little bit hard to
believe that RandomAccessFile is not part of the InputStream or
OutputStream hierarchy. However, it has no association with those
hierarchies other than that it happens to implement the
DataInput and
DataOutput interfaces
(which are also implemented by DataInputStream and
DataOutputStream). It doesn’t even use any of the functionality of
the existing InputStream or OutputStream classes—it’s
a completely separate class, written from scratch, with all of its own (mostly
native) methods. The reason for this may be that RandomAccessFile has
essentially different behavior than the other I/O types, since you can move
forward and backward within a file. In any event, it stands alone, as a direct
descendant of Object.
Essentially, a RandomAccessFile
works like a DataInputStream pasted together with a
DataOutputStream, along with the methods getFilePointer( ) to
find out where you are in the file, seek( ) to move to a new point
in the file, and length( ) to determine the maximum size of the
file. In addition, the constructors require a second argument (identical to
fopen( ) in C) indicating whether you are just randomly reading
(“r”) or reading and writing (“rw”).
There’s no support for write-only files, which could suggest that
RandomAccessFile might have worked well if it were inherited from
DataInputStream.
The seeking methods are available only in
RandomAccessFile, which works for files only. BufferedInputStream
does allow you to
mark( ) a position
(whose value is held in a single internal variable) and
reset( ) to that
position, but this is limited and not very
useful.
Although you can combine the I/O stream
classes in many different ways, you’ll probably just use a few
combinations. The following example can be used as a basic reference; it shows
the creation and use of typical I/O configurations. Note
that each configuration begins with a commented number and title that
corresponds to the heading for the appropriate explanation that follows in the
text.
//: c11:IOStreamDemo.java // Typical I/O stream configurations. import java.io.*; public class IOStreamDemo { // Throw exceptions to console: public static void main(String[] args) throws IOException { // 1. Reading input by lines: BufferedReader in = new BufferedReader( new FileReader("IOStreamDemo.java")); String s, s2 = new String(); while((s = in.readLine())!= null) s2 += s + "\n"; in.close(); // 1b. Reading standard input: BufferedReader stdin = new BufferedReader( new InputStreamReader(System.in)); System.out.print("Enter a line:"); System.out.println(stdin.readLine()); // 2. Input from memory StringReader in2 = new StringReader(s2); int c; while((c = in2.read()) != -1) System.out.print((char)c); // 3. Formatted memory input try { DataInputStream in3 = new DataInputStream( new ByteArrayInputStream(s2.getBytes())); while(true) System.out.print((char)in3.readByte()); } catch(EOFException e) { System.err.println("End of stream"); } // 4. File output try { BufferedReader in4 = new BufferedReader( new StringReader(s2)); PrintWriter out1 = new PrintWriter( new BufferedWriter( new FileWriter("IODemo.out"))); int lineCount = 1; while((s = in4.readLine()) != null ) out1.println(lineCount++ + ": " + s); out1.close(); } catch(EOFException e) { System.err.println("End of stream"); } // 5. Storing & recovering data try { DataOutputStream out2 = new DataOutputStream( new BufferedOutputStream( new FileOutputStream("Data.txt"))); out2.writeDouble(3.14159); out2.writeChars("That was pi\n"); out2.writeBytes("That was pi\n"); out2.close(); DataInputStream in5 = new DataInputStream( new BufferedInputStream( new FileInputStream("Data.txt"))); BufferedReader in5br = new BufferedReader( new InputStreamReader(in5)); // Must use DataInputStream for data: System.out.println(in5.readDouble()); // Can now use the "proper" readLine(): System.out.println(in5br.readLine()); // But the line comes out funny. // The one created with writeBytes is OK: System.out.println(in5br.readLine()); } catch(EOFException e) { System.err.println("End of stream"); } // 6. Reading/writing random access files RandomAccessFile rf = new RandomAccessFile("rtest.dat", "rw"); for(int i = 0; i < 10; i++) rf.writeDouble(i*1.414); rf.close(); rf = new RandomAccessFile("rtest.dat", "rw"); rf.seek(5*8); rf.writeDouble(47.0001); rf.close(); rf = new RandomAccessFile("rtest.dat", "r"); for(int i = 0; i < 10; i++) System.out.println( "Value " + i + ": " + rf.readDouble()); rf.close(); } } ///:~
Parts 1 through 4 demonstrate the
creation and use of input streams. Part 4 also shows the simple use of an output
stream.
To open a file for character input, you
use a FileInputReader
with a String or a File object as the file name. For speed,
you’ll want that file to be buffered so you give the resulting reference
to the constructor for a
BufferedReader. Since
BufferedReader also provides the readLine( ) method, this is
your final object and the interface you read from. When you reach the end of the
file, readLine( ) returns null so that is used to break out
of the while loop.
The String s2 is used to
accumulate the entire contents of the file (including newlines that must be
added since readLine( ) strips them off). s2 is then used in
the later portions of this program. Finally, close( ) is called to
close the file. Technically, close( ) will be called when
finalize( ) runs, and this is supposed to happen (whether or not
garbage collection occurs) as the program exits. However, this has been
inconsistently implemented, so the only safe approach is to explicitly call
close( ) for
files.
Section 1b shows how you can wrap
System.in for reading
console
input. System.in is a DataInputStream and BufferedReader
needs a Reader argument, so InputStreamReader is brought in to
perform the translation.
This section takes the String s2
that now contains the entire contents of the file and uses it to create a
StringReader. Then
read( ) is used to read each character one at a time and send it out
to the console. Note that read( ) returns the next byte as an
int and thus it must be cast to a char to print
properly.
To read “formatted” data, you
use a DataInputStream,
which is a byte-oriented I/O class (rather than char oriented).
Thus you must use all InputStream classes rather than Reader
classes. Of course, you can read anything (such as a file) as bytes using
InputStream classes, but here a String is used. To convert the
String to an array of bytes, which is what is appropriate for a
ByteArrayInputStream, String has a
getBytes( ) method to do the job. At that
point, you have an appropriate InputStream to hand to
DataInputStream.
If you read the characters from a
DataInputStream one byte at a time using readByte( ), any
byte value is a legitimate result so the return value cannot be used to detect
the end of input. Instead, you can use the
available( ) method
to find out how many more characters are available. Here’s an example that
shows how to read a file one byte at a time:
//: c11:TestEOF.java // Testing for the end of file // while reading a byte at a time. import java.io.*; public class TestEOF { // Throw exceptions to console: public static void main(String[] args) throws IOException { DataInputStream in = new DataInputStream( new BufferedInputStream( new FileInputStream("TestEof.java"))); while(in.available() != 0) System.out.print((char)in.readByte()); } } ///:~
Note that available( ) works
differently depending on what sort of medium you’re reading from;
it’s literally “the number of bytes that can be read without
blocking.” With a file
this means the whole file, but with a different kind of stream this might not be
true, so use it thoughtfully.
You could also detect the end of input in
cases like these by catching an exception. However, the use of exceptions for
control flow is considered a misuse of that feature.
This example also shows how to write data
to a file. First, a
FileWriter is created to
connect to the file. You’ll virtually always want to buffer the output by
wrapping it in a
BufferedWriter (try
removing this wrapping to see the impact on the performance—buffering
tends to dramatically increase performance of I/O operations). Then for the
formatting it’s turned into a
PrintWriter. The data
file created this way is readable as an ordinary text file.
As the lines are written to the file,
line numbers are added. Note that LineNumberInputStream is not
used, because it’s a silly class and you don’t need it. As shown
here, it’s trivial to keep track of your own line
numbers.
When
the input stream is exhausted,
readLine( ) returns
null. You’ll see an explicit close( ) for out1,
because if you don’t call close( ) for all your output files,
you might discover that the buffers don’t get flushed so they’re
incomplete.
The two primary kinds of output streams
are separated by the way they write data: one writes it for human consumption,
and the other writes it to be reacquired by a
DataInputStream. The
RandomAccessFile stands
alone, although its data format is compatible with the DataInputStream
and
DataOutputStream.
A PrintWriter formats data so
it’s readable by a human. However, to output data so that it can be
recovered by another stream, you use a DataOutputStream to write the data
and a DataInputStream to recover the data. Of course, these streams could
be anything, but here a file is used, buffered for both reading and writing.
DataOutputStream and DataInputStream are byte-oriented and
thus require the InputStreams and OutputStreams.
If you use a DataOutputStream to
write the data, then Java guarantees that you can accurately recover the data
using a DataInputStream—regardless of what different platforms
write and read the data. This is incredibly valuable, as anyone knows who has
spent time worrying about platform-specific data issues. That problem vanishes
if you have Java on both
platforms[58].
Note
that the character string is written using both writeChars( ) and
writeBytes( ). When you run the program, you’ll discover that
writeChars( ) outputs 16-bit Unicode characters. When you read the
line using readLine( ), you’ll see that there is a space
between each character, because of the extra byte inserted by Unicode. Since
there is no complementary “readChars” method in
DataInputStream, you’re stuck pulling these characters off one at a
time with
readChar( ). So for
ASCII, it’s easier to write the characters as bytes followed by a newline;
then use readLine( )
to read back the bytes as a regular ASCII line.
The
writeDouble( )
stores the double number to the stream and the complementary
readDouble( )
recovers it (there are similar methods for reading and writing the other types).
But for any of the reading methods to work correctly, you must know the exact
placement of the data item in the stream, since it would be equally possible to
read the stored double as a simple sequence of bytes, or as a
char, etc. So you must either have a fixed format for the data in the
file or extra information must be stored in the file that you parse to determine
where the data is located.
As previously noted, the
RandomAccessFile is almost totally isolated from the rest of the I/O
hierarchy, save for the fact that it implements the DataInput and
DataOutput interfaces. So you cannot combine it with any of the aspects
of the InputStream and OutputStream subclasses. Even though it
might make sense to treat a ByteArrayInputStream as a random access
element, you can use RandomAccessFile to only open a file. You must
assume a RandomAccessFile is properly buffered since you cannot add
that.
The one option you have is in the second
constructor argument: you can open a RandomAccessFile to read
(“r”) or read and write
(“rw”).
Using a RandomAccessFile is like
using a combined DataInputStream and DataOutputStream (because it
implements the equivalent interfaces). In addition, you can see that
seek( ) is used to
move about in the file and change one of the
values.
If you look at section 5, you’ll
see that the data is written before the text. That’s because a
problem was introduced in Java 1.1 (and persists in Java 2) that sure seems like
a bug to me, but I reported it and the bug people at JavaSoft said that this is
the way it is supposed to work (however, the problem did not occur in
Java 1.0, which makes me suspicious). The problem is shown in the following
code:
//: c11:IOProblem.java // Java 1.1 and higher I/O Problem. import java.io.*; public class IOProblem { // Throw exceptions to console: public static void main(String[] args) throws IOException { DataOutputStream out = new DataOutputStream( new BufferedOutputStream( new FileOutputStream("Data.txt"))); out.writeDouble(3.14159); out.writeBytes("That was the value of pi\n"); out.writeBytes("This is pi/2:\n"); out.writeDouble(3.14159/2); out.close(); DataInputStream in = new DataInputStream( new BufferedInputStream( new FileInputStream("Data.txt"))); BufferedReader inbr = new BufferedReader( new InputStreamReader(in)); // The doubles written BEFORE the line of text // read back correctly: System.out.println(in.readDouble()); // Read the lines of text: System.out.println(inbr.readLine()); System.out.println(inbr.readLine()); // Trying to read the doubles after the line // produces an end-of-file exception: System.out.println(in.readDouble()); } } ///:~
It appears that anything you write after
a call to writeBytes( ) is not recoverable. The answer is apparently
the same as the answer to the old vaudeville joke: “Doc, it hurts when I
do this!” “Don’t do
that!”
The PipedInputStream,
PipedOutputStream, PipedReader and PipedWriter have been
mentioned only briefly in this chapter. This is not to suggest that they
aren’t useful, but their value is not apparent until you begin to
understand multithreading, since the piped streams are used to communicate
between threads. This is covered along with an example in Chapter
14.
The term standard I/O refers to
the Unix concept (which is reproduced in some form in Windows and many other
operating systems) of a single stream of information that is used by a program.
All the program’s input can come from standard input, all its
output can go to standard output, and all of its error messages can be
sent to standard error. The value of standard I/O is that programs can
easily be chained together and one program’s standard output can become
the standard input for another program. This is a powerful
tool.
Following the standard I/O model, Java
has System.in, System.out, and System.err. Throughout this
book you’ve seen how to write to standard output using System.out,
which is already prewrapped as a PrintStream object. System.err is
likewise a PrintStream, but System.in is a raw InputStream,
with no wrapping. This means that while you can use System.out and
System.err right away, System.in must be wrapped before you can
read from it.
Typically, you’ll want to read
input a line at a time using readLine( ), so you’ll want to
wrap System.in in a BufferedReader. To do this, you must convert
System.in to a Reader using InputStreamReader. Here’s
an example that simply echoes each line that you type in:
//: c11:Echo.java // How to read from standard input. import java.io.*; public class Echo { public static void main(String[] args) throws IOException { BufferedReader in = new BufferedReader( new InputStreamReader(System.in)); String s; while((s = in.readLine()).length() != 0) System.out.println(s); // An empty line terminates the program } } ///:~
The reason for the exception
specification is that
readLine( ) can
throw an IOException. Note that System.in should usually be
buffered, as with most streams.
System.out is a
PrintStream, which is an OutputStream. PrintWriter has a
constructor that takes an OutputStream as an argument. Thus, if you want
you can convert System.out into a PrintWriter using that
constructor:
//: c11:ChangeSystemOut.java // Turn System.out into a PrintWriter. import java.io.*; public class ChangeSystemOut { public static void main(String[] args) { PrintWriter out = new PrintWriter(System.out, true); out.println("Hello, world"); } } ///:~
It’s important to use the
two-argument version of the PrintWriter constructor and to set the second
argument to true in order to enable automatic flushing, otherwise you may
not see the output.
The Java System class allows you
to redirect the standard input, output, and error I/O streams using simple
static method calls:
Redirecting output is especially useful
if you suddenly start creating a large amount of output on your screen and
it’s scrolling past faster than you can read
it.[59] Redirecting
input is valuable for a command-line program in which you want to test a
particular user-input sequence repeatedly. Here’s a simple example that
shows the use of these methods:
//: c11:Redirecting.java // Demonstrates standard I/O redirection. import java.io.*; class Redirecting { // Throw exceptions to console: public static void main(String[] args) throws IOException { BufferedInputStream in = new BufferedInputStream( new FileInputStream( "Redirecting.java")); PrintStream out = new PrintStream( new BufferedOutputStream( new FileOutputStream("test.out"))); System.setIn(in); System.setOut(out); System.setErr(out); BufferedReader br = new BufferedReader( new InputStreamReader(System.in)); String s; while((s = br.readLine()) != null) System.out.println(s); out.close(); // Remember this! } } ///:~
This program attaches standard input to a
file, and redirects standard output and standard error to another
file.
I/O redirection manipulates streams of
bytes, not streams of characters, thus InputStreams and
OutputStreams are used rather than Readers and
Writers.
The Java I/O library contains classes to
support reading and writing streams in a compressed format. These are wrapped
around existing I/O classes to provide compression
functionality.
These classes are not derived from the
Reader and Writer classes, but instead are part of the
InputStream and OutputStream hierarchies. This is because the
compression library works with bytes, not characters. However, you might
sometimes be forced to mix the two types of streams. (Remember that you can use
InputStreamReader and OutputStreamWriter to provide easy
conversion between one type and another.)
Compression class |
Function |
---|---|
CheckedInputStream |
GetCheckSum( ) produces
checksum for any InputStream (not just decompression). |
CheckedOutputStream |
GetCheckSum( ) produces
checksum for any OutputStream (not just compression). |
DeflaterOutputStream |
Base class for compression
classes. |
ZipOutputStream |
A DeflaterOutputStream that
compresses data into the Zip file format. |
GZIPOutputStream |
A DeflaterOutputStream that
compresses data into the GZIP file format. |
InflaterInputStream |
Base class for decompression
classes. |
ZipInputStream |
An InflaterInputStream that
decompresses data that has been stored in the Zip file format. |
GZIPInputStream |
An InflaterInputStream that
decompresses data that has been stored in the GZIP file format. |
Although there are many compression
algorithms, Zip and GZIP are possibly the most commonly used. Thus you can
easily manipulate your compressed data with the many tools available for reading
and writing these formats.
The GZIP interface is simple and thus is
probably more appropriate when you have a single stream of data that you want to
compress (rather than a container of dissimilar pieces of data). Here’s an
example that compresses a single file:
//: c11:GZIPcompress.java // Uses GZIP compression to compress a file // whose name is passed on the command line. import java.io.*; import java.util.zip.*; public class GZIPcompress { // Throw exceptions to console: public static void main(String[] args) throws IOException { BufferedReader in = new BufferedReader( new FileReader(args[0])); BufferedOutputStream out = new BufferedOutputStream( new GZIPOutputStream( new FileOutputStream("test.gz"))); System.out.println("Writing file"); int c; while((c = in.read()) != -1) out.write(c); in.close(); out.close(); System.out.println("Reading file"); BufferedReader in2 = new BufferedReader( new InputStreamReader( new GZIPInputStream( new FileInputStream("test.gz")))); String s; while((s = in2.readLine()) != null) System.out.println(s); } } ///:~
The use of the compression classes is
straightforward—you simply wrap your output stream in a
GZIPOutputStream or ZipOutputStream and your input stream in a
GZIPInputStream or ZipInputStream. All else is ordinary I/O
reading and writing. This is an example of mixing the char-oriented
streams with the byte-oriented streams: in uses the Reader
classes, whereas GZIPOutputStream’s constructor can accept only an
OutputStream object, not a Writer object. When the file is opened,
the GZIPInputStream is converted to a
Reader.
The library that supports the Zip format
is much more extensive. With it you can easily store multiple files, and
there’s even a separate class to make the process of reading a Zip file
easy. The library uses the standard Zip format so that it works seamlessly with
all the tools currently downloadable on the Internet. The following example has
the same form as the previous example, but it handles as many command-line
arguments as you want. In addition, it shows the use of the
Checksum classes to calculate and verify the
checksum for the file. There are two Checksum types:
Adler32 (which is faster) and
CRC32 (which is slower but slightly more
accurate).
//: c11:ZipCompress.java // Uses Zip compression to compress any // number of files given on the command line. import java.io.*; import java.util.*; import java.util.zip.*; public class ZipCompress { // Throw exceptions to console: public static void main(String[] args) throws IOException { FileOutputStream f = new FileOutputStream("test.zip"); CheckedOutputStream csum = new CheckedOutputStream( f, new Adler32()); ZipOutputStream out = new ZipOutputStream( new BufferedOutputStream(csum)); out.setComment("A test of Java Zipping"); // No corresponding getComment(), though. for(int i = 0; i < args.length; i++) { System.out.println( "Writing file " + args[i]); BufferedReader in = new BufferedReader( new FileReader(args[i])); out.putNextEntry(new ZipEntry(args[i])); int c; while((c = in.read()) != -1) out.write(c); in.close(); } out.close(); // Checksum valid only after the file // has been closed! System.out.println("Checksum: " + csum.getChecksum().getValue()); // Now extract the files: System.out.println("Reading file"); FileInputStream fi = new FileInputStream("test.zip"); CheckedInputStream csumi = new CheckedInputStream( fi, new Adler32()); ZipInputStream in2 = new ZipInputStream( new BufferedInputStream(csumi)); ZipEntry ze; while((ze = in2.getNextEntry()) != null) { System.out.println("Reading file " + ze); int x; while((x = in2.read()) != -1) System.out.write(x); } System.out.println("Checksum: " + csumi.getChecksum().getValue()); in2.close(); // Alternative way to open and read // zip files: ZipFile zf = new ZipFile("test.zip"); Enumeration e = zf.entries(); while(e.hasMoreElements()) { ZipEntry ze2 = (ZipEntry)e.nextElement(); System.out.println("File: " + ze2); // ... and extract the data as before } } } ///:~
For each file to add to the archive, you
must call putNextEntry( ) and pass it a
ZipEntry object. The
ZipEntry object contains an extensive interface that allows you to get
and set all the data available on that particular entry in your Zip file: name,
compressed and uncompressed sizes, date, CRC checksum, extra field data,
comment, compression method, and whether it’s a directory entry. However,
even though the Zip format has a way to set a password, this is not supported in
Java’s Zip library. And although CheckedInputStream and
CheckedOutputStream support both Adler32 and CRC32
checksums, the ZipEntry class supports only an interface for CRC. This is
a restriction of the underlying Zip format, but it might limit you from using
the faster Adler32.
To extract files, ZipInputStream
has a getNextEntry( ) method that returns the next ZipEntry
if there is one. As a more succinct alternative, you can read the file using a
ZipFile object, which has a method entries( ) to return an
Enumeration to the ZipEntries.
In order to read the checksum you must
somehow have access to the associated Checksum object. Here, a reference
to the CheckedOutputStream and CheckedInputStream objects is
retained, but you could also just hold onto a reference to the Checksum
object.
A baffling method in Zip streams is
setComment( ). As shown above, you can set a comment when
you’re writing a file, but there’s no way to recover the comment in
the ZipInputStream. Comments appear to be supported fully on an
entry-by-entry basis only via ZipEntry.
Of course, you are not limited to files
when using the GZIP or Zip libraries—you can compress
anything, including data to be sent through a network
connection.
The Zip format is also used in the
JAR (Java ARchive) file format,
which is a way to collect a group of files into a single compressed file, just
like Zip. However, like everything else in Java, JAR files are cross-platform so
you don’t need to worry about platform issues. You can also include audio
and image files as well as class files.
JAR files are particularly helpful when
you deal with the Internet. Before JAR files, your Web browser would have to
make repeated requests of a Web server in order to download all of the files
that make up an applet. In addition, each of these files was uncompressed. By
combining all of the files for a particular applet into a single JAR file, only
one server request is necessary and the transfer is faster because of
compression. And each entry in a JAR file can be digitally signed for security
(refer to the Java documentation for details).
A JAR file consists of a single file
containing a collection of zipped files along with a
“manifest” that describes them. (You can
create your own manifest file; otherwise the jar program will do it for
you.) You can find out more about JAR manifests in the JDK HTML
documentation.
The jar utility that comes with
Sun’s JDK automatically compresses the files of your choice. You invoke it
on the command line:
jar [options] destination [manifest] inputfile(s)
The options are simply a collection of
letters (no hyphen or any other indicator is necessary). Unix/Linux users will
note the similarity to the tar options. These are:
c |
Creates a new or empty archive.
|
t |
Lists the table of contents.
|
x |
Extracts all files. |
x file |
Extracts the named file. |
f |
Says: “I’m going to give you
the name of the file.” If you don’t use this, jar assumes that its
input will come from standard input, or, if it is creating a file, its output
will go to standard output. |
m |
Says that the first argument will be the
name of the user-created manifest file. |
v |
Generates verbose output describing what
jar is doing. |
0 |
Only store the files; doesn’t
compress the files (use to create a JAR file that you can put in your
classpath). |
M |
Don’t automatically create a
manifest file. |
If a subdirectory is included in the
files to be put into the JAR file, that subdirectory is automatically added,
including all of its subdirectories, etc. Path information is also
preserved.
Here are some typical ways to invoke
jar:
jar cf myJarFile.jar *.class
This creates a JAR file called
myJarFile.jar that contains all of the class files in the current
directory, along with an automatically generated manifest file.
jar cmf myJarFile.jar myManifestFile.mf *.class
Like the previous example, but adding a
user-created manifest file called myManifestFile.mf.
jar tf myJarFile.jar
Produces a table of contents of the files
in myJarFile.jar.
jar tvf myJarFile.jar
Adds the “verbose” flag to
give more detailed information about the files in
myJarFile.jar.
jar cvf myApp.jar audio classes image
Assuming audio, classes,
and image are subdirectories, this combines all of the subdirectories
into the file myApp.jar. The “verbose” flag is also included
to give extra feedback while the jar program is working.
If you create a JAR file using the
0 option, that file can be placed in your CLASSPATH:
CLASSPATH="lib1.jar;lib2.jar;"
Then Java can search lib1.jar and
lib2.jar for class files.
The jar tool isn’t as useful
as a zip utility. For example, you can’t add or update files to an
existing JAR file; you can create JAR files only from scratch. Also, you
can’t move files into a JAR file, erasing them as they are moved. However,
a JAR file created on one platform will be transparently readable by the
jar tool on any other platform (a problem that sometimes plagues
zip utilities).
Java’s object serialization
allows you to take any object that implements the Serializable interface
and turn it into a sequence of bytes that can later be fully restored to
regenerate the original object. This is even true across a network, which means
that the serialization mechanism automatically compensates for differences in
operating systems. That is, you can create an object on a Windows machine,
serialize it, and send it across the network to a Unix machine where it will be
correctly reconstructed. You don’t have to worry about the data
representations on the different machines, the byte ordering, or any other
details.
By itself, object serialization is
interesting because it allows you to implement lightweight persistence.
Remember that persistence means an object’s lifetime is not determined by
whether a program is executing—the object lives in between
invocations of the program. By taking a serializable object and writing it to
disk, then restoring that object when the program is reinvoked, you’re
able to produce the effect of persistence. The reason it’s called
“lightweight” is that you can’t simply define an object using
some kind of “persistent” keyword and let the system take care of
the details (although this might happen in the future). Instead, you must
explicitly serialize and deserialize the objects in your
program.
Object serialization was added to the
language to support two major features. Java’s remote method
invocation (RMI) allows objects that live on other machines to behave as if
they live on your machine. When sending messages to remote objects, object
serialization is necessary to transport the arguments and return values. RMI is
discussed in Chapter 15.
Object serialization is also necessary
for JavaBeans, described in Chapter 13. When a Bean is used, its state
information is generally configured at design-time. This state information must
be stored and later recovered when the program is started; object serialization
performs this task.
Serializing an object is quite simple, as
long as the object implements the Serializable interface (this interface
is just a flag and has no methods). When serialization was added to the
language, many standard library classes were changed to make them serializable,
including all of the wrappers for the primitive types, all of the container
classes, and many others. Even Class objects can be serialized. (See
Chapter 12 for the implications of this.)
To serialize an object, you create some
sort of OutputStream object and then wrap it inside an
ObjectOutputStream
object. At this point you need only call
writeObject( ) and
your object is serialized and sent to the OutputStream. To reverse the
process, you wrap an InputStream inside an ObjectInputStream and
call readObject( ).
What comes back is, as usual, a reference to an upcast Object, so you
must downcast to set things straight.
A particularly clever aspect of object
serialization is that it not only saves an image of your object but it also
follows all the references contained in your object and saves those
objects, and follows all the references in each of those objects, etc. This
is sometimes referred to as the
“web of objects”
that a single object can be connected to, and it includes arrays of references
to objects as well as member objects. If you had to maintain your own object
serialization scheme, maintaining the code to follow all these links would be a
bit mind-boggling. However, Java object serialization seems to pull it off
flawlessly, no doubt using an optimized algorithm that traverses the web of
objects. The following example tests the serialization mechanism by making a
“worm” of linked objects, each of which has a link to the next
segment in the worm as well as an array of references to objects of a different
class, Data:
//: c11:Worm.java // Demonstrates object serialization. import java.io.*; class Data implements Serializable { private int i; Data(int x) { i = x; } public String toString() { return Integer.toString(i); } } public class Worm implements Serializable { // Generate a random int value: private static int r() { return (int)(Math.random() * 10); } private Data[] d = { new Data(r()), new Data(r()), new Data(r()) }; private Worm next; private char c; // Value of i == number of segments Worm(int i, char x) { System.out.println(" Worm constructor: " + i); c = x; if(--i > 0) next = new Worm(i, (char)(x + 1)); } Worm() { System.out.println("Default constructor"); } public String toString() { String s = ":" + c + "("; for(int i = 0; i < d.length; i++) s += d[i].toString(); s += ")"; if(next != null) s += next.toString(); return s; } // Throw exceptions to console: public static void main(String[] args) throws ClassNotFoundException, IOException { Worm w = new Worm(6, 'a'); System.out.println("w = " + w); ObjectOutputStream out = new ObjectOutputStream( new FileOutputStream("worm.out")); out.writeObject("Worm storage"); out.writeObject(w); out.close(); // Also flushes output ObjectInputStream in = new ObjectInputStream( new FileInputStream("worm.out")); String s = (String)in.readObject(); Worm w2 = (Worm)in.readObject(); System.out.println(s + ", w2 = " + w2); ByteArrayOutputStream bout = new ByteArrayOutputStream(); ObjectOutputStream out2 = new ObjectOutputStream(bout); out2.writeObject("Worm storage"); out2.writeObject(w); out2.flush(); ObjectInputStream in2 = new ObjectInputStream( new ByteArrayInputStream( bout.toByteArray())); s = (String)in2.readObject(); Worm w3 = (Worm)in2.readObject(); System.out.println(s + ", w3 = " + w3); } } ///:~
To make things interesting, the array of
Data objects inside Worm are initialized with random numbers.
(This way you don’t suspect the compiler of keeping some kind of
meta-information.) Each Worm segment is labeled with a char
that’s automatically generated in the process of recursively generating
the linked list of Worms. When you create a Worm, you tell the
constructor how long you want it to be. To make the next reference it
calls the Worm constructor with a length of one less, etc. The final
next reference is left as null, indicating the end of the
Worm.
The point of all this was to make
something reasonably complex that couldn’t easily be serialized. The act
of serializing, however, is quite simple. Once the ObjectOutputStream is
created from some other stream, writeObject( ) serializes the
object. Notice the call to writeObject( ) for a String, as
well. You can also write all the primitive data types using the same methods as
DataOutputStream (they share the same interface).
There are two separate code sections that
look similar. The first writes and reads a file and the second, for variety,
writes and reads a ByteArray. You can read and write an object using
serialization to any DataInputStream or DataOutputStream
including, as you will see in the Chapter 15, a network. The output from one run
was:
Worm constructor: 6 Worm constructor: 5 Worm constructor: 4 Worm constructor: 3 Worm constructor: 2 Worm constructor: 1 w = :a(262):b(100):c(396):d(480):e(316):f(398) Worm storage, w2 = :a(262):b(100):c(396):d(480):e(316):f(398) Worm storage, w3 = :a(262):b(100):c(396):d(480):e(316):f(398)
You can see that the deserialized object
really does contain all of the links that were in the original
object.
Note that no constructor, not even the
default constructor, is called in the process of deserializing a
Serializable object. The entire object is restored by recovering data
from the InputStream.
You might wonder what’s necessary
for an object to be recovered from its serialized state. For example, suppose
you serialize an object and send it as a file or through a network to another
machine. Could a program on the other machine reconstruct the object using only
the contents of the file?
The best way to answer this question is
(as usual) by performing an experiment. The following file goes in the
subdirectory for this chapter:
//: c11:Alien.java // A serializable class. import java.io.*; public class Alien implements Serializable { } ///:~
The file that creates and serializes an
Alien object goes in the same directory:
//: c11:FreezeAlien.java // Create a serialized output file. import java.io.*; public class FreezeAlien { // Throw exceptions to console: public static void main(String[] args) throws IOException { ObjectOutput out = new ObjectOutputStream( new FileOutputStream("X.file")); Alien zorcon = new Alien(); out.writeObject(zorcon); } } ///:~
Rather than catching and handling
exceptions, this program takes the quick and dirty approach of passing the
exceptions out of main( ), so they’ll be reported on the
command line.
Once the program is compiled and run,
copy the resulting X.file to a subdirectory called xfiles, where
the following code goes:
//: c11:xfiles:ThawAlien.java // Try to recover a serialized file without the // class of object that's stored in that file. import java.io.*; public class ThawAlien { public static void main(String[] args) throws IOException, ClassNotFoundException { ObjectInputStream in = new ObjectInputStream( new FileInputStream("X.file")); Object mystery = in.readObject(); System.out.println(mystery.getClass()); } } ///:~
This program opens the file and reads in
the object mystery successfully. However, as soon as you try to find out
anything about the object—which requires the Class object for
Alien—the Java Virtual Machine (JVM) cannot find Alien.class
(unless it happens to be in the Classpath, which it shouldn’t be in this
example). You’ll get a ClassNotFoundException. (Once again, all
evidence of alien life vanishes before proof of its existence can be
verified!)
If you expect to do much after
you’ve recovered an object that has been serialized, you must make sure
that the JVM can find the associated .class file either in the local
class path or somewhere on the
Internet.
As you can see, the default serialization
mechanism is trivial to use. But what if you have special needs? Perhaps you
have special security issues and you don’t want to serialize portions of
your object, or perhaps it just doesn’t make sense for one subobject to be
serialized if that part needs to be created anew when the object is
recovered.
You can control the process of
serialization by implementing the
Externalizable interface
instead of the
Serializable interface.
The Externalizable interface extends the Serializable interface
and adds two methods,
writeExternal( ) and
readExternal( ),
that are automatically called for your object during serialization and
deserialization so that you can perform your special
operations.
The following example shows simple
implementations of the Externalizable interface methods. Note that
Blip1 and Blip2 are nearly identical except for a subtle
difference (see if you can discover it by looking at the code):
//: c11:Blips.java // Simple use of Externalizable & a pitfall. import java.io.*; import java.util.*; class Blip1 implements Externalizable { public Blip1() { System.out.println("Blip1 Constructor"); } public void writeExternal(ObjectOutput out) throws IOException { System.out.println("Blip1.writeExternal"); } public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException { System.out.println("Blip1.readExternal"); } } class Blip2 implements Externalizable { Blip2() { System.out.println("Blip2 Constructor"); } public void writeExternal(ObjectOutput out) throws IOException { System.out.println("Blip2.writeExternal"); } public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException { System.out.println("Blip2.readExternal"); } } public class Blips { // Throw exceptions to console: public static void main(String[] args) throws IOException, ClassNotFoundException { System.out.println("Constructing objects:"); Blip1 b1 = new Blip1(); Blip2 b2 = new Blip2(); ObjectOutputStream o = new ObjectOutputStream( new FileOutputStream("Blips.out")); System.out.println("Saving objects:"); o.writeObject(b1); o.writeObject(b2); o.close(); // Now get them back: ObjectInputStream in = new ObjectInputStream( new FileInputStream("Blips.out")); System.out.println("Recovering b1:"); b1 = (Blip1)in.readObject(); // OOPS! Throws an exception: //! System.out.println("Recovering b2:"); //! b2 = (Blip2)in.readObject(); } } ///:~
The output for this program
is:
Constructing objects: Blip1 Constructor Blip2 Constructor Saving objects: Blip1.writeExternal Blip2.writeExternal Recovering b1: Blip1 Constructor Blip1.readExternal
The reason that the Blip2 object
is not recovered is that trying to do so causes an exception. Can you see the
difference between Blip1 and Blip2? The constructor for
Blip1 is public, while the constructor for Blip2 is not,
and that causes the exception upon recovery. Try making Blip2’s
constructor public and removing the //! comments to see the
correct results.
When b1 is recovered, the
Blip1 default constructor is called. This is different from recovering a
Serializable object, in which the object is constructed entirely from its
stored bits, with no constructor calls. With an Externalizable object,
all the normal default construction behavior occurs (including the
initializations at the point of field definition), and then
readExternal( ) is called. You need to be aware of this—in
particular, the fact that all the default construction always takes
place—to produce the correct behavior in your Externalizable
objects.
Here’s an example that shows what
you must do to fully store and retrieve an Externalizable
object:
//: c11:Blip3.java // Reconstructing an externalizable object. import java.io.*; import java.util.*; class Blip3 implements Externalizable { int i; String s; // No initialization public Blip3() { System.out.println("Blip3 Constructor"); // s, i not initialized } public Blip3(String x, int a) { System.out.println("Blip3(String x, int a)"); s = x; i = a; // s & i initialized only in nondefault // constructor. } public String toString() { return s + i; } public void writeExternal(ObjectOutput out) throws IOException { System.out.println("Blip3.writeExternal"); // You must do this: out.writeObject(s); out.writeInt(i); } public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException { System.out.println("Blip3.readExternal"); // You must do this: s = (String)in.readObject(); i =in.readInt(); } public static void main(String[] args) throws IOException, ClassNotFoundException { System.out.println("Constructing objects:"); Blip3 b3 = new Blip3("A String ", 47); System.out.println(b3); ObjectOutputStream o = new ObjectOutputStream( new FileOutputStream("Blip3.out")); System.out.println("Saving object:"); o.writeObject(b3); o.close(); // Now get it back: ObjectInputStream in = new ObjectInputStream( new FileInputStream("Blip3.out")); System.out.println("Recovering b3:"); b3 = (Blip3)in.readObject(); System.out.println(b3); } } ///:~
The fields s and i are
initialized only in the second constructor, but not in the default constructor.
This means that if you don’t initialize s and i in
readExternal( ), it will be null (since the storage for the
object gets wiped to zero in the first step of object creation). If you comment
out the two lines of code following the phrases “You must do this”
and run the program, you’ll see that when the object is recovered,
s is null and i is zero.
If you are inheriting from an
Externalizable object, you’ll typically call the base-class
versions of writeExternal( ) and readExternal( ) to
provide proper storage and retrieval of the base-class
components.
So to make things work correctly you must
not only write the important data from the object during the
writeExternal( ) method (there is no default behavior that writes
any of the member objects for an Externalizable object), but you must
also recover that data in the readExternal( ) method. This can be a
bit confusing at first because the default construction behavior for an
Externalizable object can make it seem like some kind of storage and
retrieval takes place automatically. It does not.
When you’re controlling
serialization, there might be a particular subobject that you don’t want
Java’s serialization mechanism to automatically save and restore. This is
commonly the case if that subobject represents sensitive information that you
don’t want to serialize, such as a password. Even if that information is
private in the object, once it’s serialized it’s possible for
someone to access it by reading a file or intercepting a network
transmission.
One way to prevent sensitive parts of
your object from being serialized is to implement your class as
Externalizable, as shown previously. Then nothing is automatically
serialized and you can explicitly serialize only the necessary parts inside
writeExternal( ).
If you’re working with a
Serializable object, however, all serialization happens automatically. To
control this, you can turn off serialization on a field-by-field basis using the
transient
keyword, which says “Don’t bother saving or restoring
this—I’ll take care of it.”
For example, consider a Login
object that keeps information about a particular login session. Suppose
that, once you verify the login, you want to store the data, but without the
password. The easiest way to do this is by implementing
Serializable and marking the password
field as transient. Here’s what it looks like:
//: c11:Logon.java // Demonstrates the "transient" keyword. import java.io.*; import java.util.*; class Logon implements Serializable { private Date date = new Date(); private String username; private transient String password; Logon(String name, String pwd) { username = name; password = pwd; } public String toString() { String pwd = (password == null) ? "(n/a)" : password; return "logon info: \n " + "username: " + username + "\n date: " + date + "\n password: " + pwd; } public static void main(String[] args) throws IOException, ClassNotFoundException { Logon a = new Logon("Hulk", "myLittlePony"); System.out.println( "logon a = " + a); ObjectOutputStream o = new ObjectOutputStream( new FileOutputStream("Logon.out")); o.writeObject(a); o.close(); // Delay: int seconds = 5; long t = System.currentTimeMillis() + seconds * 1000; while(System.currentTimeMillis() < t) ; // Now get them back: ObjectInputStream in = new ObjectInputStream( new FileInputStream("Logon.out")); System.out.println( "Recovering object at " + new Date()); a = (Logon)in.readObject(); System.out.println( "logon a = " + a); } } ///:~
You can see that the date and
username fields are ordinary (not transient), and thus are
automatically serialized. However, the password is transient, and
so is not stored to disk; also the serialization mechanism makes no attempt to
recover it. The output is:
logon a = logon info: username: Hulk date: Sun Mar 23 18:25:53 PST 1997 password: myLittlePony Recovering object at Sun Mar 23 18:25:59 PST 1997 logon a = logon info: username: Hulk date: Sun Mar 23 18:25:53 PST 1997 password: (n/a)
When the object is recovered, the
password field is null. Note that toString( ) must
check for a null value of password because if you try to assemble
a String object using the overloaded ‘+’ operator, and
that operator encounters a null reference, you’ll get a
NullPointerException. (Newer versions of Java might contain code to avoid
this problem.)
You can also see that the date
field is stored to and recovered from disk and not generated
anew.
Since Externalizable objects do
not store any of their fields by default, the transient keyword is for
use with Serializable objects only.
If you’re not keen on implementing
the Externalizable interface, there’s
another approach. You can implement the Serializable interface and add
(notice I say “add” and not “override” or
“implement”) methods called
writeObject( ) and
readObject( ) that
will automatically be called when the object is serialized and deserialized,
respectively. That is, if you provide these two methods they will be used
instead of the default serialization.
The methods must have these exact
signatures:
private void writeObject(ObjectOutputStream stream) throws IOException; private void readObject(ObjectInputStream stream) throws IOException, ClassNotFoundException
From a design standpoint, things get
really weird here. First of all, you might think that because these methods are
not part of a base class or the Serializable interface, they ought to be
defined in their own interface(s). But notice that they are defined as
private, which means they are to be called only by other members of this
class. However, you don’t actually call them from other members of this
class, but instead the writeObject( ) and readObject( )
methods of the ObjectOutputStream and ObjectInputStream objects
call your object’s writeObject( ) and
readObject( ) methods. (Notice my tremendous restraint in not
launching into a long diatribe about using the same method names here. In a
word: confusing.) You might wonder how the ObjectOutputStream and
ObjectInputStream objects have access to private methods of your
class. We can only assume that this is part of the serialization
magic.
In any event, anything defined in an
interface is automatically public so if writeObject( )
and readObject( ) must be private, then they can’t be
part of an interface. Since you must follow the signatures exactly, the
effect is the same as if you’re implementing an
interface.
It would appear that when you call
ObjectOutputStream.writeObject( ), the Serializable object
that you pass it to is interrogated (using reflection, no doubt) to see if it
implements its own writeObject( ). If so, the normal serialization
process is skipped and the writeObject( ) is called. The same sort
of situation exists for readObject( ).
There’s one other twist. Inside
your writeObject( ), you can choose to perform the default
writeObject( ) action by calling defaultWriteObject( ).
Likewise, inside readObject( ) you can call
defaultReadObject( ). Here is a simple example that demonstrates how
you can control the storage and retrieval of a Serializable
object:
//: c11:SerialCtl.java // Controlling serialization by adding your own // writeObject() and readObject() methods. import java.io.*; public class SerialCtl implements Serializable { String a; transient String b; public SerialCtl(String aa, String bb) { a = "Not Transient: " + aa; b = "Transient: " + bb; } public String toString() { return a + "\n" + b; } private void writeObject(ObjectOutputStream stream) throws IOException { stream.defaultWriteObject(); stream.writeObject(b); } private void readObject(ObjectInputStream stream) throws IOException, ClassNotFoundException { stream.defaultReadObject(); b = (String)stream.readObject(); } public static void main(String[] args) throws IOException, ClassNotFoundException { SerialCtl sc = new SerialCtl("Test1", "Test2"); System.out.println("Before:\n" + sc); ByteArrayOutputStream buf = new ByteArrayOutputStream(); ObjectOutputStream o = new ObjectOutputStream(buf); o.writeObject(sc); // Now get it back: ObjectInputStream in = new ObjectInputStream( new ByteArrayInputStream( buf.toByteArray())); SerialCtl sc2 = (SerialCtl)in.readObject(); System.out.println("After:\n" + sc2); } } ///:~
In this example, one String field
is ordinary and the other is transient, to prove that the
non-transient field is saved by the
defaultWriteObject( )
method and the transient field is saved and restored explicitly. The
fields are initialized inside the constructor rather than at the point of
definition to prove that they are not being initialized by some automatic
mechanism during deserialization.
If you are going to use the default
mechanism to write the non-transient parts of your object, you must call
defaultWriteObject( ) as the first operation in
writeObject( ) and
defaultReadObject( )
as the first operation in readObject( ). These are strange method
calls. It would appear, for example, that you are calling
defaultWriteObject( ) for an ObjectOutputStream and passing
it no arguments, and yet it somehow turns around and knows the reference to your
object and how to write all the non-transient parts.
Spooky.
The storage and retrieval of the
transient objects uses more familiar code. And yet, think about what
happens here. In main( ), a SerialCtl object is created, and
then it’s serialized to an ObjectOutputStream. (Notice in this case
that a buffer is used instead of a file—it’s all the same to the
ObjectOutputStream.) The serialization occurs in the
line:
o.writeObject(sc);
The writeObject( ) method
must be examining sc to see if it has its own writeObject( )
method. (Not by checking the interface—there isn’t one—or the
class type, but by actually hunting for the method using reflection.) If it
does, it uses that. A similar approach holds true for readObject( ).
Perhaps this was the only practical way that they could solve the problem, but
it’s certainly strange.
It’s possible that you might want
to change the version of a serializable class (objects of the original class
might be stored in a database, for example). This is supported but you’ll
probably do it only in special cases, and it requires an extra depth of
understanding that we will not attempt to achieve here. The JDK HTML documents
downloadable from java.sun.com cover this topic quite
thoroughly.
You will also notice in the JDK HTML
documentation many comments that begin with:
Warning: Serialized
objects of this class will not be compatible with future Swing releases. The
current serialization support is appropriate for short term storage or RMI
between applications. ...
This is because the versioning mechanism
is too simple to work reliably in all situations, especially with JavaBeans.
They’re working on a correction for the design, and that’s what the
warning is about.
It’s quite appealing to use
serialization technology to store some of the state of
your program so that you can easily restore the program to the current state
later. But before you can do this, some questions must be answered. What happens
if you serialize two objects that both have a reference to a third object? When
you restore those two objects from their serialized state, do you get only one
occurrence of the third object? What if you serialize your two objects to
separate files and deserialize them in different parts of your
code?
Here’s an example that shows the
problem:
//: c11:MyWorld.java import java.io.*; import java.util.*; class House implements Serializable {} class Animal implements Serializable { String name; House preferredHouse; Animal(String nm, House h) { name = nm; preferredHouse = h; } public String toString() { return name + "[" + super.toString() + "], " + preferredHouse + "\n"; } } public class MyWorld { public static void main(String[] args) throws IOException, ClassNotFoundException { House house = new House(); ArrayList animals = new ArrayList(); animals.add( new Animal("Bosco the dog", house)); animals.add( new Animal("Ralph the hamster", house)); animals.add( new Animal("Fronk the cat", house)); System.out.println("animals: " + animals); ByteArrayOutputStream buf1 = new ByteArrayOutputStream(); ObjectOutputStream o1 = new ObjectOutputStream(buf1); o1.writeObject(animals); o1.writeObject(animals); // Write a 2nd set // Write to a different stream: ByteArrayOutputStream buf2 = new ByteArrayOutputStream(); ObjectOutputStream o2 = new ObjectOutputStream(buf2); o2.writeObject(animals); // Now get them back: ObjectInputStream in1 = new ObjectInputStream( new ByteArrayInputStream( buf1.toByteArray())); ObjectInputStream in2 = new ObjectInputStream( new ByteArrayInputStream( buf2.toByteArray())); ArrayList animals1 = (ArrayList)in1.readObject(); ArrayList animals2 = (ArrayList)in1.readObject(); ArrayList animals3 = (ArrayList)in2.readObject(); System.out.println("animals1: " + animals1); System.out.println("animals2: " + animals2); System.out.println("animals3: " + animals3); } } ///:~
One thing that’s interesting here
is that it’s possible to use object serialization to and from a byte array
as a way of doing a “deep copy” of any object that’s
Serializable. (A deep copy means that you’re duplicating the entire
web of objects, rather than just the basic object and its references.) Copying
is covered in depth in Appendix A.
Animal objects contain fields of
type House. In main( ), an ArrayList of these
Animals is created and it is serialized twice to one stream and then
again to a separate stream. When these are deserialized and printed, you see the
following results for one run (the objects will be in different memory locations
each run):
animals: [Bosco the dog[Animal@1cc76c], House@1cc769 , Ralph the hamster[Animal@1cc76d], House@1cc769 , Fronk the cat[Animal@1cc76e], House@1cc769 ] animals1: [Bosco the dog[Animal@1cca0c], House@1cca16 , Ralph the hamster[Animal@1cca17], House@1cca16 , Fronk the cat[Animal@1cca1b], House@1cca16 ] animals2: [Bosco the dog[Animal@1cca0c], House@1cca16 , Ralph the hamster[Animal@1cca17], House@1cca16 , Fronk the cat[Animal@1cca1b], House@1cca16 ] animals3: [Bosco the dog[Animal@1cca52], House@1cca5c , Ralph the hamster[Animal@1cca5d], House@1cca5c , Fronk the cat[Animal@1cca61], House@1cca5c ]
Of course you expect that the
deserialized objects have different addresses from their originals. But notice
that in animals1 and animals2 the same addresses appear, including
the references to the House object that both share. On the other hand,
when animals3 is recovered the system has no way of knowing that the
objects in this other stream are aliases of the objects in the first stream, so
it makes a completely different web of objects.
As long as you’re serializing
everything to a single stream, you’ll be able to recover the same web of
objects that you wrote, with no accidental duplication of objects. Of course,
you can change the state of your objects in between the time you write the first
and the last, but that’s your responsibility—the objects will be
written in whatever state they are in (and with whatever connections they have
to other objects) at the time you serialize them.
The safest thing to do if you want to
save the state of a system is to serialize as an “atomic” operation.
If you serialize some things, do some other work, and serialize some more, etc.,
then you will not be storing the system safely. Instead, put all the objects
that comprise the state of your system in a single container and simply write
that container out in one operation. Then you can restore it with a single
method call as well.
The following example is an imaginary
computer-aided design (CAD) system that demonstrates the approach. In addition,
it throws in the issue of static fields—if you look at the
documentation you’ll see that Class is Serializable, so it
should be easy to store the static fields by simply serializing the
Class object. That seems
like a sensible approach, anyway.
//: c11:CADState.java // Saving and restoring the state of a // pretend CAD system. import java.io.*; import java.util.*; abstract class Shape implements Serializable { public static final int RED = 1, BLUE = 2, GREEN = 3; private int xPos, yPos, dimension; private static Random r = new Random(); private static int counter = 0; abstract public void setColor(int newColor); abstract public int getColor(); public Shape(int xVal, int yVal, int dim) { xPos = xVal; yPos = yVal; dimension = dim; } public String toString() { return getClass() + " color[" + getColor() + "] xPos[" + xPos + "] yPos[" + yPos + "] dim[" + dimension + "]\n"; } public static Shape randomFactory() { int xVal = r.nextInt() % 100; int yVal = r.nextInt() % 100; int dim = r.nextInt() % 100; switch(counter++ % 3) { default: case 0: return new Circle(xVal, yVal, dim); case 1: return new Square(xVal, yVal, dim); case 2: return new Line(xVal, yVal, dim); } } } class Circle extends Shape { private static int color = RED; public Circle(int xVal, int yVal, int dim) { super(xVal, yVal, dim); } public void setColor(int newColor) { color = newColor; } public int getColor() { return color; } } class Square extends Shape { private static int color; public Square(int xVal, int yVal, int dim) { super(xVal, yVal, dim); color = RED; } public void setColor(int newColor) { color = newColor; } public int getColor() { return color; } } class Line extends Shape { private static int color = RED; public static void serializeStaticState(ObjectOutputStream os) throws IOException { os.writeInt(color); } public static void deserializeStaticState(ObjectInputStream os) throws IOException { color = os.readInt(); } public Line(int xVal, int yVal, int dim) { super(xVal, yVal, dim); } public void setColor(int newColor) { color = newColor; } public int getColor() { return color; } } public class CADState { public static void main(String[] args) throws Exception { ArrayList shapeTypes, shapes; if(args.length == 0) { shapeTypes = new ArrayList(); shapes = new ArrayList(); // Add references to the class objects: shapeTypes.add(Circle.class); shapeTypes.add(Square.class); shapeTypes.add(Line.class); // Make some shapes: for(int i = 0; i < 10; i++) shapes.add(Shape.randomFactory()); // Set all the static colors to GREEN: for(int i = 0; i < 10; i++) ((Shape)shapes.get(i)) .setColor(Shape.GREEN); // Save the state vector: ObjectOutputStream out = new ObjectOutputStream( new FileOutputStream("CADState.out")); out.writeObject(shapeTypes); Line.serializeStaticState(out); out.writeObject(shapes); } else { // There's a command-line argument ObjectInputStream in = new ObjectInputStream( new FileInputStream(args[0])); // Read in the same order they were written: shapeTypes = (ArrayList)in.readObject(); Line.deserializeStaticState(in); shapes = (ArrayList)in.readObject(); } // Display the shapes: System.out.println(shapes); } } ///:~
The Shape class implements
Serializable, so anything that is inherited from
Shape is automatically Serializable as well. Each Shape
contains data, and each derived Shape class contains a static
field that determines the color of all of those types of Shapes. (Placing
a static field in the base class would result in only one field, since
static fields are not duplicated in derived classes.) Methods in the base
class can be overridden to set the color for the various types (static
methods are not dynamically bound, so these are normal methods). The
randomFactory( ) method creates a different Shape each time
you call it, using random values for the Shape data.
Circle and Square are
straightforward extensions of Shape; the only difference is that
Circle initializes color at the point of definition and
Square initializes it in the constructor. We’ll leave the
discussion of Line for later.
In main( ), one
ArrayList is used to hold the Class objects and the other to hold
the shapes. If you don’t provide a command line argument the
shapeTypes ArrayList is created and the Class objects are
added, and then the shapes ArrayList is created and Shape
objects are added. Next, all the static color values are set to
GREEN, and everything is serialized to the file
CADState.out.
If you provide a command line argument
(presumably CADState.out), that file is opened and used to restore the
state of the program. In both situations, the resulting ArrayList of
Shapes is printed. The results from one run are:
>java CADState [class Circle color[3] xPos[-51] yPos[-99] dim[38] , class Square color[3] xPos[2] yPos[61] dim[-46] , class Line color[3] xPos[51] yPos[73] dim[64] , class Circle color[3] xPos[-70] yPos[1] dim[16] , class Square color[3] xPos[3] yPos[94] dim[-36] , class Line color[3] xPos[-84] yPos[-21] dim[-35] , class Circle color[3] xPos[-75] yPos[-43] dim[22] , class Square color[3] xPos[81] yPos[30] dim[-45] , class Line color[3] xPos[-29] yPos[92] dim[17] , class Circle color[3] xPos[17] yPos[90] dim[-76] ] >java CADState CADState.out [class Circle color[1] xPos[-51] yPos[-99] dim[38] , class Square color[0] xPos[2] yPos[61] dim[-46] , class Line color[3] xPos[51] yPos[73] dim[64] , class Circle color[1] xPos[-70] yPos[1] dim[16] , class Square color[0] xPos[3] yPos[94] dim[-36] , class Line color[3] xPos[-84] yPos[-21] dim[-35] , class Circle color[1] xPos[-75] yPos[-43] dim[22] , class Square color[0] xPos[81] yPos[30] dim[-45] , class Line color[3] xPos[-29] yPos[92] dim[17] , class Circle color[1] xPos[17] yPos[90] dim[-76] ]
You can see that the values of
xPos, yPos, and dim were all stored and recovered
successfully, but there’s something wrong with the retrieval of the
static information. It’s all “3” going in, but it
doesn’t come out that way. Circles have a value of 1 (RED,
which is the definition), and Squares have a value of 0 (remember, they
are initialized in the constructor). It’s as if the statics
didn’t get serialized at all! That’s right—even though class
Class is Serializable, it doesn’t do what you expect. So if
you want to serialize statics, you must do it yourself.
This is what the
serializeStaticState( ) and deserializeStaticState( )
static methods in Line are for. You can see that they are
explicitly called as part of the storage and retrieval process. (Note that the
order of writing to the serialize file and reading back from it must be
maintained.) Thus to make CADState.java run correctly you
must:
Another issue you
might have to think about is security, since serialization also saves
private data. If you have a security issue, those fields should be marked
as transient. But then you have to design a secure way to store that
information so that when you do a restore you can reset those private
variables.
Tokenizing is the process of
breaking a sequence of characters into a sequence of “tokens,” which
are bits of text delimited by whatever you choose. For example, your tokens
could be words, and then they would be delimited by white space and punctuation.
There are two classes provided in the standard Java library that can be used for
tokenization: StreamTokenizer and
StringTokenizer.
Although StreamTokenizer is not
derived from InputStream or OutputStream, it works only with
InputStream objects, so it rightfully belongs in the I/O portion of the
library.
Consider a program to count the
occurrence of words in a text file:
//: c11:WordCount.java // Counts words from a file, outputs // results in sorted form. import java.io.*; import java.util.*; class Counter { private int i = 1; int read() { return i; } void increment() { i++; } } public class WordCount { private FileReader file; private StreamTokenizer st; // A TreeMap keeps keys in sorted order: private TreeMap counts = new TreeMap(); WordCount(String filename) throws FileNotFoundException { try { file = new FileReader(filename); st = new StreamTokenizer( new BufferedReader(file)); st.ordinaryChar('.'); st.ordinaryChar('-'); } catch(FileNotFoundException e) { System.err.println( "Could not open " + filename); throw e; } } void cleanup() { try { file.close(); } catch(IOException e) { System.err.println( "file.close() unsuccessful"); } } void countWords() { try { while(st.nextToken() != StreamTokenizer.TT_EOF) { String s; switch(st.ttype) { case StreamTokenizer.TT_EOL: s = new String("EOL"); break; case StreamTokenizer.TT_NUMBER: s = Double.toString(st.nval); break; case StreamTokenizer.TT_WORD: s = st.sval; // Already a String break; default: // single character in ttype s = String.valueOf((char)st.ttype); } if(counts.containsKey(s)) ((Counter)counts.get(s)).increment(); else counts.put(s, new Counter()); } } catch(IOException e) { System.err.println( "st.nextToken() unsuccessful"); } } Collection values() { return counts.values(); } Set keySet() { return counts.keySet(); } Counter getCounter(String s) { return (Counter)counts.get(s); } public static void main(String[] args) throws FileNotFoundException { WordCount wc = new WordCount(args[0]); wc.countWords(); Iterator keys = wc.keySet().iterator(); while(keys.hasNext()) { String key = (String)keys.next(); System.out.println(key + ": " + wc.getCounter(key).read()); } wc.cleanup(); } } ///:~
Presenting the words in sorted form is
easy to do by storing the data in a TreeMap, which automatically
organizes its keys in sorted order (see Chapter 9). When you get a set of
keys using keySet( ), they will also be in sorted
order.
To open the file, a FileReader is
used, and to turn the file into words a StreamTokenizer is created from
the FileReader wrapped in a BufferedReader. In
StreamTokenizer, there is a default list of separators, and you can add
more with a set of methods. Here, ordinaryChar( ) is used to say
“This character has no significance that I’m interested in,”
so the parser doesn’t include it as part of any of the words that it
creates. For example, saying st.ordinaryChar('.') means that periods will
not be included as parts of the words that are parsed. You can find more
information in the JDK HTML documentation from
java.sun.com.
In countWords( ), the tokens
are pulled one at a time from the stream, and the ttype information is
used to determine what to do with each token, since a token can be an
end-of-line, a number, a string, or a single character.
Once a token is found, the
TreeMap counts is queried to see if it already
contains the token as a key. If it does, the corresponding Counter object
is incremented to indicate that another instance of this word has been found. If
not, a new Counter is created—since the Counter constructor
initializes its value to one, this also acts to count the word.
WordCount is not a type of
TreeMap, so it wasn’t inherited. It performs a specific type of
functionality, so even though the keys( ) and values( )
methods must be reexposed, that still doesn’t mean that
inheritance should be used since
a number of TreeMap methods are inappropriate here. In addition, other
methods like getCounter( ), which get the Counter for a
particular String, and sortedKeys( ), which produces an
Iterator, finish the change in the shape of WordCount’s
interface.
In main( ) you can see the
use of a WordCount to open and count the words in a file—it just
takes two lines of code. Then an Iterator to a sorted list of keys (words) is
extracted, and this is used to pull out each key and associated Count.
The call to cleanup( ) is necessary to ensure that the file is
closed.
Although it isn’t part of the I/O
library, the StringTokenizer has sufficiently similar functionality to
StreamTokenizer that it will be described here.
The
StringTokenizer returns the tokens within a
string one at a time. These tokens are consecutive characters delimited by tabs,
spaces, and newlines. Thus, the tokens of the string “Where is my
cat?” are “Where”, “is”, “my”, and
“cat?” Like the StreamTokenizer, you can tell the
StringTokenizer to break up the input in any way that you want, but with
StringTokenizer you do this by passing a second argument to the
constructor, which is a String of the delimiters you wish to use. In
general, if you need more sophistication, use a
StreamTokenizer.
You ask a StringTokenizer object
for the next token in the string using the nextToken( ) method,
which either returns the token or an empty string to indicate that no tokens
remain.
As an example, the following program
performs a limited analysis of a sentence, looking for key phrase sequences to
indicate whether happiness or sadness is implied.
//: c11:AnalyzeSentence.java // Look for particular sequences in sentences. import java.util.*; public class AnalyzeSentence { public static void main(String[] args) { analyze("I am happy about this"); analyze("I am not happy about this"); analyze("I am not! I am happy"); analyze("I am sad about this"); analyze("I am not sad about this"); analyze("I am not! I am sad"); analyze("Are you happy about this?"); analyze("Are you sad about this?"); analyze("It's you! I am happy"); analyze("It's you! I am sad"); } static StringTokenizer st; static void analyze(String s) { prt("\nnew sentence >> " + s); boolean sad = false; st = new StringTokenizer(s); while (st.hasMoreTokens()) { String token = next(); // Look until you find one of the // two starting tokens: if(!token.equals("I") && !token.equals("Are")) continue; // Top of while loop if(token.equals("I")) { String tk2 = next(); if(!tk2.equals("am")) // Must be after I break; // Out of while loop else { String tk3 = next(); if(tk3.equals("sad")) { sad = true; break; // Out of while loop } if (tk3.equals("not")) { String tk4 = next(); if(tk4.equals("sad")) break; // Leave sad false if(tk4.equals("happy")) { sad = true; break; } } } } if(token.equals("Are")) { String tk2 = next(); if(!tk2.equals("you")) break; // Must be after Are String tk3 = next(); if(tk3.equals("sad")) sad = true; break; // Out of while loop } } if(sad) prt("Sad detected"); } static String next() { if(st.hasMoreTokens()) { String s = st.nextToken(); prt(s); return s; } else return ""; } static void prt(String s) { System.out.println(s); } } ///:~
For each string being analyzed, a
while loop is entered and tokens are pulled off the string. Notice the
first if statement, which says to continue (go back to the
beginning of the loop and start again) if the token is neither an
“I” nor an “Are.” This means
that it will get tokens until an “I” or an “Are” is
found. You might think to use the == instead of the
equals( ) method,
but that won’t work correctly, since == compares reference values
while equals( ) compares contents.
The logic of the rest of the
analyze( ) method is that the pattern that’s being searched
for is “I am sad,” “I am not happy,” or “Are you
sad?” Without the break statement, the code for this would be even
messier than it is. You should be aware that a typical parser (this is a
primitive example of one) normally has a table of these tokens and a piece of
code that moves through the states in the table as new tokens are
read.
You should think of the
StringTokenizer only as shorthand for a simple and specific kind of
StreamTokenizer. However, if you have a String that you want to
tokenize and StringTokenizer is too limited, all you have to do is turn
it into a stream with StringBufferInputStream and then use that to create
a much more powerful
StreamTokenizer.
In this section we’ll look at a
more complete example of the use of Java I/O, which also uses tokenization.
This project is directly useful
because it performs a style check to make sure that your capitalization conforms
to the Java style as found at java.sun.com/docs/codeconv/index.html. It
opens each .java file in the current directory and extracts all the class
names and identifiers, then shows you if any of them don’t meet the Java
style.
For the program to operate correctly, you
must first build a class name repository to hold all the class names in the
standard Java library. You do this by moving into all the source code
subdirectories for the standard Java library and running ClassScanner in
each subdirectory. Provide as arguments the name of the repository file (using
the same path and name each time) and the -a command-line option to
indicate that the class names should be added to the
repository.
To use the program to check your code,
hand it the path and name of the repository to use. It will check all the
classes and identifiers in the current directory and tell you which ones
don’t follow the typical Java capitalization style.
You should be aware that the program
isn’t perfect; there are a few times when it will point out what it thinks
is a problem but on looking at the code you’ll see that nothing needs to
be changed. This is a little annoying, but it’s still much easier than
trying to find all these cases by staring at your code.
//: c11:ClassScanner.java // Scans all files in directory for classes // and identifiers, to check capitalization. // Assumes properly compiling code listings. // Doesn't do everything right, but is a // useful aid. import java.io.*; import java.util.*; class MultiStringMap extends HashMap { public void add(String key, String value) { if(!containsKey(key)) put(key, new ArrayList()); ((ArrayList)get(key)).add(value); } public ArrayList getArrayList(String key) { if(!containsKey(key)) { System.err.println( "ERROR: can't find key: " + key); System.exit(1); } return (ArrayList)get(key); } public void printValues(PrintStream p) { Iterator k = keySet().iterator(); while(k.hasNext()) { String oneKey = (String)k.next(); ArrayList val = getArrayList(oneKey); for(int i = 0; i < val.size(); i++) p.println((String)val.get(i)); } } } public class ClassScanner { private File path; private String[] fileList; private Properties classes = new Properties(); private MultiStringMap classMap = new MultiStringMap(), identMap = new MultiStringMap(); private StreamTokenizer in; public ClassScanner() throws IOException { path = new File("."); fileList = path.list(new JavaFilter()); for(int i = 0; i < fileList.length; i++) { System.out.println(fileList[i]); try { scanListing(fileList[i]); } catch(FileNotFoundException e) { System.err.println("Could not open " + fileList[i]); } } } void scanListing(String fname) throws IOException { in = new StreamTokenizer( new BufferedReader( new FileReader(fname))); // Doesn't seem to work: // in.slashStarComments(true); // in.slashSlashComments(true); in.ordinaryChar('/'); in.ordinaryChar('.'); in.wordChars('_', '_'); in.eolIsSignificant(true); while(in.nextToken() != StreamTokenizer.TT_EOF) { if(in.ttype == '/') eatComments(); else if(in.ttype == StreamTokenizer.TT_WORD) { if(in.sval.equals("class") || in.sval.equals("interface")) { // Get class name: while(in.nextToken() != StreamTokenizer.TT_EOF && in.ttype != StreamTokenizer.TT_WORD) ; classes.put(in.sval, in.sval); classMap.add(fname, in.sval); } if(in.sval.equals("import") || in.sval.equals("package")) discardLine(); else // It's an identifier or keyword identMap.add(fname, in.sval); } } } void discardLine() throws IOException { while(in.nextToken() != StreamTokenizer.TT_EOF && in.ttype != StreamTokenizer.TT_EOL) ; // Throw away tokens to end of line } // StreamTokenizer's comment removal seemed // to be broken. This extracts them: void eatComments() throws IOException { if(in.nextToken() != StreamTokenizer.TT_EOF) { if(in.ttype == '/') discardLine(); else if(in.ttype != '*') in.pushBack(); else while(true) { if(in.nextToken() == StreamTokenizer.TT_EOF) break; if(in.ttype == '*') if(in.nextToken() != StreamTokenizer.TT_EOF && in.ttype == '/') break; } } } public String[] classNames() { String[] result = new String[classes.size()]; Iterator e = classes.keySet().iterator(); int i = 0; while(e.hasNext()) result[i++] = (String)e.next(); return result; } public void checkClassNames() { Iterator files = classMap.keySet().iterator(); while(files.hasNext()) { String file = (String)files.next(); ArrayList cls = classMap.getArrayList(file); for(int i = 0; i < cls.size(); i++) { String className = (String)cls.get(i); if(Character.isLowerCase( className.charAt(0))) System.out.println( "class capitalization error, file: " + file + ", class: " + className); } } } public void checkIdentNames() { Iterator files = identMap.keySet().iterator(); ArrayList reportSet = new ArrayList(); while(files.hasNext()) { String file = (String)files.next(); ArrayList ids = identMap.getArrayList(file); for(int i = 0; i < ids.size(); i++) { String id = (String)ids.get(i); if(!classes.contains(id)) { // Ignore identifiers of length 3 or // longer that are all uppercase // (probably static final values): if(id.length() >= 3 && id.equals( id.toUpperCase())) continue; // Check to see if first char is upper: if(Character.isUpperCase(id.charAt(0))){ if(reportSet.indexOf(file + id) == -1){ // Not reported yet reportSet.add(file + id); System.out.println( "Ident capitalization error in:" + file + ", ident: " + id); } } } } } } static final String usage = "Usage: \n" + "ClassScanner classnames -a\n" + "\tAdds all the class names in this \n" + "\tdirectory to the repository file \n" + "\tcalled 'classnames'\n" + "ClassScanner classnames\n" + "\tChecks all the java files in this \n" + "\tdirectory for capitalization errors, \n" + "\tusing the repository file 'classnames'"; private static void usage() { System.err.println(usage); System.exit(1); } public static void main(String[] args) throws IOException { if(args.length < 1 || args.length > 2) usage(); ClassScanner c = new ClassScanner(); File old = new File(args[0]); if(old.exists()) { try { // Try to open an existing // properties file: InputStream oldlist = new BufferedInputStream( new FileInputStream(old)); c.classes.load(oldlist); oldlist.close(); } catch(IOException e) { System.err.println("Could not open " + old + " for reading"); System.exit(1); } } if(args.length == 1) { c.checkClassNames(); c.checkIdentNames(); } // Write the class names to a repository: if(args.length == 2) { if(!args[1].equals("-a")) usage(); try { BufferedOutputStream out = new BufferedOutputStream( new FileOutputStream(args[0])); c.classes.store(out, "Classes found by ClassScanner.java"); out.close(); } catch(IOException e) { System.err.println( "Could not write " + args[0]); System.exit(1); } } } } class JavaFilter implements FilenameFilter { public boolean accept(File dir, String name) { // Strip path information: String f = new File(name).getName(); return f.trim().endsWith(".java"); } } ///:~
The
class MultiStringMap is a tool that allows you to map a group of strings
onto each key entry. It uses a HashMap (this time with inheritance) with
the key as the single string that’s mapped onto the ArrayList
value. The add( ) method simply checks to see if there’s a key
already in the HashMap, and if not it puts one there. The
getArrayList( ) method produces an ArrayList for a particular
key, and printValues( ), which is primarily useful for debugging,
prints out all the values ArrayList by ArrayList.
To keep life simple, the class names from
the standard Java libraries are all put into a
Properties object (from the standard Java
library). Remember that a Properties object is a HashMap that
holds only String objects for both the key and value entries. However, it
can be saved to disk and restored from disk in one method call, so it’s
ideal for the repository of names. Actually, we need only a list of names, and a
HashMap can’t accept null for either its key or its value
entry. So the same object will be used for both the key and the
value.
For the classes and identifiers that are
discovered for the files in a particular directory, two MultiStringMaps
are used: classMap and identMap. Also, when the program starts up
it loads the standard class name repository into the Properties object
called classes, and when a new class name is found in the local directory
that is also added to classes as well as to classMap. This way,
classMap can be used to step through all the classes in the local
directory, and classes can be used to see if the current token is a class
name (which indicates a definition of an object or method is beginning, so grab
the next tokens—until a semicolon—and put them into
identMap).
The default constructor for
ClassScanner creates a list of file names, using the JavaFilter
implementation of
FilenameFilter, shown at
the end of the file. Then it calls scanListing( ) for each file
name.
Inside scanListing( ) the
source code file is opened and turned into a
StreamTokenizer. In the
documentation, passing true to slashStarComments( ) and
slashSlashComments( ) is supposed to strip those comments out, but
this seems to be a bit flawed, as it doesn’t quite work. Instead, those
lines are commented out and the comments are extracted by another method. To do
this, the “/” must be captured as an ordinary character
rather than letting the StreamTokenizer absorb it as part of a comment,
and the ordinaryChar( ) method tells the StreamTokenizer
to do this. This is also true for dots (“.”),
since we want to have the method calls pulled apart into individual identifiers.
However, the underscore, which is ordinarily treated by StreamTokenizer
as an individual character, should be left as part of identifiers since it
appears in such static final values as TT_EOF, etc., used
in this very program. The wordChars( ) method takes a range of
characters you want to add to those that are left inside a token that is being
parsed as a word. Finally, when parsing for one-line comments or discarding a
line we need to know when an end-of-line occurs, so by calling
eolIsSignificant(true) the EOL will show up rather than being absorbed by
the StreamTokenizer.
The rest of scanListing( )
reads and reacts to tokens until the end of the file, signified when
nextToken( ) returns the final static value
StreamTokenizer.TT_EOF.
If the token is a “/” it is
potentially a comment, so eatComments( ) is called to deal with it.
The only other situation we’re interested in here is if it’s a word,
of which there are some special cases.
If the word is class or
interface then the next token represents a class or interface name, and
it is put into classes and classMap. If the word is import
or package, then we don’t want the rest of the line. Anything else
must be an identifier (which we’re interested in) or a keyword (which
we’re not, but they’re all lowercase anyway so it won’t spoil
things to put those in). These are added to identMap.
The discardLine( ) method is
a simple tool that looks for the end of a line. Note that any time you get a new
token, you must check for the end of the file.
The eatComments( ) method is
called whenever a forward slash is encountered in the main parsing loop.
However, that doesn’t necessarily mean a comment has been found, so the
next token must be extracted to see if it’s another forward slash (in
which case the line is discarded) or an asterisk. But if it’s neither of
those, it means the token you’ve just pulled out is needed back in the
main parsing loop! Fortunately, the
pushBack( ) method
allows you to “push back” the current token onto the input stream so
that when the main parsing loop calls
nextToken( ) it will
get the one you just pushed back.
For convenience, the
classNames( ) method produces an array of all the names in the
classes container. This method is not used in the program but is helpful
for debugging.
The next two methods are the ones in
which the actual checking takes place. In checkClassNames( ), the
class names are extracted from the classMap (which, remember, contains
only the names in this directory, organized by file name so the file name can be
printed along with the errant class name). This is accomplished by pulling each
associated ArrayList and stepping through that, looking to see if the
first character is lowercase. If so, the appropriate error message is
printed.
In checkIdentNames( ), a
similar approach is taken: each identifier name is extracted from
identMap. If the name is not in the classes list, it’s
assumed to be an identifier or keyword. A special case is checked: if the
identifier length is three or more and all the characters are uppercase,
this identifier is ignored because it’s probably a static
final value such as TT_EOF. Of course, this is not a perfect
algorithm, but it assumes that you’ll eventually notice any all-uppercase
identifiers that are out of place.
Instead of reporting every identifier
that starts with an uppercase character, this method keeps track of which ones
have already been reported in an ArrayList called
reportSet( ). This treats the ArrayList as a
“set” that tells you whether an item is already in the set. The item
is produced by concatenating the file name and identifier. If the element
isn’t in the set, it’s added and then the report is
made.
The rest of the listing is comprised of
main( ), which busies itself by handling the command line arguments
and figuring out whether you’re building a repository of class names from
the standard Java library or checking the validity of code you’ve written.
In both cases it makes a ClassScanner object.
Whether you’re building a
repository or using one, you must try to open the existing repository. By making
a File object and testing
for existence, you can decide whether to open the file and load( )
the Properties list classes inside ClassScanner. (The
classes from the repository add to, rather than overwrite, the classes found by
the ClassScanner constructor.) If you provide only one command-line
argument it means that you want to perform a check of the class names and
identifier names, but if you provide two arguments (the second being
“-a”) you’re building a class name repository.
In this case, an output file is opened and the method
Properties.save( ) is used to write the list into a file, along with
a string that provides header file
information.
The Java I/O stream library does satisfy
the basic requirements: you can perform reading and writing with the console, a
file, a block of memory, or even across the Internet (as you will see in Chapter
15). With inheritance, you can create new types of input and output objects. And
you can even add a simple extensibility to the kinds of objects a stream will
accept by redefining the toString( ) method that’s
automatically called when you pass an object to a method that’s expecting
a String (Java’s limited “automatic type conversion”).
There are questions left unanswered by
the documentation and design of the I/O stream library. For example, it would
have been nice if you could say that you want an exception thrown if you try to
overwrite a file when opening it for output—some programming systems allow
you to specify that you want to open an output file, but only if it
doesn’t already exist. In Java, it appears that you are supposed to use a
File object to determine whether a file exists, because if you open it as
a FileOutputStream or FileWriter it will always get
overwritten.
The I/O stream library brings up mixed
feelings; it does much of the job and it’s portable. But if you
don’t already understand the decorator pattern, the design is
nonintuitive, so there’s extra overhead in learning and teaching it.
It’s also incomplete: there’s no support for the kind of output
formatting that almost every other language’s I/O package supports.
However, once you do understand
the decorator pattern and begin using the library in situations that require its
flexibility, you can begin to benefit from this design, at which point its cost
in extra lines of code may not bother you as much.
If you do not find what you’re
looking for in this chapter (which has only been an introduction, and is not
meant to be comprehensive), you can find in-depth coverage in Java I/O,
by Elliotte Rusty Harold (O’Reilly,
1999).
Solutions to selected exercises
can be found in the electronic document The Thinking in Java Annotated
Solution Guide, available for a small fee from
www.BruceEckel.com.
[57]
Design Patterns, Erich Gamma et al., Addison-Wesley
1995.
[58]
XML is another way to solve the problem of moving data across different
computing platforms, and does not depend on having Java on all platforms.
However, Java tools exist that support XML.
[59]
Chapter 13 shows an even more convenient solution for this: a GUI program with a
scrolling text area.