The IO class is the basis for all input and output in Ruby. Objects from this class represent connections to various different input and output devices such as hard drives, keyboards, and screens. All Ruby programs have three standard I/O streams:
- the input stream, known as STDIN or $stdin, is set to capture data from the keyboard;
- the output stream, named STDOUT or $stdout, is set to output data to a terminal screen;
- the error stream, called STDERR or $stderr, also outputs to the terminal screen.
Whenever unadorned IO methods are called (for example puts, prints, and gets) they are routed to and from the standard output and input streams. In order to send output to the error stream STDERR must be explicitly specified, for example STDERR.puts “text”.
To change the routing of any of the standard streams, you can reassign the global variables associated to each one ($stdin, $stdout, $stderr). It is recommended that you leave the constants untouched (STDIN, STDOUT, STDERR), so that you can still access the default input and output devices.
Now let’s discuss how IO objects read data streams. IO objects use iterators to read and write data to IO streams. Iterations in are delineated by the global input record separator, $/. The default global input record separator is a new line character, “\n”, which is why Ruby usually processes data one line at a time. By changing the global input record separator you can change how Ruby iterates through input and output streams.
Before covering how to read and write data to IO objects, let’s take a look at the most common IO object in Ruby: The File object. As the name suggests, the File object is used to represent files within Ruby. This object provides functionality that enables files to be opened up, read from, written to, and closed.
The most common approaches to create a file object is to use the File.open or File.new methods. These method require one parameter along with several optional parameters. The File.open object has two advantages: it also supports an optional code block and it can be called without being preceded by the class name File.
The first parameter is the only one that is mandatory. It accepts a string object that holds the location of the file. The location can be specified using an absolute or relative path. The other parameters can be used to define several options, though the only option I will cover here is the file mode. The file mode determines how a file can be used. The most common file modes are:
- Read mode is identified by an “r”. This is mode only supports reading from a file. This is the default mode. If a file does not exist then the method will return an exception.
- Append mode is identified by “a”. This mode supports writing to a file by appending new data to any existing content. If a file does not exist then it will be created.
- Write mode is identified by “w”. This modes supports writing to a file by overwriting any existing content. If a file does not exist then it will be created.
File.new method and the File.open method called without a code block, function in the same way, they return a reference to a file. When using either of these approaches to create a file object it is important to remember to call the close method when you are done.
If the optional code block is created, it will be passed a reference to the file object as an argument. The file object will automatically be closed when the block terminates. In this instance, File.open returns the value of the block.
The open-uri library makes it easy to access remote files from networks using the HTTP and HTTPS protocols. After importing this library using the require keyword you can open remote files using the open method as though they were local files. Files are downloaded and returned as StringIO objects. These objects enable strings to behave like an IO stream, which means that they can be read using the standard IO stream input methods described below.
In Ruby, IO objects feature a suite of standard input and output methods. We’ll take a look at the input methods first. All of these methods can be used with different objects to read input from various sources such as the keyboard, files from a hard-drive, or from a local or remote server.
First let’s take a look at the methods that read input one character at a time. There are four such methods, getc, getbyte, readchar, and readbyte. These can be divided in two different ways based on how they work. First we can group these methods by how they deal with being called to read data after reaching the end of a file. getc, and getbyte return nil, while readchar, and readbyte return a fatal error.
The other, more significant difference relates to the data that is actually returned by these methods. The getc and readchar methods return characters, whereas the getbyte and readbyte methods return individual bytes. Since most characters are encoded in two bytes, these methods return two numbers, each one representing a different byte.
Here are links to two short scripts that let you see how these different methods read data from files and from the keyboard.
Now let’s move on to the methods that read data one line at a time. There are three such methods, gets, readline, and each. The first two function similar to their counterparts. They both read one line at a time, but gets returns nil when it reaches the end of a file; readline returns a fatal error.
The third method functions a bit different. If you recall, each is a standard iterator method. Therefore, it iterates through the entire file (as it would with any other collection), yielding each line to a code block where it can be processed. This approach is ideal if you plan to process all the lines from a file at once. On the other hand, it does not allow you to walk through a file with the same level of control provided by gets, and readline.
Here are links to two short scripts that let you see how these different methods read data from files and from the keyboard.
Lastly, let’s take a look at the two methods that read entire files: read and readlines. These methods are designed to read from files only and are not appropriate for getting input from a keyboard. Usually, these methods are only used to read small files. When reading large files it is best to process them iteratively, as this is a more efficient use of memory and processing power.
Before we cover how to output data, let’s briefly review the methods for navigating within a file. First off, the rewind method enables you to jump back to the beginning of a file. The pos accessor attribute provides getter and setter methods that enable you to check your current location, to move to a new absolute position within the file.
The seek() method can also be used to change your current position within a file. It enables you to move by specifying a position that is either relative to your current position, to the start of the file or to the end of the file. The first argument is an integer that specifies the distance to be moved, while the second parameter is a constant that specifies the point of origin for this movement.
Here are the constants that can be specified for the second argument: IO::SEEK_SET is the default setting and it sets pointer position from the beginning of file; IO::SEEK_CUR sets pointer position from the current location; IO::SEEK_END sets pointer position from end of file).
Here is a link to a short script that illustrates how to read and navigate through a file.
Now let’s take a look at the standard output methods: print, puts, and p. For any of these methods to work the output stream on which they are called must be opened for output. If these methods are called without specifying an object recipient then they will default to the standard output stream,$stdout.
The puts(obj, …) method accepts multiple objects as arguments and writes them to an IO stream. Any objects that are not strings are converted to string using their to_s method. A newline character is appended to each object before it is written to an IO stream (unless the original object already ended with a newline). If method is called with an array argument it writes each element on a newline. If called without any arguments it outputs a newline character. This method also always returns nil.
The print(obj, …) method accepts multiple objects as arguments and writes them to an IO stream. Any objects that are not strings are converted to string using their to_s method. If multiple arguments are provided they are appended to each other using the output field operator, saved in global variable: $,. By default this global variable is empty, which means that strings are appended back to back. If this method is called without any arguments then it returns the last line of input that was read in your program, saved in global variable: $_. This method always returns nil.
The printf(obj, …) method accepts a format string followed by multiple objects. It uses the format string to determine how to integrate the data from the objects into the output string that is sent to the IO stream. The number of objects passed into the method must be consistent with the number and type of objects identified in the format string, otherwise an error exception will be raised.
The IO capabilities used by this method are based on the print method covered above, which is why this method also returns nil. The formatting capabilities embedded in this method are based on Ruby’s format method, which accepts the same parameters and returns a formatted string. For more details on creating format strings check out the documentation here.
The p(obj, …) method accepts multiple objects as arguments and outputs the return value from each object’s inspect method, followed by a newline character. Unlike the other output methods that we just reviewed, this one returns the same string that it outputs to the IO stream.
When you are working with files many of the error exceptions you will encounter are system errors. In these cases, Ruby is just a messenger that is informing you about errors that happened at the operating system level. Several objects have been created to wrap these system errors to enable Ruby to provide intelligible error exception messages. These error exception objects are part of the Errno namespace, that is why all of these errors will be labelled Errno::ERRORNAME.