Archive for the ‘Ruby’ Category

Learning Ruby [IO system]

Sunday, October 9th, 2011

The IO class is the basis for all input and output in Ruby. Objects from this class represent connections to various different input and output devices such as hard drives, keyboards, and screens. All Ruby programs have three standard I/O streams:

  • the input stream, known as STDIN or $stdin, is set to capture data from the keyboard;
  • the output stream, named STDOUT or $stdout, is set to output data to a terminal screen;
  • the error stream, called STDERR or $stderr, also outputs to the terminal screen.

Whenever unadorned IO methods are called (for example puts, prints, and gets) they are routed to and from the standard output and input streams. In order to send output to the error stream STDERR must be explicitly specified, for example STDERR.puts “text”.

To change the routing of any of the standard streams, you can reassign the global variables associated to each one ($stdin, $stdout, $stderr). It is recommended that you leave the constants untouched (STDIN, STDOUT, STDERR), so that you can still access the default input and output devices.

Now let’s discuss how IO objects read data streams. IO objects use iterators to read and write data to IO streams. Iterations in are delineated by the global input record separator, $/. The default global input record separator is a new line character, “\n”, which is why Ruby usually processes data one line at a time. By changing the global input record separator you can change how Ruby iterates through input and output streams.

Before covering how to read and write data to IO objects, let’s take a look at the most common IO object in Ruby: The File object. As the name suggests, the File object is used to represent files within Ruby. This object provides functionality that enables files to be opened up, read from, written to, and closed.

The most common approaches to create a file object is to use the File.open or File.new methods. These method require one parameter along with several optional parameters. The File.open object has two advantages: it also supports an optional code block and it can be called without being preceded by the class name File.

The first parameter is the only one that is mandatory. It accepts a string object that holds the location of the file. The location can be specified using an absolute or relative path. The other parameters can be used to define several options, though the only option I will cover here is the file mode. The file mode determines how a file can be used. The most common file modes are:

  • Read mode is identified by an “r”. This is mode only supports reading from a file. This is the default mode. If a file does not exist then the method will return an exception.
  • Append mode is identified by “a”. This mode supports writing to a file by appending new data to any existing content. If a file does not exist then it will be created.
  • Write mode is identified by “w”. This modes supports writing to a file by overwriting any existing content. If a file does not exist then it will be created.

File.new method and the File.open method called without a code block, function in the same way, they return a reference to a file. When using either of these approaches to create a file object it is important to remember to call the close method when you are done.

If the optional code block is created, it will be passed a reference to the file object as an argument. The file object will automatically be closed when the block terminates. In this instance, File.open returns the value of the block.

The open-uri library makes it easy to access remote files from networks using the HTTP and HTTPS protocols. After importing this library using the require keyword you can open remote files using the open method as though they were local files. Files are downloaded and returned as StringIO objects. These objects enable strings to behave like an IO stream, which means that they can be read using the standard IO stream input methods described below.

In Ruby, IO objects feature a suite of standard input and output methods. We’ll take a look at the input methods first. All of these methods can be used with different objects to read input from various sources such as the keyboard, files from a hard-drive, or from a local or remote server.

First let’s take a look at the methods that read input one character at a time. There are four such methods, getc, getbyte, readchar, and readbyte. These can be divided in two different ways based on how they work. First we can group these methods by how they deal with being called to read data after reaching the end of a file. getc, and getbyte return nil, while readchar, and readbyte return a fatal error.

The other, more significant difference relates to the data that is actually returned by these methods. The getc and readchar methods return characters, whereas the getbyte and readbyte methods return individual bytes. Since most characters are encoded in two bytes, these methods return two numbers, each one representing a different byte.

Here are links to two short scripts that let you see how these different methods read data from files and from the keyboard.

Now let’s move on to the methods that read data one line at a time. There are three such methods, gets, readline, and each. The first two function similar to their counterparts. They both read one line at a time, but gets returns nil when it reaches the end of a file; readline returns a fatal error.

The third method functions a bit different. If you recall, each is a standard iterator method. Therefore, it iterates through the entire file (as it would with any other collection), yielding each line to a code block where it can be processed. This approach is ideal if you plan to process all the lines from a file at once. On the other hand, it does not allow you to walk through a file with the same level of control provided by gets, and readline.

Here are links to two short scripts that let you see how these different methods read data from files and from the keyboard.

Lastly, let’s take a look at the two methods that read entire files: read and readlines. These methods are designed to read from files only and are not appropriate for getting input from a keyboard. Usually, these methods are only used to read small files. When reading large files it is best to process them iteratively, as this is a more efficient use of memory and processing power.

Before we cover how to output data, let’s briefly review the methods for navigating within a file. First off, the rewind method enables you to jump back to the beginning of a file. The pos accessor attribute provides getter and setter methods that enable you to check your current location, to move to a new absolute position within the file.

The seek() method can also be used to change your current position within a file. It enables you to move by specifying a position that is either relative to your current position, to the start of the file or to the end of the file. The first argument is an integer that specifies the distance to be moved, while the second parameter is a constant that specifies the point of origin for this movement.

Here are the constants that can be specified for the second argument: IO::SEEK_SET is the default setting and it sets pointer position from the beginning of file; IO::SEEK_CUR sets pointer position from the current location; IO::SEEK_END sets pointer position from end of file).

Here is a link to a short script that illustrates how to read and navigate through a file.

Now let’s take a look at the standard output methods: print, puts, and p. For any of these methods to work the output stream on which they are called must be opened for output. If these methods are called without specifying an object recipient then they will default to the standard output stream,$stdout.

The puts(obj, …) method accepts multiple objects as arguments and writes them to an IO stream. Any objects that are not strings are converted to string using their to_s method. A newline character is appended to each object before it is written to an IO stream (unless the original object already ended with a newline). If method is called with an array argument it writes each element on a newline. If called without any arguments it outputs a newline character. This method also always returns nil.

The print(obj, …) method accepts multiple objects as arguments and writes them to an IO stream. Any objects that are not strings are converted to string using their to_s method. If multiple arguments are provided they are appended to each other using the output field operator, saved in global variable: $,. By default this global variable is empty, which means that strings are appended back to back. If this method is called without any arguments then it returns the last line of input that was read in your program, saved in global variable: $_. This method always returns nil.

The printf(obj, …) method accepts a format string followed by multiple objects. It uses the format string to determine how to integrate the data from the objects into the output string that is sent to the IO stream. The number of objects passed into the method must be consistent with the number and type of objects identified in the format string, otherwise an error exception will be raised.

The IO capabilities used by this method are based on the print method covered above, which is why this method also returns nil. The formatting capabilities embedded in this method are based on Ruby’s format method, which accepts the same parameters and returns a formatted string. For more details on creating format strings check out the documentation here.

The p(obj, …) method accepts multiple objects as arguments and outputs the return value from each object’s inspect method, followed by a newline character. Unlike the other output methods that we just reviewed, this one returns the same string that it outputs to the IO stream.

When you are working with files many of the error exceptions you will encounter are system errors. In these cases, Ruby is just a messenger that is informing you about errors that happened at the operating system level. Several objects have been created to wrap these system errors to enable Ruby to provide intelligible error exception messages. These error exception objects are part of the Errno namespace, that is why all of these errors will be labelled Errno::ERRORNAME.


Learning Ruby [enumerator, enumerable objects]

Wednesday, October 5th, 2011

I did not provide a good description for iterators earlier, so let me do that now before jumping into the concept of enumerators. Iterators are methods that have the ability to iterate through a collection such as an array or hash table, and that yield one or more values to a code block.

Enumerators are objects that are created for doing iterations. These objects contain the standard suite of iterator methods available for all classes that mix-in the Enumerable module. To create an enumerator the logic for the each method needs to be defined. If you recall, the each logic is used to determine how all other iterator methods work in any class with the Enumerable module mixed-in.

Since enumerators are objects they are able to hold information about the current state of their iteration. That means you can step through an iteration little by little. Iterator methods are atomic; once the method call is completed they are not able to save information related to their iteration.

Enumerators can also be used to protect information. By transferring data from a collection to an enumerator object you can iterate through the data without the risk of it being changed.

There are three ways to create enumerators. The most common approach is to use an iterator method from an existing object to define the each logic for the newly minted Enumerator object. Most iterators methods return an enumerator object if they are called without a code block. As you can see in the example below, the each method when called without a code block returns a numerator, which I have assigned to a variable.

You can also call the enum_for method on a collection to create an enumerator. This method takes up to two arguments, the first specifies the iterator method that will be used as the basis of the each logic for new enumerator; the second is an optional initialization argument that is only useful for a limited number of methods, such as inject.

In the example below, the select iterator from ingredients collection is used as the basis of the each logic on this new enumerator object. This makes the each method on the new object function like the select method from the ingredients collection.

You can also use the Enumerator.new method to create an enumerator object. In this case you need to define the each logic for this object inside a block of code. Here you need to define the elements that this object will iterate through and the way in which this iteration will happen. This includes identifying the value that should be yielded by the iterator methods from this object.

In the example code below I have created an array of strings, called ingredients. It contains the elements that my enumerator object will iterate through. Then I use the each method from this array to define the way in which the iteration will happen, which is based on each element’s index in the array. Lastly, I use the expression yielder << ingredient to identify the value that should be yielded from this object on each iteration.

This last example is rather silly. I am sharing it to illustrate the point that you can create a new enumerator objects without using an existing iterator as the basis for your new object. The each logic defined below iterates through three different values, yielding each one in turn.


Learning Ruby [the enumerable module]

Wednesday, October 5th, 2011

In order to mix-in the functionality from enumerable module into a Ruby class, the class needs to be a collection and the each method needs to be defined. The each method is responsible for holding the iteration logic for a class. This method defines what value(s) are yielded to the code block during each iteration. The logic encapsulated in this method serves as the basis for all other enumerable module methods, which are briefly described below.

First let’s look at the conditional methods that return boolean values. All of these methods, except include?, will evaluate the return values from the code block to determine whether to return a true or false. The names of these methods make their function pretty self explanatory.

Next let’s look at the methods that return filtered content from a collection. Once again the code blocks play a key role in how these methods work. All of these methods, except grep, use the return value from the code block to select which elements should be returned. The find method returns only first matching element from the original collection, while the find_all and select methods return all matching elements. The reject method, returns all non-matching element.

The grep method works a little bit differently. The filter logic is defined by a RegEx expression that is passed into the method as an argument. The code block is used to process the matching elements before they are added to the array that is returned. In the example below, the results are converted to uppercase.

The group_by and partition methods filter elements by arranging them into separate groups. The group_by method returns a hash table with the elements from the original collection sorted into different groups based on an expression defined in the code block. In the example below the words are grouped based on the length.

The partition array returns elements from the original collection sorted into two groups based on a true/false condition that is applied to each element. In the example below the expression checks whether each word has more than 5 letters.

There are several interesting variations of the each method that are implemented for all classes mixed in with the enumerable module. These are methods designed to provide access to all elements from a collection via the code block. They differ from the examples we covered because they do not provide a useful return value – they return the original collection, unchanged.

The each_with_index method iterates through collection and yields the current element and index to code block. The cycle(n) method iterates through a collection n number of “times”, yielding the current element from the collection to the code block.

The each_slice(size) method iterates through collection a few elements at a time. It yields to the code block an array that contain a non-overlapping slice of the elements from the collection. The each_cons(size) methods also iterates through a collection by looking at several elements at a time. However, the arrays that it yields to the code block are overlapping.

Lastly, let’s look at methods that return content generated during the iteration process. These methods return single objects, or arrays, that have been created through the iteration process.

The inject(init) method loops through a collection and yields to the code block the current element along with an accumulator. The accumulator can be initialized via an argument; otherwise, it is initialized with the first element from the collection. Once finished iterating the method returns the last statement from the code block, which is usually the accumulator. It is important to remember that the accumulator value that is passed from one iteration to the next is always the value from last expression in the code block.

The map method is one of the most powerful methods from the each family. It iterates through a collection and generates a new array that is populated with the return values from the code block. This method returns a new array of the same size as the original collection.

In order to be able to sort custom-created objects in a collection you can take one of three approaches. All approaches require the definition of the logic which governs how a custom-created object should be compared with other objects – from a greater than, equal to, and less than perspective.

The simplest approach is to define a <=> method, also known as a spaceship method. This method accepts an object as an argument. It needs to be defined so that it returns -1 if the self object i less than, 1 if the self object is greater than, and 0 if both objects are equal. Once this method is defined for a given class then Ruby will know how to sort a collection of objects from that class.

The next approach is to use a code block that tells the array how to sort the objects. This approach can be very useful when you have an array of objects that needs to be sorted in many different ways. The sort method passes two different objects to the code block; by defining how to compare two objects the collection is able to sort the entire collection.

The last, and most robust approach, is to define the spaceship method and mix-in the Comparable module into the new class. Beyond array sorting, the Comparable module provides your objects with ability to support comparison operations such as <, >, == and so on.


Learning Ruby [sets]

Monday, October 3rd, 2011

Sets are a special type of collection that contain elements with unique content. A set can contain any type of object. However, it makes sure that there is only one version of each object in a given set. Sets are created using the Set.new(item) method. This method accepts a collection as an argument. It also supports the definition of a code block to process the contents from the collection being passed as an argument.

A new element can be added to a set using the << operator, or the add(item) and add?(item) methods. These approaches are identical for the most part. Except that add? returns nil if the item already exists in the set. The merge(collection) method enables merging new collections into the set; using the add methods adds an entire collection as a single item.

As expected, the delete(item) method deletes objects from a set.

Using the operators +, -, |, & you can check the union (add), difference (subtract), intersection (and and or) between different set objects. You can also use the superset?(set) and subset?(set) methods to compare two sets to determine if they relate to one another in either of these ways.


Learning Ruby [hash tables]

Monday, October 3rd, 2011

Hashes are collections of objects that are primarily organized by sets of key/value pairs. The keys must be unique and they can be any type of object. Most often, strings or symbols are used as keys. Like an Array, Hashes feature all of the collection-related functionality provided by the Enumerable module

In Ruby, hashes are considered ordered because they maintain information about the order in which objects were added. Hashes are said to have a meta-index; each element in a hash features a key, a value, both of which can be any type of object, and an meta-index, which is always an integer.

There are three ways to create hash tables in Ruby. The literal approach is the most common. It uses curly brackets that contains key/value pairs linked by the => operator, and separated from one another by commas. Below is an example where the keys are symbols and the values are strings (hashes support any types of objects for both keys and values).


Learning Ruby [arrays]

Monday, October 3rd, 2011

Arrays in Ruby are similar to arrays in other programming languages. They are collections of objects that are ordered based on a numerical index. This class is mixed with the Enumerable module, which provides a lot of collection-related functionality.

There are several ways to initialize an array: the literal approach; the Array.new() method; the Array method, and the %w[] (or %W[]) syntax. The literal approach, which is illustrated below, is the most common. In this approach a bare list in square brackets is used to initialize an array. The square brackets can be also be empty, to create an empty array.


Learning Ruby [symbols]

Monday, October 3rd, 2011

Symbol is special type core of object that is available in Ruby. In many ways symbols are similar to strings but they also have a lot in common with integers – they are immutable and unique. Since I did not cover strings and integers in any details in my notes, below I’ve described what it means to be immutable and unique. This is rather confusing, I know. This is one of those Ruby quirks.

You can identify symbols by the syntax used to declare them:

:symbol
:âsymbol with spacesâ

Symbols are immutable. You can’t change a symbol once it has been brought into existence. Strings are very different in this regard; they can be appended, sliced, and changed in a variety of other ways. This is one of the feature that make symbols similar to integers. A symbol or an integer can be used by Ruby as a reference to generate another object of the same type; that said, the existing symbol does not get changed into the new symbol, in the same way that the contents of a string can change.

Symbols are also unique. No two symbols can be alike. For example you can only have one symbol called :symbol in your program. All references to :symbol are links to the same object. You can confirm this by checking the object_id of a symbol. This is another important distinction between symbols and string. If you think back to how strings work, you can have multiple different string objects with the exact same content.

There are no constructors for symbols because they are unique. You cannot create new symbols, you can only refer to a symbol. Ruby uses symbols to keep track of the names for all variables, methods and constants. All symbols are stored in symbol table, which contains thousands of entries. This makes it easier for Ruby to locate these resources when running your program. Symbols are most often used as arguments for methods and as hash keys.

To view the contents of this table you can use the following expression:

Symbol.all_symbols.sort


Learning Ruby [booleans: true and false]

Monday, October 3rd, 2011

I find the way that Ruby handles true and false to be a little confusing at first. Here I am going to do my best to explain how these states (and objects) work, based on the limited knowledge that I’ve acquired thus far.

In Ruby, true and false refer to states as well as objects (remember, everything in Ruby is an object). Every object and expression in Ruby can be evaluated to a boolean value. This is different than most other programming languages, where conditional expressions are the only expressions that evaluate to true or false.

Let’s dig a bit deeper into what this means. Every object in Ruby, such as an integer, an array, and a custom-created class or object resolves to either true or false. The good news is that all objects resolve to true, except for the objects false and nil (and their related classes). You can test this by running some if statements, like the ones below, on irb.

All expressions in Ruby have true or false values. It’s easy to understand how conditional expressions resolve to true or false; for example 10 > 5 is obviously true. The true or false value of other expressions is determined by the object that those expressions evaluate to. The expression 10 * 1 evaluates to 10, and since 10 is an integer it evaluates to true.

When you call a method its evaluation will depend on the value that it returns. Instances of classes are objects and they always evaluate to true. On the other hand, method definitions will always resolve to false and class definitions will evaluate based on the last value that they contain.


Learning Ruby [syntactic sugar]

Monday, October 3rd, 2011

While learning Ruby it was the first time that I remember hearing the term syntactic. It refers to features provided by a language, such as Ruby, that help make code look cleaner.

I’ll illustrate what this means by examining how the expression x = 1 + 2 works. This expression resolves to an integer object, in this case 1, calling the method +(), and providing 2 as an argument; here is the full expression x = 1.+(2). Ruby makes it possible to call the method int.+(num) using the syntax of int + num. Syntactic sugar is the term used for these kind of practices.

Since the use of the “+” operator ultimately resolves to calling a method, we can make it possible for any two objects to be able to leverage the syntax obj_1 + obj_2 just by redefining the +(obj) method.

There are numerous methods in Ruby that have sugared notation. These range from arithmetic operator methods (+, -, *, /, %), to get, set, and append data methods used by arrays ([], []=, and <<), to comparison operators (==, <, >, <=, >=), to case equality operators (===), unary operators (+@, -@, !), and bit-wise operators (|, &, ^).


Learning Ruby [error handling control structures]

Monday, October 3rd, 2011

Ruby has three important methods that are used for handling errors: raise, rescue, and ensure. Here is a brief overview of how each one of these works. As in other programming languages, error handling in Ruby is often referred to as exception handling.

There are several different exception types that are built into to Ruby. As you would expect, exceptions in Ruby are objects. The highest level exception is known as the StandardError exception; it could be thought of as the parent exception. All other exception objects are descendants of StandardError. The default exception that Ruby calls, when a more specific one is not available, is called RuntimeError.

Ruby will raise exceptions on its own for a whole host of reasons. For example, if you try to add a string to a number you will get a ArgumentError exception. However, you can also tell Ruby to raise an exception (or one that you created) by using the raise method. The expression below raises the expression ArgumentError and prints out a string if variable x is less than 10. If the raise keyword is used without an exception type, then it will raise the RuntimeError exception.

When an exception is raised, Ruby will terminate the current program unless the exception is rescued. Rescuing an exception enables you to customize how Ruby handles an error. This can be really helpful because you often don’t want your application to stop running just because a problem was encountered.

The rescue method functions similar to try and catch in Java (with rescue being the catch). When using the rescue method you can use a begin/end code block to define the code that should be rescued in case of an exception. Below is an example of how the syntax for rescue works, I’ve just updated the code sampels from above. Note that if the exception type is not defined then the rescue method will apply to any type of exception.

The ensure keyword enables you to make sure that something happens whether or not an exception is raised, and/or rescued. This can be very handy when you are working with a file, port or some other resource that you want to make sure is close before moving on. Here is how to incorporate ensure into the rescue example from above.

In Ruby it’s easy to create your own exceptions. It is also good practice. Having very specific exceptions can be very helpful when debugging your application. This is one of the reasons why Ruby has so many standard exceptions. To create your own exception all you need to do is extend the StandardException object.