Arrays can be searched in two ways: with the BinarySearch method, which works on sorted arrays and is extremely fast, and with the IndexOf (and LastIndexOf) methods, which work regardless of the order of the elements. All three methods search for an instance of an item and return its index, and they’re all reference methods. The IndexOf and LastIndexOf methods are similar to the methods by the same name of the String class. They return the index of the first (or last) instance of an object in the array, or the value −1 if the object isn’t found in the array. Both methods are overloaded, and the simplest form of the IndexOf method is the following, where arrayName is the name of the array to be searched and object is the item you’re searching for:
itemIndex = System.Array.IndexOf(arrayName, object)
The LastIndexOf method’s syntax is identical, but the LastIndexOf method starts searching from the end of the array. If the item you’re searching for is unique in the array, both methods will return the same index.
Another form of the IndexOf and LastIndexOf methods allows you to begin the search at a specific index:
itemIndex = System.Array.IndexOf(arrayName, object, startIndex)
This form of the method starts searching in the segment of the array from startIndex to the end of the array. Finally, you can specify a range of indices in which the search will take place by using the following form of the method:
itemIndex = System.Array.IndexOf(arrayName, object, startIndex, endIndex)
You can search large arrays more efficiently with the BinarySearch method if the array is sorted. The simplest form of the BinarySearch method is the following:
The BinarySearch method returns an integer value, which is the index of the object you’re searching for in the array. If the object argument is not found, the method returns a negative value, which is the negative of the index of the next larger item minus one. This transformation, the negative of a number minus one, is called the one’s complement, and other languages provide an operator for it: the tilde ( ∼ ). The one’s complement of 10 is −11, and the one’s complement of −3 is 2.
Why all this complexity? Zero is a valid index, so only a negative value could indicate a failure in the search operation. A value of −1 would indicate that the operation failed, but the BinarySearch method does something better. If it can’t locate the item, it returns the index of the item immediately after the desired item (the first item in the array that exceeds the item you’re searching for). This is a near match, and the BinarySearch method returns a negative value to indicate near matches. A near match is usually the same string with different character casing, or a slightly different spelling. It may also be a string that’s totally irrelevant to the one you’re searching for. Notice that there will always be a near match unless you’re searching for a value larger than the last value in the array. In this case, the BinarySearch method will return the one’s complement of the array’s upper bound (−100 for an array of 100 elements, if you consider that the index of the last element is 99).
Arrays Perform Case-Sensitive Searches
The BinarySearch, IndexOf, and LastIndexOf methods perform case-sensitive searches. However, because the BinarySearch method reports near matches, it appears as if it performs caseinsensitive searches. If the array contains the element Charles and you search for charles, the IndexOf method will not find the string and will report a no-match, whereas the BinarySearch method will find the element Charles and report it as a near match. My recommendation is to standardize the case of the data and the search argument when you plan to perform searches (such as uppercase for titles, camel case for names, and so on). To perform case-insensitive searches, you must implement your own custom comparer, a process that’s described later in this chapter. Also the Option Compare statement has no effect on the comparisons performed by either the BinarySearch or the IndexOf/LastIndexOf methods.
The ArraySearch Eample
The ArraySearch example (download here), shown in Figure 10.2, demonstrates how to handle exact and near matches reported by the BinarySearch method. The Populate Array button populates an array with 10,000 random strings. The same strings are also displayed in a sorted ListBox control, so you can view them. The elements have the same order in both the array and the ListBox, so we can use the index reported by the BinarySearch method to locate and select instantly the same item in the ListBox.
Each of the 10,000 random strings has a random length of 3 to 15 characters. When you run the application, message boxes will pop up, displaying the time it took for each operation: how long it took to populate the array, how long it took to sort it, and how long it took to populate the ListBox. You might want to experiment with large arrays (100,000 elements or more) to get an idea of how efficiently VB 2008 handles arrays.
The Search Array button prompts the user for a string via the InputBox() function and then locates the string in the array by calling the BinarySearch method in the array. The result is either an exact or a near match, and it’s displayed in a message box. At the same time, the item reported by the BinarySearch method is also selected in the ListBox control.
Run the application, populate the ListBox control, and then click the Search Array button. Enter an existing string (you can use lowercase or uppercase characters; it doesn’t make a difference) and verify that the application reports an exact match and locates the item in the ListBox. The program appears to perform case-insensitive searches because all the strings stored in the array are in uppercase, and the search argument is also converted to uppercase before the BinarySearch method is called. Then enter a string that doesn’t exist in the list (or the beginning of an existing string) and see how the BinarySearch handles near matches.
The code behind the Search Array button calls the BinarySearch method and stores the integer returned by the method to the wordIndex variable. Then it examines the value of this variable. If wordIndex is positive, there was an exact match, and it’s reported. If wordIndex is negative, the program calculates the one’s complement of this value, which is the index of the nearest match. The element at this index is reported as a near match. Finally, regardless of the type of the match, the code selects the same item in the ListBox and makes it visible. Listing 10.2 is the code behind the Search Array button.
Listing 10.2: Locating Exact and Near Matches with BinarySearch
Private Sub bttnSearch_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles bttnSearch.Click
Dim srchWord As String ' the word to search for
Dim wordIndex As Integer ' the index of the word in the list (if found)
srchWord = InputBox("Enter word to search for").ToUpper
' perform a binary search to locate the word, because the list is sorted
wordIndex = System.Array.BinarySearch(words, srchWord)
If wordIndex >= 0 Then ' Zero or positive value indicates an exact match
ListBox1.TopIndex = wordIndex
ListBox1.SelectedIndex = wordIndex
MsgBox("An exact match was found for the word [" _
& words(wordIndex) & "] at index " & _
wordIndex.ToString, , "EXACT MATCH")
' a negative value indicates a near match (the first item after the location
' of the item we search for, if it were in the list
ListBox1.TopIndex = -wordIndex - 1
ListBox1.SelectedIndex = -wordIndex - 1
MsgBox("The nearest match is the word [" & _
words(-wordIndex - 1) & "] at index " & _
(-wordIndex - 1).ToString, , "NEAR MATCH")
Notice that all methods for sorting and searching arrays work with the base data types only. If the array contains custom data types, you must supply your own functions for comparing elements of this type, a process described in detail in the section ‘‘Custom Sorting,’’ later in this chapter.
The Binary Search Algorithm
The BinarySearch method uses a powerful search algorithm, the binary search algorithm, but it requires that the array be sorted. You need not care about the technical details of the implementation of a method, but in the case of the binary search algorithm, a basic understanding of how it works will help you understand how it performs near matches.
To locate an item in a sorted array, the BinarySearch method compares the search string to the array’s middle element. If the search string is smaller, we know that the element is in the first half of the array and we can safely ignore the second half. The same process is repeated with the remaining half of the elements. The search string is compared with the middle element of the reduced array, and after the comparison, we can ignore one-half of the reduced array. At each step, the binary search algorithm rejects one-half of the items left until it reduces the list to a single item. This is the item we’re searching for. If not, the item is not in the list. To search a list of 1,024 items, the binary search algorithm makes 10 comparisons. At the first step, it rejects 512 elements, then 256, then 128, and so on, until it reaches a single element. For an array of 1,024 ×1,024 (that’s a little more than a million) items, the algorithm makes 20 comparisons to locate the desired item.
If you apply the BinarySearch method to an array that hasn’t been sorted, the method will carry out all the steps and report that the item wasn’t found, even if the item belongs to the array. The algorithm doesn’t check the order of the elements; it just assumes that they’re sorted. The binary search algorithm always halves the number of elements in which it attempts to locate the search argument. That’s why you should never apply the BinarySearch method to an array that hasn’t been sorted yet.
To see what happens when you apply the BinarySearch method to an array that hasn’t been sorted, remove the statement that calls the Sort method in the ArraySearch sample application. The application will keep reporting near matches, even if the string you’re searching is present in the array. Of course, the near match reported by the BinarySearch method in an unsorted array isn’t close to the element you’re searching for — it’s just an element that happens to be there when the algorithm finishes.