In this section, you’ll look at the code for persisting a HashTable to disk. The code I present in this section belongs to the WordFrequencies project, which was presented in Chapter, ‘‘Storing Data in Collections”. The WordFrequencies application calculates word frequencies, stores the results to a HashTable, and persists them to a file between sessions. Figure 12.2 shows the interface of the WordFrequencies sample project.
This process allows us to process one document at a time yet accumulate the results over many documents. Each unique word is a key to the HashTable, and the word’s count is the corresponding item’s value. The application’s main menu, the Frequency Table menu, contains four commands, which save the HashTable to, and read it from, a text file and a binary file. Table 12.1 shows the four commands of the menu.
By the way, the Save XML and Load XML commands use a Soap Formatter, but the files they produce (or consume) are XML files, so I’ve chosen not to use the SOAP term in the menu commands. The code behind the Save Binary command is shown in Listing 12.4. The code is quite simple: It creates an instance of the BinaryFormatter class (variable Formatter) and uses its Serialize method to persist the entire HashTable with a single statement.
Listing 12.4: Persisting the HashTable to a Binary File
Private Sub SaveBin(...) Handles SaveBinary.Click
Dim saveFile As FileStream
SaveFileDialog1.DefaultExt = "BIN"
If SaveFileDialog1.ShowDialog = DialogResult.OK Then
saveFile = File.OpenWrite(SaveFileDialog1.FileName)
saveFile.Seek(0, SeekOrigin.End)
Dim Formatter As BinaryFormatter = New BinaryFormatter()
Formatter.Serialize(saveFile, WordFrequencies)
saveFile.Close()
End If
End Sub
Code language: PHP (php)
Table 12.1 – The Four Commands of the Frequency Table Menu
Command | Effect |
---|---|
Save Binary | Saves the HashTable to a binary file with the default extension BIN |
Load Binary | Loads the HashTable with data from a binary file |
Save XML | Saves the HashTable to a text file with the default extension XML |
Load XML | Loads the HashTable with data from a text file |
The equivalent Load Binary command is just as simple. It sets up a BinaryFormatter object and calls its Deserialize method to read the data. The code of the Save XML command (Listing 12.5) sets up a SoapFormatter object and uses its Serialize method to persist the HashTable. The code that reads the data from the file and populates the HashTable is equally simple, and it’s shown in Listing 12.6.
Listing 12.5: Persisting the HashTable to a Text File
Private Sub SaveText(...) Handles SaveText.Click
Dim saveFile As FileStream
SaveFileDialog1.DefaultExt = "XML"
If SaveFileDialog1.ShowDialog = DialogResult.OK Then
saveFile = File.OpenWrite(SaveFileDialog1.FileName)
saveFile.Seek(0, SeekOrigin.End)
Dim Formatter As Soap.SoapFormatter = New Soap.SoapFormatter()
Formatter.Serialize(saveFile, WordFrequencies)
saveFile.Close()
End If
End Sub
Code language: PHP (php)
Listing 12.6: Loading a HashTable from a Text File
Private Sub LoadText(...) Handles LoadText.Click
Dim readFile As FileStream
OpenFileDialog1.DefaultExt = "XML"
If OpenFileDialog1.ShowDialog = DialogResult.OK Then
readFile = File.OpenRead(OpenFileDialog1.FileName)
Dim Formatter As Soap.SoapFormatter
Formatter = New Soap.SoapFormatter()
WordFrequencies = CType(Formatter.Deserialize(readFile), HashTable)
readFile.Close
End If
End Sub
Code language: PHP (php)
You can open the binary file with a text editor, and you will see the words but not the numeric values, which are stored in binary format. If you open the text file, you will see a SOAP file with the words and their counts. The words are in the first half of the file, and their counts are in the second half. Here are the first few lines of this file (I omitted the headers):
<item id="ref-5" xsi:type="SOAP-ENC:string">A</item>
<item id="ref-6" xsi:type="SOAP-ENC:string">ABADDIRS</item>
<item id="ref-7" xsi:type="SOAP-ENC:string">ABANDON</item>
<item id="ref-8" xsi:type="SOAP-ENC:string">ABANDONED</item>
<item id="ref-9" xsi:type="SOAP-ENC:string">ABANDONING</item>
Code language: HTML, XML (xml)
The corresponding counts are the following:
<item xsi:type"="xsd:int"">2064</item>
<item xsi:type=""xsd:int"">1</item>
<item xsi:type=""xsd:int"">5</item>
<item xsi:type=""xsd:int"">10</item>
<item xsi:type=""xsd:int"">2</item>
Code language: HTML, XML (xml)
Most of us shouldn’t really care how the Serialize method stores the data to the file (SOAP or binary), as long as the Deserialize method can read them back and load them into an object so that we don’t have to write code to parse this file.