reading-notes

Hash Tables

Readings

Notes

A hash table is a data structure that allows for efficient insertion, deletion, and lookup of elements. In Python, the built-in dict type is implemented as a hash table. The keys in a Python dict are hashed, and the corresponding values are stored in an array. The process of hashing is used to map the keys to an index in the array, allowing for constant-time O(1) access of the values.

In python hash table is implemented as dictionary and it uses Hashmap concept internally.

Hash tables are useful for many tasks, such as caching, indexing, and counting. For example, you can use a hash table to keep track of the frequency of words in a text file, or to implement a simple cache that stores the results of expensive computations.

In python, the built-in hash() function is used to generate a hash value for an object. However, user-defined objects need to implement the __hash__() method, which returns an integer representing the object’s hash value.

It’s important to note that hash tables have the potential to cause collisions when the same hash value is generated for different keys. To handle such collisions, python uses open addressing method called Linear Probing.

Hash Values

A hash value is a unique, fixed-size integer or binary value that is generated by a hash function. It is associated with a key and is used to quickly locate the corresponding value in a hash table.

A good hash function should have the following properties:

It should be easy to compute the hash value for any given key.
It should generate a unique hash value for each unique key.
It should generate the same hash value for the same key every time it is called.
It should distribute the keys evenly across the entire range of possible hash values.

However, due to the nature of hash functions, it is possible for two different keys to have the same hash value, known as a collision. This is why, in practice, there is always some probability of collisions occurring.

To handle collisions python uses open addressing method called Linear Probing. In this method, if a collision occurs, the next cell in the array is checked and if it is empty, the key-value pair is stored there. If not, the next cell is checked, and so on, until an empty cell is found. This method is efficient and simple, but it can lead to clustering, which causes some cells to be occupied while others remain empty.

Linear Probing

Linear Probing is a collision resolution technique used in hash tables. It is a simple and efficient method for handling collisions, where multiple keys are mapped to the same index in the hash table.

When a new key-value pair is added to the hash table, the hash function is used to determine the index in the array where the value should be stored. If the cell at that index is already occupied, the algorithm looks at the next cell in the array and checks if it is empty. If it is, the key-value pair is stored there. If not, the algorithm looks at the next cell, and so on, until an empty cell is found. This process is repeated until a spot is found for the new key-value pair.

When searching for a value in the hash table, the algorithm starts at the index generated by the hash function and looks for the key. If the key is not found, it looks at the next cell, and so on, until the key is found or an empty cell is encountered.

Linear Probing has the advantage of being easy to implement, and it does not require any additional memory to store pointers or other data structures. However, it can lead to clustering, which occurs when multiple keys are inserted into the same cluster of cells, leaving other cells empty. This can cause the search process to be slower, as the algorithm must look at more cells to find the key. To avoid this, it’s good to use a hash function that distributes the keys evenly across the range of possible hash values.

Another drawback of linear probing is that it’s not a good fit for real-time applications, since the searching process takes a variable amount of time, depending on how many keys are in the same cluster.

In summary, Linear Probing is a simple and efficient collision resolution technique that is easy to implement, but it can lead to clustering, which can slow down the search process and it’s not suitable for real-time applications.

Code Example

# Creating an empty hash table (dictionary)
my_dict = {}

# Adding key-value pairs to the dictionary
my_dict['key1'] = 'value1'
my_dict['key2'] = 'value2'

# Creating a dictionary with initial key-value pairs
my_dict = dict([('key1', 'value1'), ('key2', 'value2')])

# Creating a dictionary with a dictionary comprehension
my_dict = {'key1': 'value1', 'key2': 'value2'}

# Accessing the values stored in a dictionary
print(my_dict['key1']) # Output: 'value1'

# Checking if a key is in the dictionary
print('key1' in my_dict) # Output: True

# Deleting a key-value pair from the dictionary
del my_dict['key1']

# Using the built-in hash() function to generate a hash value for a key
key = 'some_string'
hash_value = hash(key)

# Using the user-defined object as a key
class MyKey:
    def __init__(self, value):
        self.value = value
    def __hash__(self):
        return hash(self.value)

key = MyKey('some_string')
hash_value = hash(key)

Linear Probing Example

class HashTable:
    def __init__(self, size):
        self.size = size
        self.table = [None] * self.size

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self.hash_function(key)
        if self.table[index] is None:
            self.table[index] = (key, value)
        else:
            # Linear probing
            i = 1
            while True:
                new_index = (index + i) % self.size
                if self.table[new_index] is None:
                    self.table[new_index] = (key, value)
                    break
                i += 1

    def search(self, key):
        index = self.hash_function(key)
        if self.table[index] is None:
            return None
        else:
            if self.table[index][0] == key:
                return self.table[index][1]
            else:
                i = 1
                while True:
                    new_index = (index + i) % self.size
                    if self.table[new_index] is None:
                        return None
                    if self.table[new_index][0] == key:
                        return self.table[new_index][1]
                    i += 1

Chaining Example

class HashTable:
    def __init__(self, size):
        self.size = size
        self.table = [[] for _ in range(self.size)]

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self.hash_function(key)
        self.table[index].append((key, value))

    def search(self, key):
        index = self.hash_function(key)
        for k, v in self.table[index]:
            if k == key:
                return v
        return None