COMPUTER SCIENCE

Data Structures

Sep 14, 2023

COMPUTER SCIENCE

Data Structures

Sep 14, 2023

COMPUTER SCIENCE

Data Structures

Sep 14, 2023

COMPUTER SCIENCE

Data Structures

Sep 14, 2023

In order to make data in a computer easier to access and utilize, data structures are a means to organize and store information. They serve as the fundamental units of many computer algorithms and as the framework for software creation. Understanding the numerous types of data structures that are accessible, their characteristics, and the various computational challenges they may be used to address are all part of the study of data structures. Data structures must be designed and implemented with a thorough understanding of algorithms, computational complexity, and the trade-offs between time and space efficiency. Computer scientists and software engineers must have a thorough understanding of data structures because they are crucial to the development of many software systems.

➡️ Arrays

Arrays are a fundamental data structure that is widely used in computer science and software engineering. They are a collection of elements, each identified by an index or a key. Arrays provide fast access to individual elements, and are useful for solving a wide range of computational problems.

Basic Properties of Arrays
  1. Indexing: Arrays are indexed, which means that elements can be accessed using a numerical index. The first element is usually indexed at position 0.

  2. Fixed Size: The size of an array is fixed and cannot be changed dynamically.

  3. Contiguity: All the elements in an array are stored in contiguous memory locations, allowing for fast access times.

  4. Homogeneous: Arrays are homogeneous, which means that all elements must be of the same data type.

  5. Sequential Access: Arrays provide sequential access to elements, meaning that elements can be easily traversed in order.

Arrays Operations
  1. Accessing elements: Retrieving the value stored at a specific index.

  2. Searching elements: Finding the index of an element with a specific value.

  3. Insertion: Adding a new element to the array.

  4. Deletion: Removing an element from the array.

  5. Sorting: Arranging the elements of the array in a specific order.

  6. Merging: Combining two arrays into one.

  7. Reversal: Reversing the order of elements in the array.

  8. Resizing: Increasing or decreasing the size of the array.

  9. Copying: Making a copy of the array.

What else to review?
  • Array traversal: Understanding how to traverse arrays, including sequential and random access, and the time and space complexity of these operations.

  • Array-based algorithms: Studying common algorithms that use arrays, such as sorting algorithms and search algorithms, and the time and space complexity of these algorithms.

  • Dynamic arrays: Understanding dynamic arrays, how they are implemented, and how they are used in real-world applications.

  • Multi-dimensional arrays: Studying multi-dimensional arrays, including two-dimensional arrays, and their applications in computer graphics and scientific computing.

  • Array memory management: Understanding the memory management aspects of arrays, such as how arrays are stored in memory, and how to avoid memory leaks and other memory-related issues.

These topics form the basic foundations of array data structures, and a good understanding of them will help you to design and implement efficient and effective algorithms that use arrays.

❓ Sample Problem:

Write a program which takes as input an array of digits encoding a nonnegative decimal integer D and updates the array to represent the integer D + 1. For example, if the input is (7,2,9) then you should update the array to (1,3,0). Your algorithm should work even if it is implemented in a language that has finite-precision arithmetic.

💡 Hint: Experiment with concrete examples.

🧩 Solution:

A brute-force approach might be to convert the array of digits to the equivalent integer, increment that, and then convert the resulting value back to an array of digits. For example, if the array is <1.,2,9>, we would derive the integer 129, add one to get 130, then extract its digits to form (1,3,0). When implemented in a language that imposes a limit on the range of values an integer type can take, this approach will fail on inputs that encode integers outside of that range.

We can avoid overflow issues by operating directly on the array of digits. Specifically, we mimic the grade-school algorithm for adding integers, which entails adding digits starting from the least significant digit, and propagate carries. If the result has an additional digit, e.g., 99 + 1. = 100, there is not enough storage in the array for the result-we need three digits to represent 100, but the input has only two digits. For the given example, we would update 9 to 0 with a carry-out of 1. We update 2 to 3 (because of the carry-in). There is no carry-out, so we stop-the result is (1,3,0).


🔗 Linked Lists

A linked list is a data structure consisting of a sequence of nodes, where each node contains a reference (i.e., a link) to an object and a reference to the next node in the sequence. The main advantage of linked lists over arrays is that the elements can be efficiently inserted or removed without reallocation or reorganization of the entire structure because a node can be added or removed in constant time. The elements are stored in nodes rather than in a contiguous block of memory like arrays, which can result in better use of memory. Linked lists can be used to implement various abstract data types, including stacks, queues, and associative arrays.

What else to review?
  • The structure of a linked list: including the concept of nodes and links, and how nodes are connected in a linked list.

  • The different types of linked lists: including singly linked lists, doubly linked lists, and circular linked lists.

  • The operations that can be performed on linked lists: including insertion, deletion, and traversal.

  • The advantages and disadvantages of linked lists: compared to arrays and other data structures, and how linked lists are used in various applications.

  • The implementation of linked lists in code: including how to create a linked list, how to insert and delete elements, and how to traverse a linked list.

  • The performance characteristics of linked lists: including time complexity, space complexity, and other factors that affect the efficiency of linked list operations.

  • Common variations of linked lists: such as skip lists and XOR linked lists.

It's also recommended to practice implementing linked lists in a programming language of your choice, and solving various problems that use linked lists.

❓ Sample Problem:

Write a program that takes two lists, assumed to be sorted, and returns their merge. The only field your program can change in a node is its next field. 

💡 Hint: Two sorted arrays can be merged using two indices. For lists, take care when one iterator reaches the end.

🧩 Solution:

A naive approach is to append the two lists together and sort the resulting list. The drawback of this approach is that it does not use the fact that the initial lists are sorted. The time complexity is that of sorting, which is O((n + m)log(n + rz)), where n and m are the lengths of each of the two input lists.

A better approach, in terms of time complexity, is to traverse the two lists, always choosing the node containing the smaller key to continue traversing from.


The worst-case, from a runtime perspective, corresponds to the case when the lists are of comparable length, so the time complexity is O(n + m). (In the best-case, one list is much shorter than the other and all its entries appear at the beginning of the merged list.) Since we reuse the existing nodes, the space complexity is O(1).

🗄️ Stacks and Queues

The last item added to the stack is the first thing removed in a stack, which is a linear data structure that adheres to the Last In First Out (LIFO) concept. It is frequently used to reverse the order of the elements in a program and to maintain track of the order of function calls. Stacks can be operated on using simple operations like push, pop, and peek. Push adds an element to the top of the stack, pop removes an element from the top of the stack (accessing the top element without removing it).

A queue is a linear data structure that follows the First In First Out (FIFO) principle, where the first item that was added to the queue is the first one to be removed. It is commonly used in applications such as scheduling tasks, handling incoming requests, or maintaining a list of items waiting to be processed. Queues have basic operations like enqueue (adding an element to the end of the queue), dequeue (removing an element from the front of the queue), and peek (accessing the front element without removing it).

What else to review?
  • The Last-In-First-Out (LIFO) property of Stacks

  • The First-In-First-Out (FIFO) property of Queues

  • The basic operations of Stacks, such as push, pop, and peek

  • The basic operations of Queues, such as enqueue, dequeue, and peek

  • Use cases of Stacks and Queues, such as backtracking, function calls and history management

  • Implementing Stacks and Queues using linked lists, including the advantages and disadvantages of linked list implementation

❓ Sample Problem: 

Queue insertion and deletion follows first-in, first-out semantics; stack insertion and deletion is last-in, first-out.

How would you implement a queue given a library implementing stacks?

💡 Hint: lt's impossible to solve this problem with a single stack.

🧩 Solution:

A straightforward implementation is to enqueue by pushing the element to be enqueued onto one stack. The element to be dequeued is then the element at the bottom of this stack, which can be achieved by first popping all its elements and pushing them to another stack, then popping the top of the second stack (which was the bottom-most element of the first stack), and finally popping the remaining elements back to the first stack.

The primary problem with this approach is that every dequeue takes two pushes and two pops of every element, i.e., dequeue has O(n) time complexity, where r is the number of stored elements. (Enqueue takes O(1) time.)

The intuition for improving the time complexity of dequeue is that after we move elements from the first stack to the second stack, any further dequeues are trivial, until the second stack is empty. This is true even if we need to enqueue, as long as we enqueue onto the first stack. When the second stack becomes empty, and we need to perform a dequeue, we simply repeat the process of transferring from the first stack to the second stack. ln essence, we are using the first stack for enqueue and the second for dequeue.



🌳 Trees

A set of nodes joined together by edges form a tree, which is a type of data structure. There is one unique node called the root node that is the parent of all other nodes, and each node might have zero or more offspring. Trees are frequently used in search and sorting algorithms as well as to depict hierarchical relationships, such as those found in computer file systems. Binary trees, balanced trees, and B-trees are a few of the several kinds of trees.

Understanding a tree's features, such as its height and depth, as well as its many varieties, such as balanced and binary trees, and its numerous traversal methods, is crucial while reviewing one. Additionally, it's critical to understand how fundamental tree operations like insertion, deletion, and searching impact the structure of the tree. You should also be familiar with the various tree-building and tree-manipulating techniques, such as heap construction and tree sorting.

Things you need to know about Trees
  1. Definition: A tree is a data structure that consists of nodes connected by edges. It has a root node, leaf nodes, and parent-child relationships between nodes.

  2. Hierarchical structure: Trees are used to represent hierarchical relationships between elements, where the root node represents the top level and the leaf nodes represent the lowest level.

  3. Tree traversal: Trees can be traversed in different ways, including in-order, pre-order, and post-order. The method of traversal affects the order in which nodes are visited.

  4. Binary trees: A binary tree is a type of tree where each node has at most two children. Binary trees are widely used for various purposes, such as searching, sorting, and representation of hierarchical relationships.

  5. Balancing trees: To ensure efficient operations, trees may be balanced. There are different algorithms for balancing trees, including AVL trees and Red-Black trees.

  6. Applications: Trees are widely used in various applications, such as representation of hierarchical relationships in file systems, searching and sorting, and in database indexing.

Types of Trees
  1. Binary Tree: a tree data structure in which each node can have up to two children

  2. Binary Search Tree: a special type of binary tree that enforces a particular ordering of its elements

  3. AVL Tree: a self-balancing binary search tree

  4. Trie: a tree-like data structure used to store a collection of strings

  5. Heap: a complete binary tree that satisfies the heap property, which specifies that the parent node is either greater than or equal to, or less than or equal to, its children

  6. B-Tree: a balanced tree data structure used in databases and file systems

  7. Red-Black Tree: a self-balancing binary search tree.

❓ Sample Problem: 

The exterior of a binary tree is the following sequence of nodes: the nodes from the root to the leftmost leaf, followed by the leaves in left-to-right order, followed by the nodes from the rightmost leaf to the root. (By leftmost (rightmost) leaf, we mean the leaf that appears first (last) in an in order traversal.) For example, the exterior of the binary tree in below is <A, B, C, D, E, H, M,N, P, O, I>.

Write a program that computes the exterior of a binary tree. 

💡 Hint: Handle the root's left child and right child in mirror fashion.

🧩 Solution: 

A brute-force approach is to use a case analysis. We need the nodes on the path from the root to the leftmost leaf, the nodes on the path from the root to the rightmost leaf, and the leaves in left-to-right order.

Compute the leaves in left-to-right order. The path from root to leftmost leaf is computed by going left if a left child exists, and otherwise going right. When we reach a leaf, it must be the leftmost leaf. A similar computation yields the nodes on the path from the root to the rightmost leaf. The time complexity is O(h + n + h) = O(n), where n and h are the number of nodes and the height of the tree, respectively. The implementation is a little tricky, because some nodes appear in multiple sequences. For example,Figure above, the path from the root to the leftmost leaf is (A,B,C,D), the leaves in left-to-right order are <D,E,H,M,N,P>, and the path from the root to the rightmost leaf is (A,I,O,P). Note the leftmost leaf , D, the rightmost leaf, P, and the root, A, appear in two sequences.

We can simplify the above approach by computing the nodes on the path from the root to the leftmost leaf and the leaves in the left subtree in one traversal. After that, we find the leaves in the right subtree followed by the nodes from the rightmost leaf to the root with another traversal. This is the program shown below. For the tree in the figure above, the first traversal returns <B,C,D,E,H), and the second traversal returns <M,N,P,O,I>. We append the first and then the second sequences to <A>.

tree p-626oj
📉 Graphs

Graphs are a data structure that represent a set of objects and their relationships. A graph consists of vertices (also known as nodes) and edges, which connect the vertices. The edges can be either directed or undirected, depending on whether the relationship between the two vertices is one-way or two-way. Graphs are commonly used to model and analyze complex systems, such as social networks, transportation systems, and electrical circuits. Different types of graphs include directed acyclic graphs (DAGs), trees, bipartite graphs, and weighted graphs, among others.

Types of Graphs

In computer science, there are several types of graphs, including:

  1. Undirected Graph: In an undirected graph, edges have no direction and can be traversed in both directions.

  2. Directed Graph: In a directed graph, edges have a specific direction and can only be traversed in the direction specified.

  3. Weighted Graph: In a weighted graph, each edge is assigned a weight or cost that represents the cost of traversing that edge.

  4. Cyclic Graph: In a cyclic graph, there is at least one cycle, which is a closed path that starts and ends at the same vertex.

  5. Acyclic Graph: In an acyclic graph, there are no cycles, meaning that it is not possible to traverse a path that starts and ends at the same vertex.

  6. Simple Graph: In a simple graph, there are no self-loops and no multiple edges between two vertices.

  7. Complete Graph: In a complete graph, there is an edge between every pair of vertices.

  8. Bipartite Graph: In a bipartite graph, vertices can be divided into two disjoint sets, and there are no edges between vertices in the same set.

What else to review?
  • Graph representations - adjacency matrix and adjacency list

  • Graph traversal algorithms - breadth-first search (BFS) and depth-first search (DFS)

  • Shortest path algorithms - Dijkstra's algorithm and Bellman-Ford algorithm

  • Maximum flow algorithms - Ford-Fulkerson algorithm and Edmonds-Karp algorithm

  • Minimum Spanning Tree (MST) algorithms - Prim's algorithm and Kruskal's algorithm

  • Applications of graphs - Network design, Graph theory in computer science, Image processing

❓ Sample Problem:

Consider a vertex type for a directed graph in which there are two fields: an integer label and a list of references to other vertices. Design an algorithm that takes a reference to a vertex u, and creates a copy of the graph on the vertices reachable from z. Return the copy of u.

💡 Hint: Maintain a map from vertices in the original graph to their counterparts in the clone.

🧩 Solution:

We traverse the graph starting from u. Each time we encounter a vertex or an edge that is not yet in the clone, we add it to the clone. We recognize new vertices by maintaining a hash table mapping vertices in the original graph to their counterparts in the new graph, Any standard graph traversal algorithm works-the code below uses breadth first search.

graph p-03dih
📅 Hash Tables

Hash tables are a data structure that stores key-value pairs, where each key maps to a specific value. They use a hash function to calculate an index into an array, where the desired value can be found or stored. Hash tables are commonly used for quick access, insertion and deletion operations, making them well suited for implementing dictionaries, sets, and many other data structures. They have an average time complexity of O(1) for these operations, making them efficient in many cases. However, they can also be subject to hash collisions, where multiple keys map to the same array index, which can affect the performance of the hash table.

What else to Review?

When reviewing hash tables, you should consider the following topics:

  • Hash function: How the keys are hashed into indices in the table and how collisions are handled.

  • Load factor: The relationship between the number of elements in the hash table and the size of the table.

  • Resizing: How the size of the hash table is adjusted as the number of elements changes.

  • Time complexity: The average and worst-case time complexity of hash table operations such as insertion, deletion, and search.

  • Space complexity: The amount of memory used by the hash table, including the memory for storing keys and values and any data structures used for collision handling.

  • Collision resolution: Techniques for handling collisions, including open addressing and chaining, and their trade-offs.

  • Applications: Examples of real-world scenarios where hash tables are used, such as caching, symbol tables, and indexing.

❓ Sample Problem: 

Write a program which takes as input a set of integers represented by an array, and returns the size of a largest subset of integers in the array having the property that if two integers are in the subset, then so are all integers between them. For example, if the input is (3,-2,7,9,8,7,2,0, -1,5,8), the largest such subset is {-2, -1,0,1,2,3}, so you should return 6.

💡 Hint: Do you really need a total ordering on the input? 

🧩 Solution:

The brute-force algorithm is to sort the array and then iterate through it, recording for each entry the largest subset with the desired property ending at that entry.

On closer inspection we see that sorting is not essential to the functioning of the algorithm. We do not need the total ordering-all we care are about is whether the integers adjacent to a given value are present. This suggests using a hash table to store the entries. Now we iterate over the entries in the array. If an entry e is present in the hash table, we compute the largest interval including e such that all values in the interval are contained in the hash table. We do this by iteratively searching entries the hash table the form e+1,e+2,...,and e-1,e-2,.... When we are done, to avoid doing duplicated computation we remove all the entries in the computed interval from the hash table, since all these entries are in the same largest contained interval.

As a concrete example, consider A = <10,5,3,11,6,100,4>. We initialize the hash table to {6,10,3,11,5,100,41}. The first entry in A is 10, and we find the largest interval contained in A including 10 by expanding from 10 in each direction by doing lookups in the hash table. The largest set is {10,11} and is of size 2. This computation updates the hash table to {6,3,5,100,4}. The next entry in A is 5. Since it is contained in the hash table, we know that the largest interval contained in A including 5 has not been computed yet. Expanding from 5, we see that 3,4,6 are all in the hash table, and 2 and 7 are not in the hash table, so the largest set containing 5 is {3,4,5,6}, which is of size 4. We update the hash table to {100}. The three entries after 5, namely 3,11, 6 are not present in the hash table, so we know we have already computed the longest intervals inA containing each of these. Then we get to 100, which cannot be extended, so the largest set containing it is {100}, which is of size 1. We update the hash table to {}. Since 4 is not in the hash table, we can skip it. The largest of the three sets is {3,4,5,61, i.e., the size of the largest contained interval is 4.

hash table p-ybpq0

The time complexity of this approach is O(n), where n is the array length, since we add and remove array elements in the hash table no more than once.

Reference: "Elements of programming interviews in Python"  by A, Aziz, T.H, Lee, and A. Prakash

Video: Data Structures: Crash Course Computer Science #14

In order to make data in a computer easier to access and utilize, data structures are a means to organize and store information. They serve as the fundamental units of many computer algorithms and as the framework for software creation. Understanding the numerous types of data structures that are accessible, their characteristics, and the various computational challenges they may be used to address are all part of the study of data structures. Data structures must be designed and implemented with a thorough understanding of algorithms, computational complexity, and the trade-offs between time and space efficiency. Computer scientists and software engineers must have a thorough understanding of data structures because they are crucial to the development of many software systems.

➡️ Arrays

Arrays are a fundamental data structure that is widely used in computer science and software engineering. They are a collection of elements, each identified by an index or a key. Arrays provide fast access to individual elements, and are useful for solving a wide range of computational problems.

Basic Properties of Arrays
  1. Indexing: Arrays are indexed, which means that elements can be accessed using a numerical index. The first element is usually indexed at position 0.

  2. Fixed Size: The size of an array is fixed and cannot be changed dynamically.

  3. Contiguity: All the elements in an array are stored in contiguous memory locations, allowing for fast access times.

  4. Homogeneous: Arrays are homogeneous, which means that all elements must be of the same data type.

  5. Sequential Access: Arrays provide sequential access to elements, meaning that elements can be easily traversed in order.

Arrays Operations
  1. Accessing elements: Retrieving the value stored at a specific index.

  2. Searching elements: Finding the index of an element with a specific value.

  3. Insertion: Adding a new element to the array.

  4. Deletion: Removing an element from the array.

  5. Sorting: Arranging the elements of the array in a specific order.

  6. Merging: Combining two arrays into one.

  7. Reversal: Reversing the order of elements in the array.

  8. Resizing: Increasing or decreasing the size of the array.

  9. Copying: Making a copy of the array.

What else to review?
  • Array traversal: Understanding how to traverse arrays, including sequential and random access, and the time and space complexity of these operations.

  • Array-based algorithms: Studying common algorithms that use arrays, such as sorting algorithms and search algorithms, and the time and space complexity of these algorithms.

  • Dynamic arrays: Understanding dynamic arrays, how they are implemented, and how they are used in real-world applications.

  • Multi-dimensional arrays: Studying multi-dimensional arrays, including two-dimensional arrays, and their applications in computer graphics and scientific computing.

  • Array memory management: Understanding the memory management aspects of arrays, such as how arrays are stored in memory, and how to avoid memory leaks and other memory-related issues.

These topics form the basic foundations of array data structures, and a good understanding of them will help you to design and implement efficient and effective algorithms that use arrays.

❓ Sample Problem:

Write a program which takes as input an array of digits encoding a nonnegative decimal integer D and updates the array to represent the integer D + 1. For example, if the input is (7,2,9) then you should update the array to (1,3,0). Your algorithm should work even if it is implemented in a language that has finite-precision arithmetic.

💡 Hint: Experiment with concrete examples.

🧩 Solution:

A brute-force approach might be to convert the array of digits to the equivalent integer, increment that, and then convert the resulting value back to an array of digits. For example, if the array is <1.,2,9>, we would derive the integer 129, add one to get 130, then extract its digits to form (1,3,0). When implemented in a language that imposes a limit on the range of values an integer type can take, this approach will fail on inputs that encode integers outside of that range.

We can avoid overflow issues by operating directly on the array of digits. Specifically, we mimic the grade-school algorithm for adding integers, which entails adding digits starting from the least significant digit, and propagate carries. If the result has an additional digit, e.g., 99 + 1. = 100, there is not enough storage in the array for the result-we need three digits to represent 100, but the input has only two digits. For the given example, we would update 9 to 0 with a carry-out of 1. We update 2 to 3 (because of the carry-in). There is no carry-out, so we stop-the result is (1,3,0).


🔗 Linked Lists

A linked list is a data structure consisting of a sequence of nodes, where each node contains a reference (i.e., a link) to an object and a reference to the next node in the sequence. The main advantage of linked lists over arrays is that the elements can be efficiently inserted or removed without reallocation or reorganization of the entire structure because a node can be added or removed in constant time. The elements are stored in nodes rather than in a contiguous block of memory like arrays, which can result in better use of memory. Linked lists can be used to implement various abstract data types, including stacks, queues, and associative arrays.

What else to review?
  • The structure of a linked list: including the concept of nodes and links, and how nodes are connected in a linked list.

  • The different types of linked lists: including singly linked lists, doubly linked lists, and circular linked lists.

  • The operations that can be performed on linked lists: including insertion, deletion, and traversal.

  • The advantages and disadvantages of linked lists: compared to arrays and other data structures, and how linked lists are used in various applications.

  • The implementation of linked lists in code: including how to create a linked list, how to insert and delete elements, and how to traverse a linked list.

  • The performance characteristics of linked lists: including time complexity, space complexity, and other factors that affect the efficiency of linked list operations.

  • Common variations of linked lists: such as skip lists and XOR linked lists.

It's also recommended to practice implementing linked lists in a programming language of your choice, and solving various problems that use linked lists.

❓ Sample Problem:

Write a program that takes two lists, assumed to be sorted, and returns their merge. The only field your program can change in a node is its next field. 

💡 Hint: Two sorted arrays can be merged using two indices. For lists, take care when one iterator reaches the end.

🧩 Solution:

A naive approach is to append the two lists together and sort the resulting list. The drawback of this approach is that it does not use the fact that the initial lists are sorted. The time complexity is that of sorting, which is O((n + m)log(n + rz)), where n and m are the lengths of each of the two input lists.

A better approach, in terms of time complexity, is to traverse the two lists, always choosing the node containing the smaller key to continue traversing from.


The worst-case, from a runtime perspective, corresponds to the case when the lists are of comparable length, so the time complexity is O(n + m). (In the best-case, one list is much shorter than the other and all its entries appear at the beginning of the merged list.) Since we reuse the existing nodes, the space complexity is O(1).

🗄️ Stacks and Queues

The last item added to the stack is the first thing removed in a stack, which is a linear data structure that adheres to the Last In First Out (LIFO) concept. It is frequently used to reverse the order of the elements in a program and to maintain track of the order of function calls. Stacks can be operated on using simple operations like push, pop, and peek. Push adds an element to the top of the stack, pop removes an element from the top of the stack (accessing the top element without removing it).

A queue is a linear data structure that follows the First In First Out (FIFO) principle, where the first item that was added to the queue is the first one to be removed. It is commonly used in applications such as scheduling tasks, handling incoming requests, or maintaining a list of items waiting to be processed. Queues have basic operations like enqueue (adding an element to the end of the queue), dequeue (removing an element from the front of the queue), and peek (accessing the front element without removing it).

What else to review?
  • The Last-In-First-Out (LIFO) property of Stacks

  • The First-In-First-Out (FIFO) property of Queues

  • The basic operations of Stacks, such as push, pop, and peek

  • The basic operations of Queues, such as enqueue, dequeue, and peek

  • Use cases of Stacks and Queues, such as backtracking, function calls and history management

  • Implementing Stacks and Queues using linked lists, including the advantages and disadvantages of linked list implementation

❓ Sample Problem: 

Queue insertion and deletion follows first-in, first-out semantics; stack insertion and deletion is last-in, first-out.

How would you implement a queue given a library implementing stacks?

💡 Hint: lt's impossible to solve this problem with a single stack.

🧩 Solution:

A straightforward implementation is to enqueue by pushing the element to be enqueued onto one stack. The element to be dequeued is then the element at the bottom of this stack, which can be achieved by first popping all its elements and pushing them to another stack, then popping the top of the second stack (which was the bottom-most element of the first stack), and finally popping the remaining elements back to the first stack.

The primary problem with this approach is that every dequeue takes two pushes and two pops of every element, i.e., dequeue has O(n) time complexity, where r is the number of stored elements. (Enqueue takes O(1) time.)

The intuition for improving the time complexity of dequeue is that after we move elements from the first stack to the second stack, any further dequeues are trivial, until the second stack is empty. This is true even if we need to enqueue, as long as we enqueue onto the first stack. When the second stack becomes empty, and we need to perform a dequeue, we simply repeat the process of transferring from the first stack to the second stack. ln essence, we are using the first stack for enqueue and the second for dequeue.



🌳 Trees

A set of nodes joined together by edges form a tree, which is a type of data structure. There is one unique node called the root node that is the parent of all other nodes, and each node might have zero or more offspring. Trees are frequently used in search and sorting algorithms as well as to depict hierarchical relationships, such as those found in computer file systems. Binary trees, balanced trees, and B-trees are a few of the several kinds of trees.

Understanding a tree's features, such as its height and depth, as well as its many varieties, such as balanced and binary trees, and its numerous traversal methods, is crucial while reviewing one. Additionally, it's critical to understand how fundamental tree operations like insertion, deletion, and searching impact the structure of the tree. You should also be familiar with the various tree-building and tree-manipulating techniques, such as heap construction and tree sorting.

Things you need to know about Trees
  1. Definition: A tree is a data structure that consists of nodes connected by edges. It has a root node, leaf nodes, and parent-child relationships between nodes.

  2. Hierarchical structure: Trees are used to represent hierarchical relationships between elements, where the root node represents the top level and the leaf nodes represent the lowest level.

  3. Tree traversal: Trees can be traversed in different ways, including in-order, pre-order, and post-order. The method of traversal affects the order in which nodes are visited.

  4. Binary trees: A binary tree is a type of tree where each node has at most two children. Binary trees are widely used for various purposes, such as searching, sorting, and representation of hierarchical relationships.

  5. Balancing trees: To ensure efficient operations, trees may be balanced. There are different algorithms for balancing trees, including AVL trees and Red-Black trees.

  6. Applications: Trees are widely used in various applications, such as representation of hierarchical relationships in file systems, searching and sorting, and in database indexing.

Types of Trees
  1. Binary Tree: a tree data structure in which each node can have up to two children

  2. Binary Search Tree: a special type of binary tree that enforces a particular ordering of its elements

  3. AVL Tree: a self-balancing binary search tree

  4. Trie: a tree-like data structure used to store a collection of strings

  5. Heap: a complete binary tree that satisfies the heap property, which specifies that the parent node is either greater than or equal to, or less than or equal to, its children

  6. B-Tree: a balanced tree data structure used in databases and file systems

  7. Red-Black Tree: a self-balancing binary search tree.

❓ Sample Problem: 

The exterior of a binary tree is the following sequence of nodes: the nodes from the root to the leftmost leaf, followed by the leaves in left-to-right order, followed by the nodes from the rightmost leaf to the root. (By leftmost (rightmost) leaf, we mean the leaf that appears first (last) in an in order traversal.) For example, the exterior of the binary tree in below is <A, B, C, D, E, H, M,N, P, O, I>.

Write a program that computes the exterior of a binary tree. 

💡 Hint: Handle the root's left child and right child in mirror fashion.

🧩 Solution: 

A brute-force approach is to use a case analysis. We need the nodes on the path from the root to the leftmost leaf, the nodes on the path from the root to the rightmost leaf, and the leaves in left-to-right order.

Compute the leaves in left-to-right order. The path from root to leftmost leaf is computed by going left if a left child exists, and otherwise going right. When we reach a leaf, it must be the leftmost leaf. A similar computation yields the nodes on the path from the root to the rightmost leaf. The time complexity is O(h + n + h) = O(n), where n and h are the number of nodes and the height of the tree, respectively. The implementation is a little tricky, because some nodes appear in multiple sequences. For example,Figure above, the path from the root to the leftmost leaf is (A,B,C,D), the leaves in left-to-right order are <D,E,H,M,N,P>, and the path from the root to the rightmost leaf is (A,I,O,P). Note the leftmost leaf , D, the rightmost leaf, P, and the root, A, appear in two sequences.

We can simplify the above approach by computing the nodes on the path from the root to the leftmost leaf and the leaves in the left subtree in one traversal. After that, we find the leaves in the right subtree followed by the nodes from the rightmost leaf to the root with another traversal. This is the program shown below. For the tree in the figure above, the first traversal returns <B,C,D,E,H), and the second traversal returns <M,N,P,O,I>. We append the first and then the second sequences to <A>.

tree p-626oj
📉 Graphs

Graphs are a data structure that represent a set of objects and their relationships. A graph consists of vertices (also known as nodes) and edges, which connect the vertices. The edges can be either directed or undirected, depending on whether the relationship between the two vertices is one-way or two-way. Graphs are commonly used to model and analyze complex systems, such as social networks, transportation systems, and electrical circuits. Different types of graphs include directed acyclic graphs (DAGs), trees, bipartite graphs, and weighted graphs, among others.

Types of Graphs

In computer science, there are several types of graphs, including:

  1. Undirected Graph: In an undirected graph, edges have no direction and can be traversed in both directions.

  2. Directed Graph: In a directed graph, edges have a specific direction and can only be traversed in the direction specified.

  3. Weighted Graph: In a weighted graph, each edge is assigned a weight or cost that represents the cost of traversing that edge.

  4. Cyclic Graph: In a cyclic graph, there is at least one cycle, which is a closed path that starts and ends at the same vertex.

  5. Acyclic Graph: In an acyclic graph, there are no cycles, meaning that it is not possible to traverse a path that starts and ends at the same vertex.

  6. Simple Graph: In a simple graph, there are no self-loops and no multiple edges between two vertices.

  7. Complete Graph: In a complete graph, there is an edge between every pair of vertices.

  8. Bipartite Graph: In a bipartite graph, vertices can be divided into two disjoint sets, and there are no edges between vertices in the same set.

What else to review?
  • Graph representations - adjacency matrix and adjacency list

  • Graph traversal algorithms - breadth-first search (BFS) and depth-first search (DFS)

  • Shortest path algorithms - Dijkstra's algorithm and Bellman-Ford algorithm

  • Maximum flow algorithms - Ford-Fulkerson algorithm and Edmonds-Karp algorithm

  • Minimum Spanning Tree (MST) algorithms - Prim's algorithm and Kruskal's algorithm

  • Applications of graphs - Network design, Graph theory in computer science, Image processing

❓ Sample Problem:

Consider a vertex type for a directed graph in which there are two fields: an integer label and a list of references to other vertices. Design an algorithm that takes a reference to a vertex u, and creates a copy of the graph on the vertices reachable from z. Return the copy of u.

💡 Hint: Maintain a map from vertices in the original graph to their counterparts in the clone.

🧩 Solution:

We traverse the graph starting from u. Each time we encounter a vertex or an edge that is not yet in the clone, we add it to the clone. We recognize new vertices by maintaining a hash table mapping vertices in the original graph to their counterparts in the new graph, Any standard graph traversal algorithm works-the code below uses breadth first search.

graph p-03dih
📅 Hash Tables

Hash tables are a data structure that stores key-value pairs, where each key maps to a specific value. They use a hash function to calculate an index into an array, where the desired value can be found or stored. Hash tables are commonly used for quick access, insertion and deletion operations, making them well suited for implementing dictionaries, sets, and many other data structures. They have an average time complexity of O(1) for these operations, making them efficient in many cases. However, they can also be subject to hash collisions, where multiple keys map to the same array index, which can affect the performance of the hash table.

What else to Review?

When reviewing hash tables, you should consider the following topics:

  • Hash function: How the keys are hashed into indices in the table and how collisions are handled.

  • Load factor: The relationship between the number of elements in the hash table and the size of the table.

  • Resizing: How the size of the hash table is adjusted as the number of elements changes.

  • Time complexity: The average and worst-case time complexity of hash table operations such as insertion, deletion, and search.

  • Space complexity: The amount of memory used by the hash table, including the memory for storing keys and values and any data structures used for collision handling.

  • Collision resolution: Techniques for handling collisions, including open addressing and chaining, and their trade-offs.

  • Applications: Examples of real-world scenarios where hash tables are used, such as caching, symbol tables, and indexing.

❓ Sample Problem: 

Write a program which takes as input a set of integers represented by an array, and returns the size of a largest subset of integers in the array having the property that if two integers are in the subset, then so are all integers between them. For example, if the input is (3,-2,7,9,8,7,2,0, -1,5,8), the largest such subset is {-2, -1,0,1,2,3}, so you should return 6.

💡 Hint: Do you really need a total ordering on the input? 

🧩 Solution:

The brute-force algorithm is to sort the array and then iterate through it, recording for each entry the largest subset with the desired property ending at that entry.

On closer inspection we see that sorting is not essential to the functioning of the algorithm. We do not need the total ordering-all we care are about is whether the integers adjacent to a given value are present. This suggests using a hash table to store the entries. Now we iterate over the entries in the array. If an entry e is present in the hash table, we compute the largest interval including e such that all values in the interval are contained in the hash table. We do this by iteratively searching entries the hash table the form e+1,e+2,...,and e-1,e-2,.... When we are done, to avoid doing duplicated computation we remove all the entries in the computed interval from the hash table, since all these entries are in the same largest contained interval.

As a concrete example, consider A = <10,5,3,11,6,100,4>. We initialize the hash table to {6,10,3,11,5,100,41}. The first entry in A is 10, and we find the largest interval contained in A including 10 by expanding from 10 in each direction by doing lookups in the hash table. The largest set is {10,11} and is of size 2. This computation updates the hash table to {6,3,5,100,4}. The next entry in A is 5. Since it is contained in the hash table, we know that the largest interval contained in A including 5 has not been computed yet. Expanding from 5, we see that 3,4,6 are all in the hash table, and 2 and 7 are not in the hash table, so the largest set containing 5 is {3,4,5,6}, which is of size 4. We update the hash table to {100}. The three entries after 5, namely 3,11, 6 are not present in the hash table, so we know we have already computed the longest intervals inA containing each of these. Then we get to 100, which cannot be extended, so the largest set containing it is {100}, which is of size 1. We update the hash table to {}. Since 4 is not in the hash table, we can skip it. The largest of the three sets is {3,4,5,61, i.e., the size of the largest contained interval is 4.

hash table p-ybpq0

The time complexity of this approach is O(n), where n is the array length, since we add and remove array elements in the hash table no more than once.

Reference: "Elements of programming interviews in Python"  by A, Aziz, T.H, Lee, and A. Prakash

Video: Data Structures: Crash Course Computer Science #14

In order to make data in a computer easier to access and utilize, data structures are a means to organize and store information. They serve as the fundamental units of many computer algorithms and as the framework for software creation. Understanding the numerous types of data structures that are accessible, their characteristics, and the various computational challenges they may be used to address are all part of the study of data structures. Data structures must be designed and implemented with a thorough understanding of algorithms, computational complexity, and the trade-offs between time and space efficiency. Computer scientists and software engineers must have a thorough understanding of data structures because they are crucial to the development of many software systems.

➡️ Arrays

Arrays are a fundamental data structure that is widely used in computer science and software engineering. They are a collection of elements, each identified by an index or a key. Arrays provide fast access to individual elements, and are useful for solving a wide range of computational problems.

Basic Properties of Arrays
  1. Indexing: Arrays are indexed, which means that elements can be accessed using a numerical index. The first element is usually indexed at position 0.

  2. Fixed Size: The size of an array is fixed and cannot be changed dynamically.

  3. Contiguity: All the elements in an array are stored in contiguous memory locations, allowing for fast access times.

  4. Homogeneous: Arrays are homogeneous, which means that all elements must be of the same data type.

  5. Sequential Access: Arrays provide sequential access to elements, meaning that elements can be easily traversed in order.

Arrays Operations
  1. Accessing elements: Retrieving the value stored at a specific index.

  2. Searching elements: Finding the index of an element with a specific value.

  3. Insertion: Adding a new element to the array.

  4. Deletion: Removing an element from the array.

  5. Sorting: Arranging the elements of the array in a specific order.

  6. Merging: Combining two arrays into one.

  7. Reversal: Reversing the order of elements in the array.

  8. Resizing: Increasing or decreasing the size of the array.

  9. Copying: Making a copy of the array.

What else to review?
  • Array traversal: Understanding how to traverse arrays, including sequential and random access, and the time and space complexity of these operations.

  • Array-based algorithms: Studying common algorithms that use arrays, such as sorting algorithms and search algorithms, and the time and space complexity of these algorithms.

  • Dynamic arrays: Understanding dynamic arrays, how they are implemented, and how they are used in real-world applications.

  • Multi-dimensional arrays: Studying multi-dimensional arrays, including two-dimensional arrays, and their applications in computer graphics and scientific computing.

  • Array memory management: Understanding the memory management aspects of arrays, such as how arrays are stored in memory, and how to avoid memory leaks and other memory-related issues.

These topics form the basic foundations of array data structures, and a good understanding of them will help you to design and implement efficient and effective algorithms that use arrays.

❓ Sample Problem:

Write a program which takes as input an array of digits encoding a nonnegative decimal integer D and updates the array to represent the integer D + 1. For example, if the input is (7,2,9) then you should update the array to (1,3,0). Your algorithm should work even if it is implemented in a language that has finite-precision arithmetic.

💡 Hint: Experiment with concrete examples.

🧩 Solution:

A brute-force approach might be to convert the array of digits to the equivalent integer, increment that, and then convert the resulting value back to an array of digits. For example, if the array is <1.,2,9>, we would derive the integer 129, add one to get 130, then extract its digits to form (1,3,0). When implemented in a language that imposes a limit on the range of values an integer type can take, this approach will fail on inputs that encode integers outside of that range.

We can avoid overflow issues by operating directly on the array of digits. Specifically, we mimic the grade-school algorithm for adding integers, which entails adding digits starting from the least significant digit, and propagate carries. If the result has an additional digit, e.g., 99 + 1. = 100, there is not enough storage in the array for the result-we need three digits to represent 100, but the input has only two digits. For the given example, we would update 9 to 0 with a carry-out of 1. We update 2 to 3 (because of the carry-in). There is no carry-out, so we stop-the result is (1,3,0).


🔗 Linked Lists

A linked list is a data structure consisting of a sequence of nodes, where each node contains a reference (i.e., a link) to an object and a reference to the next node in the sequence. The main advantage of linked lists over arrays is that the elements can be efficiently inserted or removed without reallocation or reorganization of the entire structure because a node can be added or removed in constant time. The elements are stored in nodes rather than in a contiguous block of memory like arrays, which can result in better use of memory. Linked lists can be used to implement various abstract data types, including stacks, queues, and associative arrays.

What else to review?
  • The structure of a linked list: including the concept of nodes and links, and how nodes are connected in a linked list.

  • The different types of linked lists: including singly linked lists, doubly linked lists, and circular linked lists.

  • The operations that can be performed on linked lists: including insertion, deletion, and traversal.

  • The advantages and disadvantages of linked lists: compared to arrays and other data structures, and how linked lists are used in various applications.

  • The implementation of linked lists in code: including how to create a linked list, how to insert and delete elements, and how to traverse a linked list.

  • The performance characteristics of linked lists: including time complexity, space complexity, and other factors that affect the efficiency of linked list operations.

  • Common variations of linked lists: such as skip lists and XOR linked lists.

It's also recommended to practice implementing linked lists in a programming language of your choice, and solving various problems that use linked lists.

❓ Sample Problem:

Write a program that takes two lists, assumed to be sorted, and returns their merge. The only field your program can change in a node is its next field. 

💡 Hint: Two sorted arrays can be merged using two indices. For lists, take care when one iterator reaches the end.

🧩 Solution:

A naive approach is to append the two lists together and sort the resulting list. The drawback of this approach is that it does not use the fact that the initial lists are sorted. The time complexity is that of sorting, which is O((n + m)log(n + rz)), where n and m are the lengths of each of the two input lists.

A better approach, in terms of time complexity, is to traverse the two lists, always choosing the node containing the smaller key to continue traversing from.


The worst-case, from a runtime perspective, corresponds to the case when the lists are of comparable length, so the time complexity is O(n + m). (In the best-case, one list is much shorter than the other and all its entries appear at the beginning of the merged list.) Since we reuse the existing nodes, the space complexity is O(1).

🗄️ Stacks and Queues

The last item added to the stack is the first thing removed in a stack, which is a linear data structure that adheres to the Last In First Out (LIFO) concept. It is frequently used to reverse the order of the elements in a program and to maintain track of the order of function calls. Stacks can be operated on using simple operations like push, pop, and peek. Push adds an element to the top of the stack, pop removes an element from the top of the stack (accessing the top element without removing it).

A queue is a linear data structure that follows the First In First Out (FIFO) principle, where the first item that was added to the queue is the first one to be removed. It is commonly used in applications such as scheduling tasks, handling incoming requests, or maintaining a list of items waiting to be processed. Queues have basic operations like enqueue (adding an element to the end of the queue), dequeue (removing an element from the front of the queue), and peek (accessing the front element without removing it).

What else to review?
  • The Last-In-First-Out (LIFO) property of Stacks

  • The First-In-First-Out (FIFO) property of Queues

  • The basic operations of Stacks, such as push, pop, and peek

  • The basic operations of Queues, such as enqueue, dequeue, and peek

  • Use cases of Stacks and Queues, such as backtracking, function calls and history management

  • Implementing Stacks and Queues using linked lists, including the advantages and disadvantages of linked list implementation

❓ Sample Problem: 

Queue insertion and deletion follows first-in, first-out semantics; stack insertion and deletion is last-in, first-out.

How would you implement a queue given a library implementing stacks?

💡 Hint: lt's impossible to solve this problem with a single stack.

🧩 Solution:

A straightforward implementation is to enqueue by pushing the element to be enqueued onto one stack. The element to be dequeued is then the element at the bottom of this stack, which can be achieved by first popping all its elements and pushing them to another stack, then popping the top of the second stack (which was the bottom-most element of the first stack), and finally popping the remaining elements back to the first stack.

The primary problem with this approach is that every dequeue takes two pushes and two pops of every element, i.e., dequeue has O(n) time complexity, where r is the number of stored elements. (Enqueue takes O(1) time.)

The intuition for improving the time complexity of dequeue is that after we move elements from the first stack to the second stack, any further dequeues are trivial, until the second stack is empty. This is true even if we need to enqueue, as long as we enqueue onto the first stack. When the second stack becomes empty, and we need to perform a dequeue, we simply repeat the process of transferring from the first stack to the second stack. ln essence, we are using the first stack for enqueue and the second for dequeue.



🌳 Trees

A set of nodes joined together by edges form a tree, which is a type of data structure. There is one unique node called the root node that is the parent of all other nodes, and each node might have zero or more offspring. Trees are frequently used in search and sorting algorithms as well as to depict hierarchical relationships, such as those found in computer file systems. Binary trees, balanced trees, and B-trees are a few of the several kinds of trees.

Understanding a tree's features, such as its height and depth, as well as its many varieties, such as balanced and binary trees, and its numerous traversal methods, is crucial while reviewing one. Additionally, it's critical to understand how fundamental tree operations like insertion, deletion, and searching impact the structure of the tree. You should also be familiar with the various tree-building and tree-manipulating techniques, such as heap construction and tree sorting.

Things you need to know about Trees
  1. Definition: A tree is a data structure that consists of nodes connected by edges. It has a root node, leaf nodes, and parent-child relationships between nodes.

  2. Hierarchical structure: Trees are used to represent hierarchical relationships between elements, where the root node represents the top level and the leaf nodes represent the lowest level.

  3. Tree traversal: Trees can be traversed in different ways, including in-order, pre-order, and post-order. The method of traversal affects the order in which nodes are visited.

  4. Binary trees: A binary tree is a type of tree where each node has at most two children. Binary trees are widely used for various purposes, such as searching, sorting, and representation of hierarchical relationships.

  5. Balancing trees: To ensure efficient operations, trees may be balanced. There are different algorithms for balancing trees, including AVL trees and Red-Black trees.

  6. Applications: Trees are widely used in various applications, such as representation of hierarchical relationships in file systems, searching and sorting, and in database indexing.

Types of Trees
  1. Binary Tree: a tree data structure in which each node can have up to two children

  2. Binary Search Tree: a special type of binary tree that enforces a particular ordering of its elements

  3. AVL Tree: a self-balancing binary search tree

  4. Trie: a tree-like data structure used to store a collection of strings

  5. Heap: a complete binary tree that satisfies the heap property, which specifies that the parent node is either greater than or equal to, or less than or equal to, its children

  6. B-Tree: a balanced tree data structure used in databases and file systems

  7. Red-Black Tree: a self-balancing binary search tree.

❓ Sample Problem: 

The exterior of a binary tree is the following sequence of nodes: the nodes from the root to the leftmost leaf, followed by the leaves in left-to-right order, followed by the nodes from the rightmost leaf to the root. (By leftmost (rightmost) leaf, we mean the leaf that appears first (last) in an in order traversal.) For example, the exterior of the binary tree in below is <A, B, C, D, E, H, M,N, P, O, I>.

Write a program that computes the exterior of a binary tree. 

💡 Hint: Handle the root's left child and right child in mirror fashion.

🧩 Solution: 

A brute-force approach is to use a case analysis. We need the nodes on the path from the root to the leftmost leaf, the nodes on the path from the root to the rightmost leaf, and the leaves in left-to-right order.

Compute the leaves in left-to-right order. The path from root to leftmost leaf is computed by going left if a left child exists, and otherwise going right. When we reach a leaf, it must be the leftmost leaf. A similar computation yields the nodes on the path from the root to the rightmost leaf. The time complexity is O(h + n + h) = O(n), where n and h are the number of nodes and the height of the tree, respectively. The implementation is a little tricky, because some nodes appear in multiple sequences. For example,Figure above, the path from the root to the leftmost leaf is (A,B,C,D), the leaves in left-to-right order are <D,E,H,M,N,P>, and the path from the root to the rightmost leaf is (A,I,O,P). Note the leftmost leaf , D, the rightmost leaf, P, and the root, A, appear in two sequences.

We can simplify the above approach by computing the nodes on the path from the root to the leftmost leaf and the leaves in the left subtree in one traversal. After that, we find the leaves in the right subtree followed by the nodes from the rightmost leaf to the root with another traversal. This is the program shown below. For the tree in the figure above, the first traversal returns <B,C,D,E,H), and the second traversal returns <M,N,P,O,I>. We append the first and then the second sequences to <A>.

tree p-626oj
📉 Graphs

Graphs are a data structure that represent a set of objects and their relationships. A graph consists of vertices (also known as nodes) and edges, which connect the vertices. The edges can be either directed or undirected, depending on whether the relationship between the two vertices is one-way or two-way. Graphs are commonly used to model and analyze complex systems, such as social networks, transportation systems, and electrical circuits. Different types of graphs include directed acyclic graphs (DAGs), trees, bipartite graphs, and weighted graphs, among others.

Types of Graphs

In computer science, there are several types of graphs, including:

  1. Undirected Graph: In an undirected graph, edges have no direction and can be traversed in both directions.

  2. Directed Graph: In a directed graph, edges have a specific direction and can only be traversed in the direction specified.

  3. Weighted Graph: In a weighted graph, each edge is assigned a weight or cost that represents the cost of traversing that edge.

  4. Cyclic Graph: In a cyclic graph, there is at least one cycle, which is a closed path that starts and ends at the same vertex.

  5. Acyclic Graph: In an acyclic graph, there are no cycles, meaning that it is not possible to traverse a path that starts and ends at the same vertex.

  6. Simple Graph: In a simple graph, there are no self-loops and no multiple edges between two vertices.

  7. Complete Graph: In a complete graph, there is an edge between every pair of vertices.

  8. Bipartite Graph: In a bipartite graph, vertices can be divided into two disjoint sets, and there are no edges between vertices in the same set.

What else to review?
  • Graph representations - adjacency matrix and adjacency list

  • Graph traversal algorithms - breadth-first search (BFS) and depth-first search (DFS)

  • Shortest path algorithms - Dijkstra's algorithm and Bellman-Ford algorithm

  • Maximum flow algorithms - Ford-Fulkerson algorithm and Edmonds-Karp algorithm

  • Minimum Spanning Tree (MST) algorithms - Prim's algorithm and Kruskal's algorithm

  • Applications of graphs - Network design, Graph theory in computer science, Image processing

❓ Sample Problem:

Consider a vertex type for a directed graph in which there are two fields: an integer label and a list of references to other vertices. Design an algorithm that takes a reference to a vertex u, and creates a copy of the graph on the vertices reachable from z. Return the copy of u.

💡 Hint: Maintain a map from vertices in the original graph to their counterparts in the clone.

🧩 Solution:

We traverse the graph starting from u. Each time we encounter a vertex or an edge that is not yet in the clone, we add it to the clone. We recognize new vertices by maintaining a hash table mapping vertices in the original graph to their counterparts in the new graph, Any standard graph traversal algorithm works-the code below uses breadth first search.

graph p-03dih
📅 Hash Tables

Hash tables are a data structure that stores key-value pairs, where each key maps to a specific value. They use a hash function to calculate an index into an array, where the desired value can be found or stored. Hash tables are commonly used for quick access, insertion and deletion operations, making them well suited for implementing dictionaries, sets, and many other data structures. They have an average time complexity of O(1) for these operations, making them efficient in many cases. However, they can also be subject to hash collisions, where multiple keys map to the same array index, which can affect the performance of the hash table.

What else to Review?

When reviewing hash tables, you should consider the following topics:

  • Hash function: How the keys are hashed into indices in the table and how collisions are handled.

  • Load factor: The relationship between the number of elements in the hash table and the size of the table.

  • Resizing: How the size of the hash table is adjusted as the number of elements changes.

  • Time complexity: The average and worst-case time complexity of hash table operations such as insertion, deletion, and search.

  • Space complexity: The amount of memory used by the hash table, including the memory for storing keys and values and any data structures used for collision handling.

  • Collision resolution: Techniques for handling collisions, including open addressing and chaining, and their trade-offs.

  • Applications: Examples of real-world scenarios where hash tables are used, such as caching, symbol tables, and indexing.

❓ Sample Problem: 

Write a program which takes as input a set of integers represented by an array, and returns the size of a largest subset of integers in the array having the property that if two integers are in the subset, then so are all integers between them. For example, if the input is (3,-2,7,9,8,7,2,0, -1,5,8), the largest such subset is {-2, -1,0,1,2,3}, so you should return 6.

💡 Hint: Do you really need a total ordering on the input? 

🧩 Solution:

The brute-force algorithm is to sort the array and then iterate through it, recording for each entry the largest subset with the desired property ending at that entry.

On closer inspection we see that sorting is not essential to the functioning of the algorithm. We do not need the total ordering-all we care are about is whether the integers adjacent to a given value are present. This suggests using a hash table to store the entries. Now we iterate over the entries in the array. If an entry e is present in the hash table, we compute the largest interval including e such that all values in the interval are contained in the hash table. We do this by iteratively searching entries the hash table the form e+1,e+2,...,and e-1,e-2,.... When we are done, to avoid doing duplicated computation we remove all the entries in the computed interval from the hash table, since all these entries are in the same largest contained interval.

As a concrete example, consider A = <10,5,3,11,6,100,4>. We initialize the hash table to {6,10,3,11,5,100,41}. The first entry in A is 10, and we find the largest interval contained in A including 10 by expanding from 10 in each direction by doing lookups in the hash table. The largest set is {10,11} and is of size 2. This computation updates the hash table to {6,3,5,100,4}. The next entry in A is 5. Since it is contained in the hash table, we know that the largest interval contained in A including 5 has not been computed yet. Expanding from 5, we see that 3,4,6 are all in the hash table, and 2 and 7 are not in the hash table, so the largest set containing 5 is {3,4,5,6}, which is of size 4. We update the hash table to {100}. The three entries after 5, namely 3,11, 6 are not present in the hash table, so we know we have already computed the longest intervals inA containing each of these. Then we get to 100, which cannot be extended, so the largest set containing it is {100}, which is of size 1. We update the hash table to {}. Since 4 is not in the hash table, we can skip it. The largest of the three sets is {3,4,5,61, i.e., the size of the largest contained interval is 4.

hash table p-ybpq0

The time complexity of this approach is O(n), where n is the array length, since we add and remove array elements in the hash table no more than once.

Reference: "Elements of programming interviews in Python"  by A, Aziz, T.H, Lee, and A. Prakash

Video: Data Structures: Crash Course Computer Science #14

In order to make data in a computer easier to access and utilize, data structures are a means to organize and store information. They serve as the fundamental units of many computer algorithms and as the framework for software creation. Understanding the numerous types of data structures that are accessible, their characteristics, and the various computational challenges they may be used to address are all part of the study of data structures. Data structures must be designed and implemented with a thorough understanding of algorithms, computational complexity, and the trade-offs between time and space efficiency. Computer scientists and software engineers must have a thorough understanding of data structures because they are crucial to the development of many software systems.

➡️ Arrays

Arrays are a fundamental data structure that is widely used in computer science and software engineering. They are a collection of elements, each identified by an index or a key. Arrays provide fast access to individual elements, and are useful for solving a wide range of computational problems.

Basic Properties of Arrays
  1. Indexing: Arrays are indexed, which means that elements can be accessed using a numerical index. The first element is usually indexed at position 0.

  2. Fixed Size: The size of an array is fixed and cannot be changed dynamically.

  3. Contiguity: All the elements in an array are stored in contiguous memory locations, allowing for fast access times.

  4. Homogeneous: Arrays are homogeneous, which means that all elements must be of the same data type.

  5. Sequential Access: Arrays provide sequential access to elements, meaning that elements can be easily traversed in order.

Arrays Operations
  1. Accessing elements: Retrieving the value stored at a specific index.

  2. Searching elements: Finding the index of an element with a specific value.

  3. Insertion: Adding a new element to the array.

  4. Deletion: Removing an element from the array.

  5. Sorting: Arranging the elements of the array in a specific order.

  6. Merging: Combining two arrays into one.

  7. Reversal: Reversing the order of elements in the array.

  8. Resizing: Increasing or decreasing the size of the array.

  9. Copying: Making a copy of the array.

What else to review?
  • Array traversal: Understanding how to traverse arrays, including sequential and random access, and the time and space complexity of these operations.

  • Array-based algorithms: Studying common algorithms that use arrays, such as sorting algorithms and search algorithms, and the time and space complexity of these algorithms.

  • Dynamic arrays: Understanding dynamic arrays, how they are implemented, and how they are used in real-world applications.

  • Multi-dimensional arrays: Studying multi-dimensional arrays, including two-dimensional arrays, and their applications in computer graphics and scientific computing.

  • Array memory management: Understanding the memory management aspects of arrays, such as how arrays are stored in memory, and how to avoid memory leaks and other memory-related issues.

These topics form the basic foundations of array data structures, and a good understanding of them will help you to design and implement efficient and effective algorithms that use arrays.

❓ Sample Problem:

Write a program which takes as input an array of digits encoding a nonnegative decimal integer D and updates the array to represent the integer D + 1. For example, if the input is (7,2,9) then you should update the array to (1,3,0). Your algorithm should work even if it is implemented in a language that has finite-precision arithmetic.

💡 Hint: Experiment with concrete examples.

🧩 Solution:

A brute-force approach might be to convert the array of digits to the equivalent integer, increment that, and then convert the resulting value back to an array of digits. For example, if the array is <1.,2,9>, we would derive the integer 129, add one to get 130, then extract its digits to form (1,3,0). When implemented in a language that imposes a limit on the range of values an integer type can take, this approach will fail on inputs that encode integers outside of that range.

We can avoid overflow issues by operating directly on the array of digits. Specifically, we mimic the grade-school algorithm for adding integers, which entails adding digits starting from the least significant digit, and propagate carries. If the result has an additional digit, e.g., 99 + 1. = 100, there is not enough storage in the array for the result-we need three digits to represent 100, but the input has only two digits. For the given example, we would update 9 to 0 with a carry-out of 1. We update 2 to 3 (because of the carry-in). There is no carry-out, so we stop-the result is (1,3,0).


🔗 Linked Lists

A linked list is a data structure consisting of a sequence of nodes, where each node contains a reference (i.e., a link) to an object and a reference to the next node in the sequence. The main advantage of linked lists over arrays is that the elements can be efficiently inserted or removed without reallocation or reorganization of the entire structure because a node can be added or removed in constant time. The elements are stored in nodes rather than in a contiguous block of memory like arrays, which can result in better use of memory. Linked lists can be used to implement various abstract data types, including stacks, queues, and associative arrays.

What else to review?
  • The structure of a linked list: including the concept of nodes and links, and how nodes are connected in a linked list.

  • The different types of linked lists: including singly linked lists, doubly linked lists, and circular linked lists.

  • The operations that can be performed on linked lists: including insertion, deletion, and traversal.

  • The advantages and disadvantages of linked lists: compared to arrays and other data structures, and how linked lists are used in various applications.

  • The implementation of linked lists in code: including how to create a linked list, how to insert and delete elements, and how to traverse a linked list.

  • The performance characteristics of linked lists: including time complexity, space complexity, and other factors that affect the efficiency of linked list operations.

  • Common variations of linked lists: such as skip lists and XOR linked lists.

It's also recommended to practice implementing linked lists in a programming language of your choice, and solving various problems that use linked lists.

❓ Sample Problem:

Write a program that takes two lists, assumed to be sorted, and returns their merge. The only field your program can change in a node is its next field. 

💡 Hint: Two sorted arrays can be merged using two indices. For lists, take care when one iterator reaches the end.

🧩 Solution:

A naive approach is to append the two lists together and sort the resulting list. The drawback of this approach is that it does not use the fact that the initial lists are sorted. The time complexity is that of sorting, which is O((n + m)log(n + rz)), where n and m are the lengths of each of the two input lists.

A better approach, in terms of time complexity, is to traverse the two lists, always choosing the node containing the smaller key to continue traversing from.


The worst-case, from a runtime perspective, corresponds to the case when the lists are of comparable length, so the time complexity is O(n + m). (In the best-case, one list is much shorter than the other and all its entries appear at the beginning of the merged list.) Since we reuse the existing nodes, the space complexity is O(1).

🗄️ Stacks and Queues

The last item added to the stack is the first thing removed in a stack, which is a linear data structure that adheres to the Last In First Out (LIFO) concept. It is frequently used to reverse the order of the elements in a program and to maintain track of the order of function calls. Stacks can be operated on using simple operations like push, pop, and peek. Push adds an element to the top of the stack, pop removes an element from the top of the stack (accessing the top element without removing it).

A queue is a linear data structure that follows the First In First Out (FIFO) principle, where the first item that was added to the queue is the first one to be removed. It is commonly used in applications such as scheduling tasks, handling incoming requests, or maintaining a list of items waiting to be processed. Queues have basic operations like enqueue (adding an element to the end of the queue), dequeue (removing an element from the front of the queue), and peek (accessing the front element without removing it).

What else to review?
  • The Last-In-First-Out (LIFO) property of Stacks

  • The First-In-First-Out (FIFO) property of Queues

  • The basic operations of Stacks, such as push, pop, and peek

  • The basic operations of Queues, such as enqueue, dequeue, and peek

  • Use cases of Stacks and Queues, such as backtracking, function calls and history management

  • Implementing Stacks and Queues using linked lists, including the advantages and disadvantages of linked list implementation

❓ Sample Problem: 

Queue insertion and deletion follows first-in, first-out semantics; stack insertion and deletion is last-in, first-out.

How would you implement a queue given a library implementing stacks?

💡 Hint: lt's impossible to solve this problem with a single stack.

🧩 Solution:

A straightforward implementation is to enqueue by pushing the element to be enqueued onto one stack. The element to be dequeued is then the element at the bottom of this stack, which can be achieved by first popping all its elements and pushing them to another stack, then popping the top of the second stack (which was the bottom-most element of the first stack), and finally popping the remaining elements back to the first stack.

The primary problem with this approach is that every dequeue takes two pushes and two pops of every element, i.e., dequeue has O(n) time complexity, where r is the number of stored elements. (Enqueue takes O(1) time.)

The intuition for improving the time complexity of dequeue is that after we move elements from the first stack to the second stack, any further dequeues are trivial, until the second stack is empty. This is true even if we need to enqueue, as long as we enqueue onto the first stack. When the second stack becomes empty, and we need to perform a dequeue, we simply repeat the process of transferring from the first stack to the second stack. ln essence, we are using the first stack for enqueue and the second for dequeue.



🌳 Trees

A set of nodes joined together by edges form a tree, which is a type of data structure. There is one unique node called the root node that is the parent of all other nodes, and each node might have zero or more offspring. Trees are frequently used in search and sorting algorithms as well as to depict hierarchical relationships, such as those found in computer file systems. Binary trees, balanced trees, and B-trees are a few of the several kinds of trees.

Understanding a tree's features, such as its height and depth, as well as its many varieties, such as balanced and binary trees, and its numerous traversal methods, is crucial while reviewing one. Additionally, it's critical to understand how fundamental tree operations like insertion, deletion, and searching impact the structure of the tree. You should also be familiar with the various tree-building and tree-manipulating techniques, such as heap construction and tree sorting.

Things you need to know about Trees
  1. Definition: A tree is a data structure that consists of nodes connected by edges. It has a root node, leaf nodes, and parent-child relationships between nodes.

  2. Hierarchical structure: Trees are used to represent hierarchical relationships between elements, where the root node represents the top level and the leaf nodes represent the lowest level.

  3. Tree traversal: Trees can be traversed in different ways, including in-order, pre-order, and post-order. The method of traversal affects the order in which nodes are visited.

  4. Binary trees: A binary tree is a type of tree where each node has at most two children. Binary trees are widely used for various purposes, such as searching, sorting, and representation of hierarchical relationships.

  5. Balancing trees: To ensure efficient operations, trees may be balanced. There are different algorithms for balancing trees, including AVL trees and Red-Black trees.

  6. Applications: Trees are widely used in various applications, such as representation of hierarchical relationships in file systems, searching and sorting, and in database indexing.

Types of Trees
  1. Binary Tree: a tree data structure in which each node can have up to two children

  2. Binary Search Tree: a special type of binary tree that enforces a particular ordering of its elements

  3. AVL Tree: a self-balancing binary search tree

  4. Trie: a tree-like data structure used to store a collection of strings

  5. Heap: a complete binary tree that satisfies the heap property, which specifies that the parent node is either greater than or equal to, or less than or equal to, its children

  6. B-Tree: a balanced tree data structure used in databases and file systems

  7. Red-Black Tree: a self-balancing binary search tree.

❓ Sample Problem: 

The exterior of a binary tree is the following sequence of nodes: the nodes from the root to the leftmost leaf, followed by the leaves in left-to-right order, followed by the nodes from the rightmost leaf to the root. (By leftmost (rightmost) leaf, we mean the leaf that appears first (last) in an in order traversal.) For example, the exterior of the binary tree in below is <A, B, C, D, E, H, M,N, P, O, I>.

Write a program that computes the exterior of a binary tree. 

💡 Hint: Handle the root's left child and right child in mirror fashion.

🧩 Solution: 

A brute-force approach is to use a case analysis. We need the nodes on the path from the root to the leftmost leaf, the nodes on the path from the root to the rightmost leaf, and the leaves in left-to-right order.

Compute the leaves in left-to-right order. The path from root to leftmost leaf is computed by going left if a left child exists, and otherwise going right. When we reach a leaf, it must be the leftmost leaf. A similar computation yields the nodes on the path from the root to the rightmost leaf. The time complexity is O(h + n + h) = O(n), where n and h are the number of nodes and the height of the tree, respectively. The implementation is a little tricky, because some nodes appear in multiple sequences. For example,Figure above, the path from the root to the leftmost leaf is (A,B,C,D), the leaves in left-to-right order are <D,E,H,M,N,P>, and the path from the root to the rightmost leaf is (A,I,O,P). Note the leftmost leaf , D, the rightmost leaf, P, and the root, A, appear in two sequences.

We can simplify the above approach by computing the nodes on the path from the root to the leftmost leaf and the leaves in the left subtree in one traversal. After that, we find the leaves in the right subtree followed by the nodes from the rightmost leaf to the root with another traversal. This is the program shown below. For the tree in the figure above, the first traversal returns <B,C,D,E,H), and the second traversal returns <M,N,P,O,I>. We append the first and then the second sequences to <A>.

tree p-626oj
📉 Graphs

Graphs are a data structure that represent a set of objects and their relationships. A graph consists of vertices (also known as nodes) and edges, which connect the vertices. The edges can be either directed or undirected, depending on whether the relationship between the two vertices is one-way or two-way. Graphs are commonly used to model and analyze complex systems, such as social networks, transportation systems, and electrical circuits. Different types of graphs include directed acyclic graphs (DAGs), trees, bipartite graphs, and weighted graphs, among others.

Types of Graphs

In computer science, there are several types of graphs, including:

  1. Undirected Graph: In an undirected graph, edges have no direction and can be traversed in both directions.

  2. Directed Graph: In a directed graph, edges have a specific direction and can only be traversed in the direction specified.

  3. Weighted Graph: In a weighted graph, each edge is assigned a weight or cost that represents the cost of traversing that edge.

  4. Cyclic Graph: In a cyclic graph, there is at least one cycle, which is a closed path that starts and ends at the same vertex.

  5. Acyclic Graph: In an acyclic graph, there are no cycles, meaning that it is not possible to traverse a path that starts and ends at the same vertex.

  6. Simple Graph: In a simple graph, there are no self-loops and no multiple edges between two vertices.

  7. Complete Graph: In a complete graph, there is an edge between every pair of vertices.

  8. Bipartite Graph: In a bipartite graph, vertices can be divided into two disjoint sets, and there are no edges between vertices in the same set.

What else to review?
  • Graph representations - adjacency matrix and adjacency list

  • Graph traversal algorithms - breadth-first search (BFS) and depth-first search (DFS)

  • Shortest path algorithms - Dijkstra's algorithm and Bellman-Ford algorithm

  • Maximum flow algorithms - Ford-Fulkerson algorithm and Edmonds-Karp algorithm

  • Minimum Spanning Tree (MST) algorithms - Prim's algorithm and Kruskal's algorithm

  • Applications of graphs - Network design, Graph theory in computer science, Image processing

❓ Sample Problem:

Consider a vertex type for a directed graph in which there are two fields: an integer label and a list of references to other vertices. Design an algorithm that takes a reference to a vertex u, and creates a copy of the graph on the vertices reachable from z. Return the copy of u.

💡 Hint: Maintain a map from vertices in the original graph to their counterparts in the clone.

🧩 Solution:

We traverse the graph starting from u. Each time we encounter a vertex or an edge that is not yet in the clone, we add it to the clone. We recognize new vertices by maintaining a hash table mapping vertices in the original graph to their counterparts in the new graph, Any standard graph traversal algorithm works-the code below uses breadth first search.

graph p-03dih
📅 Hash Tables

Hash tables are a data structure that stores key-value pairs, where each key maps to a specific value. They use a hash function to calculate an index into an array, where the desired value can be found or stored. Hash tables are commonly used for quick access, insertion and deletion operations, making them well suited for implementing dictionaries, sets, and many other data structures. They have an average time complexity of O(1) for these operations, making them efficient in many cases. However, they can also be subject to hash collisions, where multiple keys map to the same array index, which can affect the performance of the hash table.

What else to Review?

When reviewing hash tables, you should consider the following topics:

  • Hash function: How the keys are hashed into indices in the table and how collisions are handled.

  • Load factor: The relationship between the number of elements in the hash table and the size of the table.

  • Resizing: How the size of the hash table is adjusted as the number of elements changes.

  • Time complexity: The average and worst-case time complexity of hash table operations such as insertion, deletion, and search.

  • Space complexity: The amount of memory used by the hash table, including the memory for storing keys and values and any data structures used for collision handling.

  • Collision resolution: Techniques for handling collisions, including open addressing and chaining, and their trade-offs.

  • Applications: Examples of real-world scenarios where hash tables are used, such as caching, symbol tables, and indexing.

❓ Sample Problem: 

Write a program which takes as input a set of integers represented by an array, and returns the size of a largest subset of integers in the array having the property that if two integers are in the subset, then so are all integers between them. For example, if the input is (3,-2,7,9,8,7,2,0, -1,5,8), the largest such subset is {-2, -1,0,1,2,3}, so you should return 6.

💡 Hint: Do you really need a total ordering on the input? 

🧩 Solution:

The brute-force algorithm is to sort the array and then iterate through it, recording for each entry the largest subset with the desired property ending at that entry.

On closer inspection we see that sorting is not essential to the functioning of the algorithm. We do not need the total ordering-all we care are about is whether the integers adjacent to a given value are present. This suggests using a hash table to store the entries. Now we iterate over the entries in the array. If an entry e is present in the hash table, we compute the largest interval including e such that all values in the interval are contained in the hash table. We do this by iteratively searching entries the hash table the form e+1,e+2,...,and e-1,e-2,.... When we are done, to avoid doing duplicated computation we remove all the entries in the computed interval from the hash table, since all these entries are in the same largest contained interval.

As a concrete example, consider A = <10,5,3,11,6,100,4>. We initialize the hash table to {6,10,3,11,5,100,41}. The first entry in A is 10, and we find the largest interval contained in A including 10 by expanding from 10 in each direction by doing lookups in the hash table. The largest set is {10,11} and is of size 2. This computation updates the hash table to {6,3,5,100,4}. The next entry in A is 5. Since it is contained in the hash table, we know that the largest interval contained in A including 5 has not been computed yet. Expanding from 5, we see that 3,4,6 are all in the hash table, and 2 and 7 are not in the hash table, so the largest set containing 5 is {3,4,5,6}, which is of size 4. We update the hash table to {100}. The three entries after 5, namely 3,11, 6 are not present in the hash table, so we know we have already computed the longest intervals inA containing each of these. Then we get to 100, which cannot be extended, so the largest set containing it is {100}, which is of size 1. We update the hash table to {}. Since 4 is not in the hash table, we can skip it. The largest of the three sets is {3,4,5,61, i.e., the size of the largest contained interval is 4.

hash table p-ybpq0

The time complexity of this approach is O(n), where n is the array length, since we add and remove array elements in the hash table no more than once.

Reference: "Elements of programming interviews in Python"  by A, Aziz, T.H, Lee, and A. Prakash

Video: Data Structures: Crash Course Computer Science #14

Stay in the Loop

Stay in the Loop

Stay in the Loop