Abhay Singh

Semester 4 - Distributed System

2023-06-09T00:00:00+00:00

Course Name: Distributed System

Course Outcomes

	Course Outcomes
1	To understand data mining process and the resulting patterns, types of data, attributes and knowledge discovery process
2	To study the different data preprocessing techniques before applying the data mining process
3	To characterize the kinds of patterns that can be discovered by association rule mining
4	To learn the different predictions, classification and clustering algorithms
5	To categorize and carefully differentiate between situations for applying different data-mining techniques for different applications.

Syllabus

Internals and Attendance Criteria

Code	Course Name	Attendance Eligibility Criteria	Internals Criteria (Best is considered)
21CSA333A	Data Mining	7 Quizzes (LMS)	7 Quizzes & First 2 Assignments

Semester 3 - Data Mining - Flashcards

2023-05-26T00:00:00+00:00

Flashcards

Data Mining - All Quizzes Combined

All flashcards are made and hosted on Quizlet.
For those questions that are image dependent, please flip the flashcard to see the image and answer the question. This is because image can’t be attached on the front unlike rear.

Report via telegram or email if you want to make any corrections.

Semester 3 - Data Mining

2023-05-26T00:00:00+00:00

Official Information
Important Topics

Course Name: Data Mining

Course Outcomes

	Course Outcomes
1	To understand data mining process and the resulting patterns, types of data, attributes and knowledge discovery process
2	To study the different data preprocessing techniques before applying the data mining process
3	To characterize the kinds of patterns that can be discovered by association rule mining
4	To learn the different predictions, classification and clustering algorithms
5	To categorize and carefully differentiate between situations for applying different data-mining techniques for different applications.

Syllabus

Internals and Attendance Criteria

Code	Course Name	Attendance Eligibility Criteria	Internals Criteria (Best is considered)
21CSA333A	Data Mining	7 Quizzes (LMS)	7 Quizzes & First 2 Assignments

Mock Exam Screens

Mock Theory - Screenshots

Semester 3 - Data Mining (Week 1-4)

2023-05-26T00:00:00+00:00

Data Mining

Data Mining, also known as Knowledge Discovery in Database (KDD), is a process used to extract valuable information from large sets of data.
It involves various aspects such as:
- Data Types: Includes relational, transactional, data warehouse data, and complex data types like time-series, sequences, data streams, spatiotemporal data, multimedia data, text data, graphs, social networks, and Web data.
- Knowledge Mined: Involves discovering patterns, associations, correlations, and causal structures.
- Technologies Used: Incorporates machine learning, statistics, pattern recognition, neural networks, and visualization.
- Applications: Extensively used in various fields such as business, science, engineering, and healthcare.
Major challenges
- scalability,
- handling different types of attributes,
- dealing with noisy data, and
- developing incremental clustering algorithms.

Summarize about the steps in Knowledge Discovery Process

Knowledge Discovery Process

Data Cleaning: Remove noise and inconsistent data.
Data Integration: Combine multiple data sources.
Data Selection: Data relevant to the analysis task are retrieved from the database.
Data Transformation: Consolidate data into mining-friendly formats.
Data Mining: Apply intelligent methods to uncover patterns.
Pattern Evaluation: Identify valuable patterns via interestingness measures.
Knowledge Presentation: Visualization and knowledge representation techniques are used to present the mined knowledge.

Briefly summarize about “Transactional Data”

Refers to the data that records an exchange, agreement or transfer between entities.
Captures every system event detail.
Examples: order delivery, purchase orders, invoices.

Explain different Data Mining Functionalities

Characterization/Discrimination: Summarize and contrast data.
Association/Correlation: Find relationships in data.
Classification/Regression: Create data models and predict labels.
Cluster Analysis: Group data into clusters.
Outlier Analysis: Identify non-compliant data.
Trend/Evolution: Describe trends over time.

What is a “Data Warehouse”? Explain the importance of DW in data mining field.

Three tier data warehousing architecture

Data Warehouse

Collects and manages data from various sources.
Enables strategic data use through a mix of technologies and components.
Offers consistent business view, irrespective of data source.
Acts as electronic storage for large information volumes.
Designed for query and analysis rather than transaction processing.
Transforms data into information for user analysis.

Important reasons for using Data warehouse

Integrates many sources of data and helps to decrease stress on a production system.
Optimized Data for reading access and consecutive disk scans.
Data Warehouse helps to protect Data from the source system upgrades.
Allows users to perform master Data Management.
Improve data quality in source systems.

What are the issues faced in Data mining?

Mining Methodology: Managing diverse data types, noise, uncertainty, scalability.
User Interaction: Maintaining simplicity, transparency, and user engagement.
Efficiency and Scalability: Ensuring fast, scalable data processing.
Diversity of Database Types: Handling various data types and sources.
Data Mining and Society: Navigating information misuse, privacy, security.

Semester 3 - Data Mining (Week 11-14)

2023-05-26T00:00:00+00:00

Classification and Prediction

Definition and Purpose
- Classification: Technique for extracting models or “classifiers” from data.
- Goal: To predict categorical, unordered class labels.
Methodology
- Various techniques exist for classification, including decision tree, Bayesian, and rule-based classifiers.
- Scalable techniques for large, disk-resident data have been developed recently.
Applications
- Diverse fields such as fraud detection, target marketing, performance prediction, manufacturing, and medical diagnosis.
Accuracy and Evaluation
- Various measures evaluate classifiers’ accuracy.
- Techniques ensure reliable accuracy estimates.
Dealing with Challenges
- Methods exist for increasing accuracy, including strategies for handling imbalanced class data.
Process
- Classification typically involves a two-step process:
  - Building a classification model from previous data.
  - Determining the model’s accuracy before deploying it for new data classification.

Decision Trees

Definition and Purpose
- Decision Trees: Supervised Machine Learning technique involving data splitting based on parameters.
- Utility: Suitable for both Classification and Regression problems.
Features and Limitations
- Transparent and easy to interpret.
- Prone to overfitting, particularly with deeper trees.
Algorithms and Techniques
- Algorithms include ID3, C4.5, CART, among others.
- Use different metrics for attribute split, such as information gain, gain ratio, Gini index.
- Pruning: Technique to reduce overfitting by removing less impactful branches.
Data Compatibility
- Handles both categorical and numerical data.
- Can manage missing values by inferring the most beneficial outcome.
Further Utilization
- Serves as foundation for more powerful machine learning algorithms like Random Forest and Gradient Boosting algorithms.

Bayesian Classification:

Bayesian classification is a type of statistical classification.
It assumes different parts of data are independent if we know their class.
This makes calculations easier and can make it the best classifier.
But, in real-life, data parts can depend on each other.

Bayesian Belief Networks

These networks use joint conditional probabilities.
They let us define independence between certain data parts.
They offer a visual model of cause-effect links for learning.
Once trained, they can classify data.

Components of Bayesian Belief Networks

A belief network is made of a graph and probability tables.
Each point in the graph stands for a random variable.
Variables can be discrete or continuous.
They can represent actual data attributes or “hidden variables”.
Each line shows a probability dependence. If line goes from Y to Z, Y is Z’s parent. Z is a descendant of Y.

Applications

They can classify data and show dependencies among attribute groups.
For instance, in medical data, a “hidden variable” could point to a syndrome. This syndrome indicates multiple symptoms that define a particular disease.

K-Nearest Neighbors (KNN) Algorithm

What is KNN?
- KNN is a simple machine learning algorithm.
- It classifies objects based on the majority vote of its neighbors.
- The object is assigned to the class most common among its ‘K’ closest neighbors.
- ‘K’ is usually a small positive number. If K = 1, the object takes the class of its closest neighbor.
Uses of KNN
- KNN is used for both classification and regression problems.
- It’s mostly used for classification in the industry.
- It uses existing data points classified into groups to predict the classification of new points.
Relation to Real Life
- Like learning about a person by knowing their friends, KNN uses known data to classify unknown data.
Things to consider before using KNN
- KNN uses lots of computer power.
- Variables should be normalized to avoid bias.
- Outlier and noise removal is important before using KNN.

Clustering

What is Clustering?

Clustering groups similar data objects together.
Objects in the same cluster are more alike than those in different clusters.

Challenges in Clustering

Scalability:
- Clustering algorithms should work well on large databases with millions or billions of objects.
Different Types of Attributes:
- Clustering should handle various types of data, such as numeric, binary, categorical, and ordinal.
Clusters with Arbitrary Shape:
- Clusters can have any shape, so algorithms should detect clusters of different shapes.
Domain Knowledge:
- Clustering algorithms may require user input, like the number of desired clusters. Finding optimal parameters can be challenging.
Noisy Data:
- Real-world data often has outliers, missing values, or errors. Robust clustering methods are needed to handle noise.
Incremental Clustering and Insensitivity to Input Order:
- Clustering algorithms should handle incremental updates and be insensitive to the order of data objects.

Types of clustering methods

Partitioning methods:
- Create initial partitions and iteratively improve them.
- Examples: k-means, k-medoids, CLARANS.
Hierarchical methods:
- Create hierarchical decomposition of data objects.
- Agglomerative (bottom-up) or divisive (top-down) approaches.
- Examples: Chameleon, BIRCH.
Density-based methods:
- Cluster based on density of objects.
- Grow clusters using neighborhood density or density function.
- Examples: DBSCAN, DENCLUE, OPTICS.
Grid-based methods:
- Quantize object space into a grid structure.
- Perform clustering on the grid.
- Examples: STING, CLIQUE.

K-means Algorithm

K-means algorithm partitions data into K non-overlapping clusters based on nearest mean.
Randomly choose K cluster centers.
Assign each data point to the closest cluster center.
Update cluster centers as the mean of assigned points.
Repeat steps 3 and 4 until cluster assignments stabilize or reach maximum iterations.
Result is clusters where each point is closer to its cluster center.
Suitable for various data types and cluster shapes.
Requires prior specification of K and sensitive to initial cluster center selection.
Assumes spherical and evenly sized clusters, which may not always hold.
Sensitive to outliers, may require data preprocessing.

Hierarchical Clustering Algorithm

Hierarchical clustering is an algorithm that builds a hierarchy of clusters where each node is a cluster consisting of the clusters of its daughter nodes.
Two types: agglomerative (bottom-up) and divisive (top-down).
Results presented in a dendrogram.
No pre-specified number of clusters required.
Can be used for disjoint cluster partitioning.
Sensitivity to distance metric and inter-cluster distance calculation methods.

Semester 3 - Data Mining (Week 5-7)

2023-05-26T00:00:00+00:00

Explain the categorization of visualization methods?

Data Visualization

Graphically represents data for clear communication.
Used for reporting, managing operations, tracking tasks, and discovering data relationships.

Visualization Techniques

A Pixel-oriented visualization

Pixel-Oriented: Uses color-shaded pixels to reflect data values and analyze correlations.

A Geometric projection visualization

Geometric Projection: Visualizes geometric transformations and projections of multidimensional data.

A Chernoff faces (Icon-based) visualization

Icon-Based: Uses icons, like Chernoff faces, to represent multidimensional data.
- Chernoff Faces (Icon-Based)
  - Cartoon human faces representing up to 18 variables of multidimensional data.

A Tree map (Hierarchial) visualization

Hierarchical: Partitions dimensions into subsets visualized hierarchically. Methods include Info cube and screen filling method, Tree maps and Infocube.

Visualizing Complex Data and Relations

Techniques now include non-numeric data like text and social networks.
Tag clouds visualize statistics of user-generated tags.
Methods exist for visualizing relationships like social networks.

What is Data Quality?

Defined in terms of accuracy, completeness, consistency, timeliness, believability, and interpretability.
Quality is assessed based on the intended use of the data.

Factors Affecting Data Quality

Poorly designed data entry forms with many optional fields.
Human and deliberate errors.
Data decay and inconsistencies.
Instrumentation and system errors.
Inadequate data usage.

Explain about Data Preprocessing

Involves cleaning, integrating, reducing, and transforming data.
Cleaning fills in missing values, smooths noise, identifies outliers, and corrects inconsistencies.
Integration merges data from multiple sources.
Reduction minimizes data size while preserving information.
Transformation adjusts data for optimal mining.

Summarize Data Cleaning

Involves filling missing values, smoothing noise, identifying outliers, and correcting inconsistencies.
Typically an iterative two-step process: discrepancy detection and data transformation.

What is Integration of Data?

Combines data from different sources into one place.
Involves sorting out differences in meaning, managing data about data, checking relationships, finding and handling duplicate data and conflicts.

Define Reduction and Data Transformation

Part of data mining process.
Aims to decrease data volume while maintaining similar analytical outcomes.
Simplifies data for easier understanding and interpretation.
Techniques include reducing dimensions, reducing number of data points, and data compression.

What is Euclidean Distance?

Measures straight line distance between two points in a space.
Derived from Pythagoras’ theorem, used in data mining and machine learning.
Calculated using square root of sum of squares of differences in each dimension.
In 2D space, Euclidean distance between points (x1, y1) and (x2, y2) is sqrt((x2-x1)² + (y2-y1)²).
In 3D space, it extends to sqrt((x2-x1)² + (y2-y1)² + (z2-z1)²).
For higher dimensions, it’s sqrt(Σ(xi-yi)²), summing over all dimensions.

Define Different Distance Measures

Euclidean Distance: Straight line distance between two points.
Manhattan Distance: Distance between points along orthogonal axes (grid-based).
Chebyshev Distance: Maximum absolute distance in one dimension.
Minkowski Distance: Generalized metric distance measure with power p.
Hamming Distance: Minimum substitutions to change one string into another.
Mahalanobis Distance: Distance between a point and a distribution.

Data Similarity and Dissimilarity

Data similarity measures the likeness between two data objects.
It is subjective and defined based on the context.
Typically represented as a distance, with smaller distances indicating higher similarity.
Dissimilarity refers to the unlikeness or differences between data objects.

Define Data Visualization

Graphical representation of information and data.
Utilizes various visual tools like charts, graphs, and infographics.
Facilitates clear and efficient communication of information.
Enables quick analysis and exploration of large data sets for decision-making.

Data Discretization

Converts continuous data to discrete form.
Improves data understandability and interpretability.
Enhances machine learning model performance.
Methods include binning, histogram analysis, decision trees, and clustering.

Semester 3 - Data Mining (Week 8-10)

2023-05-26T00:00:00+00:00

What is Frequent Patterns?

Patterns that appear frequently in a dataset.
Sets of items that commonly occur together in transactions.
Example: {milk, bread} is a frequent itemset if customers often buy them together.

Define Frequent Itemset Mining

Identifies and lists itemsets that meet a minimum support threshold.
Important for tasks like association rule mining and correlation mining.
Aims to find relationships between items in large datasets.

Association Rule

Association rules find interesting relations between variables in large databases.
They identify strong rules based on measures of interestingness.
Example: “If a customer buys a coffee, they are 80% likely to also purchase a muffin.”

Market Basket Analysis

Modelling technique that explores item associations in shopping baskets.
Used by retailers to uncover item relationships.
Analyzes frequent co-occurrences of items in transactions.
Example: Customers buying computers often purchase antivirus software, suggesting strategic product placement.

Name Applications of Frequent Pattern Mining

Data Preprocessing and Noise Filtering:
- Helps filter out noise and clean data.
- Distinguishes frequent item co-occurrences from random noise.
Discovery of Inherent Structures and Clusters:
- Reveals hidden structures and clusters in data.
- Identifies coauthor and conference clusters in DBLP data set.
Pattern-based Classification:
- Constructs reliable classification models using frequent patterns.
- Provides more information gain compared to infrequent patterns.
Subspace Clustering in High-Dimensional Space:
- Supports effective clustering in high-dimensional space.
- Overcomes challenges of measuring distances in such spaces.
Microarray Data Analysis:
- Analyzes noisy microarray data with thousands of dimensions.
- Helps differentiate noise from meaningful patterns.
Semantic Annotation of Frequent Patterns:
- Enhances understanding of frequent patterns with annotations.
- Provides additional context and meaning to the patterns.

Applications of Market Basket Analysis

Marketing and Advertising Strategies:
- Tailoring marketing efforts based on frequently bought together products.
- Optimizing advertising strategies using market basket analysis insights.
Store Layout Design:
- Placing frequently purchased items together in store layout.
- Utilizing strategic placement to maximize sales and customer experience.
Sales Planning:
- Identifying items to put on sale based on frequent associations.
- Promoting complementary items to boost sales.
Catalog Design:
- Grouping frequently bought together items in catalogs.
- Enhancing catalog layout for increased sales potential.
Cross-Marketing:
- Utilizing correlations to make effective cross-marketing decisions.
- Understanding product associations to drive sales across different categories.
Customer Shopping Behavior Analysis:
- Analyzing buying habits and preferences of customers.
- Improving shopping experience and increasing sales based on insights.

Frequent Itemset Mining Methods

Apriori Algorithm:
- Mines frequent itemsets for Boolean association rules.
- Uses the Apriori property and iterative approach.
- Forms candidate itemsets based on frequent (k-1)-itemsets.
- Scans the database to find complete sets of frequent itemsets.
- Variations include hashing, transaction reduction, partitioning, and sampling for efficiency improvement.
Frequent Pattern Growth (FP-growth):
- Mines frequent itemsets without candidate generation.
- Constructs a compact FP-tree data structure.
- Compresses the transaction database.
- Focuses on frequent pattern growth rather than generate-and-test.
- Offers improved efficiency compared to Apriori-like methods.

Mining Association Rules

Mining association rules involves several steps:

Finding Frequent Itemsets:
- Identify itemsets (e.g., A and B) that meet a minimum support threshold.
- Support threshold ensures sufficient occurrence in the data.
Generating Strong Association Rules:
- Generate association rules (A -> B) from frequent itemsets.
- Rules meet a minimum confidence threshold.
- Confidence indicates the probability of B given A.
Uncovering Correlation Rules:
- Analyze associations to uncover correlation rules.
- Reveal statistical correlations between itemsets A and B.
Classifying Algorithms:
- Efficient and scalable algorithms for mining frequent itemsets.
- Three categories: Apriori-like, FP-growth, and vertical data format.
Evaluating Patterns:
- Augment support-confidence framework with pattern evaluation.
- Evaluate interestingness of association rules.
- Consider null-invariant measures (unaffected by null-transactions).
- Common measures: lift, χ2, all confidence, max confidence, Kulczynski, cosine.

Semester 3 - Data Structure & Algorithm

2023-05-19T00:00:00+00:00

Trees

Trees are hierarchical structures with a root node and connected nodes without cycles.
Each node has a parent (except root) and children.
Leaf nodes have no children.
Depth: number of edges from root to node
Height: number of edges from node to deepest leaf.
Trees have various applications:
- File system in OS
- B-Tree, B+-Tree for indexing in databases
- Syntax Tree in compilers
- Document Object Model (DOM)

Binary Trees

Binary trees are trees where each node has up to two children: left and right.
Types include:
- Full: Nodes have 0 or 2 children.
- Perfect: All nodes have two children, leaves are at the same level.
- Complete: All levels are fully filled except possibly the last.
- Balanced: Left and right subtrees’ heights differ by at most one.
- Degenerate: Each node has only one child, similar to a linked list.

Binart Tree

Binary Search Trees (BST)

BST is a binary tree where each node’s left children are less than the node and right children are greater.
Operations:
- Search: Traverse from root to left/right subtree based on comparison until value is found or subtree is null.
- Insertion: Like search, but create a new node at null subtree.
- Deletion: Remove node and maintain BST property. Consider cases: no child, one child, two children.

BST - Insert Operation

AVL Trees

Named after inventors Adelson-Velsky and Landis
Self-balancing binary search trees.
Heights of two child subtrees of any node differ by at most one.
Rebalancing is done through rotations if heights differ by more than one.
Offer faster retrievals but slower insertions and deletions compared to some other trees.
Used in applications where fast retrievals are crucial.

Balance Factor

Balance factor (k) = height of left sub-tree - height of right sub-tree.
Balance factor 1: left sub-tree is one level higher.
Balance factor 0: both sub-trees are of equal height.
Balance factor -1: right sub-tree is one level higher.
An AVL tree has balance factors within the range -1 to +1.

AVL Trees Rotations

AVL Tree Rotations: Performed to maintain balance during insertions/deletions.
Four types: Left-Left, Right-Right, Left-Right, Right-Left.
Left-Left (LL): Single right rotation.
Right-Right (RR): Single left rotation.
Left-Right (LR): Double rotation; first left, then right.
Right-Left (RL): Double rotation; first right, then left.
Rotations restore balance without affecting order properties.

AVL Trees - Rotations Animation

AVL Trees - Rotations

Tree traversal

Process of visiting each node in a tree once.
Can be depth-first (In-order, Pre-order, Post-order) or breadth-first.
There are three common ways to traverse a tree in depth-first order:
- In-order (Left, Root, Right)
- Pre-order (Root, Left, Right)
- Post-order (Left, Right, Root)

In-order Traversal

Pre-order Traversal

Post-order Traversal

Semester 3 - DSA - Mock(Theory)

2023-05-19T00:00:00+00:00

Q1. The rate of growth or order of growth of a function f(n), means how fast the function increases its value as the input size n increases. List and explain various notations used to express orders of growth.

Answer:

Big O Notation (O): Describes the upper bound on the time complexity of an algorithm. Represents the maximum time taken for all input sizes.
- O(1) is constant time complexity. (Best)
- O(log n) is logarithmic time complexity. (Good)
- O(n) is linear time complexity. (Fair)
- O(n log n) is linearithmic time complexity. (Not ideal)
- O(n^2) is quadratic time complexity. (Worse)
- O(n^3) is cubic time complexity. (Worse)
- O(2^n) is exponential time complexity. (Worst)
Big Omega Notation (Ω): Provides the asymptotic lower bound, representing the minimum time taken by an algorithm for all input sizes.
Big Theta Notation (Θ): Gives a tight bound on the time complexity, representing both the best-case and worst-case scenarios.
Little o Notation (o): Provides an upper bound that is not tight, indicating that the actual time complexity is less than the one specified.
Little Omega Notation (ω): Provides a lower bound that is not tight, indicating that the actual time complexity is more than the one specified.

These notations help analyze and compare the efficiency of algorithms based on their growth with input size.

Q2. Recursive functions are executed in a Last In First Out-order. Name a suitable data structure for implementing recursive calls and mention any two operations of that data structure.

Answer:

The suitable data structure for implementing recursive calls is the Stack.
The two primary operations associated with a stack are:
- Push: This operation adds an element to the top of the stack.
- Pop: This operation removes an element from the top of the stack.

Stacks follow a Last-In-First-Out (LIFO) principle, which aligns with the execution order of recursive functions. When a function calls itself recursively, the call information is “pushed” onto the stack, and when the function returns, the information is “popped” off. This ensures the correct order of execution for nested recursive calls.

Q3. Name any three algorithm design techniques

Answer:

Five common algorithm design techniques:

Divide and Conquer: This technique involves breaking down a problem into smaller subproblems, solving each subproblem independently, and then combining the solutions to solve the original problem.
Dynamic Programming: This technique is used when the subproblems overlap. It involves solving each subproblem only once and then storing the results of each subproblem to avoid duplicate work.
Greedy Algorithms: This technique involves making the locally optimal choice at each stage with the hope that these local choices will lead to a global optimum.
Backtracking: This technique is used for solving problems where the solution requires the sequence of decisions. If the sequence of decisions made so far has not led to a solution, then the algorithm goes back and tries the next decision.
Brute Force: This technique involves trying all possible solutions until a satisfactory solution is found. It is simple to implement but may not be efficient for complex problems.

Q4. Identify the correct way to declare a multidimenstional array in java.

Answer:

dataType[][] arrayName;
int[][] arr;
or
int arr[][];

Q5. The Pre-order traversal in a tree can also be known as

Depth first
Breadth first
Topological order
Linear order

Answer:

Depth first

Pre-order traversal is a type of depth-first traversal. In a pre-order traversal of a tree, the process is as follows: visit the root node, traverse the left subtree, then traverse the right subtree.

Q6. Which algorithm design techniques is applied in Merge sort?

Divide and conquer
Greedy Approch
Backtracking
Dynamic Programming

Answer:

Divide and conquer

Merge sort is a classic example of the divide and conquer algorithm design technique. The algorithm recursively divides the array into two halves, sorts them, and then merges the sorted halves.

Q7. What’s the most appropriate data structure to implement a priority queue?

Heap
Circular Array
Linked List
Binary Trees

Answer:

Heap

A heap is the most appropriate data structure to implement a priority queue. It allows for efficient insertion of new elements and removal of the element with the highest priority, both operations being O(log n) in complexity. This is more efficient than using a sorted array, unsorted array, or other data structures like linked lists or binary trees for this purpose.

Q8. Which of the following data structure is linear type?

String
List
Queue
All of this

Answer:

All of this
String is datatype but also an array of characters

Semester 3 - DSA - Sample Test

2023-05-19T00:00:00+00:00

Q1. The postfix form a given string S = ABC+-T, then the find out the S.
A) (A-(B+C))T
B) A+B-CT
C) AB-C+T
D) A*B+C-T
Answer: A) (A-(B+C))*T

Explanation:

Q2. H is a graph with k vertices. H is connected and has exactly k-1 edges, then:
A) H is a tree
B) H contains no cycles
C) Every pair of vertices in G is connected by exactly one path
D) All of these
Answer: D) All of these

Explanation: Formula to calculate no. of edges ‘E’ if number of Nodes/vertices is ‘K’ is E = K-1

Q3. Initially a queue with configuration is p, q, r, s. ‘p’ is at the front. To get the configuration s, r, q, p find the number of deletions and additions are? required
A) 4 deletions, 4 additions
B) 3 deletions, 3 additions
C) 5 deletions, 5 additions
D) 2 deletions, 2 additions
Answer: A) 3 deletions, 3 additions

Explanation: remove p,q,r and insert r, q, p in order.

Q4. The traversal technique which lists the nodes of a binary search tree in ascending order?
A) post-order
B) in-order
C) pre-order
D) linear order
Answer: B) in-order

Explanation:

In-order (Left, Root, Right) —> Ascending order (left is smaller than root, right is greater than root)
Pre-order (Root, Left, Right)
Post-order (Left, Right, Root)

Q5. Given a binary tree whose in order and preorder traversal are given below:
Preorder: BCQIPDNHML
In order: QICPBNDMHL
The post order traversal of the above binary tree is:
A) QICPNHDMLB
B) QICPHNDMLB
C) QICPNHDMBL
D) IPQCNMLHDB
Answer: D) IPQCNMLHDB

Explanation: See Question 2 Here

In our case, B is root, and graph would be

Q6. The postfix expression AB+CD–* can be evaluated using a
A) Stack
B) Tree
C) Queue
D) Linked list
Answer: A) Stack

Explanation:

Push ‘A’, ‘B’ into stack
Pop ‘A’, ‘B’ and perform ‘+’ operation
Push ‘A+B’ into stack, Push ‘c’ & ‘D’ into stack
Pop ‘C’ & ‘D’ and perform ‘-‘ operation
Push ‘C-D’ into stack
pop ‘A+B’ and ‘C-D’ from stack and perform ‘*’
push ‘(A+B)*(C-D)’ into the stack

Q7. A binary search tree is a binary tree:
A) All items in the left sub-tree are less than root
B) All items in the right sub-tree are greater than or equal to the root
C) Each sub-tree is itself a binary search tree
D) All of the above
Answer: D) All of the above

Explanation: order is left side < root < right side

Q8. The In-order traversal of the tree will yield a sorted listing of elements of tree in
A) Binary tree
B) Binary search tree
C) Heaps
D) None of the above
Answer: B) Binary search tree

Explanation: In a binary search tree, the left subtree of a node contains values smaller than the node, and the right subtree contains values greater than the node

Q9. How many binary trees formed with 5 nodes is
A) 22
B) 46
C) 120
D) 42
Answer: D) 42

Explanation: Formula is 2n!/(n+1)!n!
The number of binary trees for 1 to 10 nodes should be 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796

Q10. The following postfix expression is evaluated using a stack 823^/52* + 41* – The top two elements of the stack after first * is evaluated
A) 8, 10
B) 4, 25 C) 2, 25
D) 1, 10
Answer: D) 1, 10
Explanation:

Step 1: Push 8, 2, and 3 onto the stack.
Stack: [8, 2, 3]

Step 2: Pop 3 and 2 from the stack, evaluate 2^3 = 8, and push the result (8) onto the stack.
Stack: [8, 8]

Step 3: Pop 8 and 8 from the stack, evaluate 8 / 8 = 1, and push the result (1) onto the stack.
Stack: [1]

Step 4: Encounter ‘5’, push it onto the stack.
Stack: [1, 5]

Step 5: Encounter ‘2’, push it onto the stack.
Stack: [1, 5, 2]

Step 6: Pop 2 and 5 from the stack, evaluate 5 * 2 = 10, and push the result (10) onto the stack.
Stack: [1, 10]

Q11. The operations where the worst case time complexity of AVL tree is better while comparing with binary search tree for
A) Search, Delete and Insert Operations
B) Search and Delete Operations
C) Insert and Delete Operations
D) Search, and Insert Operations
Answer: A) Search, Delete and Insert Operations

Q12. Suppose an empty stack performing operations in the given order: push(1), push(2), Pop, push(3), push(4), push(5), Pop, what is the top of the stack?
A) 1
B) 2
C) 3
D) 4
Answer: D) 4

Q13. What is the minimum number of nodes in a binary tree of depth k (root is at level 0).
A) 2 k – 1
B) 2 k+1 – 1
C) k + 1
D) k
Answer: C) k + 1
Explanation: Assuming one node at each level, minimum number of nodes with depth/level k is K+1 And the maximum number of nodes with depth/level k is 1 + 2 + 4 + …+ 2^k = 2^k+1 - 1

Q14. Which is the efficient data structure to insert or delete a number in a stored set of numbers is
A) Binary tree
B) Linked list
C) Doubly linked list
D) Queue
Answer: C) Doubly linked list

Q15. Suppose a queue is implemented with a linked list, and keeping track of a front pointer and a rear pointer, during an insertion into a non-empty queue which these pointers are changed?
A) Only rear pointer changes
B) Only front pointer changes
C) Neither of the pointers change
D) Both of the pointers changes
Answer: A) Only rear pointer changes

Q16. In linked list implementation of a queue, front and rear pointers are tracked. Which of these pointers will change during an insertion into an EMPTY queue?
a) Only front pointer
b) Only rear pointer
c) Both front and rear pointers
d) No pointer will be changed
Answer: c) Both front and rear pointers

Q17. What is the maximum number of parentheses that will appear on the stack at any one time for a parenthesis expression given by ( () (()) (()) ) ))
A) 2
B) 3
C) 4
D) 5
Answer: B) 3
Explanation:

Push “(“ onto the stack: [ ( ]
Push “(“ onto the stack: [ (, ( ]
Match “)” with the top of the stack: [ ( ]
Push “(“ onto the stack: [ (, ( ]
Push “(“ onto the stack: [ (, (, ( ]
Match “)” with the top of the stack: [ (, ( ]
Match “)” with the top of the stack: [ ( ]
Match “)” with the top of the stack: [ ]

At any point during the evaluation, the maximum number of parentheses on the stack is 3.