DM QB - PDF Free Download

PESIT SOUTH CAMPUS

Question Bank Faculty: D Annapurna

No of Hours: 52

Unit 1 and Unit 2 Sl No.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

Questions Why do many enterprises need a data warehouse?

Marks 4

What are OLTP and OLAP database syatems?

4

What is ODS and what is is used for ?

4

Explain why ETL must deal with dirty data when extracting information from the source systems. List the major steps involved in the ETL process

8

What is the need for a separate database for decision makers?

4

What is a data warehouse and how it might be defined?

4

What are the likely benefits of building an enterprise data warehouse?

6

6

What is the major difference between the star schema and the snowflake schema?

8

List some differences between an OLTP system and a data warehouse system.

7

Describe the features of a data warehouse.

6

What is OLTP database system?

8

What is an ODS used for? How does it differ from an OLTP system

7

Give three most important guideline in implementing a data wartehouse for a large enterprise.

7

Give two major components of any data warehouse system.

8

What ETL?

4

Give two reasons for the dirty data being extracted from source systems?

7

List four steps of the ETL process.

8

Define the terms star schema and snowflake schema.

10

What types of queries do managers need to pose to the enterprise’s database systems?

8

Describe the type of metadata that is maintained in a data warehouse.

8

What are the major differences between OLTP and a data warehouse system?

10

Explain the star scheme technique of modeling a data warehouse.

8

What are the type of metadata that is maintained in a data warehouse.

8

What are the dimensions, members, measure and fact table?

7

What is OLAP?

4

List the characterstics of OLAP systems.

4

List some of the motivations for using OLAP.

6

Expalin multidimensional view and a data cube.

8

What are the different implementations of a data cube?

8

What are the differences between ROLAP and MOlAP.

10

Describe the operations roll-up, drill-down, slice and the dice and pivot.

10

List some guidelines for implementations OLAP.

8

What OLAP softaware is available in the market?

6

List four types of aggregate queries that are possible with two variables.

7

What are dimension?

4

What is a measure?

4

What is fact and fact table?

6

Give a Simple definition of OLAP.

7

List two major characterstics of OLAP.

5

Define data cube in your own words.

7

Show how a data cube of two dimensions looks like.

7

Give a simple data cube implenetation.

8

Are all data cube entries non-zero? If not, why not?

8

What is the differences between roll-up and Pivot?

10

B.E 7th Semester Information Science

1

PESIT SOUTH CAMPUS What is the difference between drill-down and slicing?

46

10

Unit 3: Sl No. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10 11 12 13

14 15 16 17 18 19 20 21

Questions

Marks

What is data mining

5

Mention Data mining functionality, classification, prediction, clustering & evolution analysis? What are the challenges in methodology of Data Mining technology? Discuss issues to consider during Data Mining? What defines a Data Mining Task Explain at least 5 primitives? What is knowledge discovery? Explain the motivating challenges in development of data mining. Explain with example the data mining tasks What is a data? What do you mean by quality of data? What is a data set? Explain the various types of data sets What is data preprocessing? Explain the following i. Aggregation ii. Sampling iii. Dimensionality reduction iv. Feature subset selection v. Feature creation vi. Discretization and binarization vii. Variable transformation Give example Explain the similarity and dissimilarity between 2 objects What is Ecludian distance? Write the generalized Minkowski distance metric for various values r. Explain the properties of Ecludian distance. What is simple matching coefficients and Jaccard coefficient? Explain with examples What is meant by cousine similarity? Explain with example. What is Bregman divergence? What are the issues related to proximity measures? Discuss on selection on right proximity measures

5 5 5 5 5 5 10 4 10 5 marks each

6 8 6 8 6 5 10 7

Unit 4: 1. 2. 3. 4.

What is Apriori algorithm? Explain the association rule Mining? What is more efficient method for Generalizing association rule explain? Suppose that the following table is derived by attribute-oriented induction. Class

Birth_place Canada others Canada others

Programmer DBA

5 5 5

count 180 120 20 80

a.

5.

Transform the table into crosstab showing the associated t-weights and dweights. b. Map the class Programmer into a (bidirectional)Quantitative descriptive rule, for example, VX, Programmer(X) (birth_place (X)<=>”Canada”^…) [t:x%, d:y%]…V(…) [t:w%,d:z%]. Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20,21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 36, 40, 45, 46, 52, 70.

10

a. b. c. B.E 7

th

10

What is the mean of the data? What is the median? What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal, etc.) What is the midrange of the data?

Semester Information Science

2

PESIT SOUTH CAMPUS d. e. f. g. A database has

6.

Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data? Give the five-number summary of the data. Show a boxplot of the data. How is a quantile-quantile plot different from a quantile plot? four transactions. Let min_sup=60% and min_conf=80%

TID 100 200 300 400

date 10/15/99 10/15/99 10/19/99 10/22/99

10

items_bought {K, A, B, D} {D, A, C, E, B} {C, A, B, E} {B, A, D}

a.

Find all frequent items using apriori & FP-growth, respectively. Compare the efficiency of the two meaning process. b. List all of the strong association rules (with support s and confidence c) matching the following metarule where X is a variable representing customers, and item i denotes variables representing items (e.g., “A”, “B”, etc.): Vx Є transactions, buys(X,item1) ^ buys(X,item2) =>buys(X,item3)[s,c] Prove that each entry in the following table correctly characterizes its corresponding rule constraint for frequent item set mining

7.

Rule Constraint a. b. c. d. e.

vЄS SCV min(S)≤v range(S) ≤v variance(S) ≤v

Antimonotone

Monotone

No Yes No Yes convertible

Yes No Yes No convertible

10

Succinct Yes Yes Yes No No

Unit-5 and 6 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

Define classification. Explain the purposes of using a classification model Explain the general approach for building a classification model. What is a decision tree? How a decision tree works? Explain Hunts algorithm for inducing decision trees What are the various methods for expressing attribute test conditions? Explain with examples Explain the measures that can be used to determine the best way to split the record. Explain decision tree induction algorithm What are the various characteristics of decision tree induction? Explain the rule based classifier with an example Explain how a rule based classifier works with a suitable example Discus rule based ordering scheme and class based ordering scheme Explain the direct methods of extracting the classification rules Explain the indirect methods for rule extraction What are the characteristics of rule based classifiers Explain the Nearest-Neighbor classifier Discus the k-nearest neighbor classification algorithm Explain the characteristics of Nearest-Neighbor classifiers

6 10 10 10 12 12 10 12 5 6 10 8 8 10 6 8 8

Unit 7 1. 2. 3. 4. 5.

6.

How do you compute dissimilarities in variables? What is clustering briefly describe the following approaches to clustering methods, partition method, model base method? Why is Outlier Mining important? Explain statistical based, distance based, deviation based outlier detection ? Briefly outline how to compute the dissimilarity between object described by the following types of variables: a. Asymmetric binary variables b. Normal variables c. Ratio-scaled variables d. Numerical (interval-scaled) variables Given the following measurement for the variable age: 18, 22, 25, 42, 28, 43, 33, 35, 56, 28 Standardize the variables by the following: a. Compute the mean absolute deviation for age. b. Compute the Z-score for the first four measurements.

B.E 7th Semester Information Science

5 5 5 5

3

PESIT SOUTH CAMPUS Unit 8 1. 2. 3. 4. 5. 6. 7. 8.

9.

What is spatial data mining? What is multimedia Data Mining? What is Web usage Mining? What are the differences between no coupling, loose coupling, semi tight coupling & tight coupling? Difference between row scalability & column scalability? Difference between direct query answering & intelligent query answering with an example? What are the trends in Data Mining? Suppose that you are in the market top purchase Data Mining System. a. Regarding the coupling of a Data Mining System with a database and/or data warehouse system, what are differences between no coupling, loose coupling, semi tight coupling, & tight coupling? b. What is the difference between row scalability & column scalability? c. Which feature (S) from those listed above would you look for when scaling a Data Mining system? General-purpose computers & domain-independent relation database system have become a large market in the last several decades. However, many people feel that generic Data Mining Systems will not prevail in the Data Mining Market. What do you think? for Data Mining should we focus our efforts on developing domain-independent Data Mining tools or on developing domain-specific Data Mining Solutions? Present your reasoning.

B.E 7th Semester Information Science

5 5 5 5 5 5 5 10

10

4