LCM of 3 Numbers Calculator LCM of 4 Numbers Calculator LCM of 5 Numbers Calculator LCD Calculator How to find LCM LCM and HCF Questions Least Common Denominator Calculator Greatest Common Divisor Calculator HCF Calculator HCF of 3 Numbers Calculator HCF of 4 Numbers Calculator GCD Calculator Greatest Common Factor Calculator Least Common Multiple Calculator Common Factors Calculator

300+ TOP DATA MINING Multiple Choice Questions and Answers

DATA MINING Multiple Choice Questions :-

1. The problem of finding hidden structure in unlabeled data is called
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
Ans: B

2. Task of inferring a model from labeled training data is called
A. Unsupervised learning
B. Supervised learning
C. Reinforcement learning
Ans: B

3. Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of
A. Supervised learning
B. Data extraction
C. Serration
D. Unsupervised learning
Ans: D

4. Self-organizing maps are an example of
A. Unsupervised learning
B. Supervised learning
C. Reinforcement learning
D. Missing data imputation
Ans: A

5. You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of
A. Supervised learning
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
Ans: A

6. Assume you want to perform supervised learning and to predict number of newborns according to size of storks’ population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of
A. Classification
B. Regression
C. Clustering
D. Structural equation modeling
Ans: B

7. Discriminating between spam and ham e-mails is a classification task, true or false?
A. True
B. False
Ans: A

8. In the example of predicting number of babies based on storks’ population size, number of babies is
A. outcome
B. feature
C. attribute
D. observation
Ans: A

9. It may be better to avoid the metric of ROC curve as it can suffer from accuracy paradox.
A. True
B. False
Ans: B

10. which of the following is not involve in data mining?
A. Knowledge extraction
B. Data archaeology
C. Data exploration
D. Data transformation
Ans: D

DATA MINING Questions

11. Which is the right approach of Data Mining?

A. Infrastructure, exploration, analysis, interpretation, exploitation
B. Infrastructure, exploration, analysis, exploitation, interpretation
C. Infrastructure, analysis, exploration, interpretation, exploitation
D. Infrastructure, analysis, exploration, exploitation, interpretation
Ans: A

12. Which of the following issue is considered before investing in Data Mining?
A. Functionality
B. Vendor consideration
C. Compatibility
D. All of the above
Ans: D

13. Adaptive system management is
A. It uses machine-learning techniques. Here program can learn from past experience and adapt themselves to new situations
B. Computational procedure that takes some value as input and produces some value as output.
C. Science of making machines performs tasks that would require intelligence when performed by humans
D. none of these
Ans: A

14. Bayesian classifiers is
A. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
B. Any mechanism employed by a learning system to constrain the search space of a hypothesis
C. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
D. None of these
Ans: A

15. Algorithm is
A. It uses machine-learning techniques. Here program can learn from past experience and adapt themselves to new situations
B. Computational procedure that takes some value as input and produces some value as output
C. Science of making machines performs tasks that would require intelligence when performed by humans
D. None of these
Ans: B

16. Bias is
A.A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory
B. Any mechanism employed by a learning system to constrain the search space of a hypothesis
C. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
D. None of these
Ans: B

17. Background knowledge referred to
A. Additional acquaintance used by a learning algorithm to facilitate the learning process
B. A neural network that makes use of a hidden layer
C. It is a form of automatic learning.
D. None of these
Ans: A

18. Case-based learning is
A. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
B. Any mechanism employed by a learning system to constrain the search space of a hypothesis
c. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
D. None of these
Ans: C

19. Classification is
A. A subdivision of a set of examples into a number of classes
B. A measure of the accuracy, of the classification of a concept that is given by a certain theory
C. The task of assigning a classification to a set of examples
D. None of these
Ans: A

20. Binary attribute are
A. This takes only two values. In general, these values will be 0 and 1 and .they can be coded as one bit
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
Ans: A

21. Classification accuracy is
A. A subdivision of a set of examples into a number of classes
B. Measure of the accuracy, of the classification of a concept that is given by a certain theory
C. The task of assigning a classification to a set of examples
D. None of these
Ans: B

22. Biotope are
A. This takes only two values. In general, these values will be 0 and 1
and they can be coded as one bit.
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
Ans: B

23. Cluster is
A. Group of similar objects that differ significantly from other objects
B. Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm
C. Symbolic representation of facts or ideas from which information can potentially be extracted
D. None of these
Ans: A

24. Black boxes are
A. This takes only two values. In general, these values will be 0 and 1
and they can be coded as one bit.
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
Ans: C

25. A definition of a concept is if it recognizes all the instances of that concept
A. Complete
B. Consistent
C. Constant
D. None of these
Ans: A

26. Data mining is
A. The actual discovery phase of a knowledge discovery process
B. The stage of selecting the right data for a KDD process
C. A subject-oriented integrated time variant non-volatile collection of data in support of management
D. None of these
Ans: A

27. A definition or a concept is if it classifies any examples as coming within the concept
A. Complete
B. Consistent
C. Constant
D. None of these
Ans: B

28. Data independence means
A. Data is defined separately and not included in programs
B. Programs are not dependent on the physical attributes of data.
C. Programs are not dependent on the logical attributes of data
D. Both (B) and (C).
Ans: D

29. E-R model uses this symbol to represent weak entity set?
A. Dotted rectangle
B. Diamond
C. Doubly outlined rectangle
D. None of these
Ans: C

30. SET concept is used in
A. Network Model
B. Hierarchical Model
C. Relational Model
D. None of these
Ans: D

31. Relational Algebra is
A. Data Definition Language
B. Meta Language
C. Procedural query Language
D. None of the above
Ans: C

32. Key to represent relationship between tables is called
A. Primary key
B. Secondary Key
C. Foreign Key
D. None of these
Ans: C

33. ________ produces the relation that has attributes of Ri and R2
A. Cartesian product
B. Difference
C. Intersection
D. Product
Ans: A

34. Which of the following are the properties of entities?
A. Groups
B. Table
C. Attributes
D. Switchboards
Ans: C

35. In a relation
A. Ordering of rows is immaterial
B. No two rows are identical
C. (A) and (B) both are true
D. None of these
Ans: C

36. Inductive logic programming is
A. A class of learning algorithms that try to derive a Prolog program from examples
B. A table with n independent attributes can be seen as an n-dimensional space
C. A prediction made using an extremely simple method, such as always predicting the same output
D. None of these

37. Machine learning is
A. An algorithm that can learn
B. A sub-discipline of computer science that deals with the design and implementation of learning algorithms
C. An approach that abstracts from the actual strategy of an individual algorithm and can therefore be applied to any other form of machine learning.
D. None of these

38. Projection pursuit is
A. The result of the application of a theory or a rule in a specific case
B. One of several possible enters within a database table that is chosen by the designer as the primary means of accessing the data in the table.
C. Discipline in statistics that studies ways to find the most interesting projections of multi-dimensional spaces
D. None of these

39. Node is
A. A component of a network
B. In the context of KDD and data mining, this refers to random errors in a database table.
C. One of the defining aspects of a data warehouse
D. None of these

40. Statistical significance is
A. The science of collecting, organizing, and applying numerical facts
B. Measure of the probability that a certain hypothesis is incorrect given certain observations.
C. One of the defining aspects of a data warehouse, which is specially built around all the existing applications of the operational data
D. None of these

41. Multi-dimensional knowledge is
A. A class of learning algorithms that try to derive a Prolog program from examples
B. A table with n independent attributes can be seen as an n-dimensional space
C. A prediction made using an extremely simple method, such as always predicting the same output.
D. None of these

42. Noise is
A. A component of a network
B. In the context of KDD and data mining, this refers to random errors in a database table.
C. One of the defining aspects of a data warehouse
D. None of these

43. Query tools are
A. A reference to the speed of an algorithm, which is quadratically dependent on the size of the data
B. Attributes of a database table that can take only numerical values.
C. Tools designed to query a database.
D. None of these

44. Operational database is
A. A measure of the desired maximal complexity of data mining algorithms
B. A database containing volatile data used for the daily operation of an organization
C. Relational database management system
D. None of these

45. Prediction is
A. The result of the application of a theory or a rule in a specific case
B. One of several possible enters within a database table that is chosen by the designer as the primary means of accessing the data in the table.
C. Discipline in statistics that studies ways to find the most interesting projections of multi-dimensional spaces.
D. None of these