Thursday, October 31, 2013
Tuesday, October 22, 2013
Saturday, October 19, 2013
Question papers
Please go through the following question papers for the external examinations
Go through the APRIORI Algorithm & Decision Tree Induction algorithms
Question paper 1
Question Paper 2
Question Paper 3
Go through the APRIORI Algorithm & Decision Tree Induction algorithms
Question paper 1
Question Paper 2
Question Paper 3
Thursday, October 17, 2013
German Credit Data Set
1) Install WEKE then
2) Download the German credit data set, save the file with the .arff format then make the experiments as per your syllabus
German Credit Data Set
2) Download the German credit data set, save the file with the .arff format then make the experiments as per your syllabus
German Credit Data Set
Data Mining Lab
You people need to write the following index for the Data Mining Lab Record DM Record index
Data Mining Lab Record--(this is the sample record) Data Mining Lab Record
Data Mining Lab Record--(this is the sample record) Data Mining Lab Record
Saturday, October 5, 2013
Tuesday, September 24, 2013
Monday, September 23, 2013
Saturday, August 24, 2013
Data Warehousing & Data Mining bits
1. Which of the following is the most popularly available and rich
information repositories?
a. Temporal databases
b. Relational databases
c. Transactional databases d. spatial databases
2. Which of the following databases is used to store time-related data?
a. Spatial databases b. Text databases
c. Multimedia databases
d. Temporal databases
3. From a DWH perspective, data mining can be viewed as an
advanced stage of
a. On-Line Transaction Processing b. On-Line Data
Processing
c. On-Line Analytical Processing
d. On-Line Electronic Processing
4. A _ _ _ _ _ _ is a group of heterogeneous databases?
a. Time series databases
b. Object oriented databases
c. Legacy databases
d. Spatial databases
5. Spatial databases includes
a. Legacy databases
b. Time series databases
c. Satellite image databases
d. Temporal databases
6. Many people treat data mining as synonym for another popularly
used term a. Knowledge
Discovery in databases
b. knowledge inventory in databases
c. Knowledge acceptance in databases d. knowledge disposal
in databases.
7. A database is a collection of
a. Related data
b. Interrelated data
c. Irrelevant data d. Distributed data
8. A Relational database is a collection of
a. tables
b. events
c. attributes d. values
9. A _ _ _ _ _ _ _ is a repository of information collected from
multiple squares stored under a unified schema, and which usually resides at a
single site.
a. Data mining b. Database
c. Data warehouse
d. legacy databases
10. Which of the following databases is used to store image,
audio, and video data?
a. Heterogeneous databases b. Temporal databases
c. Legacy databases
d. Multimedia databases
11. What is the single dimensional association rule for the
following predicate notation, which in multidimensional association rule.
Contains(T, "computer") == contains(T, "software")
a. Computer == software
b. Software ==
computer
c. Software == computer
d. Computer == software
12. Which of the following analysis attempt to identify attributes
that do not contribute to the classification or prediction process?
a. Cluster analysis b. Outlier analysis
c. Relevance analysis
d. Evolution analysis
13. Which of the following is a summarization of the general
characteristics or features of a target class of data?
a. Data discrimination
b. Data characterization
c. Data compression
d. Meta data
14. _ _ _ _ _ _ _ is a comparison of the general features of
target class data objects with general features of objects from one or a set of
contrasting classes.
a. Data characterization
b. Data summarization
c. Data discrimination
d. Meta data
15. _ _ _ _ _ _ _ interestingness measures are based on user
beliefs in the data.
a. Objective
b. Descriptive c. Collective
d. Subjective
16. _ _ _ _ _ _ mining tasks characterize the general properties
of the data in the databases.
a. Descriptive
b. Predictive c. Metadata d. Data
17. _ _ _ _ _ mining tasks perform inference on the current data
in order to make predictions.
a. Descriptive b. Predictive c.
Data
d. Metadata
18. The derived model may be represented in the form of
a. ER model b. Flow chart
c. Decision trees
d. DFD
19. Which of the following is the classification of data mining
systems?
a. Summarization b. Visualization c.
Discrimination
d. Characterization
20. _ _ _ _ _ _ _ analysis describes and models regularities or
trends for objects whose behavior changes over time.
a. Data evolution
b. Cluster
c. Outlier
d. Summarization
21. Which of the following issues relation to the diversity of
database type?
a. Handling noisy or incomplete data
b. Incorporation of background knowledge
c. Handling of relational and complex types of data
d. Efficiency and scalability of data mining algorithms
22. Which of the following is not major issue in data mining?
a. Mining methodology and user interaction issues
b. Performance issues
c. Issues relating to the diversity of database types
d. Issues relating to the Measurement
23. Processing _ _ _ _ _ queries in operational databases would
substantially degrade the performance of operational tasks.
a. On-Line Transaction Processing b. On-Line Electronic
Processing
c. On-Line Data Processing
d. On-Line Analytical Processing
24. An _ _ _ _ _ _ System typically adopts either a star or snow
flake model and subject oriented database design.
a. On-Line Transaction Processing b. On-Line Electronic
Processing
c. On-Line Analytical Processing
d. On-Line Data Processing
25. The access patterns of an _ _ _ _ system consist mainly of
short, atomic transactions.
a. On-Line Analytical Processing
b. On-Line Transaction Processing
c. On-Line Electronic Processing d. On-Line Data
Processing
26. Which of the following approach requires complex information
filtering and integration processes and competes for resources with processing
at local sources?
a. Update-driven approach
b. Integrate-driven approach
c. Query-driven approach d. Data-driven approach
27. Mining different kinds of knowledge in databases is an issue
in
a. Performance issue
b. Mining methodology and user interaction issues
c. Diversity of database types issues d. time complexity
28. Pattern evolution is an issue related to
a. Mining methodology and user interaction issues
b. Performance issues
c. Issues relating to the diversity of database types d.
Issues relating to the Measurement
29. A DWH is a subject oriented, integrated, time- variant, and _
_ _ _ _ _ collection of data in support of management's decision-making
process.
a. Nonvolatile
b. Volatile
c. Disintegrated
d. Object- oriented
30. An _ _ _ system focuses mainly on the current data with in an
enterprise or department, without referring to historical data or data in
different organizations .
a. On-Line Analytical Processing
b. On-Line Data Processing
c. On-Line Electronic Processing
d. On-Line Transaction Processing
31. The basic characteristic of On-line Analytical Processing is
a. Informational processing
b. Operational processing c. Data processing
d. Data cleaning
32. Which of the following cuboid that holds the highest level of
summerization?
a. Cuboid
b. Base cuboid
c. Non-base cuboid
d. Apex coboid
33. _ _ _ _ _ _ _ _ _ _ is a visualization operation that rotates
the data axes in view in order to provide an alternative presentation of the
data
a. Rollup
b. Drill down
c. Pivot
d. Slice & dice
34. _ _ _ _ _ _ tables can be specified by users or experts, or
automatically generated and adjusted based on data distributions.
a. Fact
b. Summarized c.
Dimension d.
Relational
35. _ _ _ _ _ _ _ executes queries involving more than one fact
table
a. Drill-through b. Drill-across c.
Drill-down
d. Rotate
36. A _ _ _ _ _ allows data to be modeled and viewed in multiple
dimensions.
a. Meta data b.
Data cube c.
Database d. Fact table
37. The major difference between the snowflake and star schema
models is that the dimension tables of the snowflake model image kept in _ _ _
_form
a. Standard
b. De-normalized
c. Normalized
d. Multi dimensional
38. Which of the following is not a measure, which is based on the
kind of aggregation functions used.
a. Cumulative b.
Distributed c. Algebraic
d. Holistic
39. A concept hierarchy that is a total or partial order among
attributes in database schema is called a _ _ _ _ _ _ _ _ _ _ _ hierarchy.
a. Set-grouping b. Grouping
c. Decision
d. Schema
40. Which of the following focuses on socioeconomic applications?
a. Statistical database systems
b. Online Analytical Processing systems c. Spatial
database systems
d. Temporal database systems
41. A _ _ _ _ _ _ _ _ _ model consists of radial lines emanating
from a central point, where each line represents a concept hierarchy for a dimension
a. Cube net
b. Triangle net c. Square net d. Star net
42. Which of the following is constructed where the enterprise
warehouse is the sole custodian of all warehouse data. Which is then
distributed to the various dependent data marts.
a. Enterprise DWH
b. Two- tier DWH
c. Multi-tier DWH
d. Virtual warehouse
43. Which of the following is a Multi Dimensional Online
Analytical Processing?
a. Ess base
b. Database
c. Swiss base d. Red brick
44. The _ _ _ _ _ _ view includes fact tables and dimension tables.
a. DWH
b. Top-down
c. Data source
d. Business Query
45. Which of the following is a Hybrid OLAP server?
a. MS SQL server 1.0 b. MS SQL 5.0
c. MS SQL server 7.0
d. MS SQL server 3.0
46. ETL stands for
a. Evaluate, Transport and Link b. Extract Transfer and Load c. Error, Tracking and Load
d. Extract, Transient and Load
47. To architect the DWH, the major driving factor to support is
a. An inability to cope with requirements evolution b. Not
populating the warehouse
c. Day- to- day management of the warehouse
d. Supporting Online Transaction processing
48. A _ _ _ _ _ _ _ contains a subset of corporate-wide data that
is of value to a specific group of users.
a. Enterprise warehouse b. Virtual warehouse
c. Data warehouse
d. Data mart
49. A _ _ _ _ _ _ _ is a set of views over operational databases
a. Enterprise warehouse
b. Virtual warehouse
c. Data warehouse d. Data mart
50. What kind of the intermediate servers that stand in between a
relational back-end server and client front-end tools?
a. Hybrid OLAP servers
b. Multidimensional OLAP server c. Relational OLAP servers
d. Specialized SQL servers
51. Choose the _ _ _ _ _ _ _ _ _ that will populate each fact
table record a. Measures
b. Dimensions c. Grain
d. Business Process
53. Meta data repository contains
a. Operational meta data
b. Data irrelevant to system performance
c. The mapping from the DWH to the operational environment
d. Summarized data
54. Which of the following support the bitmap indices
a. Sybase IQ
b. Oracle 7 c. CoBoL
d. SQL
55. _ _ _ _ _ _ _ are created for the data names and definitions
of the given warehouse
a. Data cube
b. Summarized data
c. Meta data
d. Detailed Information
56. Chunking technique involves "overlapping" some of
the aggregation computations, it is referred to as _ _ _ _ _ aggregation in
data cube computation
a. Two way array
b. Three way array c. Multi way array d.
Sparse array
57. The _ _ _ _ _ _ _ operator computes aggregates over all
subsets of the dimensions specified in the operation.
a. Data base
b. Computer cube
c. Define cube
d. Group by
58. Which of the following is a subcuge that is small enough to
fit into the memory available for cube computation?
a. Bulk b. Array
c. Structure
d. Chunk
59. The bit mapped join indices method is an integrated form of
a. Composite join indexing and bitmap indexing b. Join
indexing and composite join indexing
c. Join indexing and bitmap indexing
d. Bitmap indexing and outer join indexing
60. A set of attributes in a relation schema that forms a primary
key for another relation schema is called a _ _ _ __ _ _
a. Primary key
b. Foreign key
c. Secondary key d. Composite key
61. Which of the following typically gathers data from multiple,
heterogeneous, and external sources?
a. Data cleaning b. Load
c. Refresh
d. Data extraction
62. OLAM is particularly important for the following reason
a. How quality of data in DWH
b. Data processing
c. OLTP-based exploratory data analysis
d. Online selection of data mining functions
63. Which of the following sets a good example for interactive
data analysis and provides the necessary preparations for exploratory data
mining?
a. OLP
b. OLAP c.
OLTP d. OLDP
64. Which of the following is not exception indicator?
a. Out Expb.
Self Exp c. In Exp
d. Path Exp
65. _ _ _ _ _ _ _ _ _ can help business managers find and reach
more suitable customers, as well as gain critical business insights that may
help to drive market share and raise profits.
a. Data warehouse
b. Data mining
c. Data summarization d. Data processing
66. _ _ _ _ _ _ _ _ _ _ _ is an alternative approach in which
pre-computed measures indicating data exceptions are used to guide the user in
the data analysis process at all levels of aggregation.
a. Hypothesis-driven exploration b. Inventory-driven
exploration
c. Discovery-driven exploration
d. Exception-driven exploration
67. Which of the following is an exception indicator that
indicates that indicates the degree of surprise of the cell value, relative to
other cells at the same level of aggregation?
a. Out Exp b. In Exp
c. Path Exp
d. Self Exp
68. _ _ _ _ _ is a powerful paradigm that integrates OLAP with
data mining technology.
a. Online Analytical Modeling b. Online Analytical Machine
c. Online Analytical Mining
d. Online Analytical Monitoring
69. Data warehouse application is _ _ _ _ _ _ _ _ _
a. Data Processing
b. Transaction Processing c. Datacube
d. Datamining
70. _ _ _ _ _ _ _ _ _ cubes compute complex queries involving
multiple dependent aggregates as multiple granularities
a. Multi feature
b. Data
c. Meta
d. Solid
71. Which of the following performs a linear transformation on the
original data?
a. Z-score normalization
b. Normalization with decimal scaling c. Zero-standard
deviation
d. Min-max normalization
72. Which of the following is the best method for missing values
in data cleaning?
a. Fill in the missing value manually
b. Use the most probable value to fill in the missing value
c. Use the attribute mean to fill the missing value
d. Use a global constant to fill in the missing value
73. The minimum and maximum values in a given bin are identified
as the
a. Bin means b. Bin average c. Bin medians
d. Bin boundaries
74. Which of the following is data transformation operation?
a. Normalization
b. Regression c. Clustering d. Binning
76. _ _ _ _ _ methods smooth a sorted data value by consulting in
neighborhoodie the values around it.
a. Clustering
b. Binning
c. Regression
d. Data reduction
77. Z-score normalization is also called as
a. Min-max normalization
b. Zero-standard deviation normalization
c. Zero-mean normalization
d. Normalization by decimal scaling
78. _ _ _ _ _ _ is a random error or variance in a measured
variable.
a. Bin
c. Noise
d. Regression
79. The data are consolidated into forms appropriate for mining is
called as
a. Data reduction
b. Data Redundancy c. Data clean
d. Data transformation
80. Which of the following is a decision tree algorithm?
a. C3.2 b.
ID3 c. PP2
d. DIM
81. If the tuples in D are grouped into M mutually disjoint
Clustering, then an simple random sample of m clusters can be obtained, where m
M which of the following suits the above sentence?
a. Stratified sample
b. SRS without replacement
c. Cluster sample
d. SRS with replacement
82. Multidimensional index trees include
a. A- trees b. T-trees c. P-trees d. R-trees
83. Which of the following strategy for data reduction is
irrelevant, weakly relevant, or redundant attributes may be detected and
removed?
a. Data cube aggregation b. Dimension reduction c. Data compression
d. Numerosity reduction
84. In database systems, _ _ _ _ _ are primarily used for
providing fast data access.
a. Red-black trees b. Game trees
c. Multidimensional index trees
d. splay trees
85. If the mining task is classification, and the mining algorithm
itself is used to determine the attribute subset,then this is called a _ _ _ _
_ _ approach.
a. Filter
b. Reduction c. Smoothing d. Wrapper
86. The discrete wavelet transformation is closely related to the
_ _ _ _ _ _ _transform.
a. Discrete fourier
b. Fourier c. Laplace d. wavelet
87. Principal components analysis is also called as
a. Karhunen-loeve method
b. Kinen-liva method
c. Kruskal-learn method d. Kutni-lara method
88. _ _ _ _ _ _ can be used as a data reduction technique since it
allows a large data set to be represented by a much smaller random subset of
the data.
a. Clustering b. Regression c. Histograms d. Sampling
89. Loy-linear models are a. Parametric methods
b. Discrete methods
c. Non-parametric methods d. Non- discrete methods
90. Which of the following method is the generation of concept of
hierarchies for categorical data?
a. Specification of a portion of a hierarchy by implicit data
grouping b. Specification of their partial ordering, but not
of a set of attributes
c. Specification of a set of attributes, but not of their partial
order
d. Specification of only a partial set of entities
91. Which of the following method uses class information?
a. Histogram analysis b. Binning
c. Cluster analysis
d. Entropy-based Discretization
92. _ _ _ _ _ _ _ _ _ hierarchies for categorical attributes or
dimensions typically involve a group of attributes
a. Diccretization b. Semantic
c. Index
d. Concept
93. Which of the following is based on the maximal asset values,
which may lead to a highly biased hierarchy?
a. Cluster analysis b. Segmentation c.
Binning
d. Histogram analysis
94. The _ _ _ _ _ can be used to segment numeric data into
relatively uniform, "natural" intervals.
a. 1-2-3 rule b. 2-3-4 rule c. 3-4-5 rule d.
4-5-6rule
95. _ _ _ _ _ _ _ _ hierarchies for numeric attributes can be
constructed automatically based on data distribution analysis
a. Concept
b. Discretization c. Tree
d. Index
96. _ _ _ _ _ _ _ techniques can be used to reduce the number of
values for a given continuous attribute, by dividing the range of the attribute
into intervals
a. Concept hierarchy
b. Discretization
c. Tree-based d. Index
97. A _ _ _ _ _ _ _ _ _ algoithm can be applied to partition data
into groups
a. Binning
b. Histogram
c. Clustering
d. Entropy-based
98. An information-based measure called _ _ _ _ can be used to
recursively partition the values of a numeric attribute A, resulting in a
hierarchical discretization.
a. Entropy
b. Cluster c. Binning
d. Segmentation
99. The kinds of knowledge include
a. Image analysis b. Query process c. Association
d. Multimedia analysis
100. Which of the following is a simplicity measure?
a. Rule strength b. Rule quality
c. Rule reliability
d. Rule length
101. _ _ _ _ _ _ hierarchies can be used to refine or enrich
schema defined hierarchies. When the two types of hierarchies are combined.
a. Schema
b. Set-grouping
c. Operation-derived d. rule-based
102. _ _ _ _ _ _ _ are those that contribute new information or
increased performance to the given pattern set.
a. Utility patterns
b. Certainty patterns
c. Novelty pattern
d. Simplicity patterns
103. Certainty factor is also known as
a. Rule length
b. Noice threshold c. Minable view
d. Rule strength
104. Which of the following primitive specifies the data mining
functions to be performed?
a. Task-relevant data
b. The kind of knowledge to be mined
c. Background knowledge
d. Interestingness measures
105. _ _ _ _ _ _ _ may be used to guide the mining process or,
after discovery to evaluate the discovered patterns.
a. Task-relevant data
b. The kind of knowledge to be mined c. Background
knowledge
d. Interestingness measures
106. A _ _ _ _ _ hierarchy is a total or partial order among attributes
in the database schema.
a. Schema
b. Set-grouping
c. Operation-derived d. rule-based
108. _ _ _ _ _ hierarchies include the decoding of information
encoded strings information extraction from complex data objects and data
clustering.
a. Rule-based
b. Operation-derived
c. Schema
d. Set grouping
110. Which of the following clause is the task-irrelevant data
primitive?
a. In relevance to
b. Use for warehouse c. Analysis
d. Order by
111. Mining with the use of _ _ _ _ , allows additional
flexibility for ad hoc rule mining.
a. Image patterns b. Data patterns
c. Information patterns
d. Meta patterns
112. Which of the following clause lists the attributes or
dimensions for exploration
a. Order by b. group by c. having
d. in relevance to
113. Which of the following clause uses the meta pattern?
a. Analyze
b. In relevance to
c. Matching
d. Use data warehouse
114. Which of the following clause is used for discrimination?
a. Mine characteristics b. Mine discriminant
c. Mine association
d. Mine comparison
115. DMQL expansion is
a. Data Modeling Queue Level
b. Design Modeling Query language
c. Data Mining Query Language
d. Data &Meta data Query Language
116. The _ _ _ _ _ clause, when used for characterization,
specific aggregate measures, such as count, sum or count.
a. Use database
b. Analyze
c. Matching
d. Use hierarchy
117. Which of the following clause specifies the condition by
which groups of data are considered relevant?
a. Having
b. Group by c. Order by d. analyze
118. The _ _ _ _ _ _ _ _ statement is used to specify the kind of
knowledge to be mined.
a. Knowledge-mine-specification
b. Mine-knowledge-specification
c. Knowledge-specification-mine d.
Specification-mine-knowledge
120. CRISP-DM addresses an issue as
a. Mapping from datamining problems to business issues b.
Capturing and misunderstanding the data
c. Disintegrating datamining results within the business
context
d. Deploying and maintaining data mining results
121. An Example of a set-grouping hierarchy is
a. Define hierarchy age-hierarchy for age as customer on
level1:{young, middleaged, serior} level10:all level2:{20 39}
level1: young level2:{20 59}
level1: middle-aged level2:{60 89} level1:senior
b. Define hierarchy age-hierarchy as age for customer on
level1:{young, middleaged, serior} level10:all level2:{20 39}
level1: young level2:{20 59}
level1: middle-aged level2:{60 89} level1:senior
c. Define hierarchy age-hierarchy for age on customer as
level1:{young, middle-aged,serior} level10:all level2:{2039} level1: young level2:{20
59} level1: middle-aged level2:{60 89} level1:senior
d. Define hierarchy age-hierarchy on age for customer as
level1:{young, middleaged, serior} level10:all level2:{20 39}
level1: young level2:{20 59}
level1: middle-aged level2:{60 89} level1:senior
122. Which of the following data mining language uses SQL-like
syntax and serves as rule generation queries for mining association rules.
a. MINE RULE operator b. RULE MINE operator c. DATA MINE operator d. DWH
operator
123. Which of the following is not a data mining language?
a. DMQL b. MSQL c. PSQL
d. OLE DB for
124. System of schema hierarchy is
a. textbf{Define hierarchy} location-hierarchy textbf{on} address
textbf{as} [street, city, country]
b. textbf{Define hierarchy} location-hierarchy textbf{as}
address textbf{on} [street, city, country]
c. textbf{Define hierarchy} location-hierarchy
textbf{from} address textbf{to}
[street, city, country]
d. textbf{Define hierarchy }location-hierarchy textbf{for}
address textbf{all} [street, city, country]
125. The DMQL statement syntax is
a. display as result _ from
b. display result _ from
c. display on result _ from d. display for result _ from
126. Which of the following is a data mining query language
a. PSQL b. QSQL c. MSQL d.
RSQL
127. _ _ _ _ _ is used for efficient implementations of a few
essential data mining primitives.
a. No coupling
b. Loose coupling c. Tight coupling
d. Semi tight coupling
128. _ _ _ _ _ _ _ is a compromise between loose and tight
coupling.
a. No coupling
b. Loose coupling c. Tight coupling
d. Semi tight coupling
129. Which of the following coupling schema is used to fetch data
from a data repository managed by database systems?
a. No coupling b. Loose coupling
c. Tight coupling
d. Semi tight coupling
130. A well designed data mining system should offer _ _ _ _ _ _ _
with a data warehouse system
a. Semi tight coupling
b. No coupling
c. Loose coupling d. Normal coupling
131. Which of the following is difficult to achieve high
scalability and good performance with large data sets?
a. No coupling
b. Tight coupling
c. Semi tight coupling
d. Loose coupling
132. _ _ _ _ _ _ _ _ means that a Data mining system will not
utilize any function of a data warehouse system
a. Loose coupling
b. Semi tight coupling c. Loose coupling
d. No coupling
133. _ _ _ _ _ _ _ _ means that a data mining system is smoothing
integrated coupling database system.
a. No coupling
b. Loose coupling
c. Tight coupling
d. Semi tight coupling
134. Which of the following provides a concise and succinct summarization
of the given collection of data?
a. Comparison
b. Characterization
c. Summerization
d. Aggregation
135. _ _ _ _ _ _ _ _ data mining describes the data set in a
concise and summerative manner and presents interesting general properties of
the data.
a. Descriptive
b. Predictive c. Active
d. Constructive
136. _ _ _ _ _ _ data mining analyzes the data in order to
construct one or a set of models and attempts to predict thebehavior of new
data sets.
a. Descriptive b. Predictive c.
Active
d. Constructive
137. Attribute removal is based on the following rule: If there is
a large set of distinct values for an attribute of the initial working relation
but,
a. There is generalization operator on the attribute
b. There is no generalization operand on the attribute
c. There is no generalization operator on the attribute
d. There is no aggregation operator on the attribute
138. On-line analysis processing in data warehouses is a
purely-controlled process
a. Machine b. database c. Developer
d. User
139. Which of the following approach is used to control
generalization process?
a. Generalized relation threshold control
b. Generalized class threshold control
c. Generalized dimension threshold control d. Generalized
query threshold control
140. Many current OLAP systems confine dimensions to _ _ _ _ _ _ _
_ _ _ data
a. Numeric
b. Non numeric
c. Meta
d. Summerized
141. _ _ _ _ _ _ _ is a process that abstracts a large set of
task-relevant data in a database from a relatively low
conceptual level to higher conceptual levels.
a. Data realization
b. Data characterization
c. Data summerization
d. Data generalization
142. The _ _ _ _ _ _ approach can be considered as a data
warehouse-based pre-computation-oriented, material view approach.
a. Object-oriented induction
b. Data cube
c. Attribute-oriented induction d. Data square
143. Which of the following approach is a relational database
query-oriented, generalization-based, on-line data analysis technique?
a. Attribute-oriented induction
b. object-oriented approach c. Data cube
d. Data square
144. _ _ _ _ _ _ _ _ performs off-line aggregation before an OLAP
or Data mining query is submitted for processing.
a. Object-oriented induction
b. Data cube
c. Attribute-oriented induction d. Data square
146. How can the t-weight and interestingness measures in general
be used by the data mining system to display only the concept descriptions that
it objectively evaluates as interesting?
a. By threshold
b. By generalization c. By comparison
d. By characterization
147. The data cube implementation of attribute-oriented induction
can be performed by
a. Using defined data cube
b. Using a predefined data cube c. Using a generalized data cube d. Using a quantified
data cube
148. A _ _ _ _ _ can be represented by a 3-D data cube.
a. Cross-tab
b. Bar chart
c. pie chart
d. Flow chart
149. Step one of the attribute-oriented-induction algorithm is
essentially a relational query to collect the task relevant data into the _ _ _
_ _ _ _ _ _ _ .
a. Prime relation
b. Secondary relation c. Working relation d.
Analyzing relation
150. Which of the following relation collects the statistics of attribute oriented induction
algorithm?
a. Working relation
b. Prime relation
c. Secondary relation d. Analyzingrealation
151. Descriptions can also be visualized in the form of _ _ _ _ _
_ _ _ .
a. Cross-ralations b. Cross-checks
c. Cross-boards
d. Cross-tabs
152. Step three of attribute-oriented-induction derives the _ _ _
_ _ _ _ relation.
a. Working
b. Prime
c. Secondary d. Analysing
153. The _ _ _ _ _ _ as an interestingness measure that describes
the typically of each disjoint in the rule, or of each tuple in the
corresponding generalized relation.
a. Quantitative rule
b. Quantitative characteristic rule c. c-weight
d. t-weight
154. The information gain is obtained by
a. Expected information + entropy
b. Entropy - Expected information
c. Expected information entropy
d. Entropy Expected information
155. The expected information needed to classify a given sample is
a. I(s1,s2----.sm)= mathop Sigma limits_{i = 1}n ( /s) (
/s)
b. I(s1,s2----.sm)= ( /s) ( /s)
c. I(s1,s2----.sm)= - mathop Sigma limits_{i = 1}n ( /s) (
/s)
d. I(s1,s2----.sm)=- mathop Sigma limits_{i = 1}n ( /s) ( /s)
156. Class comprarison is also called as
a. composition b. aggregation
c. discrimination
d. characterization
157. _ _ _ _ _ _ can be used to perform some preliminary relevance
analysis on the data by removing or generalizing attributes having a very large
number of distinct values.
a. Object-oriented induction
b. Attribute-oriented induction
c. Batch-oriented induction d. Class-oriented induction
158. Class characterization that includes the analysis of
attribute/dimensions relevance is called _ _ _ _ _ .
a.Analytical comparison
b. Analytical measurement
c. Analytical characterization
d. Analytical difference
159. _ _ _ _ _ _ _ irrelevant and weakly relevant attributes using
the selected relevance analysis measure.
a. Insert
b. Update c. Modify
d. Remove
160. The _ _ _ _ _ class is the class to be characterized
a. base
b. target
c. contrasting d. sub
161. The _ _ _ _ _ _ class is the set of comparable data that are
not in the target class.
a. base b. target
c. contrasting
d. sub
162. Generalization is performed on the _ _ _ _ _ _ _ _ to the
level controlled by a user or expert-specified dimension threshold, which
results in a _ _ _ __ _ _
a. Target class, Prime target class relation
b. Contrasting class, Prime contrasting class relation
c. Target class, Secondary target class relation
d. Contrasting class, Secondary contrasting class relation
163. Let be a generalized tuple, and be the target class, the
d-weight is defined as
a. d-weight =condition( ) / count( )
b. d-weight =condition( ) / mathop Sigma limits_{i = 1}m count( )
c. d-weight =condition( ) / count( )
d. d-weight =condition( ) / count( )
164. Can class comparison mining be implemented efficiently using
data cube techniques?
a. yes
b. no
c. limited d. difficult
165. Class discrimination is also called as
a. class comparison
b. class hierarchy
c. class aggregation d. class concept
166. The set of relevant data in the database is collected by
query processed and is partitioned respectively into a target class and one or
a set of _ _ _ _ _ class(es)
a. discrimination
b. contrasting c.
comparable d. target
167. The range for the d-weight is
a. b. c.
d.
168. A _ _ _ _ _ _ d-weight in the target class indicates that the
concept represented by the generalized tuple is primarily derived from the
target class
a. Low
b. High
c. Average d. Middle
169. A _ _ _ _ _ _ d-weight implies that the concept is primarily
derived from the contrasting class
a. Low
b. High
c. Average d. Middle
170. A quantitave discriminant rule for the target class of a
given comparison description is written in the form
a. x, target _ class(x) compare(x) [d: d-weight]
b. x, contrasting _ class(x) condition(x) [d: d-weight]
c. x, contrasting _ class(x) compare(x) [d: d-weight]
d. x, target _ class(x) condition(x) [d: d-weight]
171. In d-weight, d stands for
a. divide b. dead
c. discrimination
d. degree
172. Inter quartile is defined as
a. First quartile -Third quartile b. First quartile +
Third quartile c. Third quartile + First quartile
d. Third quartile - First quartile
173. One common rule of thumb for identifying suspected outliers
is to single out values falling at least _ _ _ _ _ __ above the third quartile
or below the first quartile.
a. b. c. d.
174. The most commonly used percentiles other the median are _ _ _
_ _ _
a. Outliers b. Boxplots c. Quartiles d.
Modes
175. A popularly used visual representation of a distribution is
the _ _ _ _ _ _ _ _
a. Boxplot
b. Outlier c. Quartile
d. Histogram
176. Dispersion is also called as
a. Mean
b. Variance
c. Median d. mode
177. Which of the following is central tendency measure?
a. Outliers b. Variance c. Quartiles
d. Mode
178. Which of the following is a data dispersion measure?
a. Mean
b. Variance
c. Mode
d. Median
179. The average of the largest and smallest values in a data set
is called as
a. Median b. Mean
c. Mid range
d. Mode
180. The _ _ _ _ _ _ _ _ for a set of data is the value that
occurs most frequently in the set.
a. Median b. Mean
c. Mid range
d. Mode
181. Which of the following is not central tendency measure?
a. Variance
b. Mean
c. Median d. Mode
182. A _ _ _ _ _ _ _ _ is one of the most effective graphical
methods or trend between two quantitative variables.
a. q-q plot
b. scatter plot c.
quantile plot d. q-q-q plot
183. A _ _ _ _ _ _ _ _ is another important exploratory graphic
aid that adds a smooth curve to a scatter plot in order to provide better
perception of the pattern of dependence.
a. Loess curve b.
Scatter curve c. Bar chat
d. Quantile plot
184. Histograms are also called as _ _ _ _ _ _ _ _ _ histograms.
a. frequency
b. variance c. quartile d. outlier
185. The word loess is short for
a. Load compression b. Local compression c. Load
refression
d. Local refression
186. A _ _ _ _ _ _ _ _ _ consists of a set of rectangles that
reflect the counts of the classes present in the given data.
a. Quartile plot b. q-q plot
c. Histogram
d. Loess curves
187. A _ _ _ _ _ _ is a simple and effective way to have a first
look at an unvariate data distribution.
a. q-q plot
b. scatter plot c. histogram
d. quantile plot
Subscribe to:
Posts (Atom)