|
Plenary Lecture
Classification Methods for
Bibliomining

Prof. Ioana Moisil
Department of Computer Science and Automatic
Control,
Hermann Oberth Faculty of Engineering
Lucian Blaga University of Sibiu
Blvd. Victoriei 10
550024 Sibiu
ROMANIA
Email: ioana.moisil@ulbsibiu.ro
Website: http://csac.ulbsibiu.ro
Abstract: Advances in information technology are having an
important impact on library systems. Large collections of heterogeneous data,
from ancient manuscripts to sounds, videos and spatial data are now available in
electronic format. Digital libraries are capturing human knowledge and
distributing it over the web. The increasing volume of data in today digital
repositories and library data warehouses has determined a wide use of
computer-based sophisticated analysis techniques. Special operation of data
mining can be performed in order to answer questions of librarians and
researchers in information science. In 2003, S. Nicholson and J. Stanton
introduced a new term – bibliomining - for data mining library systems.
Therefore bibliomining is a large umbrella incorporating all data mining methods
based on mathematics, statistics, operational research, machine learning,
evolutionary computing, visualization techniques, and including traditional
methods of analysing groups of bibliographic references as authorship,
publications, and literature, specific to bibliometrics. Librarians and
researchers in information science are mining library data warehouses and other
library data collections in order to discover patterns and to understand library
users’ behaviour, their information and services needs, but also in order to
evaluate and predict the effectiveness of library services, to discover trends
in queries and to identify hot topics. Classification of items based on their
characteristics (features, attributes, properties) in pre-defined categories is
one of the most important bibliomining tasks. Classification is defined as the
ordering of items in pre-defined groups (categories) or classes, based on their
similarity. The classification process consists in assigning one of k labels (or
classes) to each of n items derived from a specific problem. Classification
predicts categorical labels. Analysis goal is to find a classification, a model
or profile for each class that optimizes a combinatorial function consisting of
assignment costs, based on the individual choice of label made for each item,
and separation costs - based on the pair of choices made for two related items.
In machine learning classification is defined as supervised learning.
Classification, as a bibliomining technique, can be used for finding hidden
patterns in data by deciding to what pre-defined class to assign a record of the
data set, and also in prediction, to predict group membership for data
instances. This lecture describes the most important classification methods
(traditional approaches as classification trees, discriminant analysis,
generalized linear models, modern statistical machine learning algorithms,
support vector machine, belief networks, Gaussian processes, neural network,
evolutionary algorithms, swarm intelligence, boosting and ensemble) and their
use in mining library data collections. Research questions regarding
pre-processing operations, attribute relevance and classifiers’ performance will
also be discussed with emphasis on the specificity of the library items to be
classified.
Brief Biography of the Speaker:
Ioana Moisil received the M.Sc. in Mathematics at the University of Bucharest,
in 1971, the scientific grade in Statistical, Epidemiological and Operation
Research Methods Applied in Public Health and Medicine at the Universite Libre
de Bruxelles, in Belgium, in 1991 and the Ph.D. in Mathematics at the Romanian
Academy in 1997. Work places: the National Institute for Research & Development
in Informatics - I.C.I (1971-1986), Carol Davila Faculty of Medicine Bucharest –
department of Biophysics, CCSSDM Center of the Ministry of Health. At present
she is a full-time Professor and a Senior Researcher at the Department of
Computer Science and Automatic Control – Faculty of Engineering at the “Lucian
Blaga” University of Sibiu. She is the author/co-author of fourteen books and
over 150 scientific papers. Her scientific interests include intelligent
systems, healthcare telematics, web technologies, data-mining, e-learning,
modelling and simulation, uncertainty management, human-computer interaction.
Professor Moisil participated in several EU funded projects as project manager
for the national partner (Telenurse ID ENTITY, MGT, PROPRACTITION, PRO-ACCESS),
in Tempus projects and in national funded projects as research manager and
software development coordinator (INFOSOC – eUNIV, AMTRANS – eCASTOR, INFOSOC -
e-Scribe, INFOSOC – DANTE, e-EDU-Quality, eTransMobility, CNCSIS 2007-code 33,
Studies on multivariate interpolation, polinomial classifiers and applications,
CNCSIS 2007 – cod 1502, Aspects concerning the psycho-cognitive abilities of
artificial intelligent agents and applications in ITC based education). Ioana
Moisil is a member of EARLI (European Association for Research in Learning and
Instruction), she is Romanian representative in the IMIA SIG and EFMI WG5
Nursing Informatics, honorary member of the Bohemian Medical Association
J.E.Purkyne of Bio-engineering and Medical Informatics, member of the ISCB –
International Society for Clinical Biostatistics – Romanian National Group, of
the Romanian Association of Engineers, member of the IITM- International
Institute of Tele-Medicine and of the Romanian Society of Mathematics Sciences.
She is vice-president of the Romanian Medical Informatics Society;
vice-president of the HIT Foundation for Health Informatics and Telematics and a
member of RoCHI-ACM. Professor Moisil is taking part in several international
peer-review committees and conferences scientific boards.
|