Intelligent Data Analysis
,Michael Berthold • David J. Hand (Eds.)
Intelligent
Data Analysis
An Introduction
2nd revised and extended Edition
With 140 Figures, 22 in color and 50 Tables
Sprin ger
,Editors
Michael Berthold
Universitat Konstanz
FB Informatik und Informationswissenschaft
78457 Konstanz
Germany
David J. Hand
Department of Mathematics
Imperial College
Huxley Building
180 Queen's Gate
London, SW7 2BZ
UK
Library of Congress Control Number: 2003041211
ACM Computing Classification (1998): 1.2, H.3, G.3,1.5.1,1.4, J.2, J.l, J.3, F.4.1, F.l
corrected second printing 2007
ISBN-10 3-540-43060-1 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-43060-5 Springer Berlin Heidelberg New York
ISBN-10 3-540-65808-4 1. Edition Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material
is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broad-
casting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of
this publication or parts thereof is permitted only under the provisions of the German Copyright Law
of September 9, 1965, in its current version, and permission for use must always be obtained from
Springer. Violations are liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science-hBusiness Media
springer.com
© Springer-Verlag Berlin Heidelberg 1999,2003 and 2007
The use of general descriptive names, registered names, trademarks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
Typesetting: by the Editors
Production: LE-TjX lelonek, Schmidt & Vockler GbR, Leipzig
Cover: KiinkelLopka, Heidelberg
Printed on acid-free paper 45/3100YL - 5 4 3 2 1 0
, Preface to the Second Edition
We were pleasantly surprised by the success of the first edition of this book.
Many of our colleagues have started to use it for teaching purposes, and feed-
back from industrial researchers has also shown that it is useful for practitioners.
So, when Springer-Verlag approached us and asked us to revise the material for a
second edition, we gladly took the opportunity to rearrange some of the existing
material and to invite new authors to write two new chapters. These additional
chapters cover material that has attracted considerable attention since the first
edition of the book appeared. They deal with kernel methods and support vec-
tor machines on the one hand, and visualization on the other. Kernel methods
represent a relatively new technology, but one which is showing great promise.
Visualization methods have been around in some form or other ever since data
analysis began, but are currently experiencing a renaissance in response to the
increase, in numbers and size, of large data sets. In addition the chapter on rule
induction has been replaced with a new version, covering this topic in much more
detail.
As research continues, and new tools and methods for data analysis continue
to be developed, so it becomes ever more difficult to cover all of the important
techniques. Indeed, we are probably further from this goal than we were with the
original edition - too many new fields have emerged over the past three years.
However, we believe that this revision still provides a solid basis for anyone
interested in the analysis of real data.
We are very grateful to the authors of the new chapters for working with
us to an extremely tight schedule. We also would hke to thank the authors of
the existing chapters for spending so much time carefully revising and updating
their chapters. And, again, all this would not have been possible without the
help of many people, including Olfa Nasraoui, Ashley Morris, and Jim Farrand.
Once again, we owe especial thanks to Alfred Hofmann and Ingeborg Mayer
of Springer-Verlag, for their continued support for this book and their patience
with various delays during the preparation of this second edition.
November 2002
South San Francisco, CA, USA Michael R. Berthold
London, UK David J. Hand
,Michael Berthold • David J. Hand (Eds.)
Intelligent
Data Analysis
An Introduction
2nd revised and extended Edition
With 140 Figures, 22 in color and 50 Tables
Sprin ger
,Editors
Michael Berthold
Universitat Konstanz
FB Informatik und Informationswissenschaft
78457 Konstanz
Germany
David J. Hand
Department of Mathematics
Imperial College
Huxley Building
180 Queen's Gate
London, SW7 2BZ
UK
Library of Congress Control Number: 2003041211
ACM Computing Classification (1998): 1.2, H.3, G.3,1.5.1,1.4, J.2, J.l, J.3, F.4.1, F.l
corrected second printing 2007
ISBN-10 3-540-43060-1 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-43060-5 Springer Berlin Heidelberg New York
ISBN-10 3-540-65808-4 1. Edition Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material
is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broad-
casting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of
this publication or parts thereof is permitted only under the provisions of the German Copyright Law
of September 9, 1965, in its current version, and permission for use must always be obtained from
Springer. Violations are liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science-hBusiness Media
springer.com
© Springer-Verlag Berlin Heidelberg 1999,2003 and 2007
The use of general descriptive names, registered names, trademarks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
Typesetting: by the Editors
Production: LE-TjX lelonek, Schmidt & Vockler GbR, Leipzig
Cover: KiinkelLopka, Heidelberg
Printed on acid-free paper 45/3100YL - 5 4 3 2 1 0
, Preface to the Second Edition
We were pleasantly surprised by the success of the first edition of this book.
Many of our colleagues have started to use it for teaching purposes, and feed-
back from industrial researchers has also shown that it is useful for practitioners.
So, when Springer-Verlag approached us and asked us to revise the material for a
second edition, we gladly took the opportunity to rearrange some of the existing
material and to invite new authors to write two new chapters. These additional
chapters cover material that has attracted considerable attention since the first
edition of the book appeared. They deal with kernel methods and support vec-
tor machines on the one hand, and visualization on the other. Kernel methods
represent a relatively new technology, but one which is showing great promise.
Visualization methods have been around in some form or other ever since data
analysis began, but are currently experiencing a renaissance in response to the
increase, in numbers and size, of large data sets. In addition the chapter on rule
induction has been replaced with a new version, covering this topic in much more
detail.
As research continues, and new tools and methods for data analysis continue
to be developed, so it becomes ever more difficult to cover all of the important
techniques. Indeed, we are probably further from this goal than we were with the
original edition - too many new fields have emerged over the past three years.
However, we believe that this revision still provides a solid basis for anyone
interested in the analysis of real data.
We are very grateful to the authors of the new chapters for working with
us to an extremely tight schedule. We also would hke to thank the authors of
the existing chapters for spending so much time carefully revising and updating
their chapters. And, again, all this would not have been possible without the
help of many people, including Olfa Nasraoui, Ashley Morris, and Jim Farrand.
Once again, we owe especial thanks to Alfred Hofmann and Ingeborg Mayer
of Springer-Verlag, for their continued support for this book and their patience
with various delays during the preparation of this second edition.
November 2002
South San Francisco, CA, USA Michael R. Berthold
London, UK David J. Hand