Filter By Category:
The success of any research project depends on its ability to bring results to the marketplace.
Artificial intelligence on a learning curve
Artificial intelligence systems hold real importance in modern society, and researchers are striving to improve them further. Marcello Pelillo of the SIMBAD project explains how the development of a similarity-based approach will change the machine learning and pattern recognition domains..
Enabling the automatic discovery of regularities in data through computer algorithms is a prime goal of researchers in pattern recognition, and while the feature-based approach has long been applied in the field, there is a growing awareness of the importance of similarity information per se. This is recognised as central to human intelligence; identifying similarities allows us to classify and rank objects, while the ability to extract patterns from data plays a similarly important role. “When we look in the sky at night we don’t just see stars, we also see constellations,” points out Marcello Pelillo of the SIMBAD project.
However, it is the feature-based approach which has historically been used in the machine learning – in which researchers aim to devise algorithms which mimic human learning abilities – and pattern recognition domains. “Within the feature based framework we assume that the objects to be classified, or the data to be processed, are represented in terms of a collection of numbers, a collection of features,” explains Pelillo. “For example we might say that a particular face has blue eyes and blond hair, so we extract features from objects. From a mathematical standpoint this means that the objects are described in terms of a collection of numbers – in the classical framework these numbers are then used to classify the objects.”
Under the feature-based approach an object can be considered as a point in a high-dimensional space, where similarities between the objects to be classified are derived from the distances between the corresponding points. This approach allows researchers to use powerful mathematical tools. “With the feature based approach the space where the objects lie is a very convenient space, which we call a Euclidean space, where the distance between two points is measured as the length of the straight line joining them,” says Pelillo.
However, in the real world Euclidean spaces (or generalisations thereof, such as ‘metric spaces’) are rare, encouraging Pelillo and his colleagues to adopt a radical approach. “We aim to go beyond the notion of features, because there are several real-world problems where it’s difficult to extract the right features, and it’s difficult to describe objects in terms of a mere collection of numbers. There are more complicated ways to describe objects – sometimes the data lies in a very high dimensional space, or some features are missing or, more interestingly, complex objects are described in terms of parts and relations between parts (think of a human body and its decomposition into head, torso, arms, legs, etc.), and there is no general way to reduce these ‘relational’ representations to a list of numbers. Yet, in these cases it is usually possible to say, without any notion of a feature, whether two objects are similar,” he says.
“For example, if I show you pictures of two faces, even for just five seconds, then you will probably be able to tell me whether they are similar or not. But if I asked you whether the guy on the left has brown eyes, or blond hair, then I suspect you wouldn’t be able to answer me.”
Evidence suggests people are often able to judge similarity between objects without being able to specify meaningful features. Replicating this ability in information systems could lead to the development of machines capable of responding to external stimuli. “We are studying how machines can learn and categorise information – how they can form classes, abstract patterns and extract regularities in data, without relying explicitly on any notion of feature,” outlines Pelillo.
The SIMBAD project’s agenda includes theoretical, computational and application centred research, part of the wider goal of developing improved artificial systems. “We are working in spaces without metricity – so, for example, it’s no longer true that the shortest path between point A and point B is a straight line. This undermines the very foundations of classical pattern recognition and machine learning,” says Pelillo. “There are several application areas where the classical approach clearly doesn’t work. In particular we have in mind biomedical problems; although the classical feature based approach has been applied there, with some success in terms of assisting doctors in diagnosing conditions, doctors often feel more comfortable with a similarity-based approach. This is because they can say at a glance, for example, whether two MRI pictures are similar or not without being able to tell you anything about the features.”
This research has been prompted in large part by the complexities of contemporary society, and the importance placed on developing artificial systems like robots, smart devices and machines to help us negotiate it. Both machine learning and pattern recognition hold enormous relevance in this regard, and while they are separate fields, Pelillo says there is a significant overlap between the two. “When we say that a machine is ‘learning’ we are saying that the machine is trying autonomously to extract useful patterns (or regularities) from raw data. In this respect the machine is trying to perform pattern analysis, or pattern recognition,” he outlines.
Important application fields include computer vision and medical image analysis. “The raw data here could be just the pixels comprising an image or a video,” explains Pelillo. “The problem for the machine is to tell me whether the picture – of which it has only the pixels – is for example the face of Barack Obama, or a friend of mine. It must tell me this just from the raw data. In this case the raw data is just a huge array of numbers, and these numbers are represented by pixels.”
Pelillo says the project overall aims to establish a paradigm shift in the machine learning and pattern recognition domains. “This is the first systematic attempt to attack pattern recognition and machine learning problems from a purely similarity-based perspective,” he stresses. This work is at an early stage, and it has not been possible to definitively answer all the questions raised; nevertheless there have been some tangible advances. SIMBAD has been able to characterise, from a theoretical standpoint, the issues related to non-metricity, and to develop powerful novel techniques using ideas from differential geometry, information theory and game theory. The project’s work also holds great relevance to the biomedical field. “We are using our techniques to look at two important problems in biomedical image analysis. The first concerns the diagnosis of renal cell carcinoma. We are doing this in collaboration with University Hospital Zurich,” outlines Pelillo. “We are also using our techniques to assist doctors at Verona hospital in the analysis and diagnosis of psychosis, such as schizophrenia or bi-polar disorders. In these applications the problem is image analysis. We are provided with images – this could be tissue microarray (TMA) images or other kinds of data – and the problem is to detect say cancerous cell nuclei. Essentially it’s a shape analysis problem.”
This kind of application-centred work forms a key part of the project’s future agenda. The SIMBAD consortium combines expertise from a range of areas, and while theoretical research will not be neglected, there will be a greater focus on applications in the remaining part of the project. “More emphasis will be placed on the applications, simply because in the first two years we were focused more on techniques,” outlines Pelillo.
Potential applications include human machine interaction, vehicle control and traffic safety, and energy and communication systems. “We have a clear schedule for the next part of the project concerning the two major biomedical problems we’re addressing,” says Pelillo. “We have performed a deep theoretical analysis of similarity-based pattern recognition and machine learning, and provided new techniques and tools in the field. The preliminary results we got from the two main biomedical problems we are addressing are encouraging, and at the moment we think it represents a significant improvement on the classical, feature-based approach in terms of effectiveness. So we are confident that the project is proving successful on the theoretical side, on the algorithmic side, and on the practical side.”
Marcello Pelillo is presently a professor of computer science at Ca’ Foscari University, Venice, where he leads the Computer Vision and Pattern Recognition group. He held visiting research positions at Yale University, the University College London, McGill University, the University of Vienna, York University (UK), and the National ICT Australia (NICTA). He is a Fellow of the International Association for Pattern Recognition (IAPR) and a Senior Member of the Institute of Electrical and Electronics Engineers (IEEE). For further information please visit his site here.
Published: Monday, 13th December 2010