Visual Information Retrieval
Machine (VIRM)
Content-Based Image
Retrieval on the World Wide Web
Lola Deutmeyer, Waka Waktola
Dr. David Eichmann, Project Director
Library and Information Science,
University of Iowa
Iowa City, IA, USA
July 31, 1998
Abstract
This paper presents some background information on visual information retrieval
(VIR) from the World Wide Web and on the Visual Information Retrieval Machine
(VIRM) designed specifically to operate as an image retrieval system.
This paper discusses the graphical user interface implemented by VIRM,
the process of fetching and clustering images, i.e. grouping similar images,
from the Web, and obstacles to a visual information retrieval machine.
It also proposes possibilities for the future development and improvement
of VIRM.
1 Introduction
The Visual Information Retrieval Machine (VIRM) research project evolved
from Web users' need to acquire specific information from the continuous
complex technology of the World Wide Web. Many search engines and
web crawlers now exist for the purpose of document retrieval based on a
specified keyword or words. However, there is very little assistance
available for Web users who wish to search for specific images on the Web,
and with the rapid growth of multimedia use on the Internet, image retrieval
engines are becoming necessary. The purpose of our project was to
develop an extension search engine that makes use of the capabilities of
the exisiting search engines to allow Internet users to directly perform
image inquiries on the World Wide Web. We were particularly interested
in discovering and implementing a graphical user interface for the VIRM
project that would allow users to search through numerous retrieved images
with relative ease.
2 What is VIRM?
The Visual Information Retrieval Machine
(VIRM) is a a Java-based image retrieval system for the World Wide Web.
VIRM is a branch of the Spider project. It is a meta-search engine
that interrogates other search engines for HTML documents based on a given
keyword. Upon receiving the related documents, VIRM crawls them for
image URLs of the form "*.jpg", "*.jpeg", and "*.gif". Our part of
the project was to develop the part of VIRM that is responsible for taking
these image URLs, retrieving the associated image, performing similarity
calculations on the the retrieved images, clustering them according to
our similarity measures, and displaying the resultant groups in a workable
graphical user interface.
2.1 What is Spider?
Spider is the parent of the VIRM project.
It is a Java program that creates and manipulates a relational database
of the Web graph, traversing links having patterns of "*.html" or "http://*".
Spider is a system that retrieves visual information URLs from the World
Wide Web and displays these URLs through the use of the Sulla program.
2.1.2 What is Sulla?
Sulla is a personal user agent for the World Wide Web (that currently runs
on the Spider package) with the following characteristics:
an ablility to acquire, retain, and act upon an interest profile
autonomous action on goals posed to it by its user
user progress appraisal and notification
ability to access a variety of information sources
ethical behavior, as characterized in Ethical Web Agents
2.2 Architecture of Spider
Figure 1 depicts the architecture of the Spider program. When a keyword
to search for is entered by a user, a lexer takes the "http://*" documents
from the Web, breaks them into their component words, and generates a list
of URLs contained in each document. The crawler is instantiated by
the SearchEngine file, and new URLs returned from the web search engines
are fed to the crawler. The crawler returns related documents, which
are sent both to the clusterer to be grouped with other documents and to
an archiver which stores the document to a database. The document,
once sent to the clusterer, generates cluster threads, which are then given
to a grapher. The grapher shows the clusters of similar documents
and shows the edges connecting the various items in each cluster.
These are displayed in the user's window. Spider is a text retrieval
program, and its architecture was the basis for the architecture of VIRM,
adapted for image retrieval.
Web
|
Lexer
|
Search Engine -----> Crawler <----- new URLs
|
Document
|
|
Clusterer Archiver
|
|
Cluster Threads Database
|
Grapher
|
Window
Figure 1. Architecture tree of Spider
3 Existing data on Visual Information
Retrieval (VIR)
We first started our project by obtaining
publications, references, and locations of image retrieval engines from
the web to find background information on visual information retrieval
and graphical user interfaces for VIR methods. We then characterized
the different GUIs we found in order to determine which interface would
work best for our project. We obtained valuable information from
IBM's QBIC (Query By Image Content),
UC Berkeley's Digital Library Project
and Virage's VIR (Visual Information Retrieval)
Engine. They have done related research and also had demonstration
image retrieval engines available on their web sites. Experimention
with these demonstrations prompted us to keep the following points in mind
when developing our Visual Image Retrieval Machine (VIRM).
1. Precision of the various
image colors.
2. Memory limits of image retrieval
3. Use of color, texture, structure,
histogram values, and threshold measurements of the images to determine
similarity.
4 Development of VIRM image retrieval and display
This section describes the process and steps we followed to build on the
Spider program to create VIRM once we finished our background research.
We started by acquainting ourselves with the Java methods used to acquire
images from given image URLs and learning to extract necessary information
about the images required to cluster them. Once this was completed,
we were able to focus our efforts on the graphical user interface for VIRM,
an important part of the VIRM project that gave us a means to be certain
that the image clustering algorithm was truly working as it should be.
4.1 Fetching images from the World Wide Web
Since the first phase of VIRM, sifting
HTML documents for image URLs, was already completed, our mission in this
project was to build a better image search engine based on the prototypes
set by IBM, UC Berkeley, and Virage. We created a simple Java
program that extends the Frame class, fetches images from specified URL
addresses, and displays the acquired images in a new frame. We then
modified the code to display the status of each image, the type of value
of the pixels in the image (i.e., whether they were bytes or integers),
and the image's width and height. We used the Image/J 0.70
package (a photoshop analysis page for image manipulation) and incorporated
a clone of its histogram class in our code in order to view the histogram
graph of the images we fetched. After some more modifications, we
were able to additionally display the red, green, and blue values of each
pixel to see if the color values were real. Each pixel starts out as a
32-bit value. Bits 0 -7 contain the value for the blue component,
bits 8 - 15 contain the green component, bits 16 - 23 contain the red component,
and bits 24 - 31 contain the alpha value, i.e., the level of transparency
of the pixel. The program we created stored the red, green, and blue
components of each pixel in an image to separate arrays. These components
were then compressed down to three bits each using a five bit shift to
the right and then stored to a new array. At this point, we were
able to eliminate the Image/J histogram graph code from our program and
create our own histogram algorithm. Our histogram was a simple array
record of the number of pixels of each color. Taking the compressed
RGB component values, we could construct the separate values into a single
9-bit integer which served as the index of the array. For the integer
construction, we simply used the equation:
new pixel color value = compressed red component
* 64 + compressed green component * 8 + compressed blue component
The new 9-bit color values were integers that ranged from 0 - 511, so we
used these values as the index of the histogram array. The value
of histogram[index] was simply incremented every time another pixel of
a color value equal to index was counted. With this histogram, we
were then able to do image similarity comparisons and clustering.
4.2 Clustering the images
In our part of the VIRM project, we were not directly involved in the creation
of the image clustering algorithm and working code. For more information
on the logistics of the clustering algorithm used in the VIRM project,
refer to the work on Ontology-based Information Fusion done by Dr. David
Eichmann at http://andal.info-science.uiowa.edu/eichmann/Sulla/OBIF.pdf.
4.3 Creating the graphical user interface
for VIRM
Once the VIRM clustering code was completed,
we began creating a new graphical user interface from scratch. This
user interface (a.k.a. image search interface)
is intended to operate solely as an image retrieval engine, not for text
searching. The image search interface is being designed in the hope
of making VIRM more user-friendly. Although it currently resembles
Sulla's control panel interface (shown in Figure 2), VIRM's GUI is built
with more components, such as menu bars with options for new multiple window
files, that give the user the flexibility to run numerous image seach windows
simultaneously. The control panel interface shown in Figure 2 is
one that is fine for developing VIRM, but it is not suitable for a general
web user to operate. Thus, we have worked on developing a new interface
for VIRM that will be better suited to general image searching. The
VIRM GUI also includes a close window menu option, a quit menu, a help
menu option, and button options (start, cancel, and exit). When the
"start" button in the new VIRM GUI is clicked, a new window appears.
This new window contains the thumbnail images, which are displayed in a
Panel within an array of scrollbars, as shown in Figure 3.
The image search interface is still
in its first stages of construction. At this time, the image search
program is able to fetch images from specified URL addresses that are sifted
from documents relating to a given keyword. These images are then
clustered, and the control panel interface allows for the user/developer
to adjust the threshold values of the clusters and thus change the conditions
for grouping. For instance, the "membership threshold" condition
is a number between 0 and 1 that determines how similar images in a given
cluster must be, zero meaning that they need not be similar at all, and
one meaning that they must be identical. The higher the membership
threshold value, the more similar images in a particular cluster are, but
also the more clusters that are generated. The "cluster threshold"
is similar, determining if, in the graph of all the existing clusters,
an edge should be drawn between any clusters to indicate that they fulfill
the cluster threshold measure of similarity. Experimentation needs
to be done with the threshold values to determine which values provide
the best similarity groupings without generating an impractical number
of clusters.
Figure 2. Sulla control panel interface
Figure 3. The VIRM image display interface.
5 Obstacles encountered
As with any pioneering project, we encountered many obstacles in the development
of VIRM. Most of our problems stemmed from the details of image loading
and processing. We were able to resolve some of these difficulties,
but many can still use refinement.
5.1 Image compression
When we worked on compressing the pixel RGB component values from eight
bits to three bits, there was some concern for the loss of so many color
bits in the compression. This is reasonable, considering that the
images start out with 24 color bits, over 16 million possible colors, and
are subsequently cut down to 9 color bits, with only 512 possible colors.
Upon reconstruction of the degraded pixels, though, we discovered that
there actually was not a significant difference between the original image
and the degraded one (Figure 4). When we tried increasing the compression
of the RGB components to two bit color values, the difference between the
original image and the reconstructed image was more marked (Figure 5).
Reducing each RGB component to 2 bits cuts the color range down to only
64 colors. However, this may still be sufficient to compute image
similarity and cluster images while increasing the real time speed of the
program, which, at this time, is still rather slow. Experimentation
is still needed to determine what is the ideal compression range for image
comparison.
Figure 4. Compression of RGB components to 3 bits (L- degraded
image, R- original image)
Figure 5. Compression of RGB components to 2 bits (L- degraded
image, R- original image)
5.2 Memory shortage
Another significant problem with visual information retrieval in general
is the incredible amount of memory images require to process. We
encountered numerous "java.lang.OutOfMemory" errors during execution of
VIRM. We eliminated many unnecessary array-creation instances in
our Java code, which significantly helped the memory situation. Despite
these measures, though, much still needs to be done in saving memory space,
such as improving garbage collection methods and avoiding other unnecessary
data structure instantiations.
5.3 Transparent images
Transparent images posed a problem during VIRM's run time. Images
that contained transparent pixels affected the image similarity computations,
because their pixels could not be registered as a particular color, since
the pixels weren't even visible. Also, pixels have different levels
of transparency and opacity, so even though a certain pixel may be mostly
transparent and show up as very light, it would be registered in the color
histogram as having full color brightness. Thus, we would get images
clustered together as "similar" that would have striking color differences.
5.4 Unsuitable images
We discovered that while the thumbnail images were displayed, VIRM grabbed
not only the user's desired images within the HTML documents but also unnecessary
buttons and logos (Figure 6). Since VIRM simply sifts through HTML
documents for image URLs indiscriminately, it retrieves navigational buttons,
bullets, bars, and other icons, in addition to images of the type that
are actually being searched for. These so-called "functional" images,
which increase the functionality of a page and aid in site navigation,
are most likely not the subject of an image search, so there presence in
the image search results is generally undesired.
Navigational buttons |
|
Bullets |
|
Paragraph separator/bar |
|
Other icons |
|
Figure 6. Example functional images
6 What is next for VIRM?
The ultimate goal of VIRM is to provide the general public with a suitable
image retrieval system for the World Wide Web. At this stage, VIRM
is able to perform image retrieval and display rather crudely. There
are still many problems with the project that need to be resolved, and
there is much room for improvement of existing capabilities. This
section will discuss a few of the possible areas for future work and propose
ways of building VIRM to its highest potential.
6.1 Eliminating functional images
The next step for VIRM would be to give it an ability to distinguish between
the desired, searched-for images in an HTML document and the undesired
functional images such as navigational buttons, bullets, paragraph separators,
and other icons. Elimination of these numerous unwanted images would
save much in the way of computational time and memory, in addition to improving
the quality of the search results presented by VIRM. It has been
proposed by V. Harmandas, M. Sanderson, and M.D. Dunlop in their paper
"Image retrieval by hypertext links" that such unsuitable functional images
could be identified from other non-functional images by the higher number
of links with other pages that these images have. Since functional
images are usually navigational or readability enhancement images, they
tend to be linked with more pages than other images. Harmandas, et
al demonstrate that over 90% of non-functional images are linked to by
only one page, whereas functional images are always linked to by more than
two pages, with almost half of them being linked to by more than five.
Reasonably, then, the possibility of functional images being retrieved
can be reduced by weighting them inversely to the number of pages linking
to them. A second possibility to remove functional images would be
to check each image's color histogram to make sure that there are more
than, say, fifty distinct colors. Since most functional images have
very few different colors, this could help cut down on the number of functional
images that appear in a VIRM run.
6.2 User download option
At this point in VIRM's development, images retrieved from a search can
only be viewed as a thumbnail (scaled down form). This is not of
much use for a user who needs to download or manipulate these images.
Future work needs to include the insertion of Java code that will allow
the user to have access to the original image. Two possibilities
are to either allow a user to download the image directly from the VIRM
thumbnail or to direct the user to the location of the original image.
If the image is to be downloaded from the thumbnail, perhaps a full-scale
view of the image could be displayed in a separate frame by a single mouse
click on the thumbnail, and the full image could then be downloaded by
another mouse click on the full-scale image.
6.3 Dealing with transparent images
The similarity of the images VIRM clusters
together is not particularly strong. Some of this may be due to transparent
images. Images that are transparent or that have transparent sections
cannot have their transparent pixels registered in the color histogram,
since they are not even visible. Since VIRM is written in Java, it
may be helpful in the similarity determination algorithm to use the Java
method called getAlpha(), which returns how transparent each pixel in an
image is. The similarity calculations could them be weighted according
to the transparency or opacity of each pixel. Pixels that are completely
transparent would have an alpha value of zero so they would not be counted
in the color histogram at all, while completely opaque pixels would have
an alpha value of 255 and would be given full weight in the histogram.
Intermediate alpha values would be weighted accordingly. This will
hopefully improve the similarity between retrieved images in a particular
cluster.
6.4 Edge detection
Another area of future improvement for image similarity in VIRM would be
applying edge detection capabilities. In order to avoid the grouping
of images with strong color similarity but obvious structural differences
(e.g., a red rose on a black background and a dark volcano with red molten
lava), an edge detection scheme is important. Not only the similarity
of colors, but the similarity of placement of these colors needs to be
observed. The edge detection or color placement scheme could be applied
in the image clustering code after the affirmation of color resemblance.
6.5 Improving the GUI
VIRM's development also needs to be extended in its graphical user interface
to make it flexible and to give it a better display of windows and images.
VIRM's new user interface is currently under construction and the next
phase would be to instantiate the necessary files in the Spider package
in order for the image search interface to operate on Spider, but particularly
for acquiring images. One question
that still needs to be answered as far as the display of the thumbnails
is concerned is whether proportional scaling should be used to make them
or if a set size thumbnail should be used. Proportional scaling makes
it much easier to tell what an image is supposed to be, but a set size
image is much easier to view in a scroll pane, since all scaled images
will be the same size. Development and improvement of the graphical
user interface is the next big phase of the VIRM project.