The present invention relates to a pattern recognition
method adapted to approximate the distribution of a set of vectors and the boundary
of two or more sets (classes) of vectors in a vector space based on basis functions.

Methods of using basis functions referred to as radial
basis functions (to be referred to as spherical basis functions hereinafter) are
known. Spherical basis functions have been proposed by several study groups independently.
J. E. Moody and C. Darken: "Fast Learning in Networks of Locally-Tuned Processing
Units", Neural Computation 1, pp. 281-294, 1989
, may be cited here as an example of such propositions.

Spherical basis functions have a peak at the center and symmetrical in all directions.
Of spherical basis functions, those of the so-called Gaussian type are the most
popular and are expressed by
$${\mathrm{o}}_{\mathrm{i}}\left(\mathrm{x}\right)\mathrm{=}\mathrm{exp}\left[\mathrm{-},{\mathrm{\Vert}\mathrm{x}\mathrm{-}{\mathrm{\&xgr;}}_{\mathrm{i}}\mathrm{\Vert}}^{\mathrm{2}},\mathrm{/},{\mathrm{2}{\mathrm{\&sgr;}}_{\mathrm{i}}}^{\mathrm{2}}\right]$$

where x is the vector that corresponds to the input pattern and &xgr;_{i}
is the i-th basis vector (parameter indicating the position in the Gaussian distribution),
while &sgr;_{i} is the i-th standard deviation (parameter indicating the
expanse of the Gaussian distribution). The value of the i-th Gaussian type basis
function is o_{i}(x), which is not negative and large when x is close to
&xgr;_{i} and takes the largest value of 1 when x = &xgr;_{i}.
It is possible to approximate the distribution of any arbitrarily selected set of
vectors to a desired accuracy level by providing a sufficient number of basis functions
and using a weighted linear combination as expressed by
$${\mathrm{y}}_{\mathrm{1}}\left(\mathrm{x}\right)\mathrm{=}{\displaystyle \mathrm{\sum}_{\mathrm{i}\mathrm{=}\mathrm{1}}^{\mathrm{H\u02b9}}}{\mathrm{w}}_{\mathrm{1}\mathrm{i}}{\mathrm{o}}_{\mathrm{i}}\left(\mathrm{x}\right)$$

where 1 is the class number of the set of vector and w_{li} is the contribution
ratio (weight parameter) of the i-th basis function to the class 1, while H' is
the number of basis functions. The above formula indicates the extent to which an
unknown input pattern resembles the particular class (degree of similarity) so that
it can be used to classify classes. For example, if
$$\mathrm{C}\left(\mathrm{x}\right)\mathrm{=}{\mathrm{arg\; max}}_{\mathrm{1}}\left[{\mathrm{y}}_{\mathrm{1}},\left(\mathrm{x}\right)\right]$$
it is possible to determine the class of an input pattern according to the class
boundary defined by basis functions. In the formula 3 above, argmax_{l}
[·] is the number of the class that provides the largest value for the degree
of similarity.

The pattern recognition method that uses spherical basis
functions provides advantages including the ability to optimize parameters-by learning,
like feed forward neural nets based on general sigmoid functions; but, unlike general
neural nets, the contribution ratios of individual basis functions are intuitively
comprehensible.

However, the distribution of vectors that corresponds to
a pattern observed from the real world is, more often than not, complex and hence
it is necessary to prepare a large number of basis functions in order to accurately
approximate such a distribution. Conversely, when the number of obtained samples
is small, approximation can produce a state of being too complex relative to the
proper distribution (population distribution) (excessive learning).

Accordingly, it is an object of the present invention to
provide a pattern recognition method adapted to approximate the distribution of
a set of vectors that may be complex and the boundary of classes based on fewer
basis vectors than the number of known comparable methods.

According to one embodiment of the present invention, there
is provided a pattern recognition method of approximating distribution of a set
of vectors and a class boundary in a vector space based on basis functions. The
method includes defining directional basis functions between two basis vectors,
and performing the approximation using a linear combination of the directional basis
functions.

The invention can be more fully understood from the followings
detailed description when taken in conjunction with the accompanying drawings, in
which:

- FIG. 1 is a flowchart of a processing operation of an embodiment of the present
invention, shown as an example;
- FIG. 2 is an illustration of the vectors, the basis functions and the distribution
of the set of vectors that correspond to an input pattern, shown as an example;
- FIG. 3 is an illustration of basis functions, shown as an example;
- FIG. 4 is a schematic illustration of an approximation of a pattern distribution,
shown as an example;
- FIG. 5 is a schematic illustration of an approximation of a pattern distribution
same as that of FIG. 4 made using conventional spherical basis functions;
- FIG. 6 is a schematic illustration of basis functions when the number of obtained
samples is small, shown as an example;
- FIG. 7 is a schematic illustration of an approximation of a pattern distribution
when the number of obtained samples is small, shown as an example;
- FIG. 8 is a schematic illustration of an approximation of the pattern distribution
same as FIG. 7 made using conventional spherical basis functions, shown as an example;
- FIG. 9 is a flowchart illustrating a method of defining parameters by learning
samples; and
- FIG. 10 is a schematic block diagram of hardware, showing the configuration
thereof as an example.

Embodiments of the present invention will be described
below with reference to the drawings.

<First Embodiment>
FIG. 1 is a flowchart of a processing operation of this
embodiment. FIG. 2 is an illustration of the vectors, the basis functions and the
distribution of the set of vectors that correspond to an input pattern.

Firstly, a pattern is input (ST1). The expression of a
pattern as used herein refers to a string of numerical values that a computer can
handle such as the pixel values of a digital image or a row of characteristic quantities.
A pattern can be regarded as vectors having such numerical values as components
(ST2, ST3). The vectors that correspond to the input pattern are expressed by x
in
$$\mathrm{x}\mathrm{=}{\left[{\mathrm{x}}_{\mathrm{1}},,{\mathrm{x}}_{\mathrm{2}},\mathrm{\dots},{\mathrm{x}}_{\mathrm{M}}\right]}^{\mathrm{T}}$$

where x_{1}, x_{2}, ..., x_{M} represent the string of numerical
value of the pattern (the elements of each vector) and M represents the number of
elements.

Then, the values of the basis functions are computationally
determined (ST4 through ST6).

According to this embodiment, basis functions are defined between two basis vectors
so that, if the number of basis vectors is H, the number of basis functions is H^{2}.
This embodiment proposes basis functions o_{ij}(x) expressed by a formula
that has peaks at the positions of the two basis vectors and enhanced characteristics
centered at the line segment connecting the two peaks. The formula is
$${\mathrm{o}}_{\mathrm{i\; j}}\left(\mathrm{x}\right)\mathrm{=}\mathrm{exp}\left[\mathrm{-},{\mathrm{\Vert}\mathrm{x}\mathrm{-}{\mathrm{\&xgr;}}_{\mathrm{i}}\mathrm{\Vert}}^{\mathrm{2}},\mathrm{/},{\mathrm{\Vert}\mathrm{x}\mathrm{-}{\mathrm{\&xgr;}}_{\mathrm{i}}\mathrm{\Vert}}^{\mathrm{2}},/,{\mathrm{2}{\mathrm{\&sgr;}}_{\mathrm{i\; j}}}^{\mathrm{2}}\right]$$

where x represents the vectors that correspond to the input pattern and &xgr;_{i}
represents the i-th basis vector, while &xgr;_{j} represents the j-th
basis vector and &sgr;_{ij} represents the standard deviation of the basis
functions defined between the i-th and j-th basis vectors.

The basis function essentially different from conventional
spherical basis functions in that it is not symmetrical in all directions when the
basis vector is viewed as center (and has enhanced characteristics depending on
the direction of some other basis vector). FIG. 3 is an illustration of basis functions,
shown as an example. The present invention is based on "an assumption of linear
interpolation" or "an assumption that the linear interpolation between two basis
vectors is strong when the distance between them is short". FIG. 3 shows the profiles
of three different basis functions when the distance between the two vectors is
long, medium and short.

Like a spherical basis function, the value of o_{ij}(x)
is non-negative and becomes larger as x approaches &xgr;_{i} or &xgr;_{j}
so that it takes the largest value of 1 when x = &xgr;_{i} or x = &xgr;_{j}.
Additionally and similarly, the distribution y_{l} (x) of the set of vectors
can be approximated by a weighted linear combination (ST7, ST8) as expressed by
the formula below.
$${\mathrm{y}}_{\mathrm{1}}\left(\mathrm{x}\right)\mathrm{=}{\displaystyle \mathrm{\sum}_{\mathrm{i}\mathrm{=}\mathrm{1}}^{\mathrm{H}}{\displaystyle \mathrm{\sum}_{\mathrm{i}\mathrm{=}\mathrm{1}}^{\mathrm{H}}}}{\mathrm{w}}_{\mathrm{1}\mathrm{i\; j}}{\mathrm{o}}_{\mathrm{i\; j}}\left(\mathrm{x}\right)$$

where l represents the class number of the set of vectors and w_{lij} represents
the contribution ratio of the class l of the basis functions defined between the
i-th and j-th basis vectors, while H represents the number of basis function.

Finally, the extent to which the input pattern resembles
the class (the degree of similarity), or C(x), is computationally determined by
means of the formula shown below.
$$\mathrm{C}\left(\mathrm{x}\right)\mathrm{=}{\mathrm{arg\; max}}_{\mathrm{1}}\left[{\mathrm{y}}_{\mathrm{1}},\left(\mathrm{x}\right)\right]$$

where argmax_{l} [·] is the number of the class that provides the largest
value for the degree of similarity.

The basis function of this embodiment has the advantage
that it is possible to learn parameters and the contribution ratios are intuitively
comprehensible like conventional spherical basis functions and additionally overcomes
the problems of the conventional methods. More specifically, a complex distribution
of vectors can be approximated by fewer basis vectors than ever. FIG. 4 is a schematic
illustration of an approximation of a pattern distribution by means of this embodiment.
Since the basis function of this embodiment has a characteristic showing an elliptic
or cylindrical expanse (asymmetric) as indicated by solid lines, it is highly expressive
if compared with a conventional basis function that shows only a circular expanse.

FIG. 5 is a schematic illustration of an approximation
of a pattern distribution same as that of FIG. 4 made using conventional spherical
basis functions. FIG. 5 schematically shows that the conventional basis function
requires more basis vectors than this embodiment (which requires only three basis
vectors).

Generally, when the pattern distribution is complex but
locally continuous (linear in particular) if viewed locally, it is possible to reduce
the number of basis vectors by the interpolation potential of basis functions of
the present invention. A basis function according to this embodiment is equivalent
to a conventional spherical basis function in the worst case where interpolation
cannot contribute to an approximation of a pattern distribution.

Additionally, if the obtained samples are few, the approximation
using basis functions does not become more complex than the population distribution
but is very close to the latter. FIG. 6 is a schematic illustration of basis functions
of this embodiment when the number of obtained samples is small and FIG. 7 is a
schematic illustration of an approximation of a pattern distribution when the number
of obtained samples is small. Basis functions of this embodiment have a means that
local fluctuations of the input pattern are predicted according to the mathematical
model for interpolation of basis vectors.

FIG. 8 is a schematic illustration of an approximation
of the pattern distribution same as FIG. 7 made using conventional spherical basis
functions. FIG. 8 schematically illustrates that conventional spherical basis functions
are located only around the obtained small number of samples to give rise to an
approximation that considerably differs from the population distribution.

Generally, the fluctuations of a pattern observed from
the real world are locally continuous, it is possible to more accurately approximate
a population distribution from fewer samples due to the interpolation potential
of basis functions of the present invention.

<Second Embodiment>.
The present invention is by no means limited to the above-described
first embodiment and can be embodied in various different ways independently from
the above-described first embodiment so long as directional (not symmetric in all
directions) basis functions are defined between two basis vectors. For example,
basis functions as defined by the formula below are proposed for the second embodiment.
$${\mathrm{o\u02b9}}_{\mathrm{i\; j}}\left(\mathrm{x}\right)\mathrm{=}\mathrm{exp}\left[\mathrm{-},{\mathrm{\Vert}\mathrm{x}\mathrm{-}{\mathrm{h}}_{\mathrm{i\; j}}\mathrm{\Vert}}^{\mathrm{2}},\mathrm{/},{\mathrm{\Vert}{\mathrm{\&xgr;}}_{\mathrm{i}}\mathrm{-}{\mathrm{\&xgr;}}_{\mathrm{j}}\mathrm{\Vert}}^{\mathrm{2}},/,{\mathrm{2}{\mathrm{\&sgr;}}_{\mathrm{i\; j}}}^{\mathrm{2}}\right]$$

where x represents the vectors that correspond to an input pattern and h_{ij}
represents the feet of the perpendiculars to the line segment connecting the i-th
and j-th basis vectors from x, while &xgr;_{i} and &xgr;_{j}
respectively represent the i-th and j-th basis vectors and &sgr;_{ij}
represents the standard deviation of the basis functions defined between the i-th
and j-th basis vectors.

The basis vectors are characterized by a cylindrical profile
having the line segment connecting two basis vectors as core and extending in directions
perpendicular to it. In the above formula, h_{ij} is actually expressed
by a formula using x, &xgr;_{i} and &xgr;_{j} so that the number
of parameters is the same as in the first embodiment. The second embodiment is characterized
in that, unlike the first embodiment, two basis vectors are connected by a constant
value that corresponds to the distance between them.

<Third Embodiment>
The third embodiment relates to learning for parameters.
Basis functions of the present invention have four parameters including W_{lij},
&xgr;_{i}, &xgr;_{j} and &sgr;_{ij}.

If the distribution of the pattern to be handled is known, the parameters can be
defined according to the distribution. However, the distribution of a pattern obtained
from the real world is generally unknown.

Thus, this embodiment proposes a technique of defining
the parameters by learning samples. FIG. 9 is a flowchart illustrating a method
of defining parameters by learning samples. Referring to FIG. 9, firstly the parameters
are initialized by using appropriate values (ST11). Then, the parameters are updated
(ST12 through ST14) for the obtained samples according to
$${\mathrm{w\u02b9}}_{\mathrm{1}\mathrm{i\; j}}\mathrm{=}{\mathrm{w}}_{\mathrm{1}\mathrm{i\; j}}\mathrm{-}\mathrm{\&agr;}\left[\mathrm{\partial},\mathrm{\&egr;},\left(\mathrm{x},,\mathrm{y}\right),\mathrm{/},\mathrm{\partial},{\mathrm{w}}_{\mathrm{1}\mathrm{i\; j}}\right]$$
$${\mathrm{\&xgr;\u02b9}}_{\mathrm{i}}\mathrm{=}{\mathrm{\&xgr;}}_{\mathrm{i}}\mathrm{-}\mathrm{\&agr;}\left[\mathrm{\partial},\mathrm{\&egr;},\left(\mathrm{x},,\mathrm{y}\right),\mathrm{/},\mathrm{\partial},{\mathrm{\&xgr;}}_{\mathrm{i}}\right]$$
$${\mathrm{\&xgr;\u02b9}}_{\mathrm{j}}\mathrm{=}{\mathrm{\&xgr;}}_{\mathrm{j}}\mathrm{-}\mathrm{\&agr;}\left[\mathrm{\partial},\mathrm{\&egr;},\left(\mathrm{x},,\mathrm{y}\right),\mathrm{/},\mathrm{\partial},{\mathrm{\&xgr;}}_{\mathrm{j}}\right]$$
$$\mathrm{\&sgr;}{\mathrm{\u02b9}}_{\mathrm{i\; j}}\mathrm{=}{\mathrm{\&sgr;}}_{\mathrm{i}}\mathrm{j}\mathrm{-}\mathrm{\&agr;}\left[\mathrm{\partial},\mathrm{\&egr;},\left(\mathrm{x},,\mathrm{y}\right),\mathrm{/},{\mathrm{\partial}\mathrm{\&sgr;}}_{\mathrm{i\; j}}\right]$$

where &egr;(x, y) is the learning error and expressed by the formula below, using
y_{c} as a teaching signal (desirable value).
$$\mathrm{\&egr;}\left(\mathrm{x},,\mathrm{y}\right)\mathrm{=}{\left({\mathrm{y}}_{\mathrm{c}},\mathrm{-},\mathrm{y},\left(\mathrm{x}\right)\right)}^{\mathrm{2}}\mathrm{/}\mathrm{2}$$

where &agr; is a learning constant, which is a positive value. The parameters
are updated sequentially in this way and the learning session is ended when &egr;(x,
y) shows a sufficiently small value (or when the number of learning sessions exceeds
a predetermined number). The learning for parameters can be realized independently
from the pattern recognition process of the first and second embodiments.

As pointed out above, with a pattern recognition method
adapted to approximate the distribution of a set of vectors and the class boundary
in a vector space based on basis functions, it is possible to approximate the distribution
of a set of vectors and a class boundary with fewer basis vectors by defining directional
basis functions (not symmetric in all directions) between two basis vectors and
performing the approximation based on a linear combination thereof. Additionally,
if the number of obtained samples is small, the approximation using basis functions
does not become more complex than the population distribution but is very close
to the latter unlike conventional basis functions.

Particularly, it is possible to approximate a set of vectors
and a class boundary by modeling the relationship of strong interpolation and short
distance, using basis functions that have peaks at the positions of the two basis
vectors and show characteristics of connecting the two peaks by a non-linear curved
surface whose dimensions correspond to the distance between them.

It is also possible to approximate a set of vectors and
a class boundary by modeling the relationship of strong interpolation and short
distance, using basis functions that have peaks at the positions of the two basis
vectors and are characterized by the cylindrical profile having the line segment
connecting two basis vectors as core and extending in directions perpendicular to
it.

Furthermore, it is possible to define parameters by sequentially
updating the parameters according to an error minimal standard, while inputting
samples, even when the distribution of the pattern to be handled is unknown.

The processing sequences described above for the embodiments
may be written as computer programs (codes) and stored in a computer-readable storage
medium (e.g., a magnetic disk, an optical disk or a semiconductor memory) and any
of the computer program may be read out and executed by means of a computer (processor)
when necessary. Any of such computer programs can be distributed by transmitting
it from a computer to another computer by way of a transmission medium.

Any of the computer programs containing the processing
sequences may be executed by an information processing apparatus (computer) 11 having
hardware resources as shown in FIG. 10. The processor (CPU or the like) 10 shown
in FIG. 10 can execute the computer program loaded in a volatile memory (RAM or
the like) 13 from a storage device (hard disk or the like) 12, display various pieces
of information (the results of the processing operation) on a display 15 according
to the operation at an input device (mouse, keyboard or the like) 14 and/or store
them in the storage device 12.

While certain embodiments of the inventions have been described,
these embodiments have been presented by way of example only, and are not intended
to limit the scope of the inventions. Indeed, the novel methods and systems described
herein maybe embodied in a variety of other forms; furthermore, various omissions,
substitutions and changes in the form of the methods and systems described herein
may be made without departing from the spirit of the inventions. The accompanying
claims and their equivalents are intended to cover such forms or modifications as
would fall within the scope and spirit of the inventions.