PatentDe  


Dokumentenidentifikation EP0604572 18.12.1997
EP-Veröffentlichungsnummer 0604572
Titel MISCHUNG VON VIDEOSIGNALEN UNTER BENUTZUNG EINES PATTERN-KEYS
Anmelder Sarnoff Corp., Princeton, N.J., US
Erfinder HANNA, Keith, James, Princeton, NJ 08540, US;
BURT, Peter, Jeffrey, Princeton, NJ 08540, US
Vertreter derzeit kein Vertreter bestellt
DE-Aktenzeichen 69223155
Vertragsstaaten DE, ES, FR, GB, IT, NL
Sprache des Dokument En
EP-Anmeldetag 10.09.1992
EP-Aktenzeichen 929208262
WO-Anmeldetag 10.09.1992
PCT-Aktenzeichen US9207498
WO-Veröffentlichungsnummer 9306691
WO-Veröffentlichungsdatum 01.04.1993
EP-Offenlegungsdatum 06.07.1994
EP date of grant 12.11.1997
Veröffentlichungstag im Patentblatt 18.12.1997
IPC-Hauptklasse H04N 5/275

Beschreibung[en]

The invention relates to a technique for deriving a composite video image by merging foreground and background video image data supplied from a plurality of separate video signal sources and, more particularly, a technique employing pattern-key insertion for this purpose.

BACKGROUND OF THE INVENTION

Means for merging two or more video signals to provide a single composite video signal is known in the art. An example of such video merging is presentation of weather-forecasts on television, where a weather-forecaster in the foreground is superimposed on a weather-map in the background.

Such prior-art means normally use a color-key merging technology in which the required foreground scene is recorded using a colored background (usually blue or green). The required background scene is also recorded. In its simplest form, the color-key video merging technique uses the color of each point in the foreground scene to automatically "hard" switch (i.e., binary switch) between the foreground and background video signal. The color-key video merging technique uses the color of each point in the foreground scene to automatically switch between the foreground and background video signal. In particular, if a blue pixel is detected in the foreground scene (assuming blue is the color key), then a video switch will direct the video signal from the background scene to the output scene at that point. If a blue pixel is not detected in the foreground scene, then the video switch will direct the video from the foreground scene to the output scene at that point. After all points have been processed in this way, the result is an output scene which is a combination of the input foreground and background scenes.

More complex "soft" forms of the color-key video merging technique are taught in the article by Nakamura et al., in an article in SMPTE Journal, Vol. 90, Feb. 1981, p. 107 and in U.S. Patent US-A 4,409,611. In these more complex forms of the color-key video merging technique, the effects of switching may be hidden and more natural merging may be achieved. For instance, shadows of foreground subjects may be made to appear in the background.

The color-key merging technique is simple, and cheap hardware for this method has been available for some time. As a result, color-key insertion can be performed on both recorded and live video. It is used widely in live television for such purposes as superimposing sports results or images of reporters on top of background scenes, and in the film industry for such purposes as superimposing foreground objects (like space-ships) onto background scenes (like space-scenes).

However, there are two important limitations of color-key merging technology. First, this technique cannot be used to combine video sources where the separation color (e.g., blue or green) in the scene cannot be controlled by the employer of this technology. This has often limited the use of color-key insertion to image sequences recorded in a broadcasting or film studio. Second, it is not currently possible to automatically combine video signals in such a way that patterns inserted from one sequence follow the motion of objects (foreground or background) in the other sequence so that the inserted patterns appear to be part of these objects. While, in the past, synchronization of the motions of background and foreground scenes has been performed manually in a very limited number of film productions, such manual synchronization is highly expensive and tedious and requires that the video material be prerecorded and not 'live'.

The prior art includes a dynamic pattern recognition method which employs a hierarchical structured search for detecting a pattern within a video scene. An example of the use of this method is described in U. S. Patent US-A-5,063,603, the teachings of which are incorporated herein by reference. Briefly, this dynamic pattern recognition method consists of representing a target pattern within a computer as a set of component patterns in a "pattern tree" structure. Components near the root of the tree typically represent large scale features of the target pattern, while components away from the root represent progressively finer detail. The coarse patterns are represented at reduced resolution, while the detailed patterns are represented at high resolution. The search procedure matches the stored component patterns in the pattern tree to patterns in the scene. A match can be found, for example, by correlating the stored pattern with the image (represented in a pyramid format). Patterns are matched sequentially, starting at the root or the tree. As a candidate match is found for each component pattern, its position in the image is used to guide the search for the next component. In this way a complex pattern can be located with relatively little computation.

Further, it is known in the prior art how to estimate the orientation of a flat surface of a given detected pattern in a scene depicted in a video image. The particular parameters that need to be determined are the position of the given detected pattern in the scene, its scale and orientation in the plane of the image, and its tilt into the image plane. Pose is estimated by measuring the geometric distortions of other "landmark" patterns on or near the given detected pattern. Pose may be estimated in two steps.

The first step is to make a rough estimate of pose by locating three or more of such landmark patterns in the scene that are on or near the given detected pattern. The positions of these landmark patterns relative to the given detected pattern are known from training images. However, the positions of these landmark patterns relative to one another change with changes in pose of the given detected pattern. Therefore, the relative positions of the landmark patterns in the observed scene can be used to determine that pose. Landmark patterns can be located using hierarchical structured search, as described above.

The second step, which refines makes use of "locator patterns" that are on or near the given detected pattern. These "locator patterns"are more extensive patterns than are typically used as landmarks. Stored copies of the pattern are matched to the scene through a process that successively estimates position and orientation, and a process that warps the stored copies into alignment with the observed patterns in the scene. This alignment process, known in the art and called herein "affine precise alignment estimation, can provide a very precise estimate of the pattern positions, and hence of the pose of the given detected pattern. The affine precise alignment estimation process is described in various publications, including "Hierarchical Model-Based Motion Estimation" in the Proc. European Conference on Computer Vision, 1992, pp. 237-252, by Bergen et al and U. S. Patent US-A-5,067,014 to Bergen et al. and assigned to the assignee of this invention.

International Patent Application number WO-A-9302524 concerns apparatus which recognizes a portion of a video image and inserts another image in place of the recognized portion. The other image may be a logo or an advertising message. In this system, a user identifies a region to be replaced by the logo and then the system automatically recognizes the region in subsequent video frames. The recognized region is replaced by the logo in a manner such that the logo appears to be part of the original scene.

Japanese Kokai publication number JP-A-02306782 concerns a camera which is equipped with rotary encoders and a potentiometer which give information on the angle and size of the image being viewed. This information is used to geometrically transform a second image which is inserted at a predetermined location in the image being viewed by the camera.

European patent application number EP-A-0 360 576 relates to a video processing system in which a user marks a monitor with a stylus reference points representing corners of a keyframe on each frame of a multi-frame sequence. Based on the marked corners, the system geometrically transforms another image and combines it with the marked image such that the transformed image overlays the marked keyframe and the overlaid image appears to have the same three-dimensional orientation as the marked keyframe.

According to one aspect of the invention, there is provided apparatus for replacing a first pattern in a sequence of successive video image frames of a scene with a second pattern comprising: first means for locating the first pattern in the sequence of successive video image frames and for obtaining respective estimates of orientation and size for the first pattern in each of the image frames; second means for geometrically transforming said second pattern into a sequence of transformed second patterns responsive to the respective estimates of orientation and size of said first pattern; and third means responsive to the estimates of the orientation and size of said first pattern for replacing occurrences of said first pattern in said video image frames with respective ones of said geometrically transformed second patterns, characterized in that the first means includes: means for determining locations for a plurality of landmarks relative to an image represented by one image frame of the sequence of successive video image frames, means responsive to the determined locations of the plurality of landmarks, for automatically determining the location of one of the plurality of landmarks in each of said video image frames in said sequence; and means for estimating, from the determined relative location of the one landmark in each of said video image frames, an orientation and size for said first pattern with respect to each of said video image frames in said sequence.

According to another aspect of the invention, there is provided a method for replacing a first pattern in a sequence of video image frames of a scene with a second pattern comprising the steps of:

  • a) locating the first pattern (A) in the sequence of successive video image frames and obtaining respective estimates of orientation and size for the first pattern in each of the image frames;
  • b) geometrically transforming said second pattern (B) into a sequence of second patterns using the respective estimates of orientation and size of said first pattern; and
  • c) replacing said detected first pattern with a respective one of said geometrically transformed second patterns in response to the estimates of orientation and size of said first pattern in each of said video image frames in said sequence, characterized in that step (a) includes the steps of:
    • a1) determining locations for a plurality of landmarks relative to an image represented by one image frame of the sequence of successive video image frames;
    • a2) automatically determining, responsive to the determined locations of the plurality of landmarks, the relative location of one of the plurality of landmarks in each of said video image frames in said sequence; and
    • a3) estimating an orientation and size for the first pattern with respect to the determined location of the one of the plurality of landmarks in each of said video image frames in said sequence.

BRIEF DESCRIPTION OF THE DRAWING

  • Fig. 1 illustrates the prior-art color-key insertion technique;
  • Figs. 2, 3, 4 and 5 illustrate examples of the pattern-key insertion technique of the invention;
  • Fig. 6 shows an example of "landmark region tracking"; and
  • Fig. 7 illustrates the successive steps performed in implementing the pattern-key insertion technique of the invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Elements common to different Figures have the same numerical designation in each Figure.

Each of Figs. 1 - 5 includes a video switch for merging foreground and background scenes into an output scene. For illustrative purposes, each of these video switches is assumed to be a "hard" switch. However it should be understood that a "soft" switch, of a type known in the prior art discussed above, could be used instead.

In Fig. 1, an example of the prior-art color-key insertion technique is shown. The video output pixels of each of a sequence of successive image frames being recorded by camera 100-1 that is viewing background scene 102-1 (which comprises the sun and a tree) in real time (or, alternatively, such output pixels of a video playback device that is playing back previously recorded background scene 102-1) are forwarded to output scene 104 through video switch 106 whenever the output from means 108 indicates that means 108 is detecting blue pixels, assuming blue is the key color. The video output pixels of each of a sequence of successive image frames being recorded by camera 100-2 that is viewing foreground scene 102-2 (which comprises a person sitting at a desk, which are situated in front of a blue screen) in real time (or, alternatively, such output pixels of a video playback device that is playing back previously recorded foreground scene 102-2) are forwarded to output scene 104 through video switch 106 are applied as an input to both video switch 106 and means 108. Therefore, video switch 106 forwards background scene 102-1 to output scene 104 only when the video output pixels of camera 100-2 constitute the blue pixels of the blue screen, and forwards foreground scene 102-2 to output scene 104 when the video output pixels of camera 100-2 constitute the non-blue pixels of the person sitting at a desk. Therefore output scene 104 constitutes the video output pixels of each of a sequence of successive composite image frames of the merged sun and a tree of background scene 102-1 and the person sitting at a desk of foreground scene 102-2.

The invention is a technique, known as "pattern-key insertion", used to replace a predetermined pattern present in a background scene with an inserted substitute pattern present in a foreground scene. Figs. 2 - 5 show different examples of replacement, with the simplest example being shown in Fig. 2 and more complex examples being shown in Figs. 3 - 5, respectively. Each of the examples shown in Figs. 2 - 5 will first be described in general. Thereafter, the specific operation of the structure functionally shown in these figures will be discussed in more detail.

In Fig. 2, camera 200A records in real time a sequence of successive 2-dimensional video image frames of physical 3-dimensional objects, such as billboard 202, situated in background scene 204A (or, alternatively, this sequence of 2-dimensional successive image frames of a video playback device that is playing back previously recorded background scene 204A). Illustratively, billboard 202 is assumed immovable, but camera 200A is movable both to the left and to the right and toward and away from the objects, including billboard 202, that comprise background scene 204A, as indicated by arrowed lines 206A. This will result in the orientation and scale of each of the sequence of video images of billboard 202 in background scene 204A changing in correspondence with the movement of camera 200 from one position to another, as indicated by arrowed lines 208A.

Billboard 202 in background scene 204A comprises striped logo pattern "A". It is desired to replace striped logo pattern "A" of background scene 204A with immovable striped logo pattern "B" of foreground scene 204B, which logo pattern "B" is assumed in Fig. 2 to be a fixed object defined by a video clip or single still picture. This is accomplished by means 210A which, in response to the video output of camera 200A applied as an input thereto, performs the functions of (1) detecting logo pattern "A" and (2) estimating the pose of detected logo pattern "A" (i.e., estimating the orientation, scale and perspective parameters of of logo pattern "A" in each of the sequence of successive 2-dimensional video image frames of background scene 204A with respect to the image of one or more reference objects, such as billboard 202 itself) in that video image frame.

In accordance with the above assumption that video switch 212 is a "hard" switch, means 210A provides a "no pattern A/pattern A" output 211 from means 210A, which is indicative of the presence or absence of detected pixels of logo pattern "A" in the video output from camera 200A, that is used to control the operation of video switch 212 (which performs the same function as video switch 106 in Fig. 1). In the case in which video switch 212 is a "soft" switch, rather than a "hard" switch, the binary "no pattern A/pattern A" output 211 from means 210A, which provides a sharp edge between logo pattern "A" and logo pattern "B", is replaced by a soft edge in which logo pattern "A" blends into logo pattern "B", as described in the above-discussed prior art.

Means 214, responsive to a "pose information" (i.e., orientation, scale, perspective distortion, etc. parameters of detected logo pattern "A") input 215 from means 210A and the one-time selection via input 217 of manually-selected size, shape, orientation, etc. parameters of logo pattern "B" itself performs the function of geometrically transforming the orientation and scale of logo pattern "B" of foreground scene 204B to match the orientation and scale of the estimated pose of logo pattern "A" in the current image frame (as indicated by diagram 216 in Fig. 2). The operation of video switch 212 merges the video output of camera 200 with the geometrically-transformed logo pattern "B" to result in output scene 218, wherein logo pattern "A" is replaced by inserted logo pattern "B". Thus, the pose of logo pattern "B" in output scene 218 changes in correspondence with the movement of camera 200 from one position to another, as indicated by arrowed lines 220.

In Fig. 3, the example is directed to the replacement of a predetermined pattern, such as striped logo pattern "A", present in a sequence of successive video image frames of background scene 204A, with an inserted substitute inherently movable pattern, such as movable striped logo pattern "B" (a baseball player at bat), present in a sequence of successive video image frames of foreground scene 204B derived from camera (or video playback device) 200B, wherein camera 200B is itself also movable. Because striped logo pattern "B" is inherently movable, it is possible that at times camera 200B may view only a portion of logo pattern "B", e.g., only the head of the baseball player (which head may be used as a reference pattern for logo pattern "B", as described below). For this reason, the object-insertion pattern-key insertion technique of Fig. 2 cannot be used. Instead, a motion-adaptive video-insertion pattern-key insertion technique is required in Fig. 3 (as well as in Figs. 4-5 to be described below).

More specifically, it is apparent that motion in a sequence of video image frames derived from camera 200B may be due to motion in one or both of camera 200B itself (as indicated by arrowed lines 206B) and the combined motion of both logo pattern "B" in foreground scene 204B and camera 200B (as indicated by arrowed lines 208B). Means 210B which, in response to the video output of camera 200B applied as an input thereto, performs the functions of (1) detecting logo pattern "B" and (2) estimating the pose of detected logo pattern "B" (i.e., estimating the orientation, scale and perspective parameters of logo pattern "B" in each of the sequence of successive 2-dimensional video image frames of foreground scene 204B with respect to the image of a reference pattern, such as the head of the baseball player) in that video image frame of camera 200B. The pose information derived by means 210B is applied as a first input to geometric transform means 214. Means 210A, which performs the same functions in Fig. 3 as described above in connection with Fig. 2, applies the pose information derived thereby as a second input 215A to geometric transform means 214.

Geometric transform means 214 uses the pose information from means 210B applied as a first input 215B thereto to compute a stabilized transformed image of logo pattern "B" for which the reference pattern (i.e., the head of the baseball player) constitutes a fixed origin. A fixed origin means that in the stabilized transformed image of the baseball player of logo pattern "B", the reference pattern (i.e., the head of the baseball player) appears fixed in position in each of the sequence of image frames, even though this reference pattern is moving in foreground scene 204B Geometric transform means 214 then uses the pose information from means 210A applied as a second input 215A thereto to transform the pose of the stabilized transformed image of logo pattern "B" in the manner described above in connection with Fig. 2 to provide the pose shown in diagram 216. The pose shown in diagram 216 is now inserted in output scene 218 by video switch 212 using the "no pattern A/pattern A" output 211 from means 210A. The result is that camera 200B recording foreground scene 204B and camera 200A recording background scene 204A can move independently of one another and a sequence of image frames of inherently movable logo pattern "B" of foreground scene 204B can still be inserted onto a a sequence of image frames of logo pattern "A" of background scene 204A, thereby replacing logo pattern "A" in output scene 218.

In Figs. 2 and 3, it has been assumed that physical 3-dimensional billboard 202 containing logo pattern "A" is immovable in background scene 204A. In the example of Fig. 4, it is assumed that physical 3-dimensional movable truck 222 contains logo pattern "A", and it is desired to replace logo pattern "A" with the same independently movable foreground-scene 204B logo pattern "B" (a baseball player) as in Fig. 3. Other than the fact that the detection of logo pattern "A" by means 210A in a sequence of image frames of background scene 204A is affected by the movement of truck 222 relative to other objects that may be present in background scene 204A, and the pose information from means 210A applied as a second input to geometric transform means 214 must provide information pertaining thereto, the implementation of Fig. 4 is substantially similar to that of Fig. 3.

In Fig. 5, it is desired to replace physical 3-dimensional moving automobile 224 in background scene 204A with the pattern of truck 226 from a foreground scene 204B in output scene 218. In principle, the design and implementation of pattern-key insertion described above in connection with Figs. 3 and 4 could be used without modification to implement Fig. 5 pattern insertion, but modifications might be necessary to make the insertion appear highly realistic in all viewing circumstances. For example, the orientation of automobile 224 in the in the image-frame background sequence of background scene 204A might change so much throughout the sequence, that no geometrical transform of a replacement video of truck 226 can make the inserted video look realistic. To illustrate this, imagine a side-on view of truck 226 as the replacement video, and a front-on view of automobile 224 as the background video. The goal is to replace automobile 224 with truck 226. Such replacement cannot be performed realistically, since the side-on view of the truck contains no information on the image of the front-view of the truck. One way of solving this problem is to obtain a set of images of the truck recorded from different view-points, and then geometrically transform that recorded image that will give the best replacement image. An alternative solution is shown in Fig. 5. In this case, graphics generator 228, which includes geometric transform means, has pose information about automobile 224 derived by means 210A applied as an input 215A thereto. This permits graphics generator 228 to produce a computationally rendered or generated image of truck 226 in the correct pose as a replacement scene 204B, in which the image of moving automobile 224 is replaced in background scene 204A by an image of moving truck 226.

The operation of means 210 A, 210B and 214 in Figs. 2, 3, 4 and/or 5 needed to provide pattern-key insertion, which will now be described, makes use of prior-art techniques discussed above.

Specifically, in the system of Fig. 2, an operator designates an arbitrary target pattern (logo pattern "A") in one video sequence. That corresponds to a stationary object (billboard 202) in background scene 204A. Means 210A of Fig. 2 monitors that sequence continuously; whenever the target pattern is found (detected), it is replaced by a replacement pattern or video (logo pattern "B") taken from a second sequence. The effect of insertion is that the replacement pattern (or video) can be made to appear as part of an object in the first sequence when the target pattern moves in the image due solely to motion of the camera. Thus, camera 200A recording background scene 204A can move across the scene containing logo pattern "A", and logo pattern "B" will remain correctly positioned on top of logo pattern "A" in the output sequence.

More generally, it is often difficult for means 210A to track certain background regions (like the middle of a featureless tennis court) simply because the pattern is not easily distinguished from other similar (or identical) patterns in the background scene. Secondary region tracking overcomes this problem. Instead of means 210A tracking the pattern on which video is to be inserted (the target pattern), a second "landmark" pattern that is easily distinguishable from any other pattern in the background scene can be tracked. The precise location of the landmark pattern in the background scene can then be used to infer the precise location of the target pattern.

Landmark region tracking requires a method for inferring the precise location of the target pattern from the landmark pattern. In means 210A, the coordinate position of the landmark pattern is subtracted from the coordinate position of the target pattern in a single reference image in the background sequence to obtain a difference vector. The difference vector is then added to the recovered location of the landmark pattern to estimate the position of the target pattern throughout the derivation of the foreground sequence. Put simply, the positional relationship between the landmark and target patterns is assumed to remain translationally fixed (but may still permitted to rotate and/or zoom with respect to one another) throughout the derivation of the foreground sequence, so the location of the target pattern can be inferred if the landmark pattern can be found.

It is even more difficult to detect a pattern (target or landmark) that is leaving or entering the field of view in the background sequence, because only part of the pattern is visible when it enters or leaves the scene, so the pattern-detector has more difficulty in identifying the region. This problem can be solved using landmark region tracking since the landmark pattern could be in full view even though the target pattern is only partially in view or even if the target is totally absent from the current field of view. An example of landmark region tracking is shown in Fig. 6. In Fig. 6, background scene 304A consists of the current field of view of camera 300A. As indicated, the current field of view includes the target (billboard 302 comprising logo pattern "A") and landmarks B (a tree) and C (a house), with each of the target and landmarks being positionally displaced from one another. As indicated by blocks 330, the current field of view, and 332, the world map, the target A and landmarks B and C, comprising the current field of view 330 of a landmark region, form only a portion of the stored relative positions and poses of patterns of the world map 332 of the landmark region. These stored patterns also include landmarks D and E which happen to be outside of the current field of view of the landmark region, but may be included in an earlier or later field of view of the landmark region. Means 310A(1), responsive to inputs thereto from both camera 300A and block 332, is able to derive an output therefrom indicative of the location of target A whether pattern A is completely in the field of view, is partially in the field of view, or only one or more landmarks is in the field of view. Means 310A(1) detects pattern A by detecting pattern B and/or C and using world map 332 to infer the position of pattern A. The output from means 310A(1), the location of pattern A, is applied to means 310A(2), not shown, which estimates pose in the manner described above. The output of means 310A(2) is then connected to a video switch (not shown).

Landmark region tracking is also useful when the target itself happens to be occluded in the current field of view, so that its location must be inferred from the locations of one or more non-occluded landmarks.

Landmark region tracking will only solve the problem if the target pattern leaves or enters the field of view in a particular direction. In the example shown in Fig. 6, where each of the landmark patterns within the landmark region lies to the right of the target pattern, landmark pattern tracking only solves the problem if the target pattern leaves the field of view on the left-hand-side of the image.

Multiple landmark tracking overcomes the problem. Instead of detecting a single landmark (or target) pattern, the means 210A of the system could choose to detect one or more landmark patterns within different landmark regions depending on which pattern(s) contributed most to inferring the position of the target pattern. For example, if the target pattern is leaving the field of view on the left-hand-side, then the system could elect to detect a landmark pattern towards the right of the target pattern. On the other hand, if the target pattern is leaving the field of view on the right-hand-side, the system could elect to detect a landmark pattern towards the left of the target pattern. If more than one landmark pattern is visible, the system could elect to detect more than one landmark pattern at any one time in order to infer the position of the target pattern even more precisely. As taught in the prior art, this system can be implemented using the results of pattern detection in a previous image in the background sequence to control pattern detection in the next image of the sequence. Specifically, the system uses the position of the landmark pattern that was detected in the previous image to infer the approximate positions of other landmark patterns in the previous image. These positions are inferred in the same way the position of the target pattern is inferred from a single landmark pattern. The system then elects to detect in the current image the landmark pattern that was nearest the target pattern in the previous image, and that was sufficiently far from the border of the previous image. As a result, when a detected landmark region becomes close to leaving the field of view of the background scene, the system elects to detect another landmark region that is further from the image border.

When a scene cut in the background sequence occurs, or when the system is first turned on, it has no (correct) prior knowledge of the locations of landmarks. When this occurs, the system can elect to search for all the landmark patterns throughout the whole image. Once landmarks have been detected in this way, the system can resume the procedure of directing the landmark search in the next frame using detection results from the current frame.

A problem with the implementation of basic landmark region tracking by means 210A is that a single, fixed difference vector is often not sufficient to characterize the pose relationship between the target pattern and the landmark pattern. For example, distortion at the periphery of the lens of the camera can make distances between features in the scene appear smaller or larger as the camera moves across the screen. For example, the distance between the target and landmark patterns might be 20 pixels when both patterns appear close to the image center, yet the distance might increase to 25 pixels when the patterns appear near the edge of the image. The problem is not just limited to changes in distance; changes in pose (scale, orientation and perspective distortion) can occur due to lens distortion, as well as differences in depths of the patterns in the scene. The result is that the position (and pose) of the target pattern is inferred incorrectly and video is inserted in the wrong place in the background scene. To overcome this problem, corrective-landmark-region tracking compensates for errors in the position and pose of the target pattern using the precisely located positions of the landmark patterns in the background sequence to predict lens-distortion and depth errors at the location of the target pattern. This tracking method includes the steps of:

  • a) computing the precise position and pose of each landmark region with respect to other landmark regions throughout the image sequence;
  • b) computing the position and pose of each landmark region with respect to other landmark regions using the same fixed difference vector throughout the image sequence;
  • c) subtracting b) from a) to determine the error in the simple difference-vector model at each landmark region throughout the image sequence;
  • d) interpolating the error results recovered in c) to predict the error at the location of the target pattern assuming that errors vary smoothly between landmark regions; and
  • e) subtracting the predicted error in position and pose at the target pattern from the original estimate of the position and pose to obtain a more accurate estimate of position and pose at the target pattern.

Technically the most challenging aspect of pattern key insertion is the detection of target and landmark patterns within the video sequences. Detection must be performed reliably and at video rate. One approach makes use of hierarchical structured search and hierarchical motion (registration) techniques. The following is a detailed description of an implementation of pattern-key insertion that uses these techniques.

In Fig. 7 the successive steps performed in implementing the pattern-key insertion technique of the invention are illustrated. Oval blocks correspond to data, and rectangular blocks correspond to processing modules. If 'N' is the image number in a sequence, then the diagram shows how image 'N' of the first source sequence (i. e., the A sequence derived from the background scene in Figs. 2-5) is merged with image N of the second source sequence (i. e., the B sequence derived from the foreground scene in Figs. 2-5) to produce image N of the destination sequence (i. e., the output scene sequence in Figs. 2-5). There are 3 main components in the pattern-key insertion procedure. In Figs. 2-5, these components comprise means 210A, means 210B and means 214. Means 210A performs the functions of landmark locator for the first source by detecting patterns in the background scene and (2) the pose estimator that estimate the pose of landmarks with respect to a reference image (or images) of the first source. Means 210B performs the functions of landmark locator for the second source by detecting patterns in the foreground scene and (2) the pose estimator that estimate the pose of landmarks with respect to a reference image (or images) of the second source. Means 214 is a geometric transform module that properly formats the video for insertion into the destination sequence. The landmark locator operation determines a coarse estimate of the location of LANDMARKS in image N, Sequence 1 (i.e., the first source sequence). The pose estimator determines a fine estimate of the location through a procedure that systematically registers image regions that have been called 'LOCATORS'. These two components are labeled 'Locate landmarks' and 'Select locator and Fit affine precise alignment to locator' respectively in Fig. 7. The final stage of the procedure actually inserts the video from the second source sequence into the first source sequence to produce the destination sequence.

The initial set-up procedures are separated into two parts: the image-specific-set-up, and the world-specific-set-up. They are separated because in the applications envisaged by the invention, it is expected that the world-specific set-up need only be performed once for a particular image scene. For example, the world-specific set-up might be performed and stored for a particular football field, and then retrieved the next time it is wanted to perform pattern-key insertion on any image sequence showing that particular football field. On the other hand, the image-specific-set-up is concerned with the position of inserted video in the image sequence.

In an image-specific-set-up, before the pattern-key insertion method begins, an operator has to define where in the first source sequence images are to be inserted (the target or DESTINATION region), and where the images will be inserted from in the second source sequence (the SOURCE region). The operator also has to define the position, size and shape of image landmarks and locators. To do this, a REFERENCE image is selected from the first source sequence that contains the destination region. For example, the operator may define the top-left hand corner of this image to be the origin of a world-coordinate system. The operator may define the target or destination region by recording the coordinates of the 4 corners of a closed, convex polygon that encloses the destination region. The four corners of logo pattern "A", in background scene 204A in Figs. 2-5 defines such a polygon. This data is called the DESTINATION coordinate set. Similarly, the operator chooses a source position from an image in the Sequence 2 (i.e., the second source sequence) that is called the SOURCE coordinate set. Initially, it is assumed that there is no camera motion in Sequence 2 (which is the actual case in the foreground scene of Fig. 2), so that any image can be used to define the source coordinate set. Later, this constraint will be relaxed.

In the world specific set-up, the operator defines the position and size of the LOCATOR regions in the reference image by recording the coordinates of the 4 corners of a closed, convex polygon that encloses the locator region. At least one locator must be defined, but if more locators are defined, the robustness of the system is improved when locators move in and out of camera-view. At least one locator must be in full camera-view when the destination region is in partial or full camera-view if precise video insertion is to be achieved in all parts of the sequence. Typically, about 4 or 5 locator regions are defined that surround the destination region so that at least one locator is in full view of the camera as the destination region moves in or out of view.

The operator also has to choose image LANDMARKS that will be used in the first stage of the insertion procedure. In the pattern-key insertion of Figs. 2 and 3, the location of these landmarks must remain fixed with respect to the world coordinate system while pattern-key insertion is being performed; therefore mobile landmarks (for example, cars and people) are not wise selections as landmarks. Good landmark selections might be the corners of a football stand, or the net supports on a tennis court. In Fig. 4 or 5 pattern-key insertion, where the target or destination region is on a moving object, the landmarks must appear on the same object on which the target or destination region appears. The operator must define at least one landmark, but if more landmarks are defined, then the robustness of the landmark locator module improves in the presence of landmark occlusion (mobile objects obscuring a landmark), and when landmarks move in and out of the field of view of the camera. In general, about 4 or 5 landmarks are defined that surround the destination region. This means that at least one landmark should be detected and tracked as the destination region comes into view. Landmarks need not only appear in the reference image; landmarks can be located in other images of the first source sequence, as long as their positions are expressed in world coordinates with respect to the reference image. The world coordinate position for each landmark is calculated using a simple planar world model. If it is assumed that landmark 1 is visible in the reference image, and that both landmark 1 and landmark 2 are present in a later image in the sequence, then the world coordinates of landmark 2 is equal to the world coordinates of landmark 1 added to the difference of the coordinates of landmark 2 and landmark 1 in the local coordinate system of the later image. This calculation can be repeated so that the world coordinates of any landmark in the sequence can be estimated, provided that two or more landmarks are visible in an image at all times.

Typically a set of landmarks is selected. The relative positions of these landmarks (as well as the location of locators) is recorded in a graph structure (labeled WORLD MAP in Fig. 7). In the hierarchical structured search procedure, the search begins with the most prominent landmark. This is typically represented at low resolution. The search then progresses at finer resolutions using results from coarser resolutions to guide the search. Such a search structure improves the efficiency of the landmark detector module. In Fig. 7, the set of coarse/fine images that records the reference 'template' for each landmark is called the TREE STRUCTURE.

Locate Landmarks is the first processing module of the pattern-key insertion technique shown in Fig. 7. For the first image in the first source sequence, the module takes each tree structure and searches throughout the image for each landmark. To increase processing efficiency, the search procedure first locates for each landmark the highest correlation match at a coarse resolution, that limits the search area for the landmark at successively finer resolutions. The output is a coordinate position (with respect to the current image) of each VISIBLE landmark in the image. If a correlation match is below a threshold, then the landmark is declared INVISIBLE and no coordinate position is returned. This might happen if a landmark is out of view, or if a landmark is occluded.

Now that the position of some landmarks with respect to the current image is known, that information can be combined with the world map to produce a LOCAL MAP. The local map is a file containing the estimated coordinates of all landmarks and locators (invisible or invisible) with respect to the current image. The local map is basically the world map shifted by the sum of the world coordinates of a detected landmark and the local coordinates of the detected landmark. Therefore, all landmark and locator positions in the local map are defined with respect to the top-left hand corner of the current image. In practice, it is desired to combine information from all detected landmarks to increase the precision of the position estimates. The uncertainty in the detected landmark positions is modeled to determine an optimal estimate of the positions of the other landmarks and locators. Specifically, a fixed uncertainty in detected position with respect to orientation and translational position is assumed for each landmark. Then it can be determined, using the world map, how these fixed uncertainties map onto the translation uncertainties for each undetected landmark. For example, a small uncertainty in the orientation of a detected landmark can result in a large uncertainty in the vertical position of an undetected landmark if the two landmarks are far apart horizontally. For each undetected landmark, therefore, there is an estimated position with associated uncertainty determined from each detected landmark. The position of the undetected landmark is estimated by determining the weighted average of the estimated positions, using the determined uncertainties as the weighting factors.

The coordinate dimensions of the current image are known, and the maximum speed of the camera motion (e.g., 20 pixels motion per frame) can be estimated, so that it can be predicted from the local map whether or not a landmark or locator might be visible in the next image of the sequence. For each landmark, we store this VISIBILITY prediction in the local map.

After the local map has been produced for the first image of the sequence, the local map is used to guide the landmark-locator module for the second image of the sequence. In this case, the landmark-locator module will only search for landmarks that are predicted as VISIBLE in the previous local map, and it will focus its search in the region surrounding the location of each of such landmarks in the previous local map. The width and height of this search region is controlled for each landmark by the estimated maximum speed of the camera motion and also the predicted uncertainty in the position of each landmark. This focused-search procedure dramatically improves processing efficiency.

If a scene change or scene cut occurs in the image sequence and no landmarks are detected within the focused search regions, then the system searches throughout the entire image for landmarks. If landmarks are still not detected, then it is assumed that a completely different world scene is being displayed, and the pattern-key insertion for that image is suspended. This means that sequences showing pattern-key insertion can be freely mixed with other sequences without any adverse affects on insertion quality.

The Select Locator and Fit Affine Precise Alignment module performs the second major operation of pattern-key insertion. The module first selects which of the several locator regions should be used to determine a precise image motion estimate. It is desired that a locator region that is close to the destination region be used because the motion of the locator will be used as an estimate of the motion of the destination region. On the other hand, it is desired to ensure that the locator region is fully visible so that the motion estimate is precise. The local map contains the estimated positions of the top-left hand corners of each locator, so that combining this information with the LOCATOR coordinate data set gives an estimate of the coordinates of each corner of each locator with respect to the current image. This is accomplished by choosing the locator that lies entirely within the image and has the smallest distance between the centers of gravity of the locator and destination region. If no locator region satisfies this condition, then the affine fitting module is aborted, and only the local map is used. The motion estimate that is used is the position of the locator that has the smallest distance between the centers of gravity of the locator and destination region.

If a locator region lies entirely within the image, then an affine precise alignment model between the locator region in the reference image is fitted and the current image, using the coordinates of the locator region in the local map as a first estimate of the motion. The affine precise alignment model is computed at a coarse resolution before refining the model at progressively finer resolutions. The result is a set of parameters that define the motion between the locator in the reference image and the locator in the current image. The affine precise alignment estimation process is described in more detail in the various prior-art publications set forth above.

In order to know where to insert video, the precise motion between the source region in the second source sequence and the destination region in the first source sequence needs to be determined. This is done by cascading the motion that was recovered from the affine precise alignment estimation procedure with the motions between the source, locator and destination regions that were determined from the coordinates of the corners of these regions. In particular, the following steps are performed:

  • Source region -> Locator in reference image: A motion estimate is determined between these regions by fitting an affine or quadric motion model to the 4 motion vectors determined from the 4 corner coordinates of each region. The same affine fitting procedure as the one mentioned previously is used for this purpose.
  • Locator in reference image -> Locator in current image: This result is already available from the affine precise alignment estimation module.
  • Locator in current image -> Destination region in current image: This motion is approximated as the affine or quadric motion estimate between these regions in the REFERENCE image. The same technique as the one mentioned previously is used for this purpose

These motion estimates are cascaded to produce a single mapping estimate between the source region in the reference image and the destination region in the current image. Any error in the mapping arises from the third and last mapping procedure where it was assumed that the relative positions of the locator and destination region in the reference image are the same as the relative positions in the current image (i.e., a planar world model was assumed). Due to lens distortion and depth variations in the scene, this assumption is not always valid. However, in real-world scenes, it can be a good approximation, especially if the locator and destination regions are in close proximity in the current image. Moreover, the affine precise alignment estimation module gives information on the difference between the motion that was predicted using a planar world model (the initial motion estimate provided by the local map) and the actual motion that was measured. This information can be used to model the errors arising from the planar world assumption in order to refine the mapping estimate that has been previously determined. Because this error-function model has not yet been implemented, it is displayed in broken lines in Fig. 7.

Preferably the video is color video. In this case, all processing required for landmark detection and pose estimation of locators is performed on the luminance (brightness) component of a color image sequence. To insert color video into color sequences, each color component of the source video is inserted into the corresponding color of the destination video. For example, the source region of the red intensity image of the second source sequence is inserted into the destination region of the red intensity image of the first source sequence.

In practice, many video sequences are recorded in an interlaced NTSC format where each image of a sequence is comprised of 2 fields, each recorded at a 1/60th second time interval, such that every alternate, horizontal line of the image belongs to the same field. As a result, horizontal motion of the camera can result in motion between adjacent lines of the image. For example, a vertical line imaged by an interlace camera panning quickly in a horizontal direction will be recorded as a jagged, vertical line. There are two consequences of this problem for pattern-key analysis. First, interlace can distort the intensity patterns of landmarks and locators so that the landmark locator and affine fitting modules are less robust. Second, inserting video from the second source sequence with a different interlace distortion (or no distortion) to that in the first source sequence could result in unacceptable insertion results. For example, a fast moving camera would result in considerable interlace distortion throughout the image yet inserted video derived from a stationary camera would contain no interlace distortion, and it would appear synthetic and false.

One way to overcome these problems is to perform pattern-key insertion separately on the first and second fields of the image. In this case, separate reference images, locator coordinates and source coordinate data sets are used, but the same tree structure, world map, local map and destination coordinate data sets are used. In the set-up procedure, locator coordinates and source coordinate data sets in one field are chosen, and the locator coordinates and source coordinates in the second field are determined by determining the affine precise alignment between each locator region in field 1 with the corresponding locator region in field 2. The same affine fit module is used that is used in the pattern-key insertion method. The result is a set of different (although proximate) locator and source coordinates that correspond to each locator in field 2 of the image sequence. Interlace distortion in sequence 2 is therefore replicated precisely in the inserted video sequence.

The result is a pattern-key insertion procedure that exactly replicates the image sequence that would have been observed if a physical 3-dimensional scene object in the first source sequence were replaced by another physical 3-dimensional scene scene object. In order to achieve this exact replication of the image sequence, the pattern-key insertion procedure should preferably simulate motion blurring.

The following generalizations may be made:

  • a) arbitrary shapes for the locator, source and destination regions can be used rather than the 4-sided, convex polygon.
  • (b) the source video can be merged with the destination video to smooth sharp brightness, transitions that may occur between the two video sources. Some merging will be necessary to produce band-limited, NTSC compatible signals. Multi-resolution merging may be used.
  • (c) illumination changes from, for example, lens vignetting that changes the light detected by the camera light-sensors as the camera pans across the screen can be included in processing the inserted inserted video to match the background
  • (d) the results of the affine-fitting module can be used to refine the local map to increase the efficiency of the method.
  • (e) the pattern-key insertion method can be generalized to allow for camera zoom.
  • (f) a sophisticated pattern-key insertion system at a broadcasting studio could insert a pattern into a video sequence that is easily identified by a simpler pattern-key insertion system located at smaller broadcasting and cable outlets elsewhere.
  • (g) pattern key insertion systems can be cascaded together so that inserted video is superimposed on top of other inserted video.
  • (h) because pattern-key insertion can insert any standard video sequence, all current video manipulation techniques can be used to preprocess the inserted video sequence without any consequence on the pattern key insertion technique. For example, insertion of zoomed video; that is, video that increases in size within the destination region over time can be included.
  • (i) In the processing required for implementing pattern-key insertion, 3-d models of the scene may be used instead of the above-described 2-d affine models.

While the pattern-key insertion technique of the invention will improve the quality of any video insertion currently performed by the color-key insertion technique, there are other applications that only the pattern-key insertion technique of the invention can perform. These other applications include:

  • a) inserting video into another image sequence recorded in an environment that cannot be controlled by the user.
  • b) automating animation techniques used to combine real-world sequences with animated sequences.
  • c) inserting advertisements on top of other advertisements or image features in a video sequence.
  • d) inserting a moving region within an image sequence into another moving region within the same image sequence. For example, the image of the umpire in a tennis match could be inserted on the scoreboard in the same tennis match.

It is to be understood that the apparatus and method of operation taught herein are illustrative of the invention. Modifications may readily be devised by those skilled in the art without departing from the scope of the invention. While the various embodiments have been described in terms of three dimensional physical objects, it is to be understood that objects can also include two dimensional physical objects, electronically generated images and other types of images which can be detected and/or recorded.


Anspruch[de]
  1. Vorrichtung zum Ersetzen eines ersten Musters (A) in einer Folge von aufeinanderfolgenden Videoeinzelbildern einer Szene durch ein zweites Muster (B), mit:
    • einer ersten Einrichtung (210A), um das erste Muster (A) in der Folge von aufeinanderfolgenden Videoeinzelbildern örtlich festzulegen, und um jeweils Schätzwerte für die Orientierung und Größe des ersten Musters in jedem der Einzelbilder zu erhalten,
    • einer zweiten Einrichtung (214), um das zweite Muster (B) in eine Folge von umgeformten zweiten Mustern geometrisch umzuformen unter Ansprechen auf die jeweiligen Schätzwerte der Orientierung und Größe des ersten Musters, und
       dritten Einrichtungen (210A, 212), die auf die Schätzwerte der Orientierung und Größe des ersten Musters ansprechen, um das Auftreten des ersten Musters in den Videoeinzelbildern durch die entsprechenden geometrisch umgeformten zweiten Muster zu ersetzen, dadurch gekennzeichnet, daß die erste Einrichtung aufweist:
    • eine Einrichtung (310A) (1), um Positionen für eine Mehrzahl von Ortungspunkten relativ zu einem Bild festzulegen, welches durch ein Einzelbild der Folge aufeinanderfolgender Videoeinzelbilder repräsentiert wird,
    • eine Einrichtung (310A) (1), die auf die festgelegten Positionen der Mehrzahl von Ortungspunkten anspricht, um automatisch die Position eines aus der Mehrzahl von Ortungspunkten in jedem der Videoeinzelbilder in der Folge festzulegen, und
    • einer Einrichtung (310A (2)), um aus der festgelegten relativen Position des einen Ortungspunktes in jedem der Videoeinzelbilder eine Orientierung und Größe für das erste Muster bezüglich jedes der Videoeinzelbilder in der Folge abzuschätzen.
  2. Vorrichtung nach Anspruch 1, wobei:

       die erste Einrichtung eine Einrichtung (310A (2)) aufweist, welche eine geklärte, exakte Ausrichtung verwendet, um eine perspektivische Umwandlung bereitzustellen, um aus der bestimmten Position des einen Ortungspunktes die Orientierung und Größe des ersten Musters in jedem Einzelbild der Folge aufeinanderfolgender Videoeinzelbilder abzuschätzen.
  3. Vorrichtung nach Anspruch 1, wobei:

       die erste Einrichtung eine Einrichtung (310A (1)) aufweist, die auf die Folge von aufeinanderfolgenden Videoeinzelbildern anspricht, welche aus einer Kamerasicht von einem der Mehrzahl von Ortungspunkten abgeleitet wurde, um Techniken der Suche von grob nach fein zu verwenden, um den einen aus der Mehrzahl von Ortungspunkten in der Szene zu erfassen.
  4. Vorrichtung nach Anspruch 1, wobei:

       das zweite Muster (B) ein vorbestimmter Teil eines zweiten Bildes ist, und die zweite Einrichtung eine Einrichtung aufweist, die auf ausgewählte geometrische Muster anspricht, um den betreffenden Teil des zweiten Bildes festzulegen, der in jene eingegeben wurde, und zwar für die Verwendung beim Berechnen der geometrischen Transformationen des zweiten Musters unter Verwendung der geschätzten Orientierung und Größe des ersten Musters in dem jeweiligen Einzelbild der Folge aufeinanderfolgender Videoeinzelbilder.
  5. Vorrichtung nach Anspruch 1, wobei:
    • das zweite Muster (B) ein sich bewegendes Muster in einer zweiten Videoszene ist, die durch eine weitere Folge aufeinanderfolgender Videoeinzelbilder definiert wird, wobei ein vorbestimmter Teil des sich bewegenden zweiten Musters ein Bezugsmuster bildet,
    • die Vorrichtung weiterhin eine vierte Einrichtung (210B) aufweist, um das sich bewegende zweite Muster in der zweiten Szene zu erfassen, und um die Orientierung und Größe des erfaßten, sich bewegenden zweiten Musters bezüglich des Bezugsmusteranteiles desselben abzuschätzen, und
    • die zweite Einrichtung (14) für die geometrische Transformation des zweiten Musters den Bezugsmusteranteil der abgeschätzten Orientierung und Größe des erfaßten, sich bewegenden zweiten Musters als Ursprung verwendet, um ein stabilisiertes, transformiertes Bild des zweiten Musters bezüglich des Bezugsmusteranteiles desselben bereitzustellen, und die abgeschätzte Orientierung und Größe des erfaßten ersten Musters verwendet, um das stabilisierte, transformierte Bild des zweiten Musters geometrisch zu transformieren.
  6. Vorrichtung nach Anspruch 5, wobei:

       die vierte Einrichtung (210B) auf die Folge aufeinanderfolgender Videoeinzelbilder anspricht, welche die zweite Szene definieren und die Abschätzung der geklärten exakten Ausrichtung verwenden, um die Orientierung und Größe des erfaßten, sich bewegenden zweiten Musters bezüglich des Bezugsmusteranteils in der zweiten Szene abzuschätzen.
  7. Vorrichtung nach Anspruch 1, wobei die jeweiligen Positionen der Mehrzahl von Ortungspunkten des ersten Musters in einer Weltkarte gespeichert werden, wobei das erste Muster nur teilweise in einem der Videoeinzelbilder in der Folge enthalten oder in diesem vollständig abwesend sein kann, und wobei:

       die Einrichtung (310A (1)) zum Bestimmen der Positionen der Mehrzahl von Ortungspunkten aufweist:
    • eine Einrichtung, die auf die jeweiligen Positionen der Ortungspunkte anspricht, welche in der Weltkarte gespeichert sind, sowie auf die festgestellte Position des einen der Mehrzahl von Ortungspunkten in einem gegebenen Videoeinzelbild, um daraus eine relative Position eines zweiten der Mehrzahl von Ortungspunkten in dem von dem gegebenen Videoeinzelbild repräsentierten Bild abzuleiten, und
    • eine Einrichtung, die auf die abgeleitete relative Position des zweiten der Mehrzahl von Ortungspunkten anspricht, um den zweiten der Mehrzahl von Ortungspunkten zu erfassen, und wobei
    • die Einrichtung für das Abschätzen der Orientierung und Größe des ersten Musters eine Einrichtung aufweist, die auf die entsprechenden Positionen der ersten und zweiten der Mehrzahl von Ortungspunkten anspricht, um die Orientierung und Größe des ersten Musters in dem gegebenen Videoeinzelbild abzuschätzen.
  8. Vorrichtung nach Anspruch 7, wobei jeder der ersten und zweiten aus der Mehrzahl von Ortungspunkten durch eine Mehrzahl von Pixelpositionen getrennt ist.
  9. Vorrichtung nach Anspruch 3, wobei:
    • die erste Einrichtung (210A) auf die festgestellte Position des einen der Mehrzahl von Ortungspunkten in einem ersten Videoeinzelbild der Folge von aufeinanderfolgenden Videoeinzelbildern anspricht, sowie auf einen Bewegungsvektor, um eine Position für den einen der Mehrzahl von Ortungspunkten in einem folgenden Videoeinzelbild der Folge von Videoeinzelbildern abzuleiten, und
    • die erste Einrichtung auf die abgeleitete Position des einen aus der Mehrzahl von Ortungspunkten anspricht, um die Position des einen aus der Mehrzahl von Ortungspunkten in dem folgenden Videoeinzelbild zu bestimmen,
    • wobei die erste Einrichtung auf die jeweils festgestellten Positionen des einen aus der Mehrzahl von Ortungspunkten in dem ersten Videoeinzelbild und in dem folgenden Videoeinzelbild anspricht, um einen Bewegungsvektor zu erzeugen für die Verwendung durch die erste Einrichtung beim Bestimmen der Position des einen aus der Mehrzahl von Ortungspunkten in einem weiteren folgenden Videoeinzelbild.
  10. Vorrichtung nach Anspruch 1, wobei:

       jedes Videobild erste und zweite Felder aufweist und die ersten, zweiten und dritten Einrichtungen (210A, 214, 212) so arbeiten, daß sie ein erstes Muster (A) in dem Bild lokalisieren und das erste Muster (A) getrennt von den ersten und zweiten Feldern durch das transformierte zweite Muster (B) ersetzen.
  11. Vorrichtung nach Anspruch 1, wobei:
    • die Folge von Videoeinzelbildern Farbvideoeinzelbilder sind, die eine Luminanzsignalkomponente und eine Chrominanzsignalkomponente enthalten,
    • wobei die Einrichtung (310A (1)) zum Bestimmen der Positionen der Mehrzahl von Ortungspunkten relativ zu dem durch ein Einzelbild wiedergegebenen Bild auf die Luminanzkomponente des Videoeinzelbildes anspricht, und
    • die Einrichtung (310A (1)) für das Abschätzen der Orientierung und Größe des ersten Musters bezüglich jedes der Videoeinzelbilder auf die Luminanzkomponente der Videoeinzelbilder anspricht.
  12. Verfahren zum Ersetzen eines ersten Musters (A) in einer Folge von Videoeinzelbildern einer Szene durch ein zweites Muster (B), welches die Schritte aufweist:
    • a) Lokalisieren des ersten Musters (A) in der Folge von aufeinanderfolgenden Videoeinzelbildern und Erhalten entsprechender Abschätzungen der Orientierung und Größe des ersten Musters in jedem der Einzelbilder,
    • b) geometrisches Transformieren des zweiten Musters (B) in eine Folge von zweiten Mustern unter Verwendung der entsprechenden Abschätzungen der Orientierung und Größe des ersten Musters, und
    • c) Ersetzen des erfaßten ersten Musters durch das entsprechende der geometrisch transformierten zweiten Muster unter Ansprechen auf die Abschätzungen in der Orientierung und Größe des ersten Musters in jedem der Videoeinzelbilder in der Folge,
    dadurch gekennzeichnet, daß der Schritt (a) die Schritte aufweist:
    • a1) Bestimmen der Positionen einer Mehrzahl von Ortungspunkten relativ zu einem Bild, welches durch ein Einzelbild der Folge aufeinanderfolgender Videoeinzelbilder wiedergegegen wird,
    • a2) automatisches Bestimmen der relativen Position eines der Mehrzahl von Ortungspunkten in jedem der Videoeinzelbilder in der Folge, unter Ansprechen auf die festgestellten Positionen der Mehrzahl von Ortungspunkten, und
    • a3) Abschätzen einer Orientierung und Größe für das erste Muster bezüglich der festgestellten Position des einen der Mehrzahl von Ortungspunkten in jedem der Videoeinzelbilder in der Folge.
  13. Verfahren nach Anspruch 1 2, wobei:

       der Schritt (a3) eine geklärte, exakte Ausrichtung verwendet, um eine perspektivische Transformation bereitzustellen, um die Orientierung und Größe des ersten Musters bezüglich der festgestellten Position des einen aus der Mehrzahl von Ortungspunkten in jedem der Videoeinzelbilder in der Folge abzuschätzen.
  14. Verfahren nach Anspruch 12, wobei:

       der Schritt (a2) auf Videoeinzelbilder anspricht, die aus einer Kamerasicht eines der Mehrzahl von Ortungspunkten abgeleitet wurden und Techniken der Suche von grob nach fein verwendet, um den einen aus der Mehrzahl von Ortungspunkten in der Szene zu erfassen.
  15. Verfahren nach Anspruch 12, wobei:
    • das zweite Muster (B) ein festes Muster ist, und
    • der Schritt (b) auf ausqewählte qeometrische Parameter ansoricht. welche das feste Muster definieren, um die geometrisch Transformierte des zweiten Musters unter Verwendung der abgeschätzen Orientierung und Größe des ersten Musters zu berechnen.
  16. Verfahren nach Anspruch 12, wobei:
    • das zweite Muster (B) ein sich bewegendes Muster in einer zweiten Szene ist, welche durch eine weitere Folge aufeinanderfolgender Videoeinzelbilder definiert wird, wobei ein vorbestimmter Anteil des sich bewegenden zweiten Musters ein Bezugsmuster bildet,
    • wobei das Verfahren weiterhin den Schritt (D) aufweist, daß das sich bewegende zweite Muster in der zweiten Szene erfaßt wird und die Orientierung und Größe des erfaßten, sich bewegenden zweiten Musters bezüglich des Bezugsmusteranteiles desselben abgeschätzt wird, und
    • der Schritt (b) das zweite Muster geometrisch transformiert unter Verwendung der abgeschätzten Orientierung und Größe des zweiten sich bewegenden Musters, um ein stabilisiertes, transformiertes Bild des zweiten Musters bereitzustellen, und die abgeschätzte Orientierung und Größe des ersten Musters verwendet, um das stabilisierte, transformierte Bild des zweiten Musters geometrisch zu transformieren.
  17. Verfahren nach Anspruch 1 5, wobei:

       Schritt (d) auf die Folge von aufeinanderfolgenden Videoeinzelbildern anspricht, welche die zweite Szene definieren, und eine geklärte, genaue Ausrichtungsabschätzung für das Abschätzen der Orientierung und Größe des erfaßten, sich bewegenden zweiten Musters bezüglich des Bezugsmusteranteiles in der zweiten Szene abzuschätzen.
  18. Verfahren nach Anspruch 12, wobei:
    • das Verfahren weiterhin den Schritt (d) aufweist, daß das zweite Muster einen Grafikgenerator (228) verwendet, und daß
    • Schritt (b) den Grafikgenerator verwendet, um das zweite Muster geometrisch zu transformieren.
  19. Verfahren nach Anspruch 12, wobei die jeweiligen Positionen der Mehrzahl von Ortungspunkten und des ersten Musters in einer Weltkarte gespeichert werden, wobei das erste Muster nur teilweise in einem der Videoeinzelbilder in der Folge enthalten oder in diesem vollständig fehlen sein kann, und wobei:

       Schritt (a1) die Schritte aufweist:

       Verwenden der jeweiligen Positionen der Ortungspunkte, die in der Weltkarte gespeichert sind, zusammen mit der festgestellten Position des einen der Mehrzahl von Ortungspunkten in einem gegebenen Videoeinzelbild, um eine relative Position eines weiteren der Mehrzahl von Ortungspunkten in dem Bild abzuleiten, welches durch das gegebene Videoeinzelbild repräsentiert wird, und Verwenden der abgeleiteten relativen Position des einen aus der Mehrzahl von Ortungspunkten, um die Position des weiteren der Mehrzahl von Ortungspunkten zu bestimmen, und

       Schritt (a3) den Schritt aufweist:

       Verwenden der jeweiligen Positionen des einen der Mehrzahl von Ortungspunkten und des weiteren der Mehrzahl von Ortungspunkten, um die Orientierung und Größe des ersten Musters in dem gegebenen Videoeinzelbild abzuschätzen.
Anspruch[en]
  1. Apparatus for replacing a first pattern (A) in a sequence of successive video image frames of a scene with a second pattern (B) comprising:
    • first means (210A) for locating the first pattern (A) in the sequence of successive video image frames and for obtaining respective estimates of orientation and size for the first pattern in each of the image frames;
    • second means (214) for geometrically transforming said second pattern (B) into a sequence of transformed second patterns responsive to the respective estimates of orientation and size of said first pattern; and
    • third means (210A, 212) responsive to the estimates of the orientation and size of said first pattern for replacing occurrences of said first pattern in said video image frames with respective ones of said geometrically transformed second patterns, characterized in that the first means includes:
    • means (310A(1)), for determining locations for a plurality of landmarks relative to an image represented by one image frame of the sequence of successive video image frames,
    • means (310A(1), responsive to the determined locations of the plurality of landmarks, for automatically determining the location of one of the plurality of landmarks in each of said video image frames in said sequence; and
    • means (310A(2)), for estimating, from the determined relative location of the one landmark in each of said video image frames, an orientation and size for said first pattern with respect to each of said video image frames in said sequence.
  2. The apparatus of Claim 1, wherein:

       said first means comprises means (310A(2)), employing affine precise alignment to provide perspective transformation for estimating, from the determined location of the one landmark, the orientation and size of the first pattern in each frame of said sequence of successive video image frames.
  3. The apparatus of Claim 1, wherein:

       said first means includes means (310A(1)), responsive to said sequence of successive video image frames derived from a camera view of the one of said plurality of landmarks, for employing coarse-to-fine search techniques for detecting said one of said plurality of landmarks in said scene.
  4. The apparatus of Claim 1, wherein:

       said second pattern (B) is a predetermined portion of a second image; and said second means includes means, responsive to selected geometric patterns that define said portion of the second image, applied as an input thereto, for use in computing the geometric transformations of said second pattern using the estimated orientation and size of said first pattern in each respective frame of said sequence of successive video frames.
  5. The apparatus of Claim 1, wherein:
    • said second pattern (B) is a moving pattern in a second video scene defined by a further sequence of successive video image frames, wherein a predetermined portion of said moving second pattern constitutes a reference pattern;
    • said apparatus further comprises fourth means (210B) for detecting said moving second pattern in said second scene and for estimating the orientation and size of said detected moving second pattern with respect to said reference-pattern portion thereof; and
    • said second means (214) for geometrically transforming said second pattern uses the reference-pattern portion of said estimated orientation and size of said detected moving second pattern as an origin to provide a stabilized transformed image of said second pattern with respect to the reference-pattern portion thereof, and uses the estimated orientation and size of said detected first pattern to geometrically transform said stabilized transformed image of said second pattern.
  6. The apparatus of Claim 5 wherein:

       said fourth means (210B) is responsive to said sequence of successive video image frames defining said second scene and employing affine precise alignment estimation for estimating the orientation and size of said detected moving second pattern with respect to said reference pattern portion in said second scene.
  7. The apparatus of Claim 1 wherein the respective locations of the plurality of landmarks and of said first pattern are stored in a world map, whereby said first pattern may be only partially included or entirely absent from one of the video image frames in said sequence; and wherein:

       the means (310A(1)) for determining locations for the plurality of landmarks comprises:
    • means, responsive to the respective locations of the landmarks stored in said world map and to the determined location of the one of said plurality of landmarks in a given video image frame, for inferring a relative location of a second one of said plurality of landmarks in the image represented by said given video image frame; and
    • means, responsive to the inferred relative location of the second one of said plurality of landmarks to detect the second one of said plurality of landmarks; and
       the means for estimating the orientation and size of the first pattern includes means responsive to the respective locations of the first and second ones of said plurality of landmarks for estimating the orientation and size of the first pattern in the given video image frame.
  8. The apparatus of Claim 7, wherein each of the first and second ones of the plurality of landmarks is separated from the first pattern by a plurality of pixel positions.
  9. The apparatus of Claim 3, wherein:
    • said first means (210A) is responsive to the determined location of the one of the plurality of landmarks in a first video image frame of said sequence of successive video image frames and to a motion vector for inferring a position for the one of the plurality of landmarks in a subsequent video image frame of said sequence of successive video image frames; and
    • said first means is responsive to the inferred position for the one of the plurality of landmarks for determining the location of the one of said plurality of landmarks in said subsequent video image frame;
    • said first means is responsive to the respective determined positions of the one of the plurality of landmarks in the first video image frame and in the subsequent video image frame to produce a motion vector for use by the first means in determining the location of the one of the plurality of landmarks in a further subsequent video image frame.
  10. The apparatus of claim 1, wherein:

       each video frame comprises first and second fields, and the said first, second and third means (210A, 214, 212) operate to locate the first pattern (A) in the image and replace the first pattern (A) with the transformed second pattern (B) separately on the first and second fields.
  11. The apparatus of claim 1, wherein:
    • said sequence of video image frames are colour video image frames including a luminance signal component and a chrominance signal component;
    • the means (310A(1)) for determining locations for the plurality of landmarks relative to the image represented by the one image frame is responsive to the luminance component of the video image frame; and
    • the means (310A(1)) for estimating an orientation and size for the first pattern with respect to each of said video image frames is responsive to the luminance component of the video image frames.
  12. A method for replacing a first pattern (A) in a sequence of video image frames of a scene with a second pattern (B) comprising the steps of:
    • a) locating the first pattern (A) in the sequence of successive video image frames and obtaining respective estimates of orientation and size for the first pattern in each of the image frames;
    • b) geometrically transforming said second pattern (B) into a sequence of second patterns using the respective estimates of orientation and size of said first pattern; and
    • c) replacing said detected first pattern with a respective one of said geometrically transformed second patterns in response to the estimates of orientation and size of said first pattern in each of said video image frames in said sequence, characterized in that step (a) includes the steps of:
      • a1) determining locations for a plurality of landmarks relative to an image represented by one image frame of the sequence of successive video image frames;
      • a2) automatically determining, responsive to the determined locations of the plurality of landmarks, the relative location of one of the plurality of landmarks in each of said video image frames in said sequence; and
      • a3) estimating an orientation and size for the first pattern with respect to the determined location of the one of the plurality of landmarks in each of said video image frames in said sequence.
  13. The method of Claim 12, wherein:

       step (a3) employs affine precise alignment to provide perspective transformation for estimating the orientation and size of said first pattern with respect to the determined location of the one of the plurality of landmarks in each of said video image frames in said sequence.
  14. The method of Claim 12, wherein:

       step (a2) is responsive to video image frames derived from a camera view of ones of said plurality of landmarks and employs coarse-to-fine search techniques for detecting said one of said plurality of landmarks in said scene.
  15. The method of Claim 12, wherein:
    • said second pattern (B) is a fixed pattern; and
    • step (b) is responsive to selected geometric parameters that define said fixed pattern to compute the geometric transform of said second pattern using the estimated orientation and size of said first pattern.
  16. The method of Claim 12 wherein:
    • said second pattern (B) is a moving pattern in a second scene defined by a further sequence of successive video image frames, wherein a predetermined portion of said moving second pattern constitutes a reference pattern;
    • the method further comprises the step of (d) detecting said moving second pattern in said second scene and estimating the orientation and size of said detected moving second pattern with respect to said reference pattern portion thereof; and
    • the step (b) geometrically transforms said second pattern using the estimated orientation and size of the moving second pattern to provide a stabilized transformed image of said second pattern, and uses the estimated orientation and size of the first pattern to geometrically transform said stabilized transformed image of said second pattern.
  17. The method of Claim 15, wherein:

       step (d) is responsive to said sequence of successive video image frames defining said second scene and employs affine precise alignment estimation for estimating the orientation and size of said detected moving second pattern with respect to said reference-pattern portion in said second scene.
  18. The method of Claim 12, wherein:
    • the method further comprises the step of (d) generating said second pattern using a graphics generator (228); and
    • step (b) employs the graphics generator to geometrically transform the second pattern.
  19. The method of Claim 12 wherein the respective locations of the plurality of landmarks and of said first pattern are stored in a world map, whereby said first pattern may be only partially included or entirely absent from one of the video image frames in said sequence; and wherein:
    • step (a1) comprises the steps of:
      • employing the respective locations of the landmarks stored in said world map together with the determined location of the one of said plurality of landmarks in a given video image frame to infer a relative location of a further one of said plurality of landmarks in the image represented by said given video image frame; and
      • employing the inferred relative location of the further one of said plurality of landmarks to determine the location of the further one of said plurality of landmarks; and
    • step (a3) comprises the step of:

         employing the respective locations of the one of said plurality of landmarks and the further one of said plurality of landmarks to estimate the orientation and size of the first pattern in the given video image frame.
Anspruch[fr]
  1. Appareil pour remplacer un premier motif (A) dans une séquence de trames successives d'image vidéo d'une scène par un second motif (B), comprenant :
    • des premiers moyens (210A) pour localiser le premier motif (A) dans la séquence de trames successives d'image vidéo et pour obtenir des estimations respectives d'orientation et de taille pour le premier motif dans chacune des trames d'image ;
    • des deuxièmes moyens (214) pour transformer, de façon géométrique, ledit second motif (B) en une séquence de seconds motifs transformés sensibles aux estimations respectives d'orientation et de taille dudit premier motif ; et
    • des troisièmes moyens (210A, 212) sensibles aux estimations d'orientation et de taille dudit premier motif pour remplacer des occurrences dudit premier motif dans lesdites trames d'image vidéo par des motifs respectifs desdits seconds motifs transformés de façon géométrique, caractérisé en ce que les premiers moyens comprennent :
    • des moyens (310A(1)) pour déterminer des emplacements pour une pluralité de points de repère se rapportant à une image représentée par une trame d'image particulière de la séquence de trames successives d'image vidéo,
    • des moyens (310A(1)), sensibles aux emplacements déterminés de la pluralité de points de repère, pour déterminer automatiquement l'emplacement d'un point de repère particulier de la pluralité de points de repère dans chacune desdites trames d'image vidéo dans ladite séquence ; et
    • des moyens (310A(2)) pour estimer, à partir de l'emplacement relatif déterminé du point de repère particulier dans chacune desdites trames d'image vidéo, une orientation et une taille pour ledit premier motif par rapport à chacune desdites trames d'image vidéo dans ladite séquence.
  2. Appareil selon la revendication 1, dans lequel :

       lesdits premiers moyens comprennent des moyens (310A(2)), utilisant un alignement précis affine pour réaliser une transformation en perspective pour estimer, à partir de l'emplacement prédéterminé du point de repère particulier, l'orientation et la taille du premier motif dans chaque trame de ladite séquence de trames successives d'image vidéo.
  3. Appareil selon la revendication 1, dans lequel :

       lesdits premiers moyens comprennent des moyens (310A(1)), sensibles à ladite séquence de trames successives d'image vidéo obtenues à partir d'une vue par appareil de prise de vues du point de repère particulier de ladite pluralité de points de repère, pour utiliser des techniques de recherche du plus gros au plus fin pour détecter ledit point de repère particulier de ladite pluralité de points de repère dans ladite scène.
  4. Appareil selon la revendication 1, dans lequel :
    • ledit second motif (B) est une partie prédéterminée d'une seconde image ; et
    • lesdits deuxièmes moyens comprennent des moyens sensibles à des motifs géométriques sélectionnés qui définissent ladite portion de la seconde image, appliqués en tant qu'entrée à ces derniers, à utiliser dans le calcul de transformations géométriques dudit second motif en utilisant l'orientation et la taille estimées dudit premier motif dans chaque trame respective de ladite séquence de trames vidéo successives.
  5. Appareil selon la revendication 1, dans lequel :
    • ledit second motif (B) est un motif mobile dans une seconde scène vidéo définie par une séquence supplémentaire de trames successives d'image vidéo, dans lequel une partie prédéterminée dudit second motif mobile constitue un motif de référence ;
    • ledit appareil comprend de plus des quatrièmes moyens (210B) pour détecter ledit second motif mobile dans ladite seconde scène et pour estimer l'orientation et la taille dudit second motif mobile détecté par rapport à ladite partie de motif de référence de ce dernier ; et
    • lesdits deuxièmes moyens (214) pour transformer, de façon géométrique, ledit second motif, utilisent la partie de motif de référence desdites orientation et taille estimées dudit second motif mobile détecté en tant qu'origine pour donner une image transformée stabilisée dudit second motif par rapport à la partie de motif de référence de ce dernier, et utilise l'orientation et la taille estimées dudit premier motif détecté pour transformer, de façon géométrique, ladite image transformée stabilisée dudit second motif.
  6. Appareil selon la revendication 5, dans lequel :

       lesdits quatrièmes moyens (210B) sont sensibles à ladite séquence de trames successives d'image vidéo définissant ladite seconde scène et utilisant une estimation d'alignement précis affine pour estimer l'orientation et la taille dudit second motif mobile détecté par rapport à ladite partie de motif de référence dans ladite seconde scène.
  7. Appareil selon la revendication 1, dans lequel les emplacements respectifs de la pluralité de points de repère et dudit premier motif sont stockés dans une carte mondiale, de sorte que ledit premier motif peut être seulement partiellement inclus ou totalement absent d'une trame particulière des trames d'image vidéo dans ladite séquence ; et dans lequel :

       les moyens (310A(1)) pour déterminer les emplacements de la pluralité de points de repère comprennent :
    • des moyens, sensibles aux emplacements respectifs des points de repère stockés dans ladite carte mondiale et à l'emplacement déterminé du point de repère particulier de ladite pluralité de points de repère dans une trame d'image vidéo donnée, pour déduire un emplacement relatif d'un second point de repère de ladite pluralité de points de repère dans l'image présentée par ladite trame d'image vidéo donnée ; et
    • des moyens, sensibles à l'emplacement relatif déduit du second point de repère particulier de ladite pluralité de points de repère pour détecter le second point de repère particulier de ladite pluralité de points de repère ; et
    • les moyens pour estimer l'orientation et la taille du premier motif comprennent des moyens sensibles aux emplacements respectifs des premier et second points de repère particuliers de ladite pluralité de points de repère pour estimer l'orientation et la taille du premier motif dans la trame d'image vidéo donnée.
  8. Appareil selon la revendication 7, dans lequel chacun des premier et second points de repère particuliers de la pluralité de points de repère est séparé du premier motif par une pluralité de positions d'éléments d'image (pixel).
  9. Appareil selon la revendication 3, dans lequel :
    • lesdits premiers moyens (210A) sont sensibles à l'emplacement déterminé du point de repère particulier de la pluralité de points de repère dans une première trame d'image vidéo de ladite séquence de trames successives d'image vidéo et à un vecteur de mouvement pour déduire une position pour le point de repère particulier de la pluralité de points de repère dans une trame d'image vidéo suivante de ladite séquence de trames successives d'image vidéo ; et
    • lesdits premiers moyens sont sensibles à la position déduite pour le point de repère particulier de la pluralité de points de repère pour déterminer l'emplacement du point de repère particulier de ladite pluralité de points de repère dans ladite trame d'image vidéo suivante ;
    • lesdits premiers moyens sont sensibles aux positions respectives déterminées du point de repère particulier de la pluralité de points de repère dans la première trame d'image vidéo et dans la trame d'image vidéo suivante pour produire un vecteur de mouvement à utiliser par les premiers moyens dans la détermination de l'emplacement du point de repère particulier de la pluralité de points de repère dans une trame d'image vidéo suivante.
  10. Appareil selon la revendication 1, dans lequel :

       chaque trame vidéo comprend des premier et second champs, et lesdits premiers, deuxièmes et troisièmes moyens (210A, 214, 212) fonctionnent pour localiser le premier motif (A) dans l'image et pour remplacer le premier motif (A) par le second motif transformé (B) séparément sur les premier et second champs.
  11. Appareil selon la revendication 1, dans lequel :
    • ladite séquence de trames d'image vidéo est constituée par des trames d'image vidéo en couleurs incluant une composante de signal de luminance et une composante de signal de chrominance ;
    • les moyens (310A(1)), pour déterminer des emplacements pour la pluralité de points de repère se rapportant à l'image représentée par la trame d'image particulière, sont sensibles à la composante de luminance de la trame d'image vidéo ; et
    • les moyens (310A(1)), pour estimer une orientation et une taille pour le premier motif par rapport à chacune des trames d'image vidéo, sont sensibles à la composante de luminance des trames d'image vidéo.
  12. Procédé pour remplacer un premier motif (A) dans une séquence de trames d'image vidéo d'une scène par un second motif (B), comprenant les étapes suivantes :
    • a) localisation du premier motif (A) dans la séquence de trames successives d'image vidéo et obtention des estimations respectives d'orientation et de taille pour le premier motif dans chacune des trames d'image ;
    • b) transformation, de façon géométrique, dudit second motif (B) en une séquence de seconds motifs en utilisant les estimations respectives d'orientation et de taille dudit premier motif ; et
    • c) remplacement dudit premier motif détecté par un motif respectif desdits seconds motifs transformés de façon géométrique en réponse aux estimations d'orientation et de taille dudit premier motif dans chacune desdites trames d'image vidéo dans ladite séquence, caractérisé en ce que l'étape (a) comprend les étapes suivantes :
      • a1) détermination des emplacements pour une pluralité de points de repère se rapportant à une image représentée par une trame d'image particulière de la séquence de trames successives d'image vidéo ;
      • a2) détermination automatique, en réponse aux emplacements déterminés de la pluralité de points de repère, de l'emplacement relatif d'un point de repère particulier de la pluralité de points de repère dans chacune desdites trames d'image vidéo dans ladite séquence ; et
      • a3) estimation d'une orientation et d'une taille pour le premier motif par rapport à l'emplacement déterminé du point de repère particulier de la pluralité de points de repère dans chacune desdites trames d'image vidéo dans ladite séquence.
  13. Procédé selon la revendication 12, dans lequel :

       l'étape (a3) utilise un alignement précis affine pour réaliser une transformation en perspective pour estimer l'orientation et la taille dudit premier motif par rapport à l'emplacement déterminé du point de repère particulier de la pluralité de points de repère dans chacune desdites trames d'image vidéo dans ladite séquence.
  14. Procédé selon la revendication 12, dans lequel :

       l'étape (a2) est sensible à des trames d'image vidéo obtenues à partir d'une vue par appareil de prise de vues de points de repère particuliers de ladite pluralité de points de repère, et utilise des techniques de recherche du plus gros au plus fin pour détecter ledit point de repère particulier de ladite pluralité de points de repère dans ladite scène.
  15. Procédé selon la revendication 12, dans lequel :
    • ledit second motif (B) est un motif fixe ; et
    • l'étape (b) est sensible aux paramètres géométriques sélectionnés qui définissent ledit motif fixe pour calculer la transformée géométrique dudit second motif en utilisant l'orientation et la taille estimées dudit premier motif.
  16. Procédé selon la revendication 12, dans lequel :
    • ledit second motif (B) est un motif mobile dans une seconde scène définie par une séquence supplémentaire de trames successives d'image vidéo, dans lequel une partie prédéterminée dudit second motif mobile constitue un motif de référence ;
    • le procédé comprend de plus l'étape de (d) détection dudit second motif mobile dans ladite seconde scène et d'estimation de l'orientation et de la taille dudit second motif mobile détecté par rapport à ladite partie de motif de référence de ce dernier ; et
    • l'étape (b) transforme, de façon géométrique, ledit second motif, en utilisant les orientation et taille estimées du second motif mobile pour donner une image transformée stabilisée dudit second motif, et utilise l'orientation et la taille estimées du premier motif pour transformer, de façon géométrique, ladite image transformée stabilisée dudit second motif.
  17. Procédé selon la revendication 15, dans lequel :

       l'étape (d) est sensible à ladite séquence de trames successives d'image vidéo définissant ladite seconde scène et utilise une estimation d'alignement précis affine pour estimer l'orientation et la taille dudit second motif mobile détecté par rapport à ladite partie de motif de référence dans ladite seconde scène.
  18. Procédé selon la revendication 12, dans lequel :
    • le procédé comprend de plus l'étape de (d) production dudit second motif en utilisant un générateur graphique (228) ; et
    • l'étape (b) utilise le générateur graphique pour transformer, de façon géométrique, le second motif.
  19. Procédé selon la revendication 12, dans lequel les emplacements respectifs de la pluralité de points de repère et dudit premier motif sont stockés dans une carte mondiale, de sorte que ledit premier motif peut être seulement partiellement inclus ou totalement absent d'une trame particulière des trames d'image vidéo dans ladite séquence ; et dans lequel :
    •  l'étape (a1) comprend les étapes suivantes :
      • l'utilisation des emplacements respectifs des points de repère stockés dans ladite carte mondiale en même temps que l'emplacement déterminé du point de repère particulier de ladite pluralité de points de repère dans une trame d'image vidéo donnée, pour déduire un emplacement relatif d'un point de repère particulier supplémentaire de ladite pluralité de points de repère dans l'image représentée par ladite trame d'image vidéo donnée ; et
      • l'utilisation de l'emplacement relatif déduit du point de repère particulier supplémentaire de ladite pluralité de points de repère pour déterminer l'emplacement du point de repère particulier supplémentaire de ladite pluralité de points de repère ; et
    • l'étape (a3) comprend l'étape suivante :

         l'utilisation des emplacements respectifs du point de repère particulier de ladite pluralité de points de repère et du point de repère particulier supplémentaire de ladite pluralité de points de repère pour estimer l'orientation et la taille du premier motif dans la trame d'image vidéo donnée.






IPC
A Täglicher Lebensbedarf
B Arbeitsverfahren; Transportieren
C Chemie; Hüttenwesen
D Textilien; Papier
E Bauwesen; Erdbohren; Bergbau
F Maschinenbau; Beleuchtung; Heizung; Waffen; Sprengen
G Physik
H Elektrotechnik

Anmelder
Datum

Patentrecherche

Patent Zeichnungen (PDF)

Copyright © 2008 Patent-De Alle Rechte vorbehalten. eMail: info@patent-de.com