I wrote about a cool Wii-mote hack yesterday, which provides a richer 3D experience, based on the movement of the head of the user. In this post (and if all goes well some following posts) I’ll try to explain some of the basics how this works. I won’t get into geometrical or mathematical details, not to bore the readers too much
I start with very basic rendering of linear distance changes.
We’ll start with the most basic view: standing in front of the screen (which displays the scene) and moving towards it, or going backwards. Imagine you’re standing in front of three towers. The middle one is in front of you, and you can see all three towers. Now you start walking towards the one in front of you. This one will become larger and larger (well, it won’t become any larger of course, but it’ll look as if it grows), and so will the 2 towers besides it.
Those two towers aside will move to the corner of what you see though (the scene), until they disappear.
Here are some very basic drawings explaining this. The first one is a top view of the scene, where the viewer is far enough to see all towers:
This might need some explanation. Actually it’s pretty simple, think of it as if someone is looking at these towers, and you’re flying above him in a helicopter and look downwards to the earth. You can see the top of the person’s head (the green circle), and the roof of the three towers (yellow cubes).
The red and black lines are just imaginary: the red lines depict the border lines of the person’s view. As you most likely noticed yourself already, a human being isn’t able to see everything 180 degrees in front of him without rolling his eyeballs, or turning his head. In this sample image I used a vision of 90 degrees (which is most likely more than a normal human being can see, but well). The black line depicts where the intersection is taken. Think about it like this: the black line is the top of an enormous piece of paper, which is attached to the towers. Our viewer (the person who’s head is green on top) got an extremely long pencil, and draws the outlines of the towers on this paper from where he stands, as he sees them.
This is what would be on the paper (well, not only the outlines, but hey…). You could also interpret this as a picture taken by the viewer from where he stands. The red lines would depict the vision border of the camera lens, then:
Here’s what happens when the viewer comes somewhat closer. The right tower will be out of sight (at least the front of it, for simplicity I’m not taking depth into account), the left one will go partially out of sight. Notice the paper could become smaller, but when displaying this paper on a screen, or printing out a picture you take, the screen size or photo paper size would be the same as in the first view. The size of the sensor area in your eyes also remains constant, so everything will look bigger:
I must confess I forgot to take the tower hight/width aspect ratios into account, my bad. It shouldn’t confuse you though
Do notice the viewer can almost no longer see the top of the towers, if he approaches even more, he’ll be so close to the tower he can only see the front of the lowest floors:
Calculating what to display on the screen when a user approaches or goes away from the display (ie walks forwards or backwards in the scene) is pretty easy using basic triangles. You don’t even need any trigonometry, as long as the viewer remains exactly in front of the scene object.
Let’s go through this using a very basic object, a cube which is exactly in front of the viewer. Adding more objects to the scene would be easy: just add them to the scene, but always make sure to only display what’s between the lines of sight of the viewer.
Let’s say the cube’s height/width/depth is 20 (in the scene), and initially the viewer is at 50 away from it. Let’s say the screen (black line) in the real world is 200 wide (this 20, 50 and 200 could be pixels, cm, whatever. The metrics of display size on screen are decoupled from metrics inside the scene). The viewer got a vision of 90 degrees. As we know the viewer is right in front of the cube and got a 90 degree vision, this 200 wide screen can display 100 ( = 2 * 50 ) width of the virtual world. As the cube is 20 wide, it should be displayed by 200 / (100 / 20) = 200 / 5 = 40 on the screen.
Do note the image is not correctly scaled at all, and the blue and black numbers represent different units: the blue ones are virtual scene units, the black ones physical screen dimensions.
Now the viewer walks towards the cube until he’s 20 away from it, the width of the black line in the top view would become smaller, but the width of the cube would remain 20, so the cube would seem to be bigger than before, as it covers a larger part of the ‘paper’, but the display size (the size of the screen) remains constant.
How big should it be displayed on the screen? More than 20, obviously, but how much exactly?
This is pretty easy: the ‘paper width’ is 2 * 20 = 40 now (in the virtual world), the cube width remains 20. So, the cube should take 100 / (40 / 20 ) % = 50% of the physical display: 50% of 200 is 100:
That’s about all there is to know, without touching the subject of ‘virtual sight’ (which is most basicly a mirror operation around the intersection plane). If all goes well, more should follow later.
I find this subject quite interesting, and I personally would *love* to see the math, trig and all. I’d also love to see your explanation of the head-tracking math.
I’ve read somewhere that the human angle of vision is considered to be approximately 120/130°.
You’d need to check a more accurate source than my rememberings, but the human field of view is definitely more than 90° : just put the corner of a blank sheet of paper under your nose, you will easily see far beyond the edge of the sheet on the sides.
hmmmm… deja vu, Almost like Iv seen someone do this hack already
Woudn’t it be possible to get the same result with a regular webcam, if you put something easily detectable on your forehead (big colored dots)? Or is the the crucial part of the wiimote the detection speed?
@Anonymous: it might take some time until I finish next article(s), exams here.
@Thomas: I’ll look for it. I was just guessing, the angle where you got a good view (ie well-focused) looked quite small to me (but maybe it was too late in the evening )
Maybe I’m understanding something wrong, but I think your ‘sheet of paper’ example is wrong, as the angle it proves to be visible is depending on both the width of the paper, and the distance between your eyes and the paper…
@John: obviously, check yesterday’s post I link to at the top of this article.
@Joachim: object tracking in picture sequences isn’t that easy, unless the tracked object is really distinct from other parts of the image. For the wiimote it’s much easier: there’ll most likely be only one IR source around, where the IR light could even be of some specific wave length (inside some range, that is). Normal webcams don’t register (or filter) IR light from the visual though, before sending it to the PC, afaik. Maybe I should dig up my old webcam and check whether there are any good Linux drivers by now, and play around with it a little
Great post.. where are the images for the top view, front view etc… ??
Exaplananation is great, but I can’t access the diagrams. How can i access diagrams? I am really interested in this subject